Commit Graph

11816 Commits

Author SHA1 Message Date
Jonas Jenwald
37e8a8189b [TextLayer] Use an Array to build the total transform, rather than concatenating Strings, in expandTextDivs
Furthermore, it's possible to re-use the same Array for all `textDiv`s on the page.
2019-08-23 12:17:12 +02:00
Tim van der Meij
490deb1b65
Merge pull request #11086 from Snuffleupagus/textLayer-originalTransform
[TextLayer] Only cache the `originalTransform` when `enhanceTextSelection` is enabled
2019-08-22 23:09:07 +02:00
Brendan Dahl
31f319301d
Merge pull request #11087 from brendandahl/disable-links
Add a way to disable external links.
2019-08-22 11:13:11 -07:00
Jonas Jenwald
a519ceffee [TextLayer] Use template strings when updating the font property in the _layoutText method 2019-08-22 14:47:44 +02:00
Jonas Jenwald
6afe3221b7 [TextLayer] Only cache the originalTransform when enhanceTextSelection is enabled
Given that this is completely unused in "regular" text-selection mode, there's no reason to unconditionally store one string for every `textDiv`.
2019-08-22 14:47:18 +02:00
Brendan Dahl
98e989116c Add a way to disable external links. 2019-08-21 11:20:41 -07:00
Tim van der Meij
52c6b3c138
Merge pull request #11079 from Snuffleupagus/textLayer-memory
[TextLayer] Only cache the current `textDiv` style when `enhanceTextSelection` is enabled and use template strings in `expandTextDivs``
2019-08-20 22:48:10 +02:00
Tim van der Meij
78f9ab53fc
Merge pull request #11081 from dhuang612/document-pdfjs-dist/webpack
added in information about pdfjs/webpack
2019-08-20 22:38:58 +02:00
dhuang612
d52d1e2d09 added in information about pdfjs/webpack
updated readme with corrections
2019-08-20 10:20:32 -04:00
Jonas Jenwald
431a264126 [TextLayer] Reduce the amount of intermediary strings in expandTextDivs
By using template strings, we can avoid some unnecessary string allocations (which is also helped by shortening a variable name).
2019-08-19 12:09:18 +02:00
Jonas Jenwald
45dfad8640 [TextLayer] Only cache the current textDiv style when enhanceTextSelection is enabled
This will help save a little bit of memory, by not storing one unused string for each `textDiv` in regular text-selection mode.
2019-08-19 11:02:56 +02:00
Tim van der Meij
852fc955bd
Merge pull request #11076 from Snuffleupagus/XRef-fetch-isRef/cache
Replace the `XRef.cache` Array with a Map instead
2019-08-18 14:44:31 +02:00
Jonas Jenwald
1cd9a28c81 Replace the XRef.cache Array with a Map instead
Given that the different types of `Stream`s will never be cached, this thus implies that the `XRef.cache` Array will *always* be more-or-less sparse.
Generally speaking, the longer the document the more sparse the `XRef.cache` will thus become. For example, looking at the `pdf.pdf` file from the test-suite: The length of the `XRef.cache` Array will be a few hundred thousand elements, with approximately 95% of them being empty.

Hence it seems pretty clear that an Array isn't really the best data-structure for this kind of cache, and this patch thus changes it to a Map instead.

This patch-series was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 200,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch-series against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   200 |         2736 |        2736 |   1 |  0.02 |
Firefox | Page Request |   200 |            2 |           2 |   0 | -8.26 |        faster
Firefox | Rendering    |   200 |         2733 |        2734 |   1 |  0.03 |
```
2019-08-18 12:07:18 +02:00
Jonas Jenwald
34a53b9f5d Inline the isRef checks in the various XRef.fetch related methods
The relevant methods are usually not hot enough for these changes to have an easily measurable effect, however there's been a lot of other cases where similiar inlining has helped performance. (And these changes may help offset the changes made in the next patch.)
2019-08-18 11:57:48 +02:00
Tim van der Meij
1565d1849d
Merge pull request #11073 from brendandahl/code-point
Move polyfill for codePointAt to String prototype.
2019-08-17 13:26:35 +02:00
Brendan Dahl
c8129b8787 Move polyfill for codePointAt to String prototype.
This method belongs on the prototype not the String object.
2019-08-16 14:32:43 -07:00
Tim van der Meij
20181b65d4
Merge pull request #11070 from Snuffleupagus/Parser-getObj-rm-isString
Inline the `isString` check in the `Parser.getObj` method
2019-08-16 22:54:55 +02:00
Jonas Jenwald
40d3916f31 Reduce the number of temporary variables in the Parser.getObj method
This avoids allocating approximately 1.7 million short-lived variables when loading the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, in the default viewer.
2019-08-16 13:51:41 +02:00
Jonas Jenwald
7728a6630c Inline the isString check in the Parser.getObj method
For very large and complex PDF files this will help performance *slightly*, since `Parser.getObj` is called *a lot* during parsing in the worker.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 200,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   200 |         2847 |        2830 | -17 | -0.60 |        faster
Firefox | Page Request |   200 |            2 |           2 |   0 | -7.14 |
Firefox | Rendering    |   200 |         2844 |        2827 | -17 | -0.60 |        faster
```
2019-08-16 10:34:24 +02:00
Tim van der Meij
02dcd20263
Merge pull request #11064 from Snuffleupagus/Util-class
Convert the `src/shared/util.js` file to ES6 syntax
2019-08-11 20:57:11 +02:00
Jonas Jenwald
7f456b3e2e Replace of all usages of var with let/const in the src/shared/util.js file
Also removes a couple of unnecessary (temporary) variable assigments in `arraysToBytes` and uses template strings in a few spots.
2019-08-11 14:35:35 +02:00
Jonas Jenwald
f6c4a1f080 Convert Util to a class with static methods
Also replaces `var` with `const` in all the relevant code.
2019-08-11 14:35:35 +02:00
Jonas Jenwald
7ee370a394 Remove the skipEmpty parameter from Util.intersect (PR 11059 follow-up)
Looking at this again, it struck me that added functionality in `Util.intersect` is probably more confusing than helpful in general; sorry about the churn in this code!
Based on the parameter name you'd probably expect it to only match when the intersection is `[0, 0,  0, 0]` and not when only one component is zero, hence the `skipEmpty` parameter thus feels too tightly coupled to the `Page.view` getter.
2019-08-11 14:33:52 +02:00
Tim van der Meij
85acc9acac
Merge pull request #11062 from Snuffleupagus/misc-viewer-cleanup
Miscellaneous small clean-up of code in the `web/` folder
2019-08-11 13:48:35 +02:00
Jonas Jenwald
446ce88f81 Remove the non-PRODUCTION only 'disablebcmaps' hash parameter
This was added in PR 4470, but doesn't appear to have been used since.
While it's certainly easy to understand how this was helpful during development of that PR, actually providing this hash parameter isn't going to work anymore given that the original CMap files were also removed from the repository.

I suppose that the hash parameter *could* be useful if you'd attempt to update the BCMap files, however that hasn't been attempted even once in over *five* years time. Furthermore, at this point using the `AppOptions` directly in that situation should also work fine.

All in all, this seems like a piece of old and unused code which we can simply remove now.
2019-08-10 15:40:33 +02:00
Jonas Jenwald
04a3dc65e4 Move the sidebar toggleButton event listener into PDFSidebar
This is consistent with other functionality, such as e.g. `SecondaryToolbar` and `PDFFindBar`.
2019-08-10 15:38:33 +02:00
Tim van der Meij
fbe8c6127c
Merge pull request #11059 from Snuffleupagus/boundingBox-more-validation
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
2019-08-09 22:39:01 +02:00
Tim van der Meij
85b570adfb
Merge pull request #11057 from Snuffleupagus/issue-11052
Handle some corrupt/truncated JPEG images that are missing the EOI (End of Image) marker (issue 11052)
2019-08-09 22:02:34 +02:00
Jonas Jenwald
d637b25e36 Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".

The patch makes the following notable changes:
 - Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
 - Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
 - Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
 - Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
 - Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.

---

[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-09 10:18:13 +02:00
Jonas Jenwald
0f78fdb229 Handle some corrupt/truncated JPEG images that are missing the EOI (End of Image) marker (issue 11052)
Note that even Adobe Reader cannot render the PDF file completely, which is always a good indication that it's corrupt.
2019-08-08 10:37:41 +02:00
Tim van der Meij
aaef00ce5d
Merge pull request #11051 from Snuffleupagus/view-array-compare
Actually compare the `cropBox` and `mediaBox` correctly in the `Page.view` getter
2019-08-07 23:16:02 +02:00
Jonas Jenwald
e9b7996f2f Actually compare the cropBox and mediaBox correctly in the Page.view getter
The current code will only consider the `cropBox` and `mediaBox` as equal when they both point to the *same* underlying Array. In the case where a PDF file actually specifies both boxes independently, with the exact same values in each, the comparison will currently fail and lead to an unneeded intersection computation.
2019-08-07 17:15:57 +02:00
Tim van der Meij
bb5e98195d
Merge pull request #11047 from Snuffleupagus/issue-11045
Support corrupt PDF files with invalid/non-existent Group /CS entries (issue 11045)
2019-08-06 22:50:36 +02:00
Brendan Dahl
e1aed05c44
Merge pull request #11049 from brendandahl/telem-add-time
Add page rendered timestamp to telemetry.
2019-08-06 11:53:08 -07:00
Brendan Dahl
47077f8de9 Add page rendered timestamp to telemetry. 2019-08-06 09:46:33 -07:00
Jonas Jenwald
5ac9c7c384 Support corrupt PDF files with invalid/non-existent Group /CS entries (issue 11045)
The PDF file in question tries to reference a non-existent ColorSpace, which should be quite rare in practice.
2019-08-06 14:33:05 +02:00
Tim van der Meij
a666f1ef00
Merge pull request #11048 from Snuffleupagus/linkService-compact-pagesRefCache
Use more compact keys in `PDFLinkService._pagesRefCache`
2019-08-05 22:44:00 +02:00
Jonas Jenwald
9acaaf5126 Use more compact keys in PDFLinkService._pagesRefCache
By using the same internal formatting here as in the `Ref.toString` method, in `src/core/primitives.js`, all cache-keys will become at least two bytes shorter (and most three bytes shorter).
Obviously this won't have a huge effect on memory since there's only one cache entry per page, but it nonetheless seems wasteful to use longer keys than strictly required.
2019-08-05 18:06:32 +02:00
Tim van der Meij
be70ee236d
Merge pull request #11013 from timvandermeij/annotations-quadpoints
[api-minor] Implement quadpoints for annotations in the core layer
2019-08-04 16:06:10 +02:00
Tim van der Meij
6c05a7653d
Merge pull request #11038 from Snuffleupagus/getStats-objects
[api-minor] Fix completely broken `getStats` method by returning stats in Objects, rather than in Arrays (PR 11029 follow-up)
2019-08-02 22:08:37 +02:00
Jonas Jenwald
0276385e6e [api-minor] Fix completely broken getStats method by returning stats in Objects, rather than in Arrays (PR 11029 follow-up)
With the changes to the `StreamType`/`FontType` "enums" in PR 11029, one unfortunate result is that `getStats` now *always* returns empty Arrays. Something that everyone, myself included, apparently missed is that you obviously cannot index an Array with Strings :-)

I wrongly assumed that the unit-tests would catch any bugs, but they apparently suffered from the same issue as the code in `src/core/`.

Another possible option could perhaps be to use `Set`s, rather than objects, but that will require larger changes since `LoopbackPort` (in `src/display/api.js`) doesn't support them.
2019-08-02 14:09:24 +02:00
Tim van der Meij
9c8fe3142a
Merge pull request #11034 from Snuffleupagus/cancel-with-AbortException
Ensure that `ReadableStream`s are cancelled with actual Errors
2019-08-02 00:18:44 +02:00
Tim van der Meij
e0b38bed3c
Merge pull request #11029 from brendandahl/pdfjs-telemetry-update
[api-minor] Update telemetry to use 'categorical' histograms.
2019-08-02 00:11:02 +02:00
Tim van der Meij
2754b09888
Merge pull request #11033 from Snuffleupagus/viewer-close-updateLoadingIndicatorState
Ensure that the loading indicator, in the pageNumber input, is hidden when the viewer is closed
2019-08-01 23:36:55 +02:00
Brendan Dahl
31d71808e7 [api-minor] Update telemetry to use 'categorical' histograms.
Firefox telemetry supports using string labels now. Convert our integers
that we used for categories to just use strings.

The upstream work will happen in:
https://bugzilla.mozilla.org/show_bug.cgi?id=1566882
2019-08-01 09:51:02 -07:00
Jonas Jenwald
a3150166ec Ensure that ReadableStreams are cancelled with actual Errors
There's a number of spots in the current code, and tests, where `cancel` methods are not called with appropriate arguments (leading to Promises not being rejected with Errors as intended).
In some cases the cancel `reason` is implicitly set to `undefined`, and in others the cancel `reason` is just a plain String. To address this inconsistency, the patch changes things such that cancelling is done with `AbortException`s everywhere instead.
2019-08-01 16:40:46 +02:00
Jonas Jenwald
cb1394c13d Ensure that the loading indicator, in the pageNumber input, is hidden when the viewer is closed
Currently the indicator may remain visible even after the document has been closed, which seems weird given that no page is either visible nor rendering :-)
2019-08-01 16:30:33 +02:00
Tim van der Meij
d909b86b28
Merge pull request #11020 from Snuffleupagus/issue-11016
Add a work-around, in `glyphlist.js`, for bad PDF generators which use a non-standard `/f_f` string in the `Encoding` dictionary when referring to the ff ligature (issue 11016)
2019-07-31 23:33:34 +02:00
Tim van der Meij
c5d837d2fe
Merge pull request #11019 from wangsongyan/master
Decode URL encoded filenames from content disposition headers
2019-07-31 23:24:08 +02:00
wangsongyan
c61205d980 decode filename when match an urlencode filename from contentDispositionFilename 2019-07-31 09:33:56 +08:00