Commit Graph

11743 Commits

Author SHA1 Message Date
Tim van der Meij
85acc9acac
Merge pull request #11062 from Snuffleupagus/misc-viewer-cleanup
Miscellaneous small clean-up of code in the `web/` folder
2019-08-11 13:48:35 +02:00
Jonas Jenwald
446ce88f81 Remove the non-PRODUCTION only 'disablebcmaps' hash parameter
This was added in PR 4470, but doesn't appear to have been used since.
While it's certainly easy to understand how this was helpful during development of that PR, actually providing this hash parameter isn't going to work anymore given that the original CMap files were also removed from the repository.

I suppose that the hash parameter *could* be useful if you'd attempt to update the BCMap files, however that hasn't been attempted even once in over *five* years time. Furthermore, at this point using the `AppOptions` directly in that situation should also work fine.

All in all, this seems like a piece of old and unused code which we can simply remove now.
2019-08-10 15:40:33 +02:00
Jonas Jenwald
04a3dc65e4 Move the sidebar toggleButton event listener into PDFSidebar
This is consistent with other functionality, such as e.g. `SecondaryToolbar` and `PDFFindBar`.
2019-08-10 15:38:33 +02:00
Tim van der Meij
fbe8c6127c
Merge pull request #11059 from Snuffleupagus/boundingBox-more-validation
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
2019-08-09 22:39:01 +02:00
Tim van der Meij
85b570adfb
Merge pull request #11057 from Snuffleupagus/issue-11052
Handle some corrupt/truncated JPEG images that are missing the EOI (End of Image) marker (issue 11052)
2019-08-09 22:02:34 +02:00
Jonas Jenwald
d637b25e36 Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".

The patch makes the following notable changes:
 - Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
 - Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
 - Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
 - Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
 - Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.

---

[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-09 10:18:13 +02:00
Jonas Jenwald
0f78fdb229 Handle some corrupt/truncated JPEG images that are missing the EOI (End of Image) marker (issue 11052)
Note that even Adobe Reader cannot render the PDF file completely, which is always a good indication that it's corrupt.
2019-08-08 10:37:41 +02:00
Tim van der Meij
aaef00ce5d
Merge pull request #11051 from Snuffleupagus/view-array-compare
Actually compare the `cropBox` and `mediaBox` correctly in the `Page.view` getter
2019-08-07 23:16:02 +02:00
Jonas Jenwald
e9b7996f2f Actually compare the cropBox and mediaBox correctly in the Page.view getter
The current code will only consider the `cropBox` and `mediaBox` as equal when they both point to the *same* underlying Array. In the case where a PDF file actually specifies both boxes independently, with the exact same values in each, the comparison will currently fail and lead to an unneeded intersection computation.
2019-08-07 17:15:57 +02:00
Tim van der Meij
bb5e98195d
Merge pull request #11047 from Snuffleupagus/issue-11045
Support corrupt PDF files with invalid/non-existent Group /CS entries (issue 11045)
2019-08-06 22:50:36 +02:00
Brendan Dahl
e1aed05c44
Merge pull request #11049 from brendandahl/telem-add-time
Add page rendered timestamp to telemetry.
2019-08-06 11:53:08 -07:00
Brendan Dahl
47077f8de9 Add page rendered timestamp to telemetry. 2019-08-06 09:46:33 -07:00
Jonas Jenwald
5ac9c7c384 Support corrupt PDF files with invalid/non-existent Group /CS entries (issue 11045)
The PDF file in question tries to reference a non-existent ColorSpace, which should be quite rare in practice.
2019-08-06 14:33:05 +02:00
Tim van der Meij
a666f1ef00
Merge pull request #11048 from Snuffleupagus/linkService-compact-pagesRefCache
Use more compact keys in `PDFLinkService._pagesRefCache`
2019-08-05 22:44:00 +02:00
Jonas Jenwald
9acaaf5126 Use more compact keys in PDFLinkService._pagesRefCache
By using the same internal formatting here as in the `Ref.toString` method, in `src/core/primitives.js`, all cache-keys will become at least two bytes shorter (and most three bytes shorter).
Obviously this won't have a huge effect on memory since there's only one cache entry per page, but it nonetheless seems wasteful to use longer keys than strictly required.
2019-08-05 18:06:32 +02:00
Tim van der Meij
be70ee236d
Merge pull request #11013 from timvandermeij/annotations-quadpoints
[api-minor] Implement quadpoints for annotations in the core layer
2019-08-04 16:06:10 +02:00
Tim van der Meij
6c05a7653d
Merge pull request #11038 from Snuffleupagus/getStats-objects
[api-minor] Fix completely broken `getStats` method by returning stats in Objects, rather than in Arrays (PR 11029 follow-up)
2019-08-02 22:08:37 +02:00
Jonas Jenwald
0276385e6e [api-minor] Fix completely broken getStats method by returning stats in Objects, rather than in Arrays (PR 11029 follow-up)
With the changes to the `StreamType`/`FontType` "enums" in PR 11029, one unfortunate result is that `getStats` now *always* returns empty Arrays. Something that everyone, myself included, apparently missed is that you obviously cannot index an Array with Strings :-)

I wrongly assumed that the unit-tests would catch any bugs, but they apparently suffered from the same issue as the code in `src/core/`.

Another possible option could perhaps be to use `Set`s, rather than objects, but that will require larger changes since `LoopbackPort` (in `src/display/api.js`) doesn't support them.
2019-08-02 14:09:24 +02:00
Tim van der Meij
9c8fe3142a
Merge pull request #11034 from Snuffleupagus/cancel-with-AbortException
Ensure that `ReadableStream`s are cancelled with actual Errors
2019-08-02 00:18:44 +02:00
Tim van der Meij
e0b38bed3c
Merge pull request #11029 from brendandahl/pdfjs-telemetry-update
[api-minor] Update telemetry to use 'categorical' histograms.
2019-08-02 00:11:02 +02:00
Tim van der Meij
2754b09888
Merge pull request #11033 from Snuffleupagus/viewer-close-updateLoadingIndicatorState
Ensure that the loading indicator, in the pageNumber input, is hidden when the viewer is closed
2019-08-01 23:36:55 +02:00
Brendan Dahl
31d71808e7 [api-minor] Update telemetry to use 'categorical' histograms.
Firefox telemetry supports using string labels now. Convert our integers
that we used for categories to just use strings.

The upstream work will happen in:
https://bugzilla.mozilla.org/show_bug.cgi?id=1566882
2019-08-01 09:51:02 -07:00
Jonas Jenwald
a3150166ec Ensure that ReadableStreams are cancelled with actual Errors
There's a number of spots in the current code, and tests, where `cancel` methods are not called with appropriate arguments (leading to Promises not being rejected with Errors as intended).
In some cases the cancel `reason` is implicitly set to `undefined`, and in others the cancel `reason` is just a plain String. To address this inconsistency, the patch changes things such that cancelling is done with `AbortException`s everywhere instead.
2019-08-01 16:40:46 +02:00
Jonas Jenwald
cb1394c13d Ensure that the loading indicator, in the pageNumber input, is hidden when the viewer is closed
Currently the indicator may remain visible even after the document has been closed, which seems weird given that no page is either visible nor rendering :-)
2019-08-01 16:30:33 +02:00
Tim van der Meij
d909b86b28
Merge pull request #11020 from Snuffleupagus/issue-11016
Add a work-around, in `glyphlist.js`, for bad PDF generators which use a non-standard `/f_f` string in the `Encoding` dictionary when referring to the ff ligature (issue 11016)
2019-07-31 23:33:34 +02:00
Tim van der Meij
c5d837d2fe
Merge pull request #11019 from wangsongyan/master
Decode URL encoded filenames from content disposition headers
2019-07-31 23:24:08 +02:00
wangsongyan
c61205d980 decode filename when match an urlencode filename from contentDispositionFilename 2019-07-31 09:33:56 +08:00
Jonas Jenwald
9ad50521b1 Add a work-around, in glyphlist.js, for bad PDF generators which use a non-standard /f_f string in the Encoding dictionary when referring to the ff ligature (issue 11016)
This patch will not incur any (measurable) overhead, since the glyphlist is already quite long and one more entry won't really matter, which is important given that this sort of PDF corruption ought to be very rare.

Furthermore, this patch purposely does *not* add a bunch of similarly modified ligature names on pure speculation. Any similar additions, for other ligatures, should only be made if there's real-world examples of PDF files where that's actually necessary.
2019-07-30 17:06:58 +02:00
Tim van der Meij
323b2eabcf
Merge pull request #11012 from Snuffleupagus/EvaluatorPreprocessor-read-fewer-function-calls
Reduce the number of function calls in `EvaluatorPreprocessor.read`
2019-07-29 22:32:02 +02:00
Jonas Jenwald
38ccb43436 Reduce the number of function calls in EvaluatorPreprocessor.read
For very large and complex PDF files this will help performance slightly, since `EvaluatorPreprocessor.read` is called a lot during parsing in the worker.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, using the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 200,
       "type": "eq"
    }
]
```

This gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   200 |         3402 |        3358 | -43 | -1.28 |        faster
Firefox | Page Request |   200 |            1 |           1 |   0 | 26.71 |
Firefox | Rendering    |   200 |         3401 |        3357 | -44 | -1.28 |        faster
```
2019-07-29 08:43:36 +02:00
Tim van der Meij
9114004d5b
[api-minor] Implement quadpoints for annotations in the core layer 2019-07-28 20:36:21 +02:00
Tim van der Meij
9b72089886
Merge pull request #11003 from Snuffleupagus/webViewerWheel-supportedKeys
Ensure that setting the `zoomDisabledTimeout` isn't skipped, regardless of the supported zoom keys, when handling mouse wheel events (PR 7097 follow-up)
2019-07-23 22:28:36 +02:00
Tim van der Meij
e066db47dc
Merge pull request #10996 from Snuffleupagus/app-conditional-findbar
Avoid creating a `PDFFindBar` instance, in the Firefox built-in viewer, when not actually necessary
2019-07-23 22:22:31 +02:00
Jonas Jenwald
1eed5b7235 Ensure that setting the zoomDisabledTimeout isn't skipped, regardless of the supported zoom keys, when handling mouse wheel events (PR 7097 follow-up)
*Possible follow-up:* It probably wouldn't hurt to try and shorten the `supportedMouseWheelZoomModifierKeys` name a bit, but I'm not attempting that here since it'd also require updating `PdfStreamConverter.jsm` in mozilla-central in order to be consistent.
2019-07-23 17:42:12 +02:00
Jonas Jenwald
46b61ff12e Avoid creating a PDFFindBar instance, in the Firefox built-in viewer, when not actually necessary
This is similar to how `PDFPresentationMode` isn't used when the Fullscreen API isn't supported.
2019-07-23 07:51:14 +02:00
Tim van der Meij
1f287ec486
Merge pull request #11001 from Snuffleupagus/Parser-shift-rm-isCmd
Inline the `isCmd` check in the `Parser.shift` method
2019-07-22 22:32:53 +02:00
Jonas Jenwald
ff90aa4323 Inline the isCmd check in the Parser.shift method
For very large and complex PDF files this will help performance slightly, since `Parser.shift` is called *a lot* during parsing.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471 (with well over *four million* `Parser.shift` calls for just the one page), using the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 100,
       "type": "eq"
    }
]
```

This gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   100 |         3386 |        3322 | -65 | -1.92 |        faster
Firefox | Page Request |   100 |            1 |           1 |   0 | -8.08 |
Firefox | Rendering    |   100 |         3385 |        3321 | -65 | -1.92 |        faster
```
2019-07-22 12:07:36 +02:00
Tim van der Meij
71d9f5f860
Merge pull request #10995 from Snuffleupagus/app-rm-one-setFileSize
Remove an unnecessary `PDFDocumentProperties.setFileSize` call, relevant for the Firefox built-in viewer, and use the "normal" code-path in `PDFViewerApplication.open` instead
2019-07-21 12:42:26 +02:00
Jonas Jenwald
53a854bb0a Remove an unnecessary PDFDocumentProperties.setFileSize call, relevant for the Firefox built-in viewer, and use the "normal" code-path in PDFViewerApplication.open instead
Since calling `getDocument` with a `PDFDataRangeTransport` argument will always unconditionally override a manually provided `length` argument, see a1a667809f/src/display/api.js (L390-L394), this patch should thus be safe.
2019-07-21 11:38:17 +02:00
Tim van der Meij
a1a667809f
Merge pull request #10993 from Snuffleupagus/AppOption-docBaseUrl
Add the `docBaseUrl` API parameter to `AppOptions` in the viewer
2019-07-20 13:57:52 +02:00
Jonas Jenwald
ba2c042a75 Add the docBaseUrl API parameter to AppOptions in the viewer
This unfortunately required a bit of special handling, to correctly deal with the various extension builds.
2019-07-20 13:39:34 +02:00
Tim van der Meij
0cc0789af3
Merge pull request #10986 from Snuffleupagus/inline-ensureByte-ensureRange
Attempt to significantly reduce the number of `ChunkedStream.{ensureByte, ensureRange}` calls by inlining the `this.progressiveDataLength` checks at the call-sites
2019-07-19 22:51:21 +02:00
Tim van der Meij
acef5bfd16
Merge pull request #10979 from Snuffleupagus/firefox-zoomreset
[Firefox] Re-factor the 'zoomreset' message handling in the viewer (PR 10652 follow-up)
2019-07-19 22:42:07 +02:00
Tim van der Meij
b964df53da
Merge pull request #10990 from Snuffleupagus/onBeforeDraw-onAfterDraw
Refactor the `onBeforeDraw`/`onAfterDraw` functionality used in `BaseViewer` and `PDFPageView`
2019-07-19 22:35:41 +02:00
Tim van der Meij
98c4c646cb
Merge pull request #10987 from mozilla/dependabot/npm_and_yarn/js-yaml-3.13.1
Bump js-yaml from 3.12.0 to 3.13.1
2019-07-19 22:25:13 +02:00
Jonas Jenwald
366eebeb0f Refactor the onBeforeDraw/onAfterDraw functionality used in BaseViewer and PDFPageView
This functionality is very old, and pre-dates e.g. the introduction of the `EventBus` by a number of years. Rather than attaching two callback functions to every single `PDFPageView` instance, it's thus now possible to utilize the `EventBus` such that you only need a grand total of two listeners to achieve the same result.

For the `onAfterDraw` callback the replacement is particularly simple, given that a 'pagerendered' event is already being dispatched in the appropriate spot. An added benefit here is the ability to remove the event listener, since we only ever care about *one* (arbitrary) page being rendered for the `BaseViewer.onePageRendered` promise.

For the `onBeforeDraw` callback, a new 'pagerender' event was thus added to replace the callback.
2019-07-19 12:57:14 +02:00
dependabot[bot]
808f7db586
Bump js-yaml from 3.12.0 to 3.13.1
Bumps [js-yaml](https://github.com/nodeca/js-yaml) from 3.12.0 to 3.13.1.
- [Release notes](https://github.com/nodeca/js-yaml/releases)
- [Changelog](https://github.com/nodeca/js-yaml/blob/master/CHANGELOG.md)
- [Commits](https://github.com/nodeca/js-yaml/compare/3.12.0...3.13.1)

Signed-off-by: dependabot[bot] <support@github.com>
2019-07-19 00:04:03 +00:00
Jonas Jenwald
b5254f2745 Attempt to significantly reduce the number of ChunkedStream.{ensureByte, ensureRange} calls by inlining the this.progressiveDataLength checks at the call-sites
The number of in particular `ChunkedStream.ensureByte` calls is often absolutely *huge* (on the order of million calls) when loading and rendering even moderately complicated PDF files, which isn't entirely surprising considering that the `getByte`/`getBytes`/`peekByte`/`peekBytes` methods are used for essentially all data reading/parsing.

The idea implemented in this patch is to inline an inverted `progressiveDataLength` check at all of the `ensureByte`/`ensureRange` call-sites, which in practice will often result in *several* orders of magnitude fewer function calls.
Obviously this patch will only help if the browser supports streaming, which all reasonably modern browsers now do (including the Firefox built-in PDF viewer), and assuming that the user didn't set the `disableStream` option (e.g. for using `disableAutoFetch`). However, I think we should be able to improve performance for the default out-of-the-box use case, without worrying about e.g. older browsers (where this patch will thus incur *one* additional check before calling `ensureByte`/`ensureRange`).

This patch was inspired by the *first* commit in PR 5005, which was subsequently backed out in PR 5145 for causing regressions. Since the general idea of avoiding unnecessary function calls was really nice, I figured that re-attempting this in one way or another wouldn't be a bad idea.
Given that streaming is now supported, which it wasn't back then, using `progressiveDataLength` seemed like an easier approach in general since it also allowed supporting both `ensureByte` and `ensureRange`.

This sort of patch obviously needs data to back it up, hence I've benchmarked the changes using the following manifest file (with the default `tracemonkey` file):
```
[
    {  "id": "tracemonkey-eq",
       "file": "pdfs/tracemonkey.pdf",
       "md5": "9a192d8b1a7dc652a19835f6f08098bd",
       "rounds": 250,
       "type": "eq"
    }
]
```

I get the following complete results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |  3500 |          140 |         134 |  -6 | -4.46 |        faster
Firefox | Page Request |  3500 |            2 |           2 |   0 | -0.10 |
Firefox | Rendering    |  3500 |          138 |         131 |  -6 | -4.54 |        faster
```

Here it's pretty clear that the patch does have a positive net effect, even for a PDF file of fairly moderate size and complexity. However, in this case it's probably interesting to also look at the results per page:
```
-- Grouped By page, stat --
page | stat         | Count | Baseline(ms) | Current(ms) | +/- |     %  | Result(P<.05)
---- | ------------ | ----- | ------------ | ----------- | --- | ------ | -------------
0    | Overall      |   250 |           74 |          75 |   1 |   0.69 |
0    | Page Request |   250 |            1 |           1 |   0 |  33.20 |
0    | Rendering    |   250 |           73 |          74 |   0 |   0.25 |
1    | Overall      |   250 |          123 |         121 |  -2 |  -1.87 |        faster
1    | Page Request |   250 |            3 |           2 |   0 | -11.73 |
1    | Rendering    |   250 |          121 |         119 |  -2 |  -1.67 |
2    | Overall      |   250 |           64 |          63 |  -1 |  -1.91 |
2    | Page Request |   250 |            1 |           1 |   0 |   8.81 |
2    | Rendering    |   250 |           63 |          62 |  -1 |  -2.13 |        faster
3    | Overall      |   250 |           97 |          97 |   0 |  -0.06 |
3    | Page Request |   250 |            1 |           1 |   0 |  25.37 |
3    | Rendering    |   250 |           96 |          95 |   0 |  -0.34 |
4    | Overall      |   250 |           97 |          97 |   0 |  -0.38 |
4    | Page Request |   250 |            1 |           1 |   0 |  -5.97 |
4    | Rendering    |   250 |           96 |          96 |   0 |  -0.27 |
5    | Overall      |   250 |           99 |          97 |  -3 |  -2.92 |
5    | Page Request |   250 |            2 |           1 |   0 | -17.20 |
5    | Rendering    |   250 |           98 |          95 |  -3 |  -2.68 |
6    | Overall      |   250 |           99 |          99 |   0 |  -0.14 |
6    | Page Request |   250 |            2 |           2 |   0 | -16.49 |
6    | Rendering    |   250 |           97 |          98 |   0 |   0.16 |
7    | Overall      |   250 |           96 |          95 |  -1 |  -0.55 |
7    | Page Request |   250 |            1 |           2 |   1 |  66.67 |        slower
7    | Rendering    |   250 |           95 |          94 |  -1 |  -1.19 |
8    | Overall      |   250 |           92 |          92 |  -1 |  -0.69 |
8    | Page Request |   250 |            1 |           1 |   0 | -17.60 |
8    | Rendering    |   250 |           91 |          91 |   0 |  -0.52 |
9    | Overall      |   250 |          112 |         112 |   0 |   0.29 |
9    | Page Request |   250 |            2 |           1 |   0 |  -7.92 |
9    | Rendering    |   250 |          110 |         111 |   0 |   0.37 |
10   | Overall      |   250 |          589 |         522 | -67 | -11.38 |        faster
10   | Page Request |   250 |           14 |          13 |   0 |  -1.26 |
10   | Rendering    |   250 |          575 |         508 | -67 | -11.62 |        faster
11   | Overall      |   250 |           66 |          66 |  -1 |  -0.86 |
11   | Page Request |   250 |            1 |           1 |   0 | -16.48 |
11   | Rendering    |   250 |           65 |          65 |   0 |  -0.62 |
12   | Overall      |   250 |          303 |         291 | -12 |  -4.07 |        faster
12   | Page Request |   250 |            2 |           2 |   0 |  12.93 |
12   | Rendering    |   250 |          301 |         289 | -13 |  -4.19 |        faster
13   | Overall      |   250 |           48 |          47 |   0 |  -0.45 |
13   | Page Request |   250 |            1 |           1 |   0 |   1.59 |
13   | Rendering    |   250 |           47 |          46 |   0 |  -0.52 |
```

Here it's clear that this patch *significantly* improves the rendering performance of the slowest pages, while not causing any big regressions elsewhere. As expected, this patch thus helps larger and/or more complex pages the most (which is also where even small improvements will be most beneficial).
There's obviously the question if this is *slightly* regressing simpler pages, but given just how short the times are in most cases it's not inconceivable that the page results above are simply caused be e.g. limited `Date.now()` and/or limited numerical precision.
2019-07-18 17:30:22 +02:00
Jonas Jenwald
d1af8bd196 Slightly more simplified dispatching of the 'findbarclose' events in firefoxcom.js 2019-07-18 14:28:49 +02:00
Jonas Jenwald
8e5aa484fb [Firefox] Re-factor the 'zoomreset' message handling in the viewer (PR 10652 follow-up)
Given that this special-case only matters for the Firefox PDF viewer, it's probably better to just move it into `firefoxcom.js` instead to reduce unnecessary confusion.
2019-07-18 14:27:43 +02:00