Commit Graph

3845 Commits

Author SHA1 Message Date
Jonas Jenwald
a8fc306b6e Replace globalScope with the standard globalThis property instead
Please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/globalThis and note that most (reasonably) modern browsers have supported this for a while now, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/globalThis#Browser_compatibility

Since ESLint doesn't support this new global yet, it was added to the `globals` list in the top-level configuration file to prevent issues.

Finally, for older browsers a polyfill was added in `ssrc/shared/compatibility.js`.
2019-12-08 20:19:02 +01:00
Jonas Jenwald
a02122e984 Ensure that PDFDocument.checkFirstPage waits for cleanup to complete (PR 10392 follow-up)
Given how this method is currently used there shouldn't be any fonts loaded at the point in time where it's called, but it does seem like a bad idea to assume that that's always going to be the case. Since `PDFDocument.checkFirstPage` is already asynchronous, it's easy enough to simply await `Catalog.cleanup` here.

(The patch also makes a tiny simplification in a loop in `Catalog.cleanup`.)
2019-12-07 12:31:41 +01:00
Jonas Jenwald
5c0336872e Handle corrupt ASCII85Decode inline images with truncated EOD markers (issue 11385)
In the PDF document in question, there's an ASCII85Decode inline image where the '>' part of EOD (end-of-data) marker is missing; hence the PDF document is corrupt.
2019-12-05 15:53:18 +01:00
Jonas Jenwald
c3b1c8f857 Slightly simplify the XRef cache lookup in XRef.fetch
Note that the XRef cache will only hold objects returned through `Parser.getObj`, and indirectly via `Lexer.getObj`. Since neither of those methods will ever return `undefined`, we can simply `assert` that when inserting objects into the cache and thus get rid of one function call when doing cache lookups.

Obviously this won't have a huge effect on performance, however `XRef.fetch` is usually called *a lot* in larger documents and this patch thus cannot hurt.
2019-11-30 22:41:53 +01:00
Jonas Jenwald
168c6aecae Stop caching Streams in XRef.fetchCompressed
I'm slightly surprised that this hasn't actually caused any (known) bugs, but that may be more luck than anything else since it fortunately doesn't seem common for Streams to be defined inside of an 'ObjStm'.[1]

Note that in the `XRef.fetchUncompressed` method we're *not* caching Streams, and that for very good reasons too.

 - Streams, especially the `DecodeStream` ones, can become *very* large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so.

 - Attempting to read from the *same* Stream more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position.

 - Given that even the `src/core/` code is now fairly asynchronous, see e.g. the `PartialEvaluator`, it's generally impossible to assert that any one Stream isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Streams isn't going to work in the general case.

All in all, I cannot understand why it'd ever be correct to cache Streams in the `XRef.fetchCompressed` method.

---
[1] One example where that happens is the `issue3115r.pdf` file in the test-suite, where the streams in question are not actually used for anything within the PDF.js code.
2019-11-30 10:21:08 +01:00
Jonas Jenwald
06412a557b Slighthly re-factor XRef.fetchCompressed
- Change all occurences of `var` to `let`/`const`.

 - Initialize the (temporary) Arrays with the correct sizes upfront.

 - Inline the `isCmd` check. Obviously this won't make a huge difference, but given that the check is only relevant for corrupt documents it cannot hurt.
2019-11-30 09:49:51 +01:00
Jonas Jenwald
725566cfea Remove the Number.isInteger checks from XRef.fetchUncompressed (PR 8857 follow-up)
Having ran the entire test-suite locally with these `Number.isInteger` checks removed, there wasn't a single test failure anywhere; see also PR 8857.
Hence everything points to this being completely unnecessary now, and by removing this code there's thus fewer function calls being made in `XRef.fetchUncompressed`.
2019-11-28 23:25:39 +01:00
Jonas Jenwald
cc76132c24 Remove outdated, and misleading, JSDoc comment from the PDFDocument class
The contents of this comment hasn't been correct for *years*, ever since the library was properly split into main/worker-threads, so it's probably high time for this to be updated.
2019-11-25 11:36:29 +01:00
Jonas Jenwald
a965662184 Enable the getter-return, no-dupe-else-if, and no-setter-return ESLint rules
All of these rules can help catch errors during development. Please note that only `getter-return` required a few changes, which was limited to disabling the rule in a couple of spots; please find additional details about these rules at:
 - https://eslint.org/docs/rules/getter-return
 - https://eslint.org/docs/rules/no-dupe-else-if
 - https://eslint.org/docs/rules/no-setter-return
2019-11-23 11:40:30 +01:00
Tim van der Meij
be02e67972
Merge pull request #11335 from Snuffleupagus/issue-11330
Subtract `stream.start` when getting the `startXRef` property for documents with a Linearization dictionary (issue 11330)
2019-11-16 13:56:01 +01:00
Jonas Jenwald
9199b02a42 Subtract stream.start when getting the startXRef property for documents with a Linearization dictionary (issue 11330)
For documents with a Linearization dictionary the computed `startXRef` position will be relative to the raw file, rather than the actual PDF document itself (which begins with `%PDF-`).
Hence it's necessary to subtract `stream.start` in this case, since otherwise the `XRef.readXRef` method will increment the position too far resulting in parsing errors.
2019-11-16 09:29:10 +01:00
Jonas Jenwald
688d15526e Use getBytes, rather than looping over getByte, in FlateStream.prototype.readBlock
*Please note:* A a similar change was attempted in PR 5005, but it was subsequently backed out (in PR 5069) since other parts of the patch caused issues.

With these changes, it's possible to replace repeated function calls within a loop with just a single function call and subsequent assignment instead.
2019-11-15 15:45:31 +01:00
Jonas Jenwald
878432784c [PDFHistory] Move the IE11 pushState/replaceState work-around to src/shared/compatibility.js (PR 10461 follow-up)
I've always disliked the solution in PR 10461, since it required changes to the `PDFHistory` code itself to deal with a bug in IE11.
Now that IE11 support is limited, it seems reasonable to remove these `pushState`/`replaceState` hacks from the main code-base and simply use polyfills instead.
2019-11-11 17:48:04 +01:00
Jonas Jenwald
74e00ed93c Change isNodeJS from a function to a constant
Given that this shouldn't change after the `pdf.js`/`pdf.worker.js` files have been loaded, it doesn't seems necessary to keep this as a function.
2019-11-10 16:44:29 +01:00
Jonas Jenwald
2817121bc1 Convert globalScope and isNodeJS to proper modules
Slightly unrelated to the rest of the patch, but this also removes an out-of-place `globals` definition from the `web/viewer.js` file.
2019-11-10 16:44:29 +01:00
Tim van der Meij
6763e16804
Merge pull request #11313 from Snuffleupagus/issue-11122
Ensure that Popup annotations, where the parent annotation is a polyline, will always be possible to open/close (issue 11122)
2019-11-10 13:31:51 +01:00
Jonas Jenwald
0233fc07b6
Revert "Convert Catalog.getPageDict to an async method" 2019-11-09 22:36:23 +01:00
Jonas Jenwald
536a52e981 Ensure that Popup annotations, where the parent annotation is a polyline, will always be possible to open/close (issue 11122)
For Popup annotation trigger elements consisting of an arbitrary polyline, you need to ensure that the 'stroke-width' is always non-zero since otherwise it's impossible to actually open/close the popup.

Unfortunately I don't believe that any of the test-suites can be used to test this, hence why no tests are included in the patch.
2019-11-09 13:35:59 +01:00
Jonas Jenwald
79d7c002de Inline a couple of isRef/isDict checks in the getPageDict method
As we've seen in numerous other cases, avoiding unnecessary function calls is never a bad thing (even if the effect is probably tiny here).

With these changes we also avoid potentially two back-to-back `isDict` checks when evaluating possible Page nodes, and can also no longer accidentally pick a dictionary with an incorrect /Type.
2019-11-08 17:53:00 +01:00
Jonas Jenwald
0d89006bf1 Convert Catalog.getPageDict to an async method
This makes it possible to remove the internal `next` helper function, and also gets rid of the need to manually resolve/reject a `PromiseCapability`.
2019-11-08 17:45:28 +01:00
Jonas Jenwald
98f570c103 Prevent browser exceptions from incorrectly triggering the assert in PDFPageProxy._abortOperatorList (PR 11069 follow-up)
For certain canvas-related errors (and probably others), the browser rendering exceptions may be propagated "as-is" to the PDF.js code. In this case, the exceptions are of the somewhat cryptic `NS_ERROR_FAILURE` type.
Unfortunately these aren't actual `Error`s, which thus ends up unintentionally triggering the `assert` in `PDFPageProxy._abortOperatorList`; sorry about that!
2019-11-07 11:37:48 +01:00
Jonas Jenwald
80342e2fdc Support UTF-16 little-endian strings in the stringToPDFString helper function (bug 1593902)
The bug report seem to suggest that we don't support UTF-16 strings with a BOM (byte order mark), which we *actually* do as evident by both the code and a unit-test.
The issue at play here is rather that we previously only supported big-endian UTF-16 BOM, and the `Title` string in the PDF document is using a *little-endian* UTF-16 BOM instead.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1593902
2019-11-05 12:43:17 +01:00
Jonas Jenwald
04497bcb3c Re-factor the ObjectLoader._walk method to be properly asynchronous
Rather than having to store a `PromiseCapability` on the `ObjectLoader` instances, we can simply convert `_walk` to be `async` and thus have the same functionality with native JavaScript instead.
2019-11-03 15:04:20 +01:00
Jonas Jenwald
fec1f02b2a Slightly re-factor setting of the link target in addLinkAttributes
I happened to look at this code and the way that the link target is set seems unecessarily convoluted, since we're using `Object.values` and `Array.prototype.includes` for *every* link being parsed.
Given that the number of link targets are so few, the easist solution honestly seem to be to just use a `switch` statement to do the link target mapping.
2019-11-02 14:01:31 +01:00
Tim van der Meij
bbd2386bd9
Merge pull request #11296 from Snuffleupagus/parseColorSpace-stopAtErrors
Allow skipping of errors when parsing broken/unsupported ColorSpaces (issue 6707, issue 11287)
2019-11-01 22:47:50 +01:00
Jonas Jenwald
829d6ba2dc Ensure that the peekByte methods, on the various Streams, handles end of data correctly (PR 5286 follow-up)
When the end of data has already been reached for the various Streams, the `getByte` methods will return `-1` to signal that to the caller. Note however that the current position obviously won't be incremented in this case, meaning that the `peekByte` methods will in this case *incorrectly* decrement the position.

Thankfully the corresponding `peekBytes` shouldn't be affected by this bug, since they decrement the current position with the *actually* returned number of bytes.

I'm not aware of any bugs caused by this blatant oversight, but that doesn't mean this shouldn't be fixed :-)
2019-11-01 18:22:33 +01:00
Jonas Jenwald
835d8c2be5 Allow skipping of errors when parsing broken/unsupported ColorSpaces (issue 6707, issue 11287)
This will allow us to attempt to recover as much as possible of a page, rather than immediately failing, when a broken/unsupported ColorSpace is encountered. This patch thus extends the framework added in PRs such as e.g. 8240 and 8922, to also cover parsing of ColorSpaces.
2019-11-01 09:01:24 +01:00
Tim van der Meij
30ef05c161
Merge pull request #11290 from Snuffleupagus/MessageHandler-rm-in
[MessageHandler] Re-factor and convert the code to a proper `class`
2019-10-31 23:57:52 +01:00
Jonas Jenwald
eedd449cb4 Remove some unused require statements, used when loading fake workers, in non-PRODUCTION mode
The code in question is *only* relevant in non-`PRODUCTION` mode, i.e. the *development* version of the viewer run with `gulp server`, and has been completely unused at least since SystemJS was added.
I really cannot see any reason to keep this, since it's code which first of all isn't shipping and secondly isn't even being used in the development viewer.
2019-10-31 12:08:07 +01:00
Jonas Jenwald
0293222b96 [MessageHandler] Convert the code to a proper class 2019-10-30 23:22:59 +01:00
Jonas Jenwald
5d5733c0a7 [MessageHandler] Convert all instances of var to const in the code 2019-10-30 23:22:59 +01:00
Jonas Jenwald
f61fb3e0f9 [MessageHandler] Re-factor the _onComObjOnMessage function to use early returns
When `ReadableStream` support was added to the `MessageHandler`, the `_onComObjOnMessage` function became more complex than previously.
All of the nested `if`/`else if`/`else` branches are now, at least in my opinion, making some of this code a bit difficult to follow. Hence this patch, which attempts to help readability by making use of early `return`s and `Error`s.

The patch also changes a couple of `var`/`let` occurences to `const`.
2019-10-30 23:22:59 +01:00
Jonas Jenwald
62f28e11a3 [MessageHandler] Remove unnecessary usage of in from the code
Note that using `in` leads to unnecessary stringification of the properties, which seems completely unnecessary here. To avoid future problems from these changes the `MessageHandler.on` method will now assert, in non-`PRODUCTION`/`TESTING` builds, that it's always called with a function as expected.

This patch also renames `callbacksCapabilities` to `callbackCapabilities`, note the removed "s", since using a double plural format looks a bit strange.
2019-10-30 23:22:59 +01:00
Jonas Jenwald
3e46e800a0 [MessageHandler] Replace the internal isReply property, as sent when Promise callbacks are used, with enumeration values
Given that the `isReply` property is an internal implementation detail, changing its type shouldn't be a problem. Note that by directly indicating if either data or an Error is sent, it's no longer necessary to use `in` when handling the callback.
2019-10-30 23:22:59 +01:00
Jonas Jenwald
2d35a49dd8 Inline a couple of isRef/isDict checks in the ObjectLoader code
As we've seen in numerous other cases, avoiding unnecessary function calls is never a bad thing (even if the effect is probably tiny here).
2019-10-29 23:20:10 +01:00
Jonas Jenwald
1133dbac33 Make the ObjectLoader use more efficient methods when determining if data needs to be loaded
Currently, for data in `ChunkedStream` instances, the `getMissingChunks` method is used in a couple of places to determine if data is already available or if it needs to be loaded.

When looking at how `ChunkedStream.getMissingChunks` is being used in the `ObjectLoader` you'll notice that we don't actually care about which *specific* chunks are missing, but rather only want essentially a yes/no answer to the "Is the data available?" question.
Furthermore, when looking at how `ChunkedStream.getMissingChunks` itself is implemented you'll notice that it (somewhat expectedly) always iterates over *all* chunks.

All in all, using `ChunkedStream.getMissingChunks` in the `ObjectLoader` seems like an unnecessary "heavy" and roundabout way to obtain a boolean value. However, it turns out there already exists a `ChunkedStream.allChunksLoaded` method, consisting of a *single* simple check, which seems like a perfect fit for the `ObjectLoader` use cases.
In particular, once the *entire* PDF document has been loaded (which is usually fairly quick with streaming enabled), you'd really want the `ObjectLoader` to be as simple/quick as possible (similar to e.g. loading a local files) which this patch should help with.

Note that I wouldn't expect this patch to have a huge effect on performance, but it will nonetheless save some CPU/memory resources when the `ObjectLoader` is used. (As usual this should help larger PDF documents, w.r.t. both file size and number of pages, the most.)
2019-10-29 23:20:09 +01:00
Jonas Jenwald
0496ea61f5 Ensure that PartialEvaluator.hasBlendModes handles Blend Modes in Arrays (PR 11281 follow-up)
I completely overlooked this in PR 11281, but you obviously need to make similar changes in `PartialEvaluator.hasBlendModes` since it will otherwise ignore valid Blend Modes.
2019-10-28 11:37:05 +01:00
Jonas Jenwald
5c266f0e8c Support Blend Modes which are specified in an Array of Names (issue 11279)
According to the specification, the first *supported* Blend Mode should be choosen in this case; please see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G10.4848607
2019-10-26 14:24:31 +02:00
Tim van der Meij
4a5a4328f4
Merge pull request #11273 from Snuffleupagus/getViewport-offsets
[api-minor] Support custom `offsetX`/`offsetY` values in `PDFPageProxy.getViewport` and `PageViewport.clone`
2019-10-24 00:08:40 +02:00
Jonas Jenwald
681bc9d70e [api-minor] Support custom offsetX/offsetY values in PDFPageProxy.getViewport and PageViewport.clone
There's no good reason, as far as I can tell, to not also support `offsetX`/`offsetY` in addition to e.g. `dontFlip`.
2019-10-23 20:48:14 +02:00
Jonas Jenwald
6f7f8257bc Slightly re-factor the String handling in StatTimer
This uses template strings in a couple of spots, and a buffer in the `toString` method.
2019-10-23 14:45:18 +02:00
Jonas Jenwald
8e5d3836d6 Remove the enable argument from the StatTimer constructor
This argument is a left-over from older API code, where we unconditionally initialized `StatTimer` instances for every page. For quite some time that's only been done when `pdfBug` is set, hence it seems unnecessary to keep this functionality.
2019-10-23 14:45:18 +02:00
Jonas Jenwald
9fc40f8b84 Remove DummyStatTimer since it's unused now
Since this isn't part of the API surface, removing it now that it's unused shouldn't cause any problems.
2019-10-23 14:45:16 +02:00
Jonas Jenwald
860da8b840 Stop using the DummyStatTimer in the API, and check if this._stats exists when trying to report statistics
Even though the currect situation only results in six unnecessary function calls per page, it nonetheless seems completely unnecessary to call dummy functions when `pdfBug` is *not* set (i.e. the default behaviour).
2019-10-23 13:23:41 +02:00
Jonas Jenwald
df0e1edab5 Re-factor sending of various Exceptions from the worker to the API
As can be seen in the API, there's a number of document loading Exception handlers which are both really simple and highly similar. Hence these are changed such that all the relevant Exceptions are sent via *one* message instead.

Furthermore, the patch also avoids unnecessarily re-creating `UnknownErrorException`s at the worker side and removes an unnecessary `bind` call.
2019-10-19 12:54:54 +02:00
Tim van der Meij
11f3851a97
Merge pull request #11243 from Snuffleupagus/issue-11242
Add a fallback for non-embedded *composite* Verdana fonts (issue 11242)
2019-10-18 23:56:46 +02:00
Tim van der Meij
c54bb222ca
Merge pull request #11231 from Snuffleupagus/indexObjects-entries-gen
Allow over-writing entries, in `XRef.indexObjects`, only when the generation number matches (issues 11230, 11139, 9552, 9129, 7303)
2019-10-17 23:56:26 +02:00
Jonas Jenwald
2fcb5afc7b Add a fallback for non-embedded *composite* Verdana fonts (issue 11242)
Obviously this won't look exactly right, but considering that the PDF file doesn't bother embedding non-standard fonts this is the best that we can do here.
2019-10-17 17:00:55 +02:00
Pedro Luiz Cabral Salomon Prado
4d0c759b7f Change variable assignment (#11247)
Remove unused variable assignment in `src/core/fonts.js`
2019-10-16 00:39:25 +02:00
Jonas Jenwald
ffc847eaa5 Allow over-writing entries, in XRef.indexObjects, only when the generation number matches (issues 11230, 11139, 9552, 9129, 7303)
This patch is making me somewhat worried about future regressions, since it's certainly easy to imagine this completely breaking certain kinds of corrupt/edited PDF documents while fixing others.[1]

Obviously it passes all existing reference tests (and even improves one), however compared to many other patches there's no telling how much it could break.
The only reason that I'm even submitting this patch, is because of the number of open issues that it would address.

Generally speaking though, the best course of action would probably be if `XRef.indexObjects` was re-written to be much more robust (since it currently feels somewhat hand-wavy in parts). E.g. by actually checking/validating more of the objects before committing to them.

---
[1] Especially given that it's reverting part of PR 5910, however in the case of issue 5909 it seems that other (more recent) changes have actually made that PR redundant.
2019-10-14 22:10:04 +02:00
Tim van der Meij
ec6a99d781
Bundle all API documentation in a module
This commit allows JSDoc to generate all API documentation in the
`pdfjsLib` module (namespace) so the documentation becomes easier to
navigate.
2019-10-13 21:23:00 +02:00
Tim van der Meij
9f4d45ddf4
Don't include private methods in the the PDFPageProxy API documentation 2019-10-13 21:23:00 +02:00
Tim van der Meij
36c01c2c2a
Deduplicate the documentation for PDFDocumentLoadingTask and PDFWorker
Both classes live inside a closure with the same name, which confuses
JSDoc. Move the documentation to the inner class to deduplicate them.
2019-10-13 21:23:00 +02:00
Tim van der Meij
ca3a58f93a
Consistently use @returns for returned data types in JSDoc comments
Sometimes we also used `@return`, but `@returns` is what the JSDoc
documentation recommends. Even though `@return` works as an alias, it's
good to use the recommended syntax and to be consistent within the
project.
2019-10-13 13:58:17 +02:00
Tim van der Meij
8b4ae6f3eb
Consistently use @type for getter data types in JSDoc comments
Sometimes we also used `@return` or `@returns`, but `@type` is what
the JSDoc documentation recommends. This also improves the documentation
because before this commit the types were not shown and now they are.
2019-10-13 13:58:17 +02:00
Tim van der Meij
f4daafc077
Consistently use square brackets for optional parameters in JSDoc comments
Square brackets are recommended to indicate optional parameters. Using
them helps for automatically generating correct documentation.
2019-10-13 13:58:17 +02:00
Tim van der Meij
efd331daa1
Consistently use string for string data types in JSDoc comments
Sometimes we also used `String`, but `string` is the what the JSDoc
documentation recommends.
2019-10-13 13:58:17 +02:00
Tim van der Meij
e75991b49e
Consistently use number for numeric data types in JSDoc comments
Sometimes we also used `Number` and `integer`, but `number` is what
the JSDoc documentation recommends.
2019-10-13 13:58:13 +02:00
Jonas Jenwald
03387ebaa8 Update src/shared/compatibility.js to only run with SKIP_BABEL = false set
Rather than specifying certain build targets manually, it seems much more appropriate (and future-proof) to use the `SKIP_BABEL` build target instead.

Also, the patch adds a missing `/* eslint no-var: error */` line since I'm touch the file anyway and no code-changes were necessary for it.
2019-10-13 11:33:41 +02:00
Jonas Jenwald
bfcbf2d78d Cache processed 'ExtGState's in PartialEvaluator.hasBlendModes to avoid unnecessary parsing/lookups
This simply extends the already existing caching of processed resources to avoid duplicated parsing of 'ExtGState's, which should help with badly generated PDF documents.

This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf, with the following manifest file:
```
[
    {  "id": "issue6961",
       "file": "../web/pdfs/issue6961.pdf",
       "md5": "",
       "rounds": 200,
       "type": "eq"
    }
]
```

which gave the following *overall* results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   400 |         1063 |        1051 | -12 | -1.17 |        faster
Firefox | Page Request |   400 |          552 |         543 |  -9 | -1.69 |        faster
Firefox | Rendering    |   400 |          511 |         508 |  -3 | -0.61 |
```

and the following *page-specific* results:
```
-- Grouped By page, stat --
page | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
---- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
0    | Overall      |   200 |         1122 |        1110 | -12 | -1.03 |
0    | Page Request |   200 |          552 |         544 |  -8 | -1.48 |        faster
0    | Rendering    |   200 |          570 |         566 |  -4 | -0.62 |
1    | Overall      |   200 |         1005 |         992 | -13 | -1.33 |        faster
1    | Page Request |   200 |          552 |         542 | -11 | -1.91 |        faster
1    | Rendering    |   200 |          452 |         450 |  -3 | -0.61 |
```
2019-10-12 12:35:42 +02:00
Jonas Jenwald
af71f9b40a Inline all the possible type checks in PartialEvaluator.hasBlendModes to avoid unnecessary function calls
For badly generated PDF documents, with issue 6961 being one example, there's well over one hundred thousand function calls being made in total for just the *two* pages.
2019-10-12 11:24:37 +02:00
huzjakd
94171d9d72 Attempt to fallback to a default font, for non-available ones, in PartialEvaluator.loadFont
This handles the two different ways that fonts can be loaded, either by Name (which is the common case) or by Reference.
Furthermore, this also takes the `ignoreErrors` option into account when deciding whether to fallback or Error.
Finally, by creating a minimal but valid Font dictionary, there's no special-cases necessary in any of the font parsing code.

Co-authored-by: huzjakd <huzjakd@gmail.com>
Co-Authored-By: Jonas Jenwald <jonas.jenwald@gmail.com>
2019-10-10 16:49:46 +02:00
Jonas Jenwald
ea729ec55c [api-minor] Replace all deprecated calls with throwing of actual Errors
All of these methods have been marked as `deprecated` in *three* releases now, and I'd thus like to (slowly) move towards complete removal.

However rather than just removing the methods right away, which would cause somewhat cryptic failures, this patch tries to implement a hopefully reasonable middle ground by throwing `Error`s with (essentially) the same information as the previous warnings.

While the previous `deprecated` messages could perhaps be seen as optional, with these changes API consumers will now be forced to actually migrate their code.
2019-10-09 09:21:15 +02:00
Takashi Tamura
d5ee083050 * use square brackets for optional properties in the JSDoc comments of src/display/api.js 2019-10-08 20:34:17 +09:00
Tim van der Meij
cead77ef3a
Merge pull request #11186 from Snuffleupagus/issue-9655
Improve the heuristics, in `PartialEvaluator._buildSimpleFontToUnicode`, for glyphNames of the Cdd{d}/cdd{d} format (issue 9655)
2019-10-06 19:50:43 +02:00
Jonas Jenwald
eabedab38e [MessageHandler] Add a non-PRODUCTION/TESTING check to ensure that wrapReason is called with a valid reason
There shouldn't be any situation where `reason` isn't either an `Error`, or a cloned "Error" sent via `postMessage`.
2019-10-06 14:15:13 +02:00
Jonas Jenwald
9201c8dad4 [MessageHandler] Convert the deleteStreamController helper function to a "private" method instead 2019-10-06 14:15:02 +02:00
Jonas Jenwald
f5be2d62a3 Improve the heuristics, in PartialEvaluator._buildSimpleFontToUnicode, for glyphNames of the Cdd{d}/cdd{d} format (issue 9655)
*Please note:* I've been thinking about possible ways of addressing this issue for a while now, but all of the solutions I came up with became too complicated and thus hurt readability of the code.
However, it occured to me that we're essentially trying to add a heuristic *on top* of another heuristic, and that it shouldn't matter how efficient the code is as long as it works.

In the PDF file in the issue the Encoding contains glyphNames of the `Cdd` format, which our existing heuristics will treat as base 10 values. However, in this particular file they actually contain base 16 values, which we thus attempt to detect and fix such that text-selection works.
2019-10-06 10:47:29 +02:00
Jonas Jenwald
572abdcb4a Convert the various image decoder ...Errors to classes extending BaseException (PR 11185 follow-up)
Somehow I missed these in PR 11185, but there's no good reason not to convert them as well.
2019-10-01 13:10:14 +02:00
Tim van der Meij
8c4f4b5eec
Merge pull request #11182 from Snuffleupagus/disableWorker-disable-Dict-postMessage
Forbid sending of `Dict`s and `Stream`s, with `postMessage`, when workers are disabled
2019-09-29 15:09:42 +02:00
Jonas Jenwald
5d93fda4f2 Convert the various ...Exceptions to proper classes, to reduce code duplication
By utilizing a base "class", things become significantly simpler. Unfortunately the new `BaseException` cannot be a proper ES6 class and just extend `Error`, since the SystemJS dependency doesn't seem to play well with that.
Note also that we (generally) need to keep the `name` property on the actual `...Exception` object, rather than on its prototype, since the property will otherwise be dropped during the structured cloning used with `postMessage`.
2019-09-29 10:16:20 +02:00
Jonas Jenwald
3f8fee371b Forbid sending of Dicts and Streams, with postMessage, when workers are disabled
By default, i.e. with workers enabled, it's *purposely* not possible to send `Dict`s and `Stream`s from the worker-thread. This is achieved by defining a `function` on every `Dict` instance, since that ensures that [the structured clone algoritm](https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm) will throw an Error on `postMessage`.

However, with workers *disabled* we fall-back to the `LoopbackPort` implementation which just ignores any `function`s, thus incorrectly allowing sending of data which *should* be unclonable.
2019-09-26 16:16:13 +02:00
Tim van der Meij
cd909c531f
Merge pull request #11169 from Snuffleupagus/Dict-inline-Ref-checks
Reduce the number of function calls in the `Dict` class
2019-09-24 23:33:37 +02:00
Tim van der Meij
f762d59ad2
Merge pull request #11173 from Snuffleupagus/ReadableStream-polyfill
Replace the bundled `ReadableStream` polyfill with the `web-streams-polyfill` npm package (issue 11157)
2019-09-24 23:22:17 +02:00
Jonas Jenwald
2cac68467f Reduce the number of function calls in the Dict class
The following changes were made:
 - Remove unnecessary `typeof` checks in the `get`/`getAsync` methods.
 - Reduce unnecessary code duplication in the `get`/`getAsync` methods.
 - Inline the `Ref` checks in the `get`/`getAsync`/`getArray` methods, since it helps avoid many unnecessary functions calls. I.e. this way it's possible to directly call `XRef.{fetch, fetchAsync)` only when necessary, rather than always having to call `XRef.{fetchIfRef, fetchIfRefAsync)`.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, using the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 250,
       "type": "eq"
    }
]
```
This gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   250 |         2821 |        2790 | -32 | -1.12 |        faster
Firefox | Page Request |   250 |            2 |           2 |   0 |  6.68 |
Firefox | Rendering    |   250 |         2820 |        2788 | -32 | -1.13 |        faster
```
2019-09-24 08:31:39 +02:00
Jonas Jenwald
0ee373f9cc Replace the bundled ReadableStream polyfill with the web-streams-polyfill npm package (issue 11157)
Compared to the recently replaced `URL` polyfill, the new `ReadableStream` polyfill isn't being exported globally for two reasons:
 - We're currently checking for the existence of a global `ReadableStream` implementation when determining if the Fetch API will be used; please see `isFetchSupported` in the src/display/display_utils.js file.
 - Given that it's much newer functionality (compared to `URL`) and that not all browsers may implement all parts of the specification yet, not exposing the `ReadableStream` globally seems safer for now.
2019-09-23 22:16:59 +02:00
Jonas Jenwald
7f18c57c12 Fix the inconsistent return types for Dict.{get, getAsync}
Having these methods fallback to returning `null` in only *one* particular case seems outright wrong, since a "falsy" value will thus be handled incorrectly.
The only reason that this hasn't caused issues in practice is that there's only one call-site passing in three keys, and in that case we're trying to read a font file where falling back to `null` isn't a problem.
2019-09-23 11:41:19 +02:00
Tim van der Meij
1f5ebfbf0c
Replace our URL polyfill with the one from core-js
`core-js` polyfills have proven to be of good quality and using them
prevents us from having to maintain them ourselves.
2019-09-19 14:09:51 +02:00
Tim van der Meij
c71a291317
Upgrade core-js to version 3.2.1
This only required changing the import paths. The `es` folder contains
all polyfills we need now. If we want to import everything, we need to
explicitly require the `index` file.
2019-09-19 13:58:36 +02:00
Tim van der Meij
3da680cdfc
Merge pull request #11158 from janpe2/gradient-stops
Avoid floating point inaccuracy in gradient color stops
2019-09-19 13:15:11 +02:00
Tim van der Meij
58e5f36666
Merge pull request #11159 from Snuffleupagus/issue-11150
For Type1 fonts, replace missing font dictionary /Widths entries with ones from the font data (issue 11150)
2019-09-19 13:14:27 +02:00
Jonas Jenwald
af22dc9b0c For Type1 fonts, replace missing font dictionary /Widths entries with ones from the font data (issue 11150)
Hopefully this patch makes sense, and in order to reduce the regression risk the implementation ensures that only completely missing widths are being replaced.
2019-09-18 10:15:09 +02:00
Jani Pehkonen
911df237f3 Avoid floating point inaccuracy in gradient color stops 2019-09-17 21:01:17 +03:00
Jonas Jenwald
4bd79ec4b3 Inline the resolveOrReject helper function at its call-sites in MessageHandler, and rename an error key to reason
Given that there's only a couple of call-sites, and that the helper function is really simple, it doesn't seem entirely necessary to keep it around. While fewer function calls is always a good thing, in this case the performance impact is small enough to be unmeasurable.

With *one* single exception the code in `MessageHandler` is using `reason` when passing around various Errors, hence this patch also renames an `error` key for consistency.
2019-09-17 14:22:24 +02:00
Jonas Jenwald
0617984b59 Remove unnecessary data.streamId accesses in MessageHandler._processStreamMessage, and use a constant object shape in MessageHandler.sendWithStream
The `streamId` short-hand in `MessageHandler._processStreamMessage` was only used partially througout the method, which seemed kind of strange, hence that's fixed in this patch.
Furthermore, always giving the `streamController` object a constant shape in `MessageHandler.sendWithStream` cannot hurt either.
2019-09-17 14:18:57 +02:00
Jonas Jenwald
281ed33e43 Abort, with a small delay, getOperatorList on the worker-thread when rendering is cancelled (PR 11069 follow-up)
With this patch we're finally able to abort worker-thread parsing of the `OperatorList`, rather than *only* aborting the main-thread rendering itself, when the `RenderTask.cancel` method is being called.
This will help improve perceived performance in the default viewer, especially when reading longer and more complex documents, since pages that've been scrolled out-of-view (and thus evicted from the cache) will no longer compete for parsing resources on the worker-thread.

*Please note:* With the implementation in this patch we're *not* aborting worker-thread parsing immediately on `RenderTask.cancel`, since that would lead to *worse* performance in many cases. For example: When zoom/rotation occurs in the viewer, while parsing/rendering is still ongoing, a `cancel` call will usually be (almost) immediately folled by a new `PDFPageProxy.render` call. In that case you obviously don't want to abort parsing on the worker-thread, since that would risk throwing away a partially parsed `OperatorList` and thus force unnecessary re-parsing which will regress perceived performance (especially for more complex documents).

When choosing a reasonable delay, before cancelling `getOperatorList` on the worker-thread when `RenderTask.cancel` is called, two different positions need to be considered:
 1. The delay needs to be short enough, since a timeout in the multiple seconds range would essentially make this entire functionality meaningless (by always allowing most/all pages enough time to finish parsing).

 2. The delay cannot be *too* short, since that would actually *reduce* performance in the zoom/rotation case outlined above. Furthermore, the time between `RenderTask.cancel` and `PDFPageProxy.render` calls will obviously be affected by both general computer performance and current CPU load.

It's certainly possible that the timeout may require some further tweaks, however the value settled on in this patch was easily *one order* of magnitude larger than the delta between cancel/render in my tests.
2019-09-14 11:30:32 +02:00
Jonas Jenwald
00efff532c Ensure that addLinkAttributes is always called with a valid url parameter
There's no good reason for calling this helper function without a `url` parameter, and this way we can prevent that from happening.
Note how the `PDFOutlineViewer` call-site was already doing the right thing here, and only the `LinkAnnotationElement` call-site needed a small adjustment to make it work.
2019-09-11 13:24:04 +02:00
Jonas Jenwald
12e1c91f73 Don't enqueue unused properties when sending 'GetOperatorList' data from the worker-thread (PR 11069 follow-up)
With the changes made in PR 11069, it's no longer necessary to include the `pageIndex`/`intent` parameters when sending 'GetOperatorList' data. In the previous implementation these properties were used to associate the `OperatorList` with the correct `RenderTask`, however now that `ReadableStream`s are used that's handled automatically and it's thus dead code at this point.
2019-09-09 17:41:26 +02:00
Tim van der Meij
37d5b80ba8
Merge pull request #11118 from Snuffleupagus/FetchBuiltInCMap-sendWithStream
Transfer, rather than copy, CMap data to the worker-thread
2019-09-06 22:56:14 +02:00
Jonas Jenwald
7dea3f9389 [api-minor] Remove the postMessageTransfers parameter, and thus the ability to manually disable transferring of data, from the API
By transfering, rather than copying, `ArrayBuffer`s between the main- and worker-threads, you can avoid unnecessary allocations by only having *one* copy of the same data.
Hence manually setting `postMessageTransfers: false`, when calling `getDocument`, is a performance footgun[1] which will do nothing but waste memory.

Given that every reasonably modern browser supports `postMessage` transfers[2], I really don't see why it should be possible to force-disable this functionality.
Looking at the browser support, for `postMessage` transfers[2], it's highly unlikely that PDF.js is even usable in browsers without it. However, the feature testing of `postMessage` transfers is kept for the time being just to err on the safe side.

---
[1] This is somewhat similar to the, now removed, `disableWorker` parameter which also provided API users a much too simple way of reducing performance.

[2] See e.g. https://developer.mozilla.org/en-US/docs/Web/API/MessagePort/postMessage#Browser_compatibility and https://developer.mozilla.org/en-US/docs/Web/API/Transferable#Browser_compatibility
2019-09-05 13:09:54 +02:00
Jonas Jenwald
f0534b9b51 Adjust the values sent, with the 'test' message, by the WorkerMessageHandler.setup method
Note how the sent values have inconsistent types, with a boolean in one case and an object in the other (normal) case.
Furthermore, explicitly sending a `supportTypedArray: true` property seems superfluous at least to me.
2019-09-05 11:27:27 +02:00
Jonas Jenwald
7212ff4eea Stop checking for the response property, on XMLHttpRequest, when setting up the WorkerMessageHandler
This check was added in PR 2445, however it's no longer necessary since all data[1] is now loaded on the main-thread (and then transferred to the worker-thread).
Furthermore, by default the Fetch API is now (usually) used rather than `XMLHttpRequest`.

All in all, while these checks *were* necessary at one point that's no longer the case and they can thus be removed.

---
[1] This includes both the actual PDF data, as well as the CMap data.
2019-09-05 11:27:22 +02:00
Jonas Jenwald
f11a4ba750 Transfer, rather than copy, CMap data to the worker-thread
It recently occurred to me that the CMap data should be an excellent candidate for transfering.
This will help reduce peak memory usage for PDF documents using CMaps, since transfering of data avoids duplicating it on both the main- and worker-threads.

Unfortunately it's not possible to actually transfer data when *returning* data through `sendWithPromise`, and another solution had to be used.
Initially I looked at using one message for requesting the data, and another message for returning the actual CMap data. While that should have worked, it would have meant adding a lot more complexity particularly on the worker-thread.
Hence the simplest solution, at least in my opinion, is to utilize `sendWithStream` since that makes it *really* easy to transfer the CMap data. (This required PR 11115 to land first, since otherwise CMap fetch errors won't propagate correctly to the worker-thread.)

Please note that the patch *purposely* only changes the API to Worker communication, and not the API *itself* since changing the interface of `CMapReaderFactory` would be a breaking change.
Furthermore, given the relatively small size of the `.bcmap` files (the largest one is smaller than the default range-request size) streaming doesn't really seem necessary either.
2019-09-04 11:46:04 +02:00
Jonas Jenwald
74f5a59f43 Ensure that the cancel/error methods on Streams are always called with valid reason arguments 2019-09-02 23:31:07 +02:00
Jonas Jenwald
02bdacef42 Ensure that Errors are handled correctly when using postMessage with Streams in MessageHandler
Having recently worked with this code, it struck me that most of the `postMessage` calls where `Error`s are involved have never been correctly implemented (i.e. missing `wrapReason` calls).
2019-09-02 23:31:07 +02:00
Tim van der Meij
e59b11860d
Merge pull request #11108 from timvandermeij/es6-annotations
Use more ES6 syntax in the annotation code
2019-09-02 23:13:24 +02:00
Tim van der Meij
2866c8a39e
Use more ES6 syntax in src/core/annotation.js
`let` is converted to `const` where possible.
2019-09-02 22:37:27 +02:00
Tim van der Meij
c37a2c0408
Merge pull request #11112 from Snuffleupagus/TESTING-rm-version-warn
Remove the API/Worker version warning message in `TESTING` mode
2019-09-02 22:22:33 +02:00
Jonas Jenwald
229f6f34d1 Remove the API/Worker version warning message in TESTING mode
The warning messages turn out to be more annoying than helpful when looking at the `console` during tests, so let's just remove them.
2019-09-01 16:47:26 +02:00
Jonas Jenwald
cd82b81bc7 Inline the resolveCall helper function at its call-sites in MessageHandler
There's only three call-sites and one of them doesn't even need the complete functionality of `resolveCall`, hence it seems reasonable to just inline this code.
An additional benefit of this is that the `Function.prototype.apply()` instance can also be converted into "normal" function calls, which should be a tiny bit more efficient.

The patch also replaces a number of unnecessary arrow functions, in relevant parts of the `MessageHandler` code, with "normal" functions instead.
Finally, all `Promise.resolve().then(...)` calls are replaced with `new Promise(...)` instead since the latter is a tiny bit more efficient. This also explains the test failures on the Linux bot, with a prior version of the patch, since the `Promise.resolve().then(...)` format essentially creates two Promises thus causing additional delay.
2019-09-01 13:40:19 +02:00
Jonas Jenwald
055f03938b Remove support for the scope parameter in the MessageHandler.on method
At this point in time it's easy to convert the `MessageHandler.on` call-sites to use arrow functions, and thus let the JavaScript engine handle scopes for us, rather than having to manually keep references to the relevant scopes in `MessageHandler`.[1]
An additional benefit of this is that a couple of `Function.prototype.call()` instances can now be converted into "normal" function calls, which should be a tiny bit more efficient.

All in all, I don't see any compelling reason why it'd be necessary to keep supporting custom `scope`s in the `MessageHandler` implementation.

---
[1] In the event that a custom scope is ever needed, simply using `bind` on the handler function when calling `MessageHandler.on` ought to work as well.
2019-09-01 09:24:15 +02:00
Tim van der Meij
49018482dc
Use more ES6 syntax in src/display/annotation_layer.js
`let` is converted to `const` where possible, `var` usage is disabled
and template strings are used where possible.
2019-08-31 16:40:39 +02:00
Jonas Jenwald
f71ea2de0e Remove the makeReasonSerializable helper function, and use wrapReason instead, in src/shared/message_handler.js
Since `wrapReason` and `makeReasonSerializable` are essentially functionally equivalent it doesn't seem necessary to keep both of them around, especially when `makeReasonSerializable` only has a *single* call-site.
2019-08-30 19:36:10 +02:00
Jonas Jenwald
4e6a9b54c7 Change the internal stream property, as sent when Streams are used, from a String to a Number
Given that the `stream` property is an internal implementation detail, changing its type shouldn't be a problem. By using Numbers instead, we can avoid unnecessary String allocations when creating/processing Streams.
2019-08-30 13:27:18 +02:00
Jonas Jenwald
252a3e35fb Reduce the amount of unnecessary function calls and object allocations, in MessageHandler, when using Streams
With PR 11069 we're now using Streams for OperatorList parsing (in addition to just TextContent parsing), which brings the nice benefit of being able to easily abort parsing on the worker-thread thus saving resources.

However, since we're now creating many more `ReadableStream` there appears to be a tiny bit more overhead because of it (giving ~1% slower runtime of `browsertest` on the bots). In this case we're just going to have to accept such a small regression, since the benefits of using Streams clearly outweighs it.

What we *can* do here, is to try and make the Streams part of the `MessageHandler` implementation slightly more efficient by e.g. removing unnecessary function calls (which has been helpful in other parts of the code-base). To that end, this patch makes the following changes:

 - Actually support `transfers` in `MessageHandler.sendWithStream`, since the parameter was being ignored.

 - Inline the `sendStreamRequest`/`sendStreamResponse` helper functions at their respective call-sites. Obviously this causes some amount of code duplication, however I still think this change seems reasonable since for each call-site:
   - It avoids making one unnecessary function call.
   - It avoids allocating one temporary object.
   - It avoids sending, and thus structure clone, various undefined object properties.

 - Inline objects in the `MessageHandler.{send, sendWithPromise}` methods.

 - Finally, directly call `comObj.postMessage` in various methods when `transfers` are *not* present, rather than calling `MessageHandler.postMessage`, to further reduce the amount of function calls.
2019-08-30 12:32:20 +02:00
Jonas Jenwald
ae0d9e8c2a Replace some instances of implicit function.bind(this) usage, in src/display/api.js, with arrow functions instead 2019-08-30 11:35:05 +02:00
Jonas Jenwald
667e548e5f [TextLayer] Remove setAttribute usage in appendText (issue 8066)
One of the motivations for using `setAttribute` in the first place was to support more efficient DOM updates in the `expandTextDivs` method, since performance of the `enhanceTextSelection` mode can be somewhat bad when there's a lot of `textDivs` on the page.

With recent `TextLayer` changes/optimizations it's no longer necessary to store a complete `style`-string for every `textDiv`, and we can thus re-visit the `setAttribute` usage.
Note that with the current code, in `appendText`, there's only *one* string per `textDiv` which avoids a bunch of temporary strings. While the changes in this patch means that there's now *three* strings per `textDiv` instead, the total length of these strings are now quite a bit shorter (42 characters to be exact).
2019-08-28 16:52:09 +02:00
Jonas Jenwald
106b239c5d [TextLayer] Avoid unnecessary font updates in _layoutText (PR 11097 follow-up)
*This should obviously have been done in PR 11097, but for some reason I completely overlooked it; sorry about that.*

There's no good reason to update the font unless you're actually going to measure the width of the textContent. This can reduce unnecessary font switching a fair bit, even for documents which are somewhat simple/short (in e.g. the `tracemonkey.pdf` file this cuts the amount of font switches almost in half).
2019-08-28 16:08:06 +02:00
Jonas Jenwald
a1398048e5 [TextLayer] Simplify building of the *expanded* transform in expandTextDivs
Rather than essentially re-computing the `originalTransform` every time, we can simply use it directly instead.
2019-08-25 13:09:04 +02:00
Jonas Jenwald
b68f7bb404 [TextLayer] Only measure the width of the text, in _layoutText, for multi-char text divs
For performance reasons single-char text divs aren't being scaled, as outlined in a comment in `appendText`. Hence it doesn't seem necessary, or even a good idea, to unconditionally measuring the width of the text in `_layoutText`.
2019-08-25 12:32:49 +02:00
Jonas Jenwald
711040ecc5 Stop re-throwing errors in the 'GetOperatorList' and 'GetTextContent' handlers, in src/core/worker.js
These functions aren't returning anything, now that they're using `ReadableStream`s, and it thus doesn't seem necessary to re-throw errors (also given the console message that's caused by it).
2019-08-24 15:56:41 +02:00
Yury Delendik
66e0dd1b06 Use streams for OperatorList chunking (issue 10023)
*Please note:* The majority of this patch was written by Yury, and it's simply been rebased and slightly extended to prevent issues when dealing with `RenderingCancelledException`.

By leveraging streams this (finally) provides a simple way in which parsing can be aborted on the worker-thread, which will ultimately help save resources.
With this patch worker-thread parsing will *only* be aborted when the document is destroyed, and not when rendering is cancelled. There's a couple of reasons for this:

 - The API currently expects the *entire* OperatorList to be extracted, or an Error to occur, once it's been started. Hence additional re-factoring/re-writing of the API code will be necessary to properly support cancelling and re-starting of OperatorList parsing in cases where the `lastChunk` hasn't yet been seen.
 - Even with the above addressed, immediately cancelling when encountering a `RenderingCancelledException` will lead to worse performance in e.g. the default viewer. When zooming and/or rotation of the document occurs it's very likely that `cancel` will be (almost) immediately followed by a new `render` call. In that case you'd obviously *not* want to abort parsing on the worker-thread, since then you'd risk throwing away a partially parsed Page and thus be forced to re-parse it again which will regress perceived performance.
 - This patch is already *somewhat* risky, given that it touches fundamentally important/critical code, and trying to keep it somewhat small should hopefully reduce the risk of regressions (and simplify reviewing as well).

Time permitting, once this has landed and been in Nightly for awhile, I'll try to work on the remaining points outlined above.

Co-Authored-By: Yury Delendik <ydelendik@mozilla.com>
Co-Authored-By: Jonas Jenwald <jonas.jenwald@gmail.com>
2019-08-24 15:56:40 +02:00
Jonas Jenwald
29a2516e4c [TextLayer] Use an Array to build the total padding, rather than concatenating Strings, in expandTextDivs
Furthermore, it's possible to re-use the same Array for all `textDiv`s on the page and the resulting padding string also becomes a lot more compact.
Please note that the `paddingLeft` branch was moved, since the padding values need to be ordered as `top, right, bottom, left`.

Finally, with this re-factoring it's no longer necessary to cache the original `style` string for every `textDiv` when `enhanceTextSelection` is enabled.
2019-08-24 01:13:59 +02:00
Tim van der Meij
edbebb8bf7
Merge pull request #11090 from Snuffleupagus/textLayer-expandTextDivs-transform
[TextLayer] Use an Array to build the total `transform`, rather than concatenating Strings, in `expandTextDivs`
2019-08-23 23:12:42 +02:00
Jonas Jenwald
932fcacff8 [TextLayer] Only handle positive padding values in expandTextDivs
Given that browsers will reject padding values smaller than zero (which may be caused by limited numerical precision during calculations in the `expand` code), it makes no sense to include those when expanding the `textDiv`s.
2019-08-23 13:16:20 +02:00
Jonas Jenwald
37e8a8189b [TextLayer] Use an Array to build the total transform, rather than concatenating Strings, in expandTextDivs
Furthermore, it's possible to re-use the same Array for all `textDiv`s on the page.
2019-08-23 12:17:12 +02:00
Tim van der Meij
490deb1b65
Merge pull request #11086 from Snuffleupagus/textLayer-originalTransform
[TextLayer] Only cache the `originalTransform` when `enhanceTextSelection` is enabled
2019-08-22 23:09:07 +02:00
Brendan Dahl
31f319301d
Merge pull request #11087 from brendandahl/disable-links
Add a way to disable external links.
2019-08-22 11:13:11 -07:00
Jonas Jenwald
a519ceffee [TextLayer] Use template strings when updating the font property in the _layoutText method 2019-08-22 14:47:44 +02:00
Jonas Jenwald
6afe3221b7 [TextLayer] Only cache the originalTransform when enhanceTextSelection is enabled
Given that this is completely unused in "regular" text-selection mode, there's no reason to unconditionally store one string for every `textDiv`.
2019-08-22 14:47:18 +02:00
Brendan Dahl
98e989116c Add a way to disable external links. 2019-08-21 11:20:41 -07:00
Jonas Jenwald
431a264126 [TextLayer] Reduce the amount of intermediary strings in expandTextDivs
By using template strings, we can avoid some unnecessary string allocations (which is also helped by shortening a variable name).
2019-08-19 12:09:18 +02:00
Jonas Jenwald
45dfad8640 [TextLayer] Only cache the current textDiv style when enhanceTextSelection is enabled
This will help save a little bit of memory, by not storing one unused string for each `textDiv` in regular text-selection mode.
2019-08-19 11:02:56 +02:00
Jonas Jenwald
1cd9a28c81 Replace the XRef.cache Array with a Map instead
Given that the different types of `Stream`s will never be cached, this thus implies that the `XRef.cache` Array will *always* be more-or-less sparse.
Generally speaking, the longer the document the more sparse the `XRef.cache` will thus become. For example, looking at the `pdf.pdf` file from the test-suite: The length of the `XRef.cache` Array will be a few hundred thousand elements, with approximately 95% of them being empty.

Hence it seems pretty clear that an Array isn't really the best data-structure for this kind of cache, and this patch thus changes it to a Map instead.

This patch-series was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 200,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch-series against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   200 |         2736 |        2736 |   1 |  0.02 |
Firefox | Page Request |   200 |            2 |           2 |   0 | -8.26 |        faster
Firefox | Rendering    |   200 |         2733 |        2734 |   1 |  0.03 |
```
2019-08-18 12:07:18 +02:00
Jonas Jenwald
34a53b9f5d Inline the isRef checks in the various XRef.fetch related methods
The relevant methods are usually not hot enough for these changes to have an easily measurable effect, however there's been a lot of other cases where similiar inlining has helped performance. (And these changes may help offset the changes made in the next patch.)
2019-08-18 11:57:48 +02:00
Tim van der Meij
1565d1849d
Merge pull request #11073 from brendandahl/code-point
Move polyfill for codePointAt to String prototype.
2019-08-17 13:26:35 +02:00
Brendan Dahl
c8129b8787 Move polyfill for codePointAt to String prototype.
This method belongs on the prototype not the String object.
2019-08-16 14:32:43 -07:00
Jonas Jenwald
40d3916f31 Reduce the number of temporary variables in the Parser.getObj method
This avoids allocating approximately 1.7 million short-lived variables when loading the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, in the default viewer.
2019-08-16 13:51:41 +02:00
Jonas Jenwald
7728a6630c Inline the isString check in the Parser.getObj method
For very large and complex PDF files this will help performance *slightly*, since `Parser.getObj` is called *a lot* during parsing in the worker.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 200,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   200 |         2847 |        2830 | -17 | -0.60 |        faster
Firefox | Page Request |   200 |            2 |           2 |   0 | -7.14 |
Firefox | Rendering    |   200 |         2844 |        2827 | -17 | -0.60 |        faster
```
2019-08-16 10:34:24 +02:00
Jonas Jenwald
7f456b3e2e Replace of all usages of var with let/const in the src/shared/util.js file
Also removes a couple of unnecessary (temporary) variable assigments in `arraysToBytes` and uses template strings in a few spots.
2019-08-11 14:35:35 +02:00
Jonas Jenwald
f6c4a1f080 Convert Util to a class with static methods
Also replaces `var` with `const` in all the relevant code.
2019-08-11 14:35:35 +02:00
Jonas Jenwald
7ee370a394 Remove the skipEmpty parameter from Util.intersect (PR 11059 follow-up)
Looking at this again, it struck me that added functionality in `Util.intersect` is probably more confusing than helpful in general; sorry about the churn in this code!
Based on the parameter name you'd probably expect it to only match when the intersection is `[0, 0,  0, 0]` and not when only one component is zero, hence the `skipEmpty` parameter thus feels too tightly coupled to the `Page.view` getter.
2019-08-11 14:33:52 +02:00
Tim van der Meij
fbe8c6127c
Merge pull request #11059 from Snuffleupagus/boundingBox-more-validation
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
2019-08-09 22:39:01 +02:00
Jonas Jenwald
d637b25e36 Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".

The patch makes the following notable changes:
 - Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
 - Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
 - Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
 - Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
 - Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.

---

[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-09 10:18:13 +02:00
Jonas Jenwald
0f78fdb229 Handle some corrupt/truncated JPEG images that are missing the EOI (End of Image) marker (issue 11052)
Note that even Adobe Reader cannot render the PDF file completely, which is always a good indication that it's corrupt.
2019-08-08 10:37:41 +02:00
Jonas Jenwald
e9b7996f2f Actually compare the cropBox and mediaBox correctly in the Page.view getter
The current code will only consider the `cropBox` and `mediaBox` as equal when they both point to the *same* underlying Array. In the case where a PDF file actually specifies both boxes independently, with the exact same values in each, the comparison will currently fail and lead to an unneeded intersection computation.
2019-08-07 17:15:57 +02:00
Jonas Jenwald
5ac9c7c384 Support corrupt PDF files with invalid/non-existent Group /CS entries (issue 11045)
The PDF file in question tries to reference a non-existent ColorSpace, which should be quite rare in practice.
2019-08-06 14:33:05 +02:00
Tim van der Meij
be70ee236d
Merge pull request #11013 from timvandermeij/annotations-quadpoints
[api-minor] Implement quadpoints for annotations in the core layer
2019-08-04 16:06:10 +02:00
Jonas Jenwald
0276385e6e [api-minor] Fix completely broken getStats method by returning stats in Objects, rather than in Arrays (PR 11029 follow-up)
With the changes to the `StreamType`/`FontType` "enums" in PR 11029, one unfortunate result is that `getStats` now *always* returns empty Arrays. Something that everyone, myself included, apparently missed is that you obviously cannot index an Array with Strings :-)

I wrongly assumed that the unit-tests would catch any bugs, but they apparently suffered from the same issue as the code in `src/core/`.

Another possible option could perhaps be to use `Set`s, rather than objects, but that will require larger changes since `LoopbackPort` (in `src/display/api.js`) doesn't support them.
2019-08-02 14:09:24 +02:00
Tim van der Meij
9c8fe3142a
Merge pull request #11034 from Snuffleupagus/cancel-with-AbortException
Ensure that `ReadableStream`s are cancelled with actual Errors
2019-08-02 00:18:44 +02:00
Tim van der Meij
e0b38bed3c
Merge pull request #11029 from brendandahl/pdfjs-telemetry-update
[api-minor] Update telemetry to use 'categorical' histograms.
2019-08-02 00:11:02 +02:00
Brendan Dahl
31d71808e7 [api-minor] Update telemetry to use 'categorical' histograms.
Firefox telemetry supports using string labels now. Convert our integers
that we used for categories to just use strings.

The upstream work will happen in:
https://bugzilla.mozilla.org/show_bug.cgi?id=1566882
2019-08-01 09:51:02 -07:00
Jonas Jenwald
a3150166ec Ensure that ReadableStreams are cancelled with actual Errors
There's a number of spots in the current code, and tests, where `cancel` methods are not called with appropriate arguments (leading to Promises not being rejected with Errors as intended).
In some cases the cancel `reason` is implicitly set to `undefined`, and in others the cancel `reason` is just a plain String. To address this inconsistency, the patch changes things such that cancelling is done with `AbortException`s everywhere instead.
2019-08-01 16:40:46 +02:00
Tim van der Meij
d909b86b28
Merge pull request #11020 from Snuffleupagus/issue-11016
Add a work-around, in `glyphlist.js`, for bad PDF generators which use a non-standard `/f_f` string in the `Encoding` dictionary when referring to the ff ligature (issue 11016)
2019-07-31 23:33:34 +02:00
wangsongyan
c61205d980 decode filename when match an urlencode filename from contentDispositionFilename 2019-07-31 09:33:56 +08:00
Jonas Jenwald
9ad50521b1 Add a work-around, in glyphlist.js, for bad PDF generators which use a non-standard /f_f string in the Encoding dictionary when referring to the ff ligature (issue 11016)
This patch will not incur any (measurable) overhead, since the glyphlist is already quite long and one more entry won't really matter, which is important given that this sort of PDF corruption ought to be very rare.

Furthermore, this patch purposely does *not* add a bunch of similarly modified ligature names on pure speculation. Any similar additions, for other ligatures, should only be made if there's real-world examples of PDF files where that's actually necessary.
2019-07-30 17:06:58 +02:00
Jonas Jenwald
38ccb43436 Reduce the number of function calls in EvaluatorPreprocessor.read
For very large and complex PDF files this will help performance slightly, since `EvaluatorPreprocessor.read` is called a lot during parsing in the worker.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, using the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 200,
       "type": "eq"
    }
]
```

This gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   200 |         3402 |        3358 | -43 | -1.28 |        faster
Firefox | Page Request |   200 |            1 |           1 |   0 | 26.71 |
Firefox | Rendering    |   200 |         3401 |        3357 | -44 | -1.28 |        faster
```
2019-07-29 08:43:36 +02:00
Tim van der Meij
9114004d5b
[api-minor] Implement quadpoints for annotations in the core layer 2019-07-28 20:36:21 +02:00
Jonas Jenwald
ff90aa4323 Inline the isCmd check in the Parser.shift method
For very large and complex PDF files this will help performance slightly, since `Parser.shift` is called *a lot* during parsing.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471 (with well over *four million* `Parser.shift` calls for just the one page), using the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 100,
       "type": "eq"
    }
]
```

This gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   100 |         3386 |        3322 | -65 | -1.92 |        faster
Firefox | Page Request |   100 |            1 |           1 |   0 | -8.08 |
Firefox | Rendering    |   100 |         3385 |        3321 | -65 | -1.92 |        faster
```
2019-07-22 12:07:36 +02:00
Jonas Jenwald
b5254f2745 Attempt to significantly reduce the number of ChunkedStream.{ensureByte, ensureRange} calls by inlining the this.progressiveDataLength checks at the call-sites
The number of in particular `ChunkedStream.ensureByte` calls is often absolutely *huge* (on the order of million calls) when loading and rendering even moderately complicated PDF files, which isn't entirely surprising considering that the `getByte`/`getBytes`/`peekByte`/`peekBytes` methods are used for essentially all data reading/parsing.

The idea implemented in this patch is to inline an inverted `progressiveDataLength` check at all of the `ensureByte`/`ensureRange` call-sites, which in practice will often result in *several* orders of magnitude fewer function calls.
Obviously this patch will only help if the browser supports streaming, which all reasonably modern browsers now do (including the Firefox built-in PDF viewer), and assuming that the user didn't set the `disableStream` option (e.g. for using `disableAutoFetch`). However, I think we should be able to improve performance for the default out-of-the-box use case, without worrying about e.g. older browsers (where this patch will thus incur *one* additional check before calling `ensureByte`/`ensureRange`).

This patch was inspired by the *first* commit in PR 5005, which was subsequently backed out in PR 5145 for causing regressions. Since the general idea of avoiding unnecessary function calls was really nice, I figured that re-attempting this in one way or another wouldn't be a bad idea.
Given that streaming is now supported, which it wasn't back then, using `progressiveDataLength` seemed like an easier approach in general since it also allowed supporting both `ensureByte` and `ensureRange`.

This sort of patch obviously needs data to back it up, hence I've benchmarked the changes using the following manifest file (with the default `tracemonkey` file):
```
[
    {  "id": "tracemonkey-eq",
       "file": "pdfs/tracemonkey.pdf",
       "md5": "9a192d8b1a7dc652a19835f6f08098bd",
       "rounds": 250,
       "type": "eq"
    }
]
```

I get the following complete results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |  3500 |          140 |         134 |  -6 | -4.46 |        faster
Firefox | Page Request |  3500 |            2 |           2 |   0 | -0.10 |
Firefox | Rendering    |  3500 |          138 |         131 |  -6 | -4.54 |        faster
```

Here it's pretty clear that the patch does have a positive net effect, even for a PDF file of fairly moderate size and complexity. However, in this case it's probably interesting to also look at the results per page:
```
-- Grouped By page, stat --
page | stat         | Count | Baseline(ms) | Current(ms) | +/- |     %  | Result(P<.05)
---- | ------------ | ----- | ------------ | ----------- | --- | ------ | -------------
0    | Overall      |   250 |           74 |          75 |   1 |   0.69 |
0    | Page Request |   250 |            1 |           1 |   0 |  33.20 |
0    | Rendering    |   250 |           73 |          74 |   0 |   0.25 |
1    | Overall      |   250 |          123 |         121 |  -2 |  -1.87 |        faster
1    | Page Request |   250 |            3 |           2 |   0 | -11.73 |
1    | Rendering    |   250 |          121 |         119 |  -2 |  -1.67 |
2    | Overall      |   250 |           64 |          63 |  -1 |  -1.91 |
2    | Page Request |   250 |            1 |           1 |   0 |   8.81 |
2    | Rendering    |   250 |           63 |          62 |  -1 |  -2.13 |        faster
3    | Overall      |   250 |           97 |          97 |   0 |  -0.06 |
3    | Page Request |   250 |            1 |           1 |   0 |  25.37 |
3    | Rendering    |   250 |           96 |          95 |   0 |  -0.34 |
4    | Overall      |   250 |           97 |          97 |   0 |  -0.38 |
4    | Page Request |   250 |            1 |           1 |   0 |  -5.97 |
4    | Rendering    |   250 |           96 |          96 |   0 |  -0.27 |
5    | Overall      |   250 |           99 |          97 |  -3 |  -2.92 |
5    | Page Request |   250 |            2 |           1 |   0 | -17.20 |
5    | Rendering    |   250 |           98 |          95 |  -3 |  -2.68 |
6    | Overall      |   250 |           99 |          99 |   0 |  -0.14 |
6    | Page Request |   250 |            2 |           2 |   0 | -16.49 |
6    | Rendering    |   250 |           97 |          98 |   0 |   0.16 |
7    | Overall      |   250 |           96 |          95 |  -1 |  -0.55 |
7    | Page Request |   250 |            1 |           2 |   1 |  66.67 |        slower
7    | Rendering    |   250 |           95 |          94 |  -1 |  -1.19 |
8    | Overall      |   250 |           92 |          92 |  -1 |  -0.69 |
8    | Page Request |   250 |            1 |           1 |   0 | -17.60 |
8    | Rendering    |   250 |           91 |          91 |   0 |  -0.52 |
9    | Overall      |   250 |          112 |         112 |   0 |   0.29 |
9    | Page Request |   250 |            2 |           1 |   0 |  -7.92 |
9    | Rendering    |   250 |          110 |         111 |   0 |   0.37 |
10   | Overall      |   250 |          589 |         522 | -67 | -11.38 |        faster
10   | Page Request |   250 |           14 |          13 |   0 |  -1.26 |
10   | Rendering    |   250 |          575 |         508 | -67 | -11.62 |        faster
11   | Overall      |   250 |           66 |          66 |  -1 |  -0.86 |
11   | Page Request |   250 |            1 |           1 |   0 | -16.48 |
11   | Rendering    |   250 |           65 |          65 |   0 |  -0.62 |
12   | Overall      |   250 |          303 |         291 | -12 |  -4.07 |        faster
12   | Page Request |   250 |            2 |           2 |   0 |  12.93 |
12   | Rendering    |   250 |          301 |         289 | -13 |  -4.19 |        faster
13   | Overall      |   250 |           48 |          47 |   0 |  -0.45 |
13   | Page Request |   250 |            1 |           1 |   0 |   1.59 |
13   | Rendering    |   250 |           47 |          46 |   0 |  -0.52 |
```

Here it's clear that this patch *significantly* improves the rendering performance of the slowest pages, while not causing any big regressions elsewhere. As expected, this patch thus helps larger and/or more complex pages the most (which is also where even small improvements will be most beneficial).
There's obviously the question if this is *slightly* regressing simpler pages, but given just how short the times are in most cases it's not inconceivable that the page results above are simply caused be e.g. limited `Date.now()` and/or limited numerical precision.
2019-07-18 17:30:22 +02:00