Commit Graph

4114 Commits

Author SHA1 Message Date
Jonas Jenwald
94f084958a Update the year in the license_header files 2020-01-05 12:14:03 +01:00
Jonas Jenwald
36881e3770 Ensure that all import and require statements, in the entire code-base, have a .js file extension
In order to eventually get rid of SystemJS and start using native `import`s instead, we'll need to provide "complete" file identifiers since otherwise there'll be MIME type errors when attempting to use `import`.
2020-01-04 13:01:43 +01:00
Jonas Jenwald
f8ab8c4d3a Move the SegoeUISymbol font to the getNonStdFontMap (PR 8698 follow-up)
For reasons that I now cannot even begin to understand, the non-standard SegoeUISymbol font was placed in the `getStdFontMap`. That honestly makes no sense, hence this patch which does what I *should* have done from the start.
2019-12-28 11:02:49 +01:00
Jonas Jenwald
a63f7ad486 Fix the linting errors, from the Prettier auto-formatting, that ESLint --fix couldn't handle
This patch makes the follow changes:
 - Remove no longer necessary inline `// eslint-disable-...` comments.
 - Fix `// eslint-disable-...` comments that Prettier moved down, thus causing new linting errors.
 - Concatenate strings which now fit on just one line.
 - Fix comments that are now too long.
 - Finally, and most importantly, adjust comments that Prettier moved down, since the new positions often is confusing or outright wrong.
2019-12-26 12:35:12 +01:00
Jonas Jenwald
de36b2aaba Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).

Prettier is being used for a couple of reasons:

 - To be consistent with `mozilla-central`, where Prettier is already in use across the tree.

 - To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.

Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.

*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.

(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-26 12:34:24 +01:00
Jonas Jenwald
8ec1dfde49 Add // prettier-ignore comments to prevent re-formatting of certain data structures
There's a fair number of (primarily) `Array`s/`TypedArray`s whose formatting we don't want disturb, since in many cases that would lead to the code becoming much more difficult to read and/or break existing inline comments.

*Please note:* It may be a good idea to look through these cases individually, and possibly re-write some of the them (especially the `String` ones) to reduce the need for all of these ignore commands.
2019-12-26 00:14:03 +01:00
Jonas Jenwald
70e3345cb4 Support OpenAction dictionaries without Type entries when parsing Print actions (issue 11442)
The PDF generator didn't bother including the `Type` entry in the OpenAction dictionary, hence we skipped parsing the `Print` action.
2019-12-24 10:41:33 +01:00
Wojciech Maj
d40d33682b
Extract & use createHeaders helper in src/display/fetch_stream.js 2019-12-23 08:08:17 +01:00
Jonas Jenwald
d370037618 [api-minor] Tweak the Node.js fake worker loader to prevent Critical dependency: ... warnings from Webpack
Since bundlers, such as Webpack, cannot be told to leave `require` statements alone we are thus forced to jump through hoops in order to prevent these warnings in third-party deployments of the PDF.js library; please see [Webpack issue 8826](https://github.com/webpack/webpack) and libraries such as [require-fool-webpack](https://github.com/sindresorhus/require-fool-webpack).

*Please note:* This is based on the assumption that code running in Node.js won't ever be affected by e.g. Content Security Policies that prevent use of `eval`. If that ever occurs, we should revert to a normal `require` statement and simply document the Webpack warnings instead.
2019-12-20 17:36:10 +01:00
Jonas Jenwald
8519f87efb Re-factor the setupFakeWorkerGlobal function (in src/display/api.js), and the loadFakeWorker function (in web/app.js)
This patch reduces some duplication, by moving *all* fake worker loader code into the `setupFakeWorkerGlobal` function. Furthermore, the functions are simplified further by using `async`/`await` where appropriate.
2019-12-20 17:36:10 +01:00
Jonas Jenwald
a5485e1ef7 [api-minor] Support loading the fake worker from GlobalWorkerOptions.workerSrc in Node.js
There's no particularily good reason, as far as I can tell, to not support a custom worker path in Node.js environments (even if workers aren't supported). This patch thus make the Node.js fake worker loader code-path consistent with the fallback code-path used with *browser* fake worker loader.

Finally, this patch also deprecates[1] the `fallbackWorkerSrc` functionality, except in Node.js, since the user should *always* provide correct worker options since the fallback is nothing more than a best-effort solution.

---
[1] Although it probably shouldn't be removed until the next major version.
2019-12-20 17:36:10 +01:00
Jonas Jenwald
591e754831 Move the fake worker loader code into the PDFWorkerClosure
Given that this code isn't needed "globally" in the file, it seems reasonable to move it to where it's actually used instead.
2019-12-20 17:36:10 +01:00
Jonas Jenwald
aab0f91740 [api-minor] Simplify the *fallback* fake worker loader code in src/display/api.js
For performance reasons, and to avoid hanging the browser UI, the PDF.js library should *always* be used with web workers enabled.
At this point in time all of the supported browsers should have proper worker support, and Node.js is thus the only environment where workers aren't supported. Hence it no longer seems relevant/necessary to provide, by default, fake worker loaders for various JS builders/bundlers/frameworks in the PDF.js code itself.[1]

In order to simplify things, the fake worker loader code is thus simplified to now *only* support Node.js usage respectively "normal" browser usage out-of-the-box.[2]

*Please note:* The officially intended way of using the PDF.js library is with workers enabled, which can be done by setting `GlobalWorkerOptions.workerSrc`, `GlobalWorkerOptions.workerPort`, or manually providing a `PDFWorker` instance when calling `getDocument`.

---
[1] Note that it's still possible to *manually* disable workers, simply my manually loading the built `pdf.worker.js` file into the (current) global scope, however this's mostly intended for testing/debugging purposes.

[2] Unfortunately some bundlers such as Webpack, when used with third-party deployments of the PDF.js library, will start to print `Critical dependency: ...` warnings when run against the built `pdf.js` file from this patch. The reason is that despite the `require` calls being protected by *runtime* `isNodeJS` checks, it's not possible to simply tell Webpack to just ignore the `require`; please see [Webpack issue 8826](https://github.com/webpack/webpack) and libraries such as [require-fool-webpack](https://github.com/sindresorhus/require-fool-webpack).
2019-12-20 17:36:08 +01:00
Jonas Jenwald
dbb82f05fc Re-factor the find helper function, in src/core/document.js, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.

The main benefits here are:
 - No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
 - In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-14 13:43:26 +01:00
Jonas Jenwald
e24050fa13 [api-minor] Move the ReadableStream polyfill to the global scope
Note that most (reasonably) modern browsers have supported this for a while now, see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#Browser_compatibility

By moving the polyfill into `src/shared/compatibility.js` we can thus get rid of the need to manually export/import `ReadableStream` and simply use it directly instead.

The only change here which *could* possibly lead to a difference in behavior is in the `isFetchSupported` function. Previously we attempted to check for the existence of a global `ReadableStream` implementation, which could now pass (assuming obviously that the preceding checks also succeeded).
However I'm not sure if that's a problem, since the previous check only confirmed the existence of a native `ReadableStream` implementation and not that it actually worked correctly. Finally it *could* just as well have been a globally registered polyfill from an application embedding the PDF.js library.
2019-12-11 19:02:37 +01:00
Jonas Jenwald
b00835f589 Attempt to improve the PDFDocument error message for empty files (issue 5887)
Given that the error in question is surfaced on the API-side, this patch makes the following changes:
 - Updates the wording such that it'll hopefully be slightly easier for users to understand.
 - Changes the plain `Error` to an `InvalidPDFException` instead, since that should work better with the existing Error handling.
 - Adds a unit-test which loads an empty PDF document (and also improves a pre-existing `InvalidPDFException` message and its test-case).
2019-12-09 15:45:50 +01:00
Tim van der Meij
a6db045789
Merge pull request #11387 from Snuffleupagus/issue-11385
Handle corrupt ASCII85Decode inline images with truncated EOD markers (issue 11385)
2019-12-08 20:27:46 +01:00
Tim van der Meij
16778118f6
Merge pull request #11391 from Snuffleupagus/globalThis
Replace `globalScope` with the standard `globalThis` property instead
2019-12-08 20:23:19 +01:00
Jonas Jenwald
71d61e4c6f Re-factor getMainThreadWorkerMessageHandler to support arbitrary global scopes, rather than only window 2019-12-08 20:19:04 +01:00
Jonas Jenwald
a8fc306b6e Replace globalScope with the standard globalThis property instead
Please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/globalThis and note that most (reasonably) modern browsers have supported this for a while now, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/globalThis#Browser_compatibility

Since ESLint doesn't support this new global yet, it was added to the `globals` list in the top-level configuration file to prevent issues.

Finally, for older browsers a polyfill was added in `ssrc/shared/compatibility.js`.
2019-12-08 20:19:02 +01:00
Jonas Jenwald
a02122e984 Ensure that PDFDocument.checkFirstPage waits for cleanup to complete (PR 10392 follow-up)
Given how this method is currently used there shouldn't be any fonts loaded at the point in time where it's called, but it does seem like a bad idea to assume that that's always going to be the case. Since `PDFDocument.checkFirstPage` is already asynchronous, it's easy enough to simply await `Catalog.cleanup` here.

(The patch also makes a tiny simplification in a loop in `Catalog.cleanup`.)
2019-12-07 12:31:41 +01:00
Jonas Jenwald
5c0336872e Handle corrupt ASCII85Decode inline images with truncated EOD markers (issue 11385)
In the PDF document in question, there's an ASCII85Decode inline image where the '>' part of EOD (end-of-data) marker is missing; hence the PDF document is corrupt.
2019-12-05 15:53:18 +01:00
Jonas Jenwald
c3b1c8f857 Slightly simplify the XRef cache lookup in XRef.fetch
Note that the XRef cache will only hold objects returned through `Parser.getObj`, and indirectly via `Lexer.getObj`. Since neither of those methods will ever return `undefined`, we can simply `assert` that when inserting objects into the cache and thus get rid of one function call when doing cache lookups.

Obviously this won't have a huge effect on performance, however `XRef.fetch` is usually called *a lot* in larger documents and this patch thus cannot hurt.
2019-11-30 22:41:53 +01:00
Jonas Jenwald
168c6aecae Stop caching Streams in XRef.fetchCompressed
I'm slightly surprised that this hasn't actually caused any (known) bugs, but that may be more luck than anything else since it fortunately doesn't seem common for Streams to be defined inside of an 'ObjStm'.[1]

Note that in the `XRef.fetchUncompressed` method we're *not* caching Streams, and that for very good reasons too.

 - Streams, especially the `DecodeStream` ones, can become *very* large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so.

 - Attempting to read from the *same* Stream more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position.

 - Given that even the `src/core/` code is now fairly asynchronous, see e.g. the `PartialEvaluator`, it's generally impossible to assert that any one Stream isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Streams isn't going to work in the general case.

All in all, I cannot understand why it'd ever be correct to cache Streams in the `XRef.fetchCompressed` method.

---
[1] One example where that happens is the `issue3115r.pdf` file in the test-suite, where the streams in question are not actually used for anything within the PDF.js code.
2019-11-30 10:21:08 +01:00
Jonas Jenwald
06412a557b Slighthly re-factor XRef.fetchCompressed
- Change all occurences of `var` to `let`/`const`.

 - Initialize the (temporary) Arrays with the correct sizes upfront.

 - Inline the `isCmd` check. Obviously this won't make a huge difference, but given that the check is only relevant for corrupt documents it cannot hurt.
2019-11-30 09:49:51 +01:00
Jonas Jenwald
725566cfea Remove the Number.isInteger checks from XRef.fetchUncompressed (PR 8857 follow-up)
Having ran the entire test-suite locally with these `Number.isInteger` checks removed, there wasn't a single test failure anywhere; see also PR 8857.
Hence everything points to this being completely unnecessary now, and by removing this code there's thus fewer function calls being made in `XRef.fetchUncompressed`.
2019-11-28 23:25:39 +01:00
Jonas Jenwald
cc76132c24 Remove outdated, and misleading, JSDoc comment from the PDFDocument class
The contents of this comment hasn't been correct for *years*, ever since the library was properly split into main/worker-threads, so it's probably high time for this to be updated.
2019-11-25 11:36:29 +01:00
Jonas Jenwald
a965662184 Enable the getter-return, no-dupe-else-if, and no-setter-return ESLint rules
All of these rules can help catch errors during development. Please note that only `getter-return` required a few changes, which was limited to disabling the rule in a couple of spots; please find additional details about these rules at:
 - https://eslint.org/docs/rules/getter-return
 - https://eslint.org/docs/rules/no-dupe-else-if
 - https://eslint.org/docs/rules/no-setter-return
2019-11-23 11:40:30 +01:00
Tim van der Meij
be02e67972
Merge pull request #11335 from Snuffleupagus/issue-11330
Subtract `stream.start` when getting the `startXRef` property for documents with a Linearization dictionary (issue 11330)
2019-11-16 13:56:01 +01:00
Jonas Jenwald
9199b02a42 Subtract stream.start when getting the startXRef property for documents with a Linearization dictionary (issue 11330)
For documents with a Linearization dictionary the computed `startXRef` position will be relative to the raw file, rather than the actual PDF document itself (which begins with `%PDF-`).
Hence it's necessary to subtract `stream.start` in this case, since otherwise the `XRef.readXRef` method will increment the position too far resulting in parsing errors.
2019-11-16 09:29:10 +01:00
Jonas Jenwald
688d15526e Use getBytes, rather than looping over getByte, in FlateStream.prototype.readBlock
*Please note:* A a similar change was attempted in PR 5005, but it was subsequently backed out (in PR 5069) since other parts of the patch caused issues.

With these changes, it's possible to replace repeated function calls within a loop with just a single function call and subsequent assignment instead.
2019-11-15 15:45:31 +01:00
Jonas Jenwald
878432784c [PDFHistory] Move the IE11 pushState/replaceState work-around to src/shared/compatibility.js (PR 10461 follow-up)
I've always disliked the solution in PR 10461, since it required changes to the `PDFHistory` code itself to deal with a bug in IE11.
Now that IE11 support is limited, it seems reasonable to remove these `pushState`/`replaceState` hacks from the main code-base and simply use polyfills instead.
2019-11-11 17:48:04 +01:00
Jonas Jenwald
74e00ed93c Change isNodeJS from a function to a constant
Given that this shouldn't change after the `pdf.js`/`pdf.worker.js` files have been loaded, it doesn't seems necessary to keep this as a function.
2019-11-10 16:44:29 +01:00
Jonas Jenwald
2817121bc1 Convert globalScope and isNodeJS to proper modules
Slightly unrelated to the rest of the patch, but this also removes an out-of-place `globals` definition from the `web/viewer.js` file.
2019-11-10 16:44:29 +01:00
Tim van der Meij
6763e16804
Merge pull request #11313 from Snuffleupagus/issue-11122
Ensure that Popup annotations, where the parent annotation is a polyline, will always be possible to open/close (issue 11122)
2019-11-10 13:31:51 +01:00
Jonas Jenwald
0233fc07b6
Revert "Convert Catalog.getPageDict to an async method" 2019-11-09 22:36:23 +01:00
Jonas Jenwald
536a52e981 Ensure that Popup annotations, where the parent annotation is a polyline, will always be possible to open/close (issue 11122)
For Popup annotation trigger elements consisting of an arbitrary polyline, you need to ensure that the 'stroke-width' is always non-zero since otherwise it's impossible to actually open/close the popup.

Unfortunately I don't believe that any of the test-suites can be used to test this, hence why no tests are included in the patch.
2019-11-09 13:35:59 +01:00
Jonas Jenwald
79d7c002de Inline a couple of isRef/isDict checks in the getPageDict method
As we've seen in numerous other cases, avoiding unnecessary function calls is never a bad thing (even if the effect is probably tiny here).

With these changes we also avoid potentially two back-to-back `isDict` checks when evaluating possible Page nodes, and can also no longer accidentally pick a dictionary with an incorrect /Type.
2019-11-08 17:53:00 +01:00
Jonas Jenwald
0d89006bf1 Convert Catalog.getPageDict to an async method
This makes it possible to remove the internal `next` helper function, and also gets rid of the need to manually resolve/reject a `PromiseCapability`.
2019-11-08 17:45:28 +01:00
Jonas Jenwald
98f570c103 Prevent browser exceptions from incorrectly triggering the assert in PDFPageProxy._abortOperatorList (PR 11069 follow-up)
For certain canvas-related errors (and probably others), the browser rendering exceptions may be propagated "as-is" to the PDF.js code. In this case, the exceptions are of the somewhat cryptic `NS_ERROR_FAILURE` type.
Unfortunately these aren't actual `Error`s, which thus ends up unintentionally triggering the `assert` in `PDFPageProxy._abortOperatorList`; sorry about that!
2019-11-07 11:37:48 +01:00
Jonas Jenwald
80342e2fdc Support UTF-16 little-endian strings in the stringToPDFString helper function (bug 1593902)
The bug report seem to suggest that we don't support UTF-16 strings with a BOM (byte order mark), which we *actually* do as evident by both the code and a unit-test.
The issue at play here is rather that we previously only supported big-endian UTF-16 BOM, and the `Title` string in the PDF document is using a *little-endian* UTF-16 BOM instead.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1593902
2019-11-05 12:43:17 +01:00
Jonas Jenwald
04497bcb3c Re-factor the ObjectLoader._walk method to be properly asynchronous
Rather than having to store a `PromiseCapability` on the `ObjectLoader` instances, we can simply convert `_walk` to be `async` and thus have the same functionality with native JavaScript instead.
2019-11-03 15:04:20 +01:00
Jonas Jenwald
fec1f02b2a Slightly re-factor setting of the link target in addLinkAttributes
I happened to look at this code and the way that the link target is set seems unecessarily convoluted, since we're using `Object.values` and `Array.prototype.includes` for *every* link being parsed.
Given that the number of link targets are so few, the easist solution honestly seem to be to just use a `switch` statement to do the link target mapping.
2019-11-02 14:01:31 +01:00
Tim van der Meij
bbd2386bd9
Merge pull request #11296 from Snuffleupagus/parseColorSpace-stopAtErrors
Allow skipping of errors when parsing broken/unsupported ColorSpaces (issue 6707, issue 11287)
2019-11-01 22:47:50 +01:00
Jonas Jenwald
829d6ba2dc Ensure that the peekByte methods, on the various Streams, handles end of data correctly (PR 5286 follow-up)
When the end of data has already been reached for the various Streams, the `getByte` methods will return `-1` to signal that to the caller. Note however that the current position obviously won't be incremented in this case, meaning that the `peekByte` methods will in this case *incorrectly* decrement the position.

Thankfully the corresponding `peekBytes` shouldn't be affected by this bug, since they decrement the current position with the *actually* returned number of bytes.

I'm not aware of any bugs caused by this blatant oversight, but that doesn't mean this shouldn't be fixed :-)
2019-11-01 18:22:33 +01:00
Jonas Jenwald
835d8c2be5 Allow skipping of errors when parsing broken/unsupported ColorSpaces (issue 6707, issue 11287)
This will allow us to attempt to recover as much as possible of a page, rather than immediately failing, when a broken/unsupported ColorSpace is encountered. This patch thus extends the framework added in PRs such as e.g. 8240 and 8922, to also cover parsing of ColorSpaces.
2019-11-01 09:01:24 +01:00
Tim van der Meij
30ef05c161
Merge pull request #11290 from Snuffleupagus/MessageHandler-rm-in
[MessageHandler] Re-factor and convert the code to a proper `class`
2019-10-31 23:57:52 +01:00
Jonas Jenwald
eedd449cb4 Remove some unused require statements, used when loading fake workers, in non-PRODUCTION mode
The code in question is *only* relevant in non-`PRODUCTION` mode, i.e. the *development* version of the viewer run with `gulp server`, and has been completely unused at least since SystemJS was added.
I really cannot see any reason to keep this, since it's code which first of all isn't shipping and secondly isn't even being used in the development viewer.
2019-10-31 12:08:07 +01:00
Jonas Jenwald
0293222b96 [MessageHandler] Convert the code to a proper class 2019-10-30 23:22:59 +01:00
Jonas Jenwald
5d5733c0a7 [MessageHandler] Convert all instances of var to const in the code 2019-10-30 23:22:59 +01:00
Jonas Jenwald
f61fb3e0f9 [MessageHandler] Re-factor the _onComObjOnMessage function to use early returns
When `ReadableStream` support was added to the `MessageHandler`, the `_onComObjOnMessage` function became more complex than previously.
All of the nested `if`/`else if`/`else` branches are now, at least in my opinion, making some of this code a bit difficult to follow. Hence this patch, which attempts to help readability by making use of early `return`s and `Error`s.

The patch also changes a couple of `var`/`let` occurences to `const`.
2019-10-30 23:22:59 +01:00
Jonas Jenwald
62f28e11a3 [MessageHandler] Remove unnecessary usage of in from the code
Note that using `in` leads to unnecessary stringification of the properties, which seems completely unnecessary here. To avoid future problems from these changes the `MessageHandler.on` method will now assert, in non-`PRODUCTION`/`TESTING` builds, that it's always called with a function as expected.

This patch also renames `callbacksCapabilities` to `callbackCapabilities`, note the removed "s", since using a double plural format looks a bit strange.
2019-10-30 23:22:59 +01:00
Jonas Jenwald
3e46e800a0 [MessageHandler] Replace the internal isReply property, as sent when Promise callbacks are used, with enumeration values
Given that the `isReply` property is an internal implementation detail, changing its type shouldn't be a problem. Note that by directly indicating if either data or an Error is sent, it's no longer necessary to use `in` when handling the callback.
2019-10-30 23:22:59 +01:00
Jonas Jenwald
2d35a49dd8 Inline a couple of isRef/isDict checks in the ObjectLoader code
As we've seen in numerous other cases, avoiding unnecessary function calls is never a bad thing (even if the effect is probably tiny here).
2019-10-29 23:20:10 +01:00
Jonas Jenwald
1133dbac33 Make the ObjectLoader use more efficient methods when determining if data needs to be loaded
Currently, for data in `ChunkedStream` instances, the `getMissingChunks` method is used in a couple of places to determine if data is already available or if it needs to be loaded.

When looking at how `ChunkedStream.getMissingChunks` is being used in the `ObjectLoader` you'll notice that we don't actually care about which *specific* chunks are missing, but rather only want essentially a yes/no answer to the "Is the data available?" question.
Furthermore, when looking at how `ChunkedStream.getMissingChunks` itself is implemented you'll notice that it (somewhat expectedly) always iterates over *all* chunks.

All in all, using `ChunkedStream.getMissingChunks` in the `ObjectLoader` seems like an unnecessary "heavy" and roundabout way to obtain a boolean value. However, it turns out there already exists a `ChunkedStream.allChunksLoaded` method, consisting of a *single* simple check, which seems like a perfect fit for the `ObjectLoader` use cases.
In particular, once the *entire* PDF document has been loaded (which is usually fairly quick with streaming enabled), you'd really want the `ObjectLoader` to be as simple/quick as possible (similar to e.g. loading a local files) which this patch should help with.

Note that I wouldn't expect this patch to have a huge effect on performance, but it will nonetheless save some CPU/memory resources when the `ObjectLoader` is used. (As usual this should help larger PDF documents, w.r.t. both file size and number of pages, the most.)
2019-10-29 23:20:09 +01:00
Jonas Jenwald
0496ea61f5 Ensure that PartialEvaluator.hasBlendModes handles Blend Modes in Arrays (PR 11281 follow-up)
I completely overlooked this in PR 11281, but you obviously need to make similar changes in `PartialEvaluator.hasBlendModes` since it will otherwise ignore valid Blend Modes.
2019-10-28 11:37:05 +01:00
Jonas Jenwald
5c266f0e8c Support Blend Modes which are specified in an Array of Names (issue 11279)
According to the specification, the first *supported* Blend Mode should be choosen in this case; please see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G10.4848607
2019-10-26 14:24:31 +02:00
Tim van der Meij
4a5a4328f4
Merge pull request #11273 from Snuffleupagus/getViewport-offsets
[api-minor] Support custom `offsetX`/`offsetY` values in `PDFPageProxy.getViewport` and `PageViewport.clone`
2019-10-24 00:08:40 +02:00
Jonas Jenwald
681bc9d70e [api-minor] Support custom offsetX/offsetY values in PDFPageProxy.getViewport and PageViewport.clone
There's no good reason, as far as I can tell, to not also support `offsetX`/`offsetY` in addition to e.g. `dontFlip`.
2019-10-23 20:48:14 +02:00
Jonas Jenwald
6f7f8257bc Slightly re-factor the String handling in StatTimer
This uses template strings in a couple of spots, and a buffer in the `toString` method.
2019-10-23 14:45:18 +02:00
Jonas Jenwald
8e5d3836d6 Remove the enable argument from the StatTimer constructor
This argument is a left-over from older API code, where we unconditionally initialized `StatTimer` instances for every page. For quite some time that's only been done when `pdfBug` is set, hence it seems unnecessary to keep this functionality.
2019-10-23 14:45:18 +02:00
Jonas Jenwald
9fc40f8b84 Remove DummyStatTimer since it's unused now
Since this isn't part of the API surface, removing it now that it's unused shouldn't cause any problems.
2019-10-23 14:45:16 +02:00
Jonas Jenwald
860da8b840 Stop using the DummyStatTimer in the API, and check if this._stats exists when trying to report statistics
Even though the currect situation only results in six unnecessary function calls per page, it nonetheless seems completely unnecessary to call dummy functions when `pdfBug` is *not* set (i.e. the default behaviour).
2019-10-23 13:23:41 +02:00
Jonas Jenwald
df0e1edab5 Re-factor sending of various Exceptions from the worker to the API
As can be seen in the API, there's a number of document loading Exception handlers which are both really simple and highly similar. Hence these are changed such that all the relevant Exceptions are sent via *one* message instead.

Furthermore, the patch also avoids unnecessarily re-creating `UnknownErrorException`s at the worker side and removes an unnecessary `bind` call.
2019-10-19 12:54:54 +02:00
Tim van der Meij
11f3851a97
Merge pull request #11243 from Snuffleupagus/issue-11242
Add a fallback for non-embedded *composite* Verdana fonts (issue 11242)
2019-10-18 23:56:46 +02:00
Tim van der Meij
c54bb222ca
Merge pull request #11231 from Snuffleupagus/indexObjects-entries-gen
Allow over-writing entries, in `XRef.indexObjects`, only when the generation number matches (issues 11230, 11139, 9552, 9129, 7303)
2019-10-17 23:56:26 +02:00
Jonas Jenwald
2fcb5afc7b Add a fallback for non-embedded *composite* Verdana fonts (issue 11242)
Obviously this won't look exactly right, but considering that the PDF file doesn't bother embedding non-standard fonts this is the best that we can do here.
2019-10-17 17:00:55 +02:00
Pedro Luiz Cabral Salomon Prado
4d0c759b7f Change variable assignment (#11247)
Remove unused variable assignment in `src/core/fonts.js`
2019-10-16 00:39:25 +02:00
Jonas Jenwald
ffc847eaa5 Allow over-writing entries, in XRef.indexObjects, only when the generation number matches (issues 11230, 11139, 9552, 9129, 7303)
This patch is making me somewhat worried about future regressions, since it's certainly easy to imagine this completely breaking certain kinds of corrupt/edited PDF documents while fixing others.[1]

Obviously it passes all existing reference tests (and even improves one), however compared to many other patches there's no telling how much it could break.
The only reason that I'm even submitting this patch, is because of the number of open issues that it would address.

Generally speaking though, the best course of action would probably be if `XRef.indexObjects` was re-written to be much more robust (since it currently feels somewhat hand-wavy in parts). E.g. by actually checking/validating more of the objects before committing to them.

---
[1] Especially given that it's reverting part of PR 5910, however in the case of issue 5909 it seems that other (more recent) changes have actually made that PR redundant.
2019-10-14 22:10:04 +02:00
Tim van der Meij
ec6a99d781
Bundle all API documentation in a module
This commit allows JSDoc to generate all API documentation in the
`pdfjsLib` module (namespace) so the documentation becomes easier to
navigate.
2019-10-13 21:23:00 +02:00
Tim van der Meij
9f4d45ddf4
Don't include private methods in the the PDFPageProxy API documentation 2019-10-13 21:23:00 +02:00
Tim van der Meij
36c01c2c2a
Deduplicate the documentation for PDFDocumentLoadingTask and PDFWorker
Both classes live inside a closure with the same name, which confuses
JSDoc. Move the documentation to the inner class to deduplicate them.
2019-10-13 21:23:00 +02:00
Tim van der Meij
ca3a58f93a
Consistently use @returns for returned data types in JSDoc comments
Sometimes we also used `@return`, but `@returns` is what the JSDoc
documentation recommends. Even though `@return` works as an alias, it's
good to use the recommended syntax and to be consistent within the
project.
2019-10-13 13:58:17 +02:00
Tim van der Meij
8b4ae6f3eb
Consistently use @type for getter data types in JSDoc comments
Sometimes we also used `@return` or `@returns`, but `@type` is what
the JSDoc documentation recommends. This also improves the documentation
because before this commit the types were not shown and now they are.
2019-10-13 13:58:17 +02:00
Tim van der Meij
f4daafc077
Consistently use square brackets for optional parameters in JSDoc comments
Square brackets are recommended to indicate optional parameters. Using
them helps for automatically generating correct documentation.
2019-10-13 13:58:17 +02:00
Tim van der Meij
efd331daa1
Consistently use string for string data types in JSDoc comments
Sometimes we also used `String`, but `string` is the what the JSDoc
documentation recommends.
2019-10-13 13:58:17 +02:00
Tim van der Meij
e75991b49e
Consistently use number for numeric data types in JSDoc comments
Sometimes we also used `Number` and `integer`, but `number` is what
the JSDoc documentation recommends.
2019-10-13 13:58:13 +02:00
Jonas Jenwald
03387ebaa8 Update src/shared/compatibility.js to only run with SKIP_BABEL = false set
Rather than specifying certain build targets manually, it seems much more appropriate (and future-proof) to use the `SKIP_BABEL` build target instead.

Also, the patch adds a missing `/* eslint no-var: error */` line since I'm touch the file anyway and no code-changes were necessary for it.
2019-10-13 11:33:41 +02:00
Jonas Jenwald
bfcbf2d78d Cache processed 'ExtGState's in PartialEvaluator.hasBlendModes to avoid unnecessary parsing/lookups
This simply extends the already existing caching of processed resources to avoid duplicated parsing of 'ExtGState's, which should help with badly generated PDF documents.

This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf, with the following manifest file:
```
[
    {  "id": "issue6961",
       "file": "../web/pdfs/issue6961.pdf",
       "md5": "",
       "rounds": 200,
       "type": "eq"
    }
]
```

which gave the following *overall* results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   400 |         1063 |        1051 | -12 | -1.17 |        faster
Firefox | Page Request |   400 |          552 |         543 |  -9 | -1.69 |        faster
Firefox | Rendering    |   400 |          511 |         508 |  -3 | -0.61 |
```

and the following *page-specific* results:
```
-- Grouped By page, stat --
page | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
---- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
0    | Overall      |   200 |         1122 |        1110 | -12 | -1.03 |
0    | Page Request |   200 |          552 |         544 |  -8 | -1.48 |        faster
0    | Rendering    |   200 |          570 |         566 |  -4 | -0.62 |
1    | Overall      |   200 |         1005 |         992 | -13 | -1.33 |        faster
1    | Page Request |   200 |          552 |         542 | -11 | -1.91 |        faster
1    | Rendering    |   200 |          452 |         450 |  -3 | -0.61 |
```
2019-10-12 12:35:42 +02:00
Jonas Jenwald
af71f9b40a Inline all the possible type checks in PartialEvaluator.hasBlendModes to avoid unnecessary function calls
For badly generated PDF documents, with issue 6961 being one example, there's well over one hundred thousand function calls being made in total for just the *two* pages.
2019-10-12 11:24:37 +02:00
huzjakd
94171d9d72 Attempt to fallback to a default font, for non-available ones, in PartialEvaluator.loadFont
This handles the two different ways that fonts can be loaded, either by Name (which is the common case) or by Reference.
Furthermore, this also takes the `ignoreErrors` option into account when deciding whether to fallback or Error.
Finally, by creating a minimal but valid Font dictionary, there's no special-cases necessary in any of the font parsing code.

Co-authored-by: huzjakd <huzjakd@gmail.com>
Co-Authored-By: Jonas Jenwald <jonas.jenwald@gmail.com>
2019-10-10 16:49:46 +02:00
Jonas Jenwald
ea729ec55c [api-minor] Replace all deprecated calls with throwing of actual Errors
All of these methods have been marked as `deprecated` in *three* releases now, and I'd thus like to (slowly) move towards complete removal.

However rather than just removing the methods right away, which would cause somewhat cryptic failures, this patch tries to implement a hopefully reasonable middle ground by throwing `Error`s with (essentially) the same information as the previous warnings.

While the previous `deprecated` messages could perhaps be seen as optional, with these changes API consumers will now be forced to actually migrate their code.
2019-10-09 09:21:15 +02:00
Takashi Tamura
d5ee083050 * use square brackets for optional properties in the JSDoc comments of src/display/api.js 2019-10-08 20:34:17 +09:00
Tim van der Meij
cead77ef3a
Merge pull request #11186 from Snuffleupagus/issue-9655
Improve the heuristics, in `PartialEvaluator._buildSimpleFontToUnicode`, for glyphNames of the Cdd{d}/cdd{d} format (issue 9655)
2019-10-06 19:50:43 +02:00
Jonas Jenwald
eabedab38e [MessageHandler] Add a non-PRODUCTION/TESTING check to ensure that wrapReason is called with a valid reason
There shouldn't be any situation where `reason` isn't either an `Error`, or a cloned "Error" sent via `postMessage`.
2019-10-06 14:15:13 +02:00
Jonas Jenwald
9201c8dad4 [MessageHandler] Convert the deleteStreamController helper function to a "private" method instead 2019-10-06 14:15:02 +02:00
Jonas Jenwald
f5be2d62a3 Improve the heuristics, in PartialEvaluator._buildSimpleFontToUnicode, for glyphNames of the Cdd{d}/cdd{d} format (issue 9655)
*Please note:* I've been thinking about possible ways of addressing this issue for a while now, but all of the solutions I came up with became too complicated and thus hurt readability of the code.
However, it occured to me that we're essentially trying to add a heuristic *on top* of another heuristic, and that it shouldn't matter how efficient the code is as long as it works.

In the PDF file in the issue the Encoding contains glyphNames of the `Cdd` format, which our existing heuristics will treat as base 10 values. However, in this particular file they actually contain base 16 values, which we thus attempt to detect and fix such that text-selection works.
2019-10-06 10:47:29 +02:00
Jonas Jenwald
572abdcb4a Convert the various image decoder ...Errors to classes extending BaseException (PR 11185 follow-up)
Somehow I missed these in PR 11185, but there's no good reason not to convert them as well.
2019-10-01 13:10:14 +02:00
Tim van der Meij
8c4f4b5eec
Merge pull request #11182 from Snuffleupagus/disableWorker-disable-Dict-postMessage
Forbid sending of `Dict`s and `Stream`s, with `postMessage`, when workers are disabled
2019-09-29 15:09:42 +02:00
Jonas Jenwald
5d93fda4f2 Convert the various ...Exceptions to proper classes, to reduce code duplication
By utilizing a base "class", things become significantly simpler. Unfortunately the new `BaseException` cannot be a proper ES6 class and just extend `Error`, since the SystemJS dependency doesn't seem to play well with that.
Note also that we (generally) need to keep the `name` property on the actual `...Exception` object, rather than on its prototype, since the property will otherwise be dropped during the structured cloning used with `postMessage`.
2019-09-29 10:16:20 +02:00
Jonas Jenwald
3f8fee371b Forbid sending of Dicts and Streams, with postMessage, when workers are disabled
By default, i.e. with workers enabled, it's *purposely* not possible to send `Dict`s and `Stream`s from the worker-thread. This is achieved by defining a `function` on every `Dict` instance, since that ensures that [the structured clone algoritm](https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm) will throw an Error on `postMessage`.

However, with workers *disabled* we fall-back to the `LoopbackPort` implementation which just ignores any `function`s, thus incorrectly allowing sending of data which *should* be unclonable.
2019-09-26 16:16:13 +02:00
Tim van der Meij
cd909c531f
Merge pull request #11169 from Snuffleupagus/Dict-inline-Ref-checks
Reduce the number of function calls in the `Dict` class
2019-09-24 23:33:37 +02:00
Tim van der Meij
f762d59ad2
Merge pull request #11173 from Snuffleupagus/ReadableStream-polyfill
Replace the bundled `ReadableStream` polyfill with the `web-streams-polyfill` npm package (issue 11157)
2019-09-24 23:22:17 +02:00
Jonas Jenwald
2cac68467f Reduce the number of function calls in the Dict class
The following changes were made:
 - Remove unnecessary `typeof` checks in the `get`/`getAsync` methods.
 - Reduce unnecessary code duplication in the `get`/`getAsync` methods.
 - Inline the `Ref` checks in the `get`/`getAsync`/`getArray` methods, since it helps avoid many unnecessary functions calls. I.e. this way it's possible to directly call `XRef.{fetch, fetchAsync)` only when necessary, rather than always having to call `XRef.{fetchIfRef, fetchIfRefAsync)`.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, using the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 250,
       "type": "eq"
    }
]
```
This gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   250 |         2821 |        2790 | -32 | -1.12 |        faster
Firefox | Page Request |   250 |            2 |           2 |   0 |  6.68 |
Firefox | Rendering    |   250 |         2820 |        2788 | -32 | -1.13 |        faster
```
2019-09-24 08:31:39 +02:00
Jonas Jenwald
0ee373f9cc Replace the bundled ReadableStream polyfill with the web-streams-polyfill npm package (issue 11157)
Compared to the recently replaced `URL` polyfill, the new `ReadableStream` polyfill isn't being exported globally for two reasons:
 - We're currently checking for the existence of a global `ReadableStream` implementation when determining if the Fetch API will be used; please see `isFetchSupported` in the src/display/display_utils.js file.
 - Given that it's much newer functionality (compared to `URL`) and that not all browsers may implement all parts of the specification yet, not exposing the `ReadableStream` globally seems safer for now.
2019-09-23 22:16:59 +02:00
Jonas Jenwald
7f18c57c12 Fix the inconsistent return types for Dict.{get, getAsync}
Having these methods fallback to returning `null` in only *one* particular case seems outright wrong, since a "falsy" value will thus be handled incorrectly.
The only reason that this hasn't caused issues in practice is that there's only one call-site passing in three keys, and in that case we're trying to read a font file where falling back to `null` isn't a problem.
2019-09-23 11:41:19 +02:00
Tim van der Meij
1f5ebfbf0c
Replace our URL polyfill with the one from core-js
`core-js` polyfills have proven to be of good quality and using them
prevents us from having to maintain them ourselves.
2019-09-19 14:09:51 +02:00
Tim van der Meij
c71a291317
Upgrade core-js to version 3.2.1
This only required changing the import paths. The `es` folder contains
all polyfills we need now. If we want to import everything, we need to
explicitly require the `index` file.
2019-09-19 13:58:36 +02:00
Tim van der Meij
3da680cdfc
Merge pull request #11158 from janpe2/gradient-stops
Avoid floating point inaccuracy in gradient color stops
2019-09-19 13:15:11 +02:00
Tim van der Meij
58e5f36666
Merge pull request #11159 from Snuffleupagus/issue-11150
For Type1 fonts, replace missing font dictionary /Widths entries with ones from the font data (issue 11150)
2019-09-19 13:14:27 +02:00
Jonas Jenwald
af22dc9b0c For Type1 fonts, replace missing font dictionary /Widths entries with ones from the font data (issue 11150)
Hopefully this patch makes sense, and in order to reduce the regression risk the implementation ensures that only completely missing widths are being replaced.
2019-09-18 10:15:09 +02:00
Jani Pehkonen
911df237f3 Avoid floating point inaccuracy in gradient color stops 2019-09-17 21:01:17 +03:00
Jonas Jenwald
4bd79ec4b3 Inline the resolveOrReject helper function at its call-sites in MessageHandler, and rename an error key to reason
Given that there's only a couple of call-sites, and that the helper function is really simple, it doesn't seem entirely necessary to keep it around. While fewer function calls is always a good thing, in this case the performance impact is small enough to be unmeasurable.

With *one* single exception the code in `MessageHandler` is using `reason` when passing around various Errors, hence this patch also renames an `error` key for consistency.
2019-09-17 14:22:24 +02:00
Jonas Jenwald
0617984b59 Remove unnecessary data.streamId accesses in MessageHandler._processStreamMessage, and use a constant object shape in MessageHandler.sendWithStream
The `streamId` short-hand in `MessageHandler._processStreamMessage` was only used partially througout the method, which seemed kind of strange, hence that's fixed in this patch.
Furthermore, always giving the `streamController` object a constant shape in `MessageHandler.sendWithStream` cannot hurt either.
2019-09-17 14:18:57 +02:00
Jonas Jenwald
281ed33e43 Abort, with a small delay, getOperatorList on the worker-thread when rendering is cancelled (PR 11069 follow-up)
With this patch we're finally able to abort worker-thread parsing of the `OperatorList`, rather than *only* aborting the main-thread rendering itself, when the `RenderTask.cancel` method is being called.
This will help improve perceived performance in the default viewer, especially when reading longer and more complex documents, since pages that've been scrolled out-of-view (and thus evicted from the cache) will no longer compete for parsing resources on the worker-thread.

*Please note:* With the implementation in this patch we're *not* aborting worker-thread parsing immediately on `RenderTask.cancel`, since that would lead to *worse* performance in many cases. For example: When zoom/rotation occurs in the viewer, while parsing/rendering is still ongoing, a `cancel` call will usually be (almost) immediately folled by a new `PDFPageProxy.render` call. In that case you obviously don't want to abort parsing on the worker-thread, since that would risk throwing away a partially parsed `OperatorList` and thus force unnecessary re-parsing which will regress perceived performance (especially for more complex documents).

When choosing a reasonable delay, before cancelling `getOperatorList` on the worker-thread when `RenderTask.cancel` is called, two different positions need to be considered:
 1. The delay needs to be short enough, since a timeout in the multiple seconds range would essentially make this entire functionality meaningless (by always allowing most/all pages enough time to finish parsing).

 2. The delay cannot be *too* short, since that would actually *reduce* performance in the zoom/rotation case outlined above. Furthermore, the time between `RenderTask.cancel` and `PDFPageProxy.render` calls will obviously be affected by both general computer performance and current CPU load.

It's certainly possible that the timeout may require some further tweaks, however the value settled on in this patch was easily *one order* of magnitude larger than the delta between cancel/render in my tests.
2019-09-14 11:30:32 +02:00
Jonas Jenwald
00efff532c Ensure that addLinkAttributes is always called with a valid url parameter
There's no good reason for calling this helper function without a `url` parameter, and this way we can prevent that from happening.
Note how the `PDFOutlineViewer` call-site was already doing the right thing here, and only the `LinkAnnotationElement` call-site needed a small adjustment to make it work.
2019-09-11 13:24:04 +02:00
Jonas Jenwald
12e1c91f73 Don't enqueue unused properties when sending 'GetOperatorList' data from the worker-thread (PR 11069 follow-up)
With the changes made in PR 11069, it's no longer necessary to include the `pageIndex`/`intent` parameters when sending 'GetOperatorList' data. In the previous implementation these properties were used to associate the `OperatorList` with the correct `RenderTask`, however now that `ReadableStream`s are used that's handled automatically and it's thus dead code at this point.
2019-09-09 17:41:26 +02:00
Tim van der Meij
37d5b80ba8
Merge pull request #11118 from Snuffleupagus/FetchBuiltInCMap-sendWithStream
Transfer, rather than copy, CMap data to the worker-thread
2019-09-06 22:56:14 +02:00
Jonas Jenwald
7dea3f9389 [api-minor] Remove the postMessageTransfers parameter, and thus the ability to manually disable transferring of data, from the API
By transfering, rather than copying, `ArrayBuffer`s between the main- and worker-threads, you can avoid unnecessary allocations by only having *one* copy of the same data.
Hence manually setting `postMessageTransfers: false`, when calling `getDocument`, is a performance footgun[1] which will do nothing but waste memory.

Given that every reasonably modern browser supports `postMessage` transfers[2], I really don't see why it should be possible to force-disable this functionality.
Looking at the browser support, for `postMessage` transfers[2], it's highly unlikely that PDF.js is even usable in browsers without it. However, the feature testing of `postMessage` transfers is kept for the time being just to err on the safe side.

---
[1] This is somewhat similar to the, now removed, `disableWorker` parameter which also provided API users a much too simple way of reducing performance.

[2] See e.g. https://developer.mozilla.org/en-US/docs/Web/API/MessagePort/postMessage#Browser_compatibility and https://developer.mozilla.org/en-US/docs/Web/API/Transferable#Browser_compatibility
2019-09-05 13:09:54 +02:00
Jonas Jenwald
f0534b9b51 Adjust the values sent, with the 'test' message, by the WorkerMessageHandler.setup method
Note how the sent values have inconsistent types, with a boolean in one case and an object in the other (normal) case.
Furthermore, explicitly sending a `supportTypedArray: true` property seems superfluous at least to me.
2019-09-05 11:27:27 +02:00
Jonas Jenwald
7212ff4eea Stop checking for the response property, on XMLHttpRequest, when setting up the WorkerMessageHandler
This check was added in PR 2445, however it's no longer necessary since all data[1] is now loaded on the main-thread (and then transferred to the worker-thread).
Furthermore, by default the Fetch API is now (usually) used rather than `XMLHttpRequest`.

All in all, while these checks *were* necessary at one point that's no longer the case and they can thus be removed.

---
[1] This includes both the actual PDF data, as well as the CMap data.
2019-09-05 11:27:22 +02:00
Jonas Jenwald
f11a4ba750 Transfer, rather than copy, CMap data to the worker-thread
It recently occurred to me that the CMap data should be an excellent candidate for transfering.
This will help reduce peak memory usage for PDF documents using CMaps, since transfering of data avoids duplicating it on both the main- and worker-threads.

Unfortunately it's not possible to actually transfer data when *returning* data through `sendWithPromise`, and another solution had to be used.
Initially I looked at using one message for requesting the data, and another message for returning the actual CMap data. While that should have worked, it would have meant adding a lot more complexity particularly on the worker-thread.
Hence the simplest solution, at least in my opinion, is to utilize `sendWithStream` since that makes it *really* easy to transfer the CMap data. (This required PR 11115 to land first, since otherwise CMap fetch errors won't propagate correctly to the worker-thread.)

Please note that the patch *purposely* only changes the API to Worker communication, and not the API *itself* since changing the interface of `CMapReaderFactory` would be a breaking change.
Furthermore, given the relatively small size of the `.bcmap` files (the largest one is smaller than the default range-request size) streaming doesn't really seem necessary either.
2019-09-04 11:46:04 +02:00
Jonas Jenwald
74f5a59f43 Ensure that the cancel/error methods on Streams are always called with valid reason arguments 2019-09-02 23:31:07 +02:00
Jonas Jenwald
02bdacef42 Ensure that Errors are handled correctly when using postMessage with Streams in MessageHandler
Having recently worked with this code, it struck me that most of the `postMessage` calls where `Error`s are involved have never been correctly implemented (i.e. missing `wrapReason` calls).
2019-09-02 23:31:07 +02:00
Tim van der Meij
e59b11860d
Merge pull request #11108 from timvandermeij/es6-annotations
Use more ES6 syntax in the annotation code
2019-09-02 23:13:24 +02:00
Tim van der Meij
2866c8a39e
Use more ES6 syntax in src/core/annotation.js
`let` is converted to `const` where possible.
2019-09-02 22:37:27 +02:00
Tim van der Meij
c37a2c0408
Merge pull request #11112 from Snuffleupagus/TESTING-rm-version-warn
Remove the API/Worker version warning message in `TESTING` mode
2019-09-02 22:22:33 +02:00
Jonas Jenwald
229f6f34d1 Remove the API/Worker version warning message in TESTING mode
The warning messages turn out to be more annoying than helpful when looking at the `console` during tests, so let's just remove them.
2019-09-01 16:47:26 +02:00
Jonas Jenwald
cd82b81bc7 Inline the resolveCall helper function at its call-sites in MessageHandler
There's only three call-sites and one of them doesn't even need the complete functionality of `resolveCall`, hence it seems reasonable to just inline this code.
An additional benefit of this is that the `Function.prototype.apply()` instance can also be converted into "normal" function calls, which should be a tiny bit more efficient.

The patch also replaces a number of unnecessary arrow functions, in relevant parts of the `MessageHandler` code, with "normal" functions instead.
Finally, all `Promise.resolve().then(...)` calls are replaced with `new Promise(...)` instead since the latter is a tiny bit more efficient. This also explains the test failures on the Linux bot, with a prior version of the patch, since the `Promise.resolve().then(...)` format essentially creates two Promises thus causing additional delay.
2019-09-01 13:40:19 +02:00
Jonas Jenwald
055f03938b Remove support for the scope parameter in the MessageHandler.on method
At this point in time it's easy to convert the `MessageHandler.on` call-sites to use arrow functions, and thus let the JavaScript engine handle scopes for us, rather than having to manually keep references to the relevant scopes in `MessageHandler`.[1]
An additional benefit of this is that a couple of `Function.prototype.call()` instances can now be converted into "normal" function calls, which should be a tiny bit more efficient.

All in all, I don't see any compelling reason why it'd be necessary to keep supporting custom `scope`s in the `MessageHandler` implementation.

---
[1] In the event that a custom scope is ever needed, simply using `bind` on the handler function when calling `MessageHandler.on` ought to work as well.
2019-09-01 09:24:15 +02:00
Tim van der Meij
49018482dc
Use more ES6 syntax in src/display/annotation_layer.js
`let` is converted to `const` where possible, `var` usage is disabled
and template strings are used where possible.
2019-08-31 16:40:39 +02:00
Jonas Jenwald
f71ea2de0e Remove the makeReasonSerializable helper function, and use wrapReason instead, in src/shared/message_handler.js
Since `wrapReason` and `makeReasonSerializable` are essentially functionally equivalent it doesn't seem necessary to keep both of them around, especially when `makeReasonSerializable` only has a *single* call-site.
2019-08-30 19:36:10 +02:00
Jonas Jenwald
4e6a9b54c7 Change the internal stream property, as sent when Streams are used, from a String to a Number
Given that the `stream` property is an internal implementation detail, changing its type shouldn't be a problem. By using Numbers instead, we can avoid unnecessary String allocations when creating/processing Streams.
2019-08-30 13:27:18 +02:00
Jonas Jenwald
252a3e35fb Reduce the amount of unnecessary function calls and object allocations, in MessageHandler, when using Streams
With PR 11069 we're now using Streams for OperatorList parsing (in addition to just TextContent parsing), which brings the nice benefit of being able to easily abort parsing on the worker-thread thus saving resources.

However, since we're now creating many more `ReadableStream` there appears to be a tiny bit more overhead because of it (giving ~1% slower runtime of `browsertest` on the bots). In this case we're just going to have to accept such a small regression, since the benefits of using Streams clearly outweighs it.

What we *can* do here, is to try and make the Streams part of the `MessageHandler` implementation slightly more efficient by e.g. removing unnecessary function calls (which has been helpful in other parts of the code-base). To that end, this patch makes the following changes:

 - Actually support `transfers` in `MessageHandler.sendWithStream`, since the parameter was being ignored.

 - Inline the `sendStreamRequest`/`sendStreamResponse` helper functions at their respective call-sites. Obviously this causes some amount of code duplication, however I still think this change seems reasonable since for each call-site:
   - It avoids making one unnecessary function call.
   - It avoids allocating one temporary object.
   - It avoids sending, and thus structure clone, various undefined object properties.

 - Inline objects in the `MessageHandler.{send, sendWithPromise}` methods.

 - Finally, directly call `comObj.postMessage` in various methods when `transfers` are *not* present, rather than calling `MessageHandler.postMessage`, to further reduce the amount of function calls.
2019-08-30 12:32:20 +02:00
Jonas Jenwald
ae0d9e8c2a Replace some instances of implicit function.bind(this) usage, in src/display/api.js, with arrow functions instead 2019-08-30 11:35:05 +02:00
Jonas Jenwald
667e548e5f [TextLayer] Remove setAttribute usage in appendText (issue 8066)
One of the motivations for using `setAttribute` in the first place was to support more efficient DOM updates in the `expandTextDivs` method, since performance of the `enhanceTextSelection` mode can be somewhat bad when there's a lot of `textDivs` on the page.

With recent `TextLayer` changes/optimizations it's no longer necessary to store a complete `style`-string for every `textDiv`, and we can thus re-visit the `setAttribute` usage.
Note that with the current code, in `appendText`, there's only *one* string per `textDiv` which avoids a bunch of temporary strings. While the changes in this patch means that there's now *three* strings per `textDiv` instead, the total length of these strings are now quite a bit shorter (42 characters to be exact).
2019-08-28 16:52:09 +02:00
Jonas Jenwald
106b239c5d [TextLayer] Avoid unnecessary font updates in _layoutText (PR 11097 follow-up)
*This should obviously have been done in PR 11097, but for some reason I completely overlooked it; sorry about that.*

There's no good reason to update the font unless you're actually going to measure the width of the textContent. This can reduce unnecessary font switching a fair bit, even for documents which are somewhat simple/short (in e.g. the `tracemonkey.pdf` file this cuts the amount of font switches almost in half).
2019-08-28 16:08:06 +02:00
Jonas Jenwald
a1398048e5 [TextLayer] Simplify building of the *expanded* transform in expandTextDivs
Rather than essentially re-computing the `originalTransform` every time, we can simply use it directly instead.
2019-08-25 13:09:04 +02:00
Jonas Jenwald
b68f7bb404 [TextLayer] Only measure the width of the text, in _layoutText, for multi-char text divs
For performance reasons single-char text divs aren't being scaled, as outlined in a comment in `appendText`. Hence it doesn't seem necessary, or even a good idea, to unconditionally measuring the width of the text in `_layoutText`.
2019-08-25 12:32:49 +02:00
Jonas Jenwald
711040ecc5 Stop re-throwing errors in the 'GetOperatorList' and 'GetTextContent' handlers, in src/core/worker.js
These functions aren't returning anything, now that they're using `ReadableStream`s, and it thus doesn't seem necessary to re-throw errors (also given the console message that's caused by it).
2019-08-24 15:56:41 +02:00
Yury Delendik
66e0dd1b06 Use streams for OperatorList chunking (issue 10023)
*Please note:* The majority of this patch was written by Yury, and it's simply been rebased and slightly extended to prevent issues when dealing with `RenderingCancelledException`.

By leveraging streams this (finally) provides a simple way in which parsing can be aborted on the worker-thread, which will ultimately help save resources.
With this patch worker-thread parsing will *only* be aborted when the document is destroyed, and not when rendering is cancelled. There's a couple of reasons for this:

 - The API currently expects the *entire* OperatorList to be extracted, or an Error to occur, once it's been started. Hence additional re-factoring/re-writing of the API code will be necessary to properly support cancelling and re-starting of OperatorList parsing in cases where the `lastChunk` hasn't yet been seen.
 - Even with the above addressed, immediately cancelling when encountering a `RenderingCancelledException` will lead to worse performance in e.g. the default viewer. When zooming and/or rotation of the document occurs it's very likely that `cancel` will be (almost) immediately followed by a new `render` call. In that case you'd obviously *not* want to abort parsing on the worker-thread, since then you'd risk throwing away a partially parsed Page and thus be forced to re-parse it again which will regress perceived performance.
 - This patch is already *somewhat* risky, given that it touches fundamentally important/critical code, and trying to keep it somewhat small should hopefully reduce the risk of regressions (and simplify reviewing as well).

Time permitting, once this has landed and been in Nightly for awhile, I'll try to work on the remaining points outlined above.

Co-Authored-By: Yury Delendik <ydelendik@mozilla.com>
Co-Authored-By: Jonas Jenwald <jonas.jenwald@gmail.com>
2019-08-24 15:56:40 +02:00
Jonas Jenwald
29a2516e4c [TextLayer] Use an Array to build the total padding, rather than concatenating Strings, in expandTextDivs
Furthermore, it's possible to re-use the same Array for all `textDiv`s on the page and the resulting padding string also becomes a lot more compact.
Please note that the `paddingLeft` branch was moved, since the padding values need to be ordered as `top, right, bottom, left`.

Finally, with this re-factoring it's no longer necessary to cache the original `style` string for every `textDiv` when `enhanceTextSelection` is enabled.
2019-08-24 01:13:59 +02:00
Tim van der Meij
edbebb8bf7
Merge pull request #11090 from Snuffleupagus/textLayer-expandTextDivs-transform
[TextLayer] Use an Array to build the total `transform`, rather than concatenating Strings, in `expandTextDivs`
2019-08-23 23:12:42 +02:00
Jonas Jenwald
932fcacff8 [TextLayer] Only handle positive padding values in expandTextDivs
Given that browsers will reject padding values smaller than zero (which may be caused by limited numerical precision during calculations in the `expand` code), it makes no sense to include those when expanding the `textDiv`s.
2019-08-23 13:16:20 +02:00
Jonas Jenwald
37e8a8189b [TextLayer] Use an Array to build the total transform, rather than concatenating Strings, in expandTextDivs
Furthermore, it's possible to re-use the same Array for all `textDiv`s on the page.
2019-08-23 12:17:12 +02:00
Tim van der Meij
490deb1b65
Merge pull request #11086 from Snuffleupagus/textLayer-originalTransform
[TextLayer] Only cache the `originalTransform` when `enhanceTextSelection` is enabled
2019-08-22 23:09:07 +02:00
Brendan Dahl
31f319301d
Merge pull request #11087 from brendandahl/disable-links
Add a way to disable external links.
2019-08-22 11:13:11 -07:00
Jonas Jenwald
a519ceffee [TextLayer] Use template strings when updating the font property in the _layoutText method 2019-08-22 14:47:44 +02:00
Jonas Jenwald
6afe3221b7 [TextLayer] Only cache the originalTransform when enhanceTextSelection is enabled
Given that this is completely unused in "regular" text-selection mode, there's no reason to unconditionally store one string for every `textDiv`.
2019-08-22 14:47:18 +02:00
Brendan Dahl
98e989116c Add a way to disable external links. 2019-08-21 11:20:41 -07:00
Jonas Jenwald
431a264126 [TextLayer] Reduce the amount of intermediary strings in expandTextDivs
By using template strings, we can avoid some unnecessary string allocations (which is also helped by shortening a variable name).
2019-08-19 12:09:18 +02:00
Jonas Jenwald
45dfad8640 [TextLayer] Only cache the current textDiv style when enhanceTextSelection is enabled
This will help save a little bit of memory, by not storing one unused string for each `textDiv` in regular text-selection mode.
2019-08-19 11:02:56 +02:00
Jonas Jenwald
1cd9a28c81 Replace the XRef.cache Array with a Map instead
Given that the different types of `Stream`s will never be cached, this thus implies that the `XRef.cache` Array will *always* be more-or-less sparse.
Generally speaking, the longer the document the more sparse the `XRef.cache` will thus become. For example, looking at the `pdf.pdf` file from the test-suite: The length of the `XRef.cache` Array will be a few hundred thousand elements, with approximately 95% of them being empty.

Hence it seems pretty clear that an Array isn't really the best data-structure for this kind of cache, and this patch thus changes it to a Map instead.

This patch-series was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 200,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch-series against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   200 |         2736 |        2736 |   1 |  0.02 |
Firefox | Page Request |   200 |            2 |           2 |   0 | -8.26 |        faster
Firefox | Rendering    |   200 |         2733 |        2734 |   1 |  0.03 |
```
2019-08-18 12:07:18 +02:00
Jonas Jenwald
34a53b9f5d Inline the isRef checks in the various XRef.fetch related methods
The relevant methods are usually not hot enough for these changes to have an easily measurable effect, however there's been a lot of other cases where similiar inlining has helped performance. (And these changes may help offset the changes made in the next patch.)
2019-08-18 11:57:48 +02:00
Tim van der Meij
1565d1849d
Merge pull request #11073 from brendandahl/code-point
Move polyfill for codePointAt to String prototype.
2019-08-17 13:26:35 +02:00
Brendan Dahl
c8129b8787 Move polyfill for codePointAt to String prototype.
This method belongs on the prototype not the String object.
2019-08-16 14:32:43 -07:00
Jonas Jenwald
40d3916f31 Reduce the number of temporary variables in the Parser.getObj method
This avoids allocating approximately 1.7 million short-lived variables when loading the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, in the default viewer.
2019-08-16 13:51:41 +02:00
Jonas Jenwald
7728a6630c Inline the isString check in the Parser.getObj method
For very large and complex PDF files this will help performance *slightly*, since `Parser.getObj` is called *a lot* during parsing in the worker.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 200,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   200 |         2847 |        2830 | -17 | -0.60 |        faster
Firefox | Page Request |   200 |            2 |           2 |   0 | -7.14 |
Firefox | Rendering    |   200 |         2844 |        2827 | -17 | -0.60 |        faster
```
2019-08-16 10:34:24 +02:00
Jonas Jenwald
7f456b3e2e Replace of all usages of var with let/const in the src/shared/util.js file
Also removes a couple of unnecessary (temporary) variable assigments in `arraysToBytes` and uses template strings in a few spots.
2019-08-11 14:35:35 +02:00
Jonas Jenwald
f6c4a1f080 Convert Util to a class with static methods
Also replaces `var` with `const` in all the relevant code.
2019-08-11 14:35:35 +02:00
Jonas Jenwald
7ee370a394 Remove the skipEmpty parameter from Util.intersect (PR 11059 follow-up)
Looking at this again, it struck me that added functionality in `Util.intersect` is probably more confusing than helpful in general; sorry about the churn in this code!
Based on the parameter name you'd probably expect it to only match when the intersection is `[0, 0,  0, 0]` and not when only one component is zero, hence the `skipEmpty` parameter thus feels too tightly coupled to the `Page.view` getter.
2019-08-11 14:33:52 +02:00
Tim van der Meij
fbe8c6127c
Merge pull request #11059 from Snuffleupagus/boundingBox-more-validation
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
2019-08-09 22:39:01 +02:00
Jonas Jenwald
d637b25e36 Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".

The patch makes the following notable changes:
 - Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
 - Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
 - Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
 - Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
 - Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.

---

[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-09 10:18:13 +02:00
Jonas Jenwald
0f78fdb229 Handle some corrupt/truncated JPEG images that are missing the EOI (End of Image) marker (issue 11052)
Note that even Adobe Reader cannot render the PDF file completely, which is always a good indication that it's corrupt.
2019-08-08 10:37:41 +02:00
Jonas Jenwald
e9b7996f2f Actually compare the cropBox and mediaBox correctly in the Page.view getter
The current code will only consider the `cropBox` and `mediaBox` as equal when they both point to the *same* underlying Array. In the case where a PDF file actually specifies both boxes independently, with the exact same values in each, the comparison will currently fail and lead to an unneeded intersection computation.
2019-08-07 17:15:57 +02:00
Jonas Jenwald
5ac9c7c384 Support corrupt PDF files with invalid/non-existent Group /CS entries (issue 11045)
The PDF file in question tries to reference a non-existent ColorSpace, which should be quite rare in practice.
2019-08-06 14:33:05 +02:00
Tim van der Meij
be70ee236d
Merge pull request #11013 from timvandermeij/annotations-quadpoints
[api-minor] Implement quadpoints for annotations in the core layer
2019-08-04 16:06:10 +02:00
Jonas Jenwald
0276385e6e [api-minor] Fix completely broken getStats method by returning stats in Objects, rather than in Arrays (PR 11029 follow-up)
With the changes to the `StreamType`/`FontType` "enums" in PR 11029, one unfortunate result is that `getStats` now *always* returns empty Arrays. Something that everyone, myself included, apparently missed is that you obviously cannot index an Array with Strings :-)

I wrongly assumed that the unit-tests would catch any bugs, but they apparently suffered from the same issue as the code in `src/core/`.

Another possible option could perhaps be to use `Set`s, rather than objects, but that will require larger changes since `LoopbackPort` (in `src/display/api.js`) doesn't support them.
2019-08-02 14:09:24 +02:00
Tim van der Meij
9c8fe3142a
Merge pull request #11034 from Snuffleupagus/cancel-with-AbortException
Ensure that `ReadableStream`s are cancelled with actual Errors
2019-08-02 00:18:44 +02:00
Tim van der Meij
e0b38bed3c
Merge pull request #11029 from brendandahl/pdfjs-telemetry-update
[api-minor] Update telemetry to use 'categorical' histograms.
2019-08-02 00:11:02 +02:00
Brendan Dahl
31d71808e7 [api-minor] Update telemetry to use 'categorical' histograms.
Firefox telemetry supports using string labels now. Convert our integers
that we used for categories to just use strings.

The upstream work will happen in:
https://bugzilla.mozilla.org/show_bug.cgi?id=1566882
2019-08-01 09:51:02 -07:00
Jonas Jenwald
a3150166ec Ensure that ReadableStreams are cancelled with actual Errors
There's a number of spots in the current code, and tests, where `cancel` methods are not called with appropriate arguments (leading to Promises not being rejected with Errors as intended).
In some cases the cancel `reason` is implicitly set to `undefined`, and in others the cancel `reason` is just a plain String. To address this inconsistency, the patch changes things such that cancelling is done with `AbortException`s everywhere instead.
2019-08-01 16:40:46 +02:00
Tim van der Meij
d909b86b28
Merge pull request #11020 from Snuffleupagus/issue-11016
Add a work-around, in `glyphlist.js`, for bad PDF generators which use a non-standard `/f_f` string in the `Encoding` dictionary when referring to the ff ligature (issue 11016)
2019-07-31 23:33:34 +02:00
wangsongyan
c61205d980 decode filename when match an urlencode filename from contentDispositionFilename 2019-07-31 09:33:56 +08:00
Jonas Jenwald
9ad50521b1 Add a work-around, in glyphlist.js, for bad PDF generators which use a non-standard /f_f string in the Encoding dictionary when referring to the ff ligature (issue 11016)
This patch will not incur any (measurable) overhead, since the glyphlist is already quite long and one more entry won't really matter, which is important given that this sort of PDF corruption ought to be very rare.

Furthermore, this patch purposely does *not* add a bunch of similarly modified ligature names on pure speculation. Any similar additions, for other ligatures, should only be made if there's real-world examples of PDF files where that's actually necessary.
2019-07-30 17:06:58 +02:00
Jonas Jenwald
38ccb43436 Reduce the number of function calls in EvaluatorPreprocessor.read
For very large and complex PDF files this will help performance slightly, since `EvaluatorPreprocessor.read` is called a lot during parsing in the worker.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, using the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 200,
       "type": "eq"
    }
]
```

This gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   200 |         3402 |        3358 | -43 | -1.28 |        faster
Firefox | Page Request |   200 |            1 |           1 |   0 | 26.71 |
Firefox | Rendering    |   200 |         3401 |        3357 | -44 | -1.28 |        faster
```
2019-07-29 08:43:36 +02:00
Tim van der Meij
9114004d5b
[api-minor] Implement quadpoints for annotations in the core layer 2019-07-28 20:36:21 +02:00
Jonas Jenwald
ff90aa4323 Inline the isCmd check in the Parser.shift method
For very large and complex PDF files this will help performance slightly, since `Parser.shift` is called *a lot* during parsing.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471 (with well over *four million* `Parser.shift` calls for just the one page), using the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 100,
       "type": "eq"
    }
]
```

This gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   100 |         3386 |        3322 | -65 | -1.92 |        faster
Firefox | Page Request |   100 |            1 |           1 |   0 | -8.08 |
Firefox | Rendering    |   100 |         3385 |        3321 | -65 | -1.92 |        faster
```
2019-07-22 12:07:36 +02:00
Jonas Jenwald
b5254f2745 Attempt to significantly reduce the number of ChunkedStream.{ensureByte, ensureRange} calls by inlining the this.progressiveDataLength checks at the call-sites
The number of in particular `ChunkedStream.ensureByte` calls is often absolutely *huge* (on the order of million calls) when loading and rendering even moderately complicated PDF files, which isn't entirely surprising considering that the `getByte`/`getBytes`/`peekByte`/`peekBytes` methods are used for essentially all data reading/parsing.

The idea implemented in this patch is to inline an inverted `progressiveDataLength` check at all of the `ensureByte`/`ensureRange` call-sites, which in practice will often result in *several* orders of magnitude fewer function calls.
Obviously this patch will only help if the browser supports streaming, which all reasonably modern browsers now do (including the Firefox built-in PDF viewer), and assuming that the user didn't set the `disableStream` option (e.g. for using `disableAutoFetch`). However, I think we should be able to improve performance for the default out-of-the-box use case, without worrying about e.g. older browsers (where this patch will thus incur *one* additional check before calling `ensureByte`/`ensureRange`).

This patch was inspired by the *first* commit in PR 5005, which was subsequently backed out in PR 5145 for causing regressions. Since the general idea of avoiding unnecessary function calls was really nice, I figured that re-attempting this in one way or another wouldn't be a bad idea.
Given that streaming is now supported, which it wasn't back then, using `progressiveDataLength` seemed like an easier approach in general since it also allowed supporting both `ensureByte` and `ensureRange`.

This sort of patch obviously needs data to back it up, hence I've benchmarked the changes using the following manifest file (with the default `tracemonkey` file):
```
[
    {  "id": "tracemonkey-eq",
       "file": "pdfs/tracemonkey.pdf",
       "md5": "9a192d8b1a7dc652a19835f6f08098bd",
       "rounds": 250,
       "type": "eq"
    }
]
```

I get the following complete results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |  3500 |          140 |         134 |  -6 | -4.46 |        faster
Firefox | Page Request |  3500 |            2 |           2 |   0 | -0.10 |
Firefox | Rendering    |  3500 |          138 |         131 |  -6 | -4.54 |        faster
```

Here it's pretty clear that the patch does have a positive net effect, even for a PDF file of fairly moderate size and complexity. However, in this case it's probably interesting to also look at the results per page:
```
-- Grouped By page, stat --
page | stat         | Count | Baseline(ms) | Current(ms) | +/- |     %  | Result(P<.05)
---- | ------------ | ----- | ------------ | ----------- | --- | ------ | -------------
0    | Overall      |   250 |           74 |          75 |   1 |   0.69 |
0    | Page Request |   250 |            1 |           1 |   0 |  33.20 |
0    | Rendering    |   250 |           73 |          74 |   0 |   0.25 |
1    | Overall      |   250 |          123 |         121 |  -2 |  -1.87 |        faster
1    | Page Request |   250 |            3 |           2 |   0 | -11.73 |
1    | Rendering    |   250 |          121 |         119 |  -2 |  -1.67 |
2    | Overall      |   250 |           64 |          63 |  -1 |  -1.91 |
2    | Page Request |   250 |            1 |           1 |   0 |   8.81 |
2    | Rendering    |   250 |           63 |          62 |  -1 |  -2.13 |        faster
3    | Overall      |   250 |           97 |          97 |   0 |  -0.06 |
3    | Page Request |   250 |            1 |           1 |   0 |  25.37 |
3    | Rendering    |   250 |           96 |          95 |   0 |  -0.34 |
4    | Overall      |   250 |           97 |          97 |   0 |  -0.38 |
4    | Page Request |   250 |            1 |           1 |   0 |  -5.97 |
4    | Rendering    |   250 |           96 |          96 |   0 |  -0.27 |
5    | Overall      |   250 |           99 |          97 |  -3 |  -2.92 |
5    | Page Request |   250 |            2 |           1 |   0 | -17.20 |
5    | Rendering    |   250 |           98 |          95 |  -3 |  -2.68 |
6    | Overall      |   250 |           99 |          99 |   0 |  -0.14 |
6    | Page Request |   250 |            2 |           2 |   0 | -16.49 |
6    | Rendering    |   250 |           97 |          98 |   0 |   0.16 |
7    | Overall      |   250 |           96 |          95 |  -1 |  -0.55 |
7    | Page Request |   250 |            1 |           2 |   1 |  66.67 |        slower
7    | Rendering    |   250 |           95 |          94 |  -1 |  -1.19 |
8    | Overall      |   250 |           92 |          92 |  -1 |  -0.69 |
8    | Page Request |   250 |            1 |           1 |   0 | -17.60 |
8    | Rendering    |   250 |           91 |          91 |   0 |  -0.52 |
9    | Overall      |   250 |          112 |         112 |   0 |   0.29 |
9    | Page Request |   250 |            2 |           1 |   0 |  -7.92 |
9    | Rendering    |   250 |          110 |         111 |   0 |   0.37 |
10   | Overall      |   250 |          589 |         522 | -67 | -11.38 |        faster
10   | Page Request |   250 |           14 |          13 |   0 |  -1.26 |
10   | Rendering    |   250 |          575 |         508 | -67 | -11.62 |        faster
11   | Overall      |   250 |           66 |          66 |  -1 |  -0.86 |
11   | Page Request |   250 |            1 |           1 |   0 | -16.48 |
11   | Rendering    |   250 |           65 |          65 |   0 |  -0.62 |
12   | Overall      |   250 |          303 |         291 | -12 |  -4.07 |        faster
12   | Page Request |   250 |            2 |           2 |   0 |  12.93 |
12   | Rendering    |   250 |          301 |         289 | -13 |  -4.19 |        faster
13   | Overall      |   250 |           48 |          47 |   0 |  -0.45 |
13   | Page Request |   250 |            1 |           1 |   0 |   1.59 |
13   | Rendering    |   250 |           47 |          46 |   0 |  -0.52 |
```

Here it's clear that this patch *significantly* improves the rendering performance of the slowest pages, while not causing any big regressions elsewhere. As expected, this patch thus helps larger and/or more complex pages the most (which is also where even small improvements will be most beneficial).
There's obviously the question if this is *slightly* regressing simpler pages, but given just how short the times are in most cases it's not inconceivable that the page results above are simply caused be e.g. limited `Date.now()` and/or limited numerical precision.
2019-07-18 17:30:22 +02:00
Tim van der Meij
6e96a158f4
Merge pull request #10820 from vlastimilmaca/annot-irt-rt-states
Annotations - Added parsing of IRT, RT, State and StateModel
2019-07-17 23:34:31 +02:00
vlastimilmaca
fe49f0f766 Annotations - Implement parsing of IRT, RT, State and StateModel 2019-07-16 23:33:07 +02:00
Jonas Jenwald
bea15b6ce5 Simplify the PDFDocument.fingerprint method slightly
The way that this method handles documents without an `ID` entry in the Trailer dictionary feels overly complicated to me. Hence this patch adds `getByteRange` methods to the various Stream implementations[1], and utilize that rather than manually calling `ensureRange` when computing a fallback `fingerprint`.

---
[1] Note that `PDFDocument` is only ever initialized with either a `Stream` or a `ChunkedStream`, hence why the `DecodeStream.getByteRange` method isn't implemented.
2019-07-15 13:26:08 +02:00
Tim van der Meij
13ebfec903
Merge pull request #10969 from Snuffleupagus/api-test-stopAtErrors
Add an API unit-test for the `stopAtErrors` option (PRs 8240 and 8922 follow-up)
2019-07-14 14:47:57 +02:00
Jonas Jenwald
b548bafef7 Simplify, and inline, the finalize function in the MessageHandler class
The `finalize` helper function has only a *single* call-site, and furthermore it's just a one-liner too. Furthermore it's only ever called with a `Promise` as its argument, meaning that it's unnecessarily convoluted as well (i.e. the `Promise.resolve()` part shouldn't be necessary).
Hence this code can be both simplified *and* inlined at its only call-site instead.
2019-07-13 17:54:32 +02:00
Jonas Jenwald
c7fb7116d6 Add an API unit-test for the stopAtErrors option (PRs 8240 and 8922 follow-up)
Also fixes an inconsistency in the 'PageError' handler, for `getOperatorList`, in the API.
2019-07-13 16:06:05 +02:00
Jonas Jenwald
17116917f7 Remove useless wrapReason calls in the MessageHandler class
Currently `wrapReason` is manually called at *every* `resolveOrReject` call-site, despite it being completely unnecessary unless there's an actual error being handled. This is obviously inefficient, and it's easy enough to avoid by having `resolveOrReject` handle this only when actually needed.
2019-07-13 13:08:29 +02:00
Tim van der Meij
ed3954fc7a
Merge pull request #10851 from brendandahl/shading-bbox
Apply bounding box before using shading patterns.
2019-07-12 22:52:07 +02:00
Tim van der Meij
87f36e3520
Merge pull request #10850 from brendandahl/scale-line-width
Scale stroking line width when using a tiling pattern.
2019-07-12 22:50:32 +02:00
Tim van der Meij
28326165ff
Merge pull request #10958 from Snuffleupagus/api-rm-receivingOperatorList
Remove the `intentState.receivingOperatorList` boolean since it's redundant
2019-07-11 23:55:00 +02:00
Jonas Jenwald
9a4d14bf36 Prevent "Uncaught promise" messages in the console when cancelling TextLayer tasks (PR 10601 follow-up)
Since `finally` won't stop error propagation, this causes unnecessary messages to be printed in the console whenever a `TextLayer` task is cancelled.
2019-07-11 11:48:33 +02:00
Jonas Jenwald
ef48a9a713 Update the PageError handler, in the API, to always mark the operatorList as done and finalize any pending renderTasks
Note that, in the old code, there was a code-path which could prevent this from happening thus affecting future cleanup.
Furthermore, ensure that we'll always attempt to cleanup when handling the 'PageError' message, similar to the code in e.g. the `PDFPageProxy._renderPageChunk` method.
2019-07-10 14:23:59 +02:00
Jonas Jenwald
c6fcdf474b Remove the intentState.receivingOperatorList boolean since it's redundant
The `receivingOperatorList` property is currently tracked *twice* in the rendering code, both directly and inversely through the `intentState.operatorList.lastChunk` boolean. This type of double bookkeeping is never a good idea, since it's just too easy for the properties to accidentally fall out of sync.

In this case there's even a `cleanup`-related bug caused by this, which means that `PDFPageProxy._tryCleanup` will never be able to discard any data if there's an error on the worker-thread (as handled through the 'PageError' message).

Hence the simplest solution seems, at least to me, to update `PDFPageProxy._tryCleanup` to replace the `intentState.receivingOperatorList` check with a `!intentState.operatorList.lastChunk` check and completely remove the former property.
2019-07-10 14:23:10 +02:00
Brendan Dahl
6fab0a0dac Apply bounding box before using shading patterns.
Fixes #8092
2019-07-08 14:05:48 -07:00
Brendan Dahl
446efab707 Scale stroking line width when using a tiling pattern. 2019-07-08 13:47:54 -07:00
Jonas Jenwald
bdc31f8b50 Make the find helper function, in src/core/document.js, more efficient by using peekBytes rather reading the stream one byte at a time
*Please note:* A a similar change was attempted in PR 5005, but it was subsequently backed out in PR 5069.

Unfortunately I don't think anyone ever tried to debug *exactly* why it didn't work, since it ought to have worked, and having re-tested this now I'm not able to reproduce the problem any more. However, given just how inefficient the current code is, with thousands of strictly unnecessary function calls for each `find` invocation, I'd really like to try fixing this again.
2019-07-06 11:44:17 +02:00
Jonas Jenwald
41745a5996 Reduce the number of isCmd calls slightly in the XRef class
This reduces the total number of function calls, when reading the XRef table respectively when fetching uncompressed XRef entries.
Note in particular the `XRef.readXRefTable` method, where there're *two* back-to-back `isCmd` checks rather than just one.
2019-06-29 16:28:45 +02:00
Tim van der Meij
7fc329d6e3
Merge pull request #10902 from ahuglajbclajep/tiling-pattern-support
Implement tiling patterns for the SVG back-end
2019-06-28 12:45:02 +02:00
Tim van der Meij
f1867de492
Merge pull request #10925 from Snuffleupagus/eslint_no-unsanitized
Enable the `eslint-plugin-no-unsanitized` ESLint plugin to disallow unsafe usage of e.g. `innerHTML`
2019-06-27 20:32:24 +02:00
ahuglajbclajep
77940dbd86 Implement tiling patterns for the SVG back-end 2019-06-25 16:25:25 +09:00
Jonas Jenwald
f710eb56e4 Change the signature of the Parser constructor to take a parameter object
A lot of the `new Parser()` call-sites look quite unwieldy/ugly as-is, with a bunch of somewhat randomly ordered arguments, which we can avoid by changing the constructor to accept an object instead. As an added bonus, this provides better documentation without having to add inline argument comments in the code.
2019-06-23 16:01:45 +02:00
Jonas Jenwald
5bb5e7741d Enable the eslint-plugin-no-unsanitized ESLint plugin to disallow unsafe usage of e.g. innerHTML
See https://github.com/mozilla/eslint-plugin-no-unsanitized

Since we've generally never allowed e.g. `innerHTML`, which is enforced during review, there's only one linting failure with this patch. (Which is white-listed, according to the existing comment and the fact that it's test-only code.)
2019-06-23 13:50:30 +02:00
Jonas Jenwald
021e5ffb88 Move PDFWorkerStream and related code to its own file
Since all other `IPDFStream` implementations live in their own files, it seems reasonable for these to do so as well.

Furthermore, converts all of the relevant code to ES6 classes and updates the interface definitions to mark a couple of methods `async`.
2019-06-15 13:05:25 +02:00
Tim van der Meij
06b253d609
Merge pull request #10890 from Snuffleupagus/outline-items-hidden
Add support for outline items, in the default viewer, which default to collapsed when the outline is built
2019-06-09 11:35:49 +02:00
Jonas Jenwald
26bc630e19 Add support for outline items, in the default viewer, which default to collapsed when the outline is built
The PDF specification supports this feature, which is commonly used in large/long documents (such as the spec itself), and it seems reasonably straightforward to implement; see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G11.2095911
2019-06-07 12:26:23 +02:00
Jonas Jenwald
625af8d2ad [api-minor] Attempt to reduce memory usage during printing, by always running cleanup once rendering has finished
Given that `cleanupAfterRender` is already set for large images, when handling 'obj' messages, this patch *should* thus be safe in general (since otherwise there ought be existing bugs related to cleanup and printing).
2019-06-03 00:29:17 +02:00
Jonas Jenwald
876c962235 Ignore Annotations with too large border widths, to prevent the annotationLayer from rendering it over the surrounding document (bug 1552113)
The border `width` will instead fallback to the default value of `1`, rather than ignoring it altoghether, to also ensure that e.g. `LinkAnnotation`s become clickable as intended.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1552113
2019-06-01 15:51:22 +02:00
Tim van der Meij
209e42043a
Merge pull request #10873 from Snuffleupagus/worker-terminate-clearPrimitiveCaches
Ensure that the `Cmd`/`Name`/`Ref` caches are cleared when terminating the worker (PR 10863 follow-up)
2019-05-31 12:56:53 +02:00
Jonas Jenwald
a3742a9f83 Ensure that the Cmd/Name/Ref caches are cleared when terminating the worker (PR 10863 follow-up)
Usually when the worker is terminated it will also be completely destroyed/removed, which means that any global caches (such as the ones in `src/core/primitive.js`) should be automatically cleared in the process.

However, for certain ways of loading the `pdf.worker.js` file, e.g. passing in a re-usable worker to `getDocument`, using the `workerPort` functionality, or even disabling workers completely  (even though this is never a good idea), the worker file may be kept in memory and these caches will not be cleared as expected.
2019-05-30 20:57:28 +02:00
Jonas Jenwald
8857a81c8d Re-use, rather than re-creating, some Arrays when resetting them in src/display/api.js
Calling `someArray = []` will create a new Array, which seems completely unnecessary when it's sufficient to just call `someArray.length = 0` to achieve the same effect.

Even though I cannot imagine these particular cases having any noticeable performance impact, similar changes were made in `core/` code years ago since it's apparently more efficient memory wise.
2019-05-30 16:33:05 +02:00
Jani Pehkonen
343b1381a2 Don't clip if path is undefined in SVG back-end 2019-05-28 18:37:15 +03:00
Jonas Jenwald
5e045bcdba Ensure that the Cmd/Name/Ref caches are cleared when running other cleanup code
The purpose of these caches is to reduce peak memory usage, by only ever having *a single* instance of a particular object.
However, as-is these caches are never cleared and they will thus remain until the worker is destroyed. This could very well have a negative effect on total memory usage, particularly for large/long documents, hence it seems to make sense to clear out these caches together with various other ones.
2019-05-26 14:29:59 +02:00
Jonas Jenwald
2fe9f3ff8f Add caching to reduce the number of Ref objects
This is similar to the existing caching used to reduced the number of `Cmd` and `Name` objects.
With the `tracemonkey.pdf` file, this patch changes the number of `Ref` objects as follows (in the default viewer):

|          | Loading the first page | Loading *all* the pages |
|----------|------------------------|-------------------------|
| `master` | 332                    | 3265                    |
| `patch`  | 163                    | 996                     |
2019-05-26 12:23:37 +02:00
Tim van der Meij
bc1eb49a77
Implement creation date only for markup annotations
The specification states that `CreationDate` is only available for
markup annotations instead of for all annotation types.

Moreover, popup annotations are not markup annotations according to the
specification, so the creation date inheritance from the parent
annotation is also removed there (note that only the modification date
is used in e.g., the viewer).
2019-05-25 15:31:06 +02:00
Tim van der Meij
cf07918ccb
Implement contents for every annotation type
The specification states that `Contents` can be available for every
annotation types instead of only for markup annotations.
2019-05-18 15:52:17 +02:00
Tim van der Meij
1421b2f205
Merge pull request #10827 from Snuffleupagus/network-streams-class
Convert the (remaining) network streams to ES6 classes
2019-05-16 22:04:29 +02:00
Jonas Jenwald
f9769af365 Convert network.js to use ES6 classes 2019-05-16 10:08:51 +02:00
Jonas Jenwald
cc661a4d38 Update fetch_stream.js to use const in more places 2019-05-16 09:15:43 +02:00
Jonas Jenwald
737705264b Convert transport_stream.js to use ES6 classes 2019-05-16 09:15:39 +02:00
Jonas Jenwald
0784c98172 Remove unused ref property from the parameters object used when creating annotations in AnnotationFactory._create
The only use-cases for this property was removed in PRs 7570 and 7775, and it's been completely unused ever since the latter one.
2019-05-16 08:33:38 +02:00
Tim van der Meij
c8c937c257
Merge pull request #10794 from janpe2/cidtogidmap-zero
Fix glyph at index zero in CIDFontType2 that has a CIDToGIDMap stream
2019-05-15 00:04:39 +02:00
Jonas Jenwald
173fbef05b Enable the consistent-return ESLint rule
This rule is already enabled in mozilla-central, and helps ensure more consistent functions/methods, see https://searchfox.org/mozilla-central/rev/b9da45f63cb567244933c77b2c7e827a057d3f9b/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#119-120

Please see https://eslint.org/docs/rules/consistent-return for additional information.
2019-05-11 14:27:21 +02:00
Jani Pehkonen
05c527f035 Fix glyph 0 in CIDFontType2 that has a CIDToGIDMap stream 2019-05-07 18:44:37 +03:00
Tim van der Meij
be1d6626a7
Implement creation/modification date for annotations
This includes the information in the core and display layers. The
date parsing logic from the document properties is rewritten according
to the specification and now includes unit tests.

Moreover, missing unit tests for the color of a popup annotation have
been added.

Finally the styling of the popup is changed slightly to make the text a
bit smaller (it's currently quite large in comparison to other viewers)
and to make the drop shadow a bit more subtle. The former is done to be
able to easily include the modification date in the popup similar to how
other viewers do this.
2019-05-05 14:51:03 +02:00
Jonas Jenwald
007fab6ab5 Change PartialEvaluator.handleColorN to throw when no valid pattern is found
Currently `handleColorN` will fallback to add a completely unparsed/unvalidated operator when no valid pattern was found. This is unfortunate, since it could very easily lead to a couple of different errors:
 - `DataCloneError`s when attempting to send the data to the main-thread, e.g. when `args` is `Dict`/`Stream`.
 - Errors in `getShadingPatternFromIR` on the main-thread, unless `args` just happens to have the expected format.
 - Errors when actually attempting to render the pattern on the main-thread, since the `args` will most likely not have the expected format.

Hence it probably makes sense to error in `PartialEvaluator.handleColorN`, and having invalid patterns fail gracefully via the existing `ignoreErrors` code-paths instead.
2019-05-04 12:53:18 +02:00
Tim van der Meij
155304a0c1
Merge pull request #10756 from Snuffleupagus/issue-10542
Attempt to handle corrupt PDF documents that contains path operators inside of text object (issue 10542)
2019-05-02 22:29:24 +02:00
Jonas Jenwald
96942d4f7f Ensure that the OperatorList constructor actually initializes a NullOptimizer when intended (PR 9089 follow-up)
It appears that this has been broken ever since PR 9089, which also introduced this code, since the `QueueOptimizer`/`NullOptimizer` choice was made based on the still undefined `this.intent` property.

Furthermore, fixing this also uncovered the fact that the `NullOptimizer.reset` method was missing.
2019-05-02 17:37:05 +02:00
Jonas Jenwald
5335285cda Attempt to handle corrupt PDF documents that contains path operators inside of text object (issue 10542)
First of all, while this simple approach appears to work OK in practice I'm not sure if it's the best way of addressing the problem (assuming that you even want to).
Second of all, while the solution implemented here only requires tracking/checking one new boolean in order for this to work, I'm nonetheless not entirely happy about this since it will add additional overhead (albeit *very* small) to the parsing of path operators in PDF documents just for a handful of *corrupt* ones.
2019-04-30 23:35:33 +02:00
Tim van der Meij
762c58e0fc
Merge pull request #10738 from Snuffleupagus/ViewerPreferences-api
[api-minor] Add support for ViewerPreferences in the API (issue 10736)
2019-04-20 18:39:32 +02:00
Jonas Jenwald
34952b732e Add a getDocId method to the idFactory, in Page instances, to avoid passing around PDFManager instances unnecessarily (PR 7941 follow-up)
This way we can avoid manually building a "document id" in multiple places in `evaluator.js`, and it also let's us avoid passing in an otherwise unnecessary `PDFManager` instance when creating a `PartialEvaluator`.
2019-04-20 13:11:17 +02:00
Tim van der Meij
55d9b35d37
Merge pull request #10727 from Snuffleupagus/type3-image-resources
Support (rare) Type3 fonts which contains image resources (issue 10717)
2019-04-18 23:07:26 +02:00
Jonas Jenwald
5e9b606e7b [Firefox] Avoid displaying the indeterminate loadingBar when disableStream=true is set (PR 10714 follow-up)
While PR 10714 did address the `disableRange=true` case, it also managed to "break" the `disableStream=true` case instead since the indeterminate loadingBar is now displayed when it shouldn't; sorry about that!
The solution is simple enough though, don't attempt to fallback to `_fullRequestReader.onProgress` when handling "incomplete" loading information.
2019-04-16 15:35:42 +02:00
Jonas Jenwald
311bac3ebb [api-minor] Add support for ViewerPreferences in the API (issue 10736)
Please see the specification, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#M11.9.12864.1Heading.71.Viewer.Preferences

Furthermore, note that this patch *only* adds API support and unit-tests but does not attempt to integrate e.g. the `ViewerPreferences -> Direction` property into the viewer (which would be necessary to address issue 10736).
The reason for this is that it's not entirely clear to me exactly if/how that could be implemented; e.g. would it be as simple as setting the `dir` attribute on the `viewerContainer` DOM element, or will it be more complicated?
There's also the question of how the `ViewerPreferences -> Direction` value interacts with the `PageMode`, and this will generally require a fair bit of manual testing. Since the direction of the *entire* viewer depends on the browser locale, there's also a somewhat open question regarding what default value to use for different locales.
Finally, if the viewer supports `ViewerPreferences -> Direction` then I'm assuming that it will be necessary to allow users to override the default value, which will require (most likely) new `SecondaryToolbar` buttons and icons for those etc.

Hence this patch only lays the necessary foundation for eventually addressing issue 10736, but defers the actual implementation until later. (Time permitting, I'll try to look into the viewer part later.)
2019-04-14 14:20:52 +02:00
Tim van der Meij
ae2a4dc3dd
Implement free text annotations 2019-04-13 18:45:22 +02:00
Jonas Jenwald
be604bd195 Support (rare) Type3 fonts which contains image resources (issue 10717)
The Type3 font type is not commonly used in PDF documents, as can be seen from telemetry data such as: https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2019-04-09&include_spill=0&keys=__none__!__none__!__none__&max_channel_version=nightly%252F68&measure=PDF_VIEWER_FONT_TYPES&min_channel_version=nightly%252F57&processType=*&product=Firefox&sanitize=1&sort_by_value=0&sort_keys=submissions&start_date=2019-03-18&table=0&trim=1&use_submission_date=0 (see also https://github.com/mozilla/pdf.js/wiki/Enumeration-Assignments-for-the-Telemetry-Histograms#pdf_viewer_font_types).

Type3 fonts containing image resources are *very* rare in practice, usually they only contain path rendering operators, but as the issue shows they unfortunately do exist.
Currently these Type3-related image resources are not handled in any special way, and given that fonts are document rather than page specific rendering breaks since the image resources are thus not available to the *entire* document.
Fortunately fixing this isn't too difficult, but it does require adding a couple of Type3-specific code-paths to the `PartialEvaluator`. In order to keep the implementation simple, particularily on the main-thread, these Type3 image resources are completely decoded on the worker-thread to avoid adding too many special cases. This should not cause any issues, only marginally less efficient code, but given how rare this kind of Type3 font is adding premature optimizations didn't seem at all warranted at this point.
2019-04-13 18:27:50 +02:00
Tim van der Meij
17de90b88a
Merge pull request #10694 from Snuffleupagus/main-thread-progressiveDataLength
Avoid dispatching range requests to fetch PDF data that's already loaded with streaming (PR 10675 follow-up)
2019-04-13 17:15:01 +02:00
Tim van der Meij
2d0c38d626
Merge pull request #10696 from Snuffleupagus/makeSubStream-ensureByte
Update `ChunkedStream.makeSubStream` to actually check if (some) data exists when the `length` parameter is undefined
2019-04-13 17:12:20 +02:00
Jonas Jenwald
a7273c8efe Avoid dispatching range requests to fetch PDF data that's already loaded with streaming (PR 10675 follow-up)
*Please note:* This patch purposely ignores `src/display/network.js`, since its support for progressive reading depends on the non-standard `moz-chunked-arraybuffer` responseType which is currently in the process of being removed.
2019-04-13 00:26:13 +02:00
Vlastimil Máca
d96267c30c Annotations - _preparePopup method replaced with MarkupAnnotation base class. This is just refactoring, so it shouldn't break anything. It should move annotation API closer to PDF spec and enable future expansion. 2019-04-12 11:24:21 +02:00
Tim van der Meij
4055d0a302
Implement caret annotations
The file `test/pdfs/annotation-caret-ink.pdf` is already available in
the repository as a reference test for this since I supplied it for
another patch that implemented ink annotations.
2019-04-09 23:39:56 +02:00
Tim van der Meij
ce62373db3
Merge pull request #10674 from timvandermeij/svg-backend-es6
Convert `src/display/svg.js` to ES6 syntax and implement `setRenderingIntent` and `setFlatness` for the SVG backend
2019-04-06 17:15:14 +02:00
Tim van der Meij
5a03b1c0d7
Optimize convertOpList in svg.js by computing the operator ID mapping only once
There is no need to recompute this for every operator list we encounter.
2019-04-06 16:57:31 +02:00
Tim van der Meij
2b18e5a355
Implement setRenderingIntent and setFlatness for the SVG backend
This mirrors the canvas implementation where we ignore these operators.
This avoids console spam regarding unimplemented operators we're not
interested in.

For the Tracemonkey paper, we're now down to one warning about tiling
patterns which is in fact a valid one.
2019-04-06 16:57:30 +02:00
Tim van der Meij
47d3620d5a
Convert src/display/svg.js to ES6 syntax
In particular, this should reduce intermediate string creation by using
template strings and reduce variable lookup times by removing unneeded
variables and caching `this.current` in more places.
2019-04-06 16:57:30 +02:00
Jonas Jenwald
f0a28b3c0d [Firefox] Ensure that loading progress is reported, and the loadingBar updated, when disableRange=true is set
With PR 10675 having fixed the completely broken `disableRange=true` setting in the Firefox version of PDF.js, I couldn't help but noticing that loading progress is never reported properly in that case.
Currently loading progress is only reported for the `rangeProgress` chrome-event, which obviously isn't dispatched with `disableRange=true` set. However, the `progressiveRead` chrome-event includes loading progress as well, but this information isn't being used in any way.
Furthermore, the `PDFDataRangeTransport.onDataProgress` method wasn't able to handle "complete" loading information, and neither was `PDFDataTransportStream._onProgress` since that method would only ever attempt to report it through a RangeReader (which won't exist when `disableRange=true` is set).
2019-04-06 12:53:33 +02:00
Tim van der Meij
b161050df4
Merge pull request #10709 from Snuffleupagus/pageLayout
[api-minor] Add basic support for PageLayout in the API and the viewer
2019-04-05 23:07:32 +02:00
Tim van der Meij
8c8738ea47
Merge pull request #10678 from Snuffleupagus/rm-moz-chunked-arraybuffer
Remove `moz-chunked-arraybuffer` support, and related code, from `src/display/network.js`
2019-04-05 22:52:28 +02:00
Jonas Jenwald
7a999d1d67 [api-minor] Add basic support for PageLayout in the API and the viewer
Please see the specification, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.2393749, and refer to the inline comments for additional details.
2019-04-05 11:32:01 +02:00
Tim van der Meij
57abddc9ca
Merge pull request #10713 from Snuffleupagus/rm-JSDoc-annotation
Remove `src/core/annotation.js` from the `gulp jsdoc` build target
2019-04-04 23:15:02 +02:00
Tim van der Meij
072c5864fb
Merge pull request #10675 from Snuffleupagus/PDFDataTransportStream-disableRange
[Firefox regression] Fix `disableRange=true` bug in `PDFDataTransportStream`
2019-04-04 23:07:45 +02:00
Jonas Jenwald
f666395c24 Remove src/core/annotation.js from the gulp jsdoc build target
Note how at https://mozilla.github.io/pdf.js/api/ it's being described as API docs, however `src/core/annotation.js` is not part of the public API.
Furthermore, given that the code residing in the `src/core/` folder is run in a worker-thread, it's not even accessible on the main-thread (since `postMessage` is being used to transfer the data).
Hence the different API methods simply returns a "proxy" to the underlying data, but not actually the same objects and data structures as in the worker-thread itself; thus it doesn't make a whole lot of sense to expose this in API docs as far as I'm concerned.

Finally, the patch fixes a small JSDoc related typo in `src/display/api.js` when referring to the `TextStyle` typedef.
2019-04-04 18:03:08 +02:00
Jonas Jenwald
b40e6723be Remove moz-chunked-arraybuffer support, and related code, from src/display/network.js
The `moz-chunked-arraybuffer` responseType is a non-standard property, which has been subsumed by the Fetch API, and it's in the process of being removed from Firefox; please see https://bugzilla.mozilla.org/show_bug.cgi?id=1120171 and https://bugzilla.mozilla.org/show_bug.cgi?id=1411865

*Please note:* Rather than waiting for both `Fetch` *and* `ReadableStream` to be available in e.g. a Firefox ESR version (which is probably going to be 68 at the earliest), let's just decide that PDF.js release `2.1.266` will be the last one with `moz-chunked-arraybuffer` support and land this patch (since nothing should outright break without it anyway).
2019-04-01 20:48:51 +02:00
Jonas Jenwald
c6ddbd55e2 Add a progressiveDataLength fast-path to ChunkedStream.ensureByte
This is *similar* to the existing check using in `ChunkedStream.ensureRange`.
2019-03-29 20:00:28 +01:00
Jonas Jenwald
49e8a270c4 Update ChunkedStream.makeSubStream to actually check if (some) data exists when the length parameter is undefined
Note how `XRef.fetchUncompressed`, which is used *a lot* for most PDF documents, is calling the `makeSubStream` method without providing a `length` argument.
In practice this results in the `makeSubStream` method, on the `ChunkedStream` instance, calling the `ensureRange` method with `NaN` as the end position,  thus resulting in no data being requested despite it possibly being necessary.

This may be quite bad, since in this particular case it will lead to a new `ChunkedStream` being created *and* also a new `Parser`/`Lexer` instance. Given that it's quite possible that even the very first `Parser.getObj` call could throw `MissingDataException`, this could thus lead to wasted time/resources (since re-parsing is necessary once the data finally arrives).

You obviously need to be very careful to not have `ChunkedStream.makeSubStream` accidentally requesting the *entire* file, hence its `this.end` property is of no use here, but it should be possible to at least check that the `start` of the data is present before any potentially expensive parsing occurs.
2019-03-29 17:20:31 +01:00
Tim van der Meij
b4c3b94592
Merge pull request #6606 from Rob--W/pattern-scaling
Improve performance and correctness of Tiling Patterns
2019-03-29 00:01:38 +01:00
Tim van der Meij
f9c58115fc
Merge pull request #10683 from janpe2/type0-noncid-cmap
Use CMap in Type0 fonts when CFF is not a CID font
2019-03-28 00:07:08 +01:00
Rob Wu
5985d4069a TilingPattern: Add comment to explain the implementation 2019-03-27 17:50:46 +01:00
Rob Wu
d3dc8f16b5 TilingPattern: Reverse transform after painting
This transform resulted in an incorrectly positioned object when the
bounding box's upper-left corner did not start at (0,0), because
the translation was not reverted. This patch adds the missing transform.

The test file (tiling-pattern-box.pdf) is based on the PDF from #2825.
All but the first cube (including the PDF data) have been removed.
To trigger the bug that is fixed by this commit, I changed the BBox of
the first pattern from "[ 0 0 596 842]" to "[90 0 596 842]". Without
this patch, the dashed vertical line that intersects the corners at A
and E would disappear.
2019-03-27 17:50:35 +01:00
Rob Wu
a72a8e921f Avoid extreme sizing / scaling in tiling pattern
The new test file (tiling-pattern-large-steps.pdf) was manually created,
to have the following characteristics:
- Large xstep and ystep (90000)
- Page width is 4000 (which is larger than MAX_PATTERN_SIZE)
- Visually, the page consists of a red rectangle with a black border,
  surrounded by a 50 unit white padding.
- Before patch: blurry; After patch: sharp

Fixes #6496
Fixes #5698
Fixes #1434
Fixes #2825
2019-03-27 17:44:04 +01:00
Jonas Jenwald
9077abc263 Take the FirstChar/LastChar properties into account when computing the hash in PartialEvaluator.preEvaluateFont (issue 10665)
Without this some fonts may incorrectly end up with matching `hash`es, thus breaking rendering since we'll not actually try to load/parse some of the fonts.
2019-03-27 16:27:10 +01:00
Jonas Jenwald
a2a824ed01 Don't accidentally use an empty hash value when comparing preEvaluatedFonts in PartialEvaluator.loadFont
Note that `PartialEvaluator.preEvaluateFont` will return an empty string when no hash was computed. This will complete short-circuit the `fontAlias` comparison in `PartialEvaluator.loadFont`, since fonts which are totally different will then match if their `hash`es are empty.
2019-03-27 00:54:39 +01:00
Jani Pehkonen
49c6233fbc Use CMap in Type0 fonts when CFF is not a CID font 2019-03-26 19:38:44 +02:00
Rob Wu
60d4685c10 Refactor TilingPattern
- Deduplicate size/scale calculation, by introducing `getSizeAndScale`.
- Eliminate unnecessary calculations / variables.
2019-03-26 17:35:23 +01:00
Jonas Jenwald
bb384dd5ed [Firefox regression] Fix disableRange=true bug in PDFDataTransportStream
Currently if trying to set `disableRange=true` in the built-in PDF Viewer in Firefox, either through `about:config` or via the URL hash, the PDF document will never load. It appears that this has been broken for a couple of years, without anyone noticing.

Obviously it's not a good idea to set `disableRange=true`, however it seems that this bug affects the PDF Viewer in Firefox even with default settings:
 - In the case where `initialData` already contains the *entire* file, we're forced to dispatch a range request to re-fetch already available data just so that file loading may complete.
 - (In the case where the data arrives, via streaming, before being specifically requested through `requestDataRange`, we're also forced to re-fetch data unnecessarily.) *This part was removed, to reduce the scope/risk of the patch somewhat.*

In the cases outlined above, we're having to re-fetch already available data thus potentially delaying loading/rendering of PDF files in Firefox (and wasting resources in the process).
2019-03-26 16:34:13 +01:00
wuhao.daraw
1472c10bab fix: electron enviroment detection 2019-03-26 20:52:49 +08:00
Tim van der Meij
33bfbef6ba
Merge pull request #10635 from timvandermeij/lexer-parser
Convert `src/core/parser.js` to ES6 syntax and write more unit tests for the lexer and the parser
2019-03-19 23:17:34 +01:00
Tim van der Meij
7d3cb19571
Convert the Linearization class in src/core/parser.js to ES6 syntax
Moreover, disable `var` usage for this file.
2019-03-17 13:27:45 +01:00
Tim van der Meij
ee3cfb7986
Merge pull request #10646 from terurou/svg-fill
Implement linear-gradient, radial-gradient and dummy-pattern in SVGGraphics.
2019-03-17 13:13:45 +01:00
terurou
9c70a3831c Fix to use radicalGradient. 2019-03-17 10:57:16 +09:00
Tim van der Meij
7c9f1cc518
Merge pull request #10644 from Snuffleupagus/revokeObjectURL
Ensure that `blob:` URLs will be revoked when pages are cleaned-up/destroyed (JPEG memory usage)
2019-03-16 19:29:23 +01:00
terurou
c970a4b6ae Fix copy-paste mistake. 2019-03-16 23:21:56 +09:00
Jonas Jenwald
56eeeea1dc Re-factor the getTransfers helper function into a "private" getter method on the OperatorList
This function is currently called with the `OperatorList` instance as its argument, hence I cannot think of any good reason for not just moving it into the `OperatorList` properly. (This will also help with other planned changes regarding the `ImageCache` functionality.)
2019-03-16 13:06:51 +01:00
Jonas Jenwald
7273795eb6 Actually transfer eligible ImageMask data, rather than always copying it
By transfering `ArrayBuffer`s you can avoid having two copies of the same data, i.e. one copy on each of the worker/main-thread, for data that's used only *once* on the worker-thread.

Note how the code in [`PDFImage.createMask`](80135378ca/src/core/image.js (L284-L285)) goes to great lengths to actually enable tranfering of the image data. However in [`PartialEvaluator.buildPaintImageXObject`](80135378ca/src/core/evaluator.js (L336)) the `cached` property is always set to `true`, which disqualifies the image data from being transfered; see [`getTransfers`](80135378ca/src/core/operator_list.js (L552-L554)).

For most ImageMask data this patch won't matter, since images found in the `/Resources -> /XObject` dictionary will always be indexed by name. However for *inline* images which contains ImageMask data, where only "small" images are cached (in both `parser.js` and `evaluator.js`), the current code will result in some unnecessary memory usage.
2019-03-16 13:06:32 +01:00
terurou
fc0f844539 Implement linear-gradient, radial-gradient and dummy-pattern in SVGGraphics. 2019-03-16 13:56:29 +09:00
Jonas Jenwald
88d5750030 Remove the src attribute from Image objects used with natively supported JPEG images, when pages are cleaned-up/destroyed
This will further help reduce the amount of image data that's currently being held alive, by explicitly removing the `src` attribute.

Please note that this is mostly relevant for browsers which do not support `URL.createObjectURL`, or where `disableCreateObjectURL` was manually set by the user, since `blob:` URLs will be revoked (see the previous patch).
However, using `about:memory` (in Firefox) it does seem that this may also be generally helpful, given that calling `URL.revokeObjectURL` won't invalidate the image data itself (as far as I can tell).
2019-03-15 15:25:48 +01:00
Jonas Jenwald
983b25f863 Ensure that blob: URLs will be revoked when pages are cleaned-up/destroyed
Natively supported JPEG images are sent as-is, using a `blob:` or possibly a `data` URL, to the main-thread for loading/decoding.
However there's currently no attempt at releasing these resources, which are held alive by `blob:` URLs, which seems unfortunately given that images can be arbitrarily large.

As mentioned in https://developer.mozilla.org/en-US/docs/Web/API/URL/createObjectURL the lifetime of these URLs are tied to the document, hence they are not being removed when a page is cleaned-up/destroyed (e.g. when being removed from the `PDFPageViewBuffer` in the viewer).
This is easy to test with the help of `about:memory` (in Firefox), which clearly shows the number of `blob:` URLs becomming arbitrarily large *without* this patch. With this patch however the `blob:` URLs are immediately release upon clean-up as expected, and the memory consumption should thus be considerably reduced for long documents with (simple) JPEG images.
2019-03-15 10:40:58 +01:00
Tim van der Meij
80135378ca
Merge pull request #10636 from Snuffleupagus/PDFDocumentProxy-destroy
Small clean-up of the `PDFDocumentProxy.destroy` method and related code
2019-03-13 23:46:41 +01:00
Jonas Jenwald
24fc4f83ca Small clean-up of the PDFDocumentProxy.destroy method and related code
Note how `PDFDocumentProxy.destroy` is a nothing more than an alias for `PDFDocumentLoadingTask.destroy`. While removing the latter method would be a breaking API change, there's still room for at least some clean-up here.

The main changes in this patch are:
 - Stop providing a `PDFDocumentLoadingTask` instance *separately* when creating a `PDFDocumentProxy`, since the loadingTask is already available through the `WorkerTransport` instance.
 - Stop tracking the `PDFDocumentProxy` instance on the `WorkerTransport`, since that property is completely unused.
 - Simplify the 'Multiple `getDocument` instances' unit-tests by only destroying *once*, rather than twice, for each document.
2019-03-12 13:25:29 +01:00
Jonas Jenwald
88f9e633dd Try to improve text-selection for Type3 fonts that utilize a non-default /FontMatrix (bug 1513120)
For Type3 fonts text-selection is often not that great, and there's a couple of heuristics used to try and improve things. This patch simple extends those heuristics a bit, and fixes a pre-existing "naive" array comparison, but this all feels a bit brittle to say the least.

The existing Type3 test-coverage isn't that great in general, and in particular Type3 `text` tests are few and far between, hence why this patch adds *two* different new `text` tests.
2019-03-12 10:32:08 +01:00
Tim van der Meij
8d4d7dbf58
Convert the Lexer class in src/core/parser.js to ES6 syntax 2019-03-10 19:04:36 +01:00
Tim van der Meij
7d0ecee771
Convert the Parser class in src/core/parser.js to ES6 syntax 2019-03-10 19:04:35 +01:00
Tim van der Meij
d587abbceb
Merge pull request #10633 from Snuffleupagus/murmurhash-class
Convert `MurmurHash3_64` to an ES6 class
2019-03-09 21:07:12 +01:00
Jonas Jenwald
6b1ac44aea Convert MurmurHash3_64 to an ES6 class
Notable changes:
 - Remove the `return this;` from the `MurmurHash3_64.update` method, since it's completely unused and doesn't make a lot of sense.
 - Remove the loop(s) from the `MurmurHash3_64.hexdigest` method, since creating a temporary array and then looping over it is wasteful given how simple this can be written with modern JavaScript.
2019-03-09 17:03:06 +01:00
Jonas Jenwald
2665502055 Move NativeImageDecoder into a separate file, and convert it to a class
Given the size of the `src/core/evaluator.js` file, it cannot hurt to move some of its (image related) helper functionality into a separate file.
2019-03-09 15:59:04 +01:00
Tim van der Meij
e41c4aece4
Merge pull request #10621 from janpe2/svg-Tm-stroke
Don't scale SVG stroke width by text matrix
2019-03-08 23:16:10 +01:00
Tim van der Meij
8b149b818e
Merge pull request #10615 from Snuffleupagus/corrupt-inline-ASCII85Decode
Handle corrupt ASCII85Decode inline images with whitespace "inside" of the EOD marker (issue 10614)
2019-03-08 23:06:01 +01:00
Tim van der Meij
e1b01a601c
Merge pull request #10605 from timvandermeij/display-utils
Convert `let` to `const` if possible in, and improve unit test coverage for, `src/display/display_utils.js`
2019-03-06 23:46:53 +01:00
Tim van der Meij
87a70f3359
Convert let to const if possible in src/display/display_utils.js
Finally, `var` usage is removed.
2019-03-06 23:41:54 +01:00
Jani Pehkonen
d9e30b3452 Don't scale SVG stroke width by text matrix 2019-03-05 22:54:25 +02:00
Jonas Jenwald
3ce8fe7927 Handle corrupt ASCII85Decode inline images with whitespace "inside" of the EOD marker (issue 10614)
There's a number of things wrong with the PDF document, since its inline images are first all *a lot* larger than the 4 KB limit (as mandated by the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.1852045).

Furthermore the actual ASCII85Decode data is interspersed with *a lot* of needless whitespace, in particular also "inside" of the EOD (end-of-data) marker which thus completely breaks the detection.
Note that according to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.1940130, this patch should be safe since it explicitly mentions that *all* whitespace should be ignored.
2019-03-04 23:41:36 +01:00
Jonas Jenwald
7caf769a66 Move the deprecated helper function to the src/display/display_utils.js file
Given that the function is (purposely) independent of the verbosity level and that its message is worded to only apply on the main-thread, there's no reason to duplicate this across the built `pdf.js`/`pdf.worker.js` files.
2019-03-02 20:23:56 +01:00
Jonas Jenwald
4170c414fa Reduce usage of Date.now() in src/core/worker.js
Currently for every single parsed/rendered page there's no less than *four* `Date.now()` calls being made on the worker-side. This seems totally unnecessary, since the result of these calls are, by default, not used for anything *unless* the verbosity level is set to `INFO`.
2019-03-02 20:23:52 +01:00
Tim van der Meij
c43396c2b7
Merge pull request #10590 from janpe2/svg-missing-moveto
Fix missing moveTos in SVG paths
2019-03-02 14:43:53 +01:00
Tim van der Meij
4f13eb00d0
Merge pull request #10604 from brendandahl/fix-type1-charset
Put the string name of the glyph in the charset array.
2019-03-02 13:03:16 +01:00
Brendan Dahl
7d6ab081eb Put the string name of the glyph in the charset array.
Also, only warn once per font when missing a glyph name.
2019-03-01 18:03:51 -08:00
Jonas Jenwald
d7d1f23826 Zero the width/height of the temporary canvas used during TextLayer rendering
The default size of these canvases seem to be `300 x 150` (two orders of magnitude larger than the ones in PR 10597), which probably is sufficient enough to matter since there's one such canvas for each textLayer that's rendered in the viewer.

Also fixes the incorrect rejection reason, i.e. one using a string rather than an `Error`, in the `TextLayerRenderTask.cancel` method.
2019-03-01 04:05:37 +01:00
Brendan Dahl
34022d2fd1
Merge pull request #10591 from brendandahl/fix-charset
Add unique glyph names for CFF fonts.
2019-02-28 17:22:29 -08:00
Tim van der Meij
9559d57636
Merge pull request #10595 from Snuffleupagus/JpegDecode-zero-tmpCanvas
Zero the width/height of the temporary canvas used during `JpegDecode` (issue 10594)
2019-02-28 23:41:22 +01:00
Tim van der Meij
39fa26ea33
Merge pull request #10597 from Snuffleupagus/isFontSubpixelAAEnabled-canvas-cleanup
Ensure that the temporary canvas created in `CanvasGraphics.isFontSubpixelAAEnabled` will be cleared
2019-02-28 23:37:24 +01:00
Tim van der Meij
af5597b7e5
Merge pull request #10573 from Snuffleupagus/type3-avoid-truncation
Avoid truncating/breaking some Type3 glyphs in `compileType3Glyph` (bug 1245391, issue 10568)
2019-02-28 23:25:45 +01:00
Jonas Jenwald
b61b4d3229 Ensure that the temporary canvas created in CanvasGraphics.isFontSubpixelAAEnabled will be cleared
While this particular canvas may be small, there can still be an arbitrarily large number of them (one per page rendered), which can/will eventually add up memory wise. This can be easily avoided by using the `cachedCanvases` abstraction instead, which will ensure that the `isFontSubpixelAAEnabled` canvas is removed together with other temporary canvases in `CanvasGraphics.endDrawing`.
2019-02-28 14:18:38 +01:00
Jonas Jenwald
4687cc85ac Zero the width/height of the temporary canvas used during JpegDecode (issue 10594) 2019-02-28 12:23:34 +01:00
Brendan Dahl
8a596ef5d5 Add unique glyph names for CFF fonts.
Printing on MacOS was broken with the previous approach of just mapping
all the glyphs to notdef.
2019-02-27 15:00:29 -08:00
Jonas Jenwald
f664e074c9 Avoid using the Fetch API, in GENERIC builds, for unsupported protocols (issue 10587) 2019-02-27 13:04:20 +01:00
Jonas Jenwald
cbc07f985b Load built-in CMap files using the Fetch API when possible 2019-02-27 13:04:19 +01:00
Jani Pehkonen
52e8e9b059 Fix missing moveTos in SVG paths 2019-02-26 20:00:35 +02:00
Jonas Jenwald
3a09a2f7a5 Update the year in the license_header files 2019-02-24 00:35:42 +01:00
Jonas Jenwald
db5dc14158 Move worker-thread only functions from src/shared/util.js and into a new src/core/core_utils.js file
The `src/shared/util.js` file is being bundled into both the `pdf.js` and `pdf.worker.js` files, meaning that its code is by definition duplicated.
Some main-thread only utility functions have already been moved to a separate `src/display/display_utils.js` file, and this patch simply extends that concept to utility functions which are used *only* on the worker-thread.

Note in particular the `getInheritableProperty` function, which expects a `Dict` as input and thus *cannot* possibly ever be used on the main-thread.
2019-02-24 00:35:39 +01:00
Jonas Jenwald
a1f7517996 Rename the src/display/dom_utils.js file to src/display/display_utils.js
This file (currently) contains not only DOM-specific helper functions/classes, but is used generally for various helper code relevant for main-thread functionality.
2019-02-23 16:30:16 +01:00
Jonas Jenwald
fb774a65b0 Avoid truncating/breaking some Type3 glyphs in compileType3Glyph (bug 1245391, issue 10568)
*Hopefully this patch makes sense, since I cannot claim to fully understand this function.*

With the changes made in PR 3354 *some* Type3 glyph outlines are no longer rendering correctly, since the final paths were being accidentally ignored.
The fact that Type3 fonts are not very common in PDF documents, and that most Type3 glyphs are unaffected by this regression, probably explains why this has gone unnoticed since 2013.
2019-02-21 23:29:43 +01:00
Jonas Jenwald
60f6d49ff7 [api-minor] Expose the existence of a Collection dictionary via the getMetadata API method (issue 10555)
Given the complexity of this functionality, and the fact that it doesn't seem widely used, I highly doubt that it'd ever make sense to support Collections; see also https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#M11.9.39646.2Heading.824.Collections
2019-02-15 15:40:31 +01:00
Jonas Jenwald
b6d090cc14 Fallback to the built-in font renderer when font loading fails
After PR 9340 all glyphs are now re-mapped to a Private Use Area (PUA) which means that if a font fails to load, for whatever reason[1], all glyphs in the font will now render as Unicode glyph outlines.
This obviously doesn't look good, to say the least, and might be seen as a "regression" since previously many glyphs were left in their original positions which provided a slightly better fallback[2].

Hence this patch, which implements a *general* fallback to the PDF.js built-in font renderer for fonts that fail to load (i.e. are rejected by the sanitizer). One caveat here is that this only works for the Font Loading API, since it's easy to handle errors in that case[3].

The solution implemented in this patch does *not* in any way delay the loading of valid fonts, which was the problem with my previous attempt at a solution, and will only require a bit of extra work/waiting for those fonts that actually fail to load.

*Please note:* This patch doesn't fix any of the underlying PDF.js font conversion bugs that's responsible for creating corrupt font files, however it does *improve* rendering in a number of cases; refer to this possibly incomplete list:

[Bug 1524888](https://bugzilla.mozilla.org/show_bug.cgi?id=1524888)
Issue 10175
Issue 10232

---
[1] Usually because the PDF.js font conversion code wasn't able to parse the font file correctly.

[2] Glyphs fell back to some default font, which while not accurate was more useful than the current state.

[3] Furthermore I'm not sure how to implement this generally, assuming that's even possible, and don't really have time/interest to look into it either.
2019-02-11 10:27:08 +01:00
Jonas Jenwald
13230a1123 Remove the ability to pass in more than one font to BaseFontLoader.bind
- The only existing call-site, of this method, is never passing more than *one* font at a time anyway.
 - As far as I can remember, this functionality has never actually been used (caveat: I didn't check the git history).
 - This allows simplification of the method, especially by making use of the fact that it's now asynchronous.
 - It should be just as easy to call `BaseFontLoader.bind` from within a loop, rather than having the loop in the method itself.
2019-02-10 21:09:57 +01:00
Jonas Jenwald
af3fcca88d Convert BaseFontLoader.bind to be async, and only utilize BaseFontLoader._queueLoadingCallback when actually necessary
Currently all fonts are using the `_queueLoadingCallback` method to determine when they have been loaded[1]. However in most cases this is just adding unnecessary overhead, especially with `BaseFontLoader.bind` now being asynchronous, given how fonts are loaded:
 - For fonts loaded using the Font Loading API, it's already possible to easily tell when a font has been loaded simply by checking the `loaded` promise on the FontFace object itself.
 - For browsers, e.g. Firefox, which support synchronous font loading it's already assumed that fonts are immediately available.

Hence the `_queueLoadingCallback` method is moved into the `GenericFontLoader`, such that it's only utilized for fonts which are loaded using CSS.

---
[1] In the "fonts loaded using CSS" case, this is already a hack anyway as outlined in the comments.
2019-02-10 21:09:57 +01:00
Tsukasa OI
96ba6afd47 Fix copying on supplementary plane characters
pdf.js had a problem when copying characters on supplementary planes
(0xPPXXXX where PP is nonzero).  This is because certain methods of
PartialEvaluator use classic String.fromCharCode instead of ES6's
String.fromCodePoint.

Despite the fact that readToUnicode method *tried* to parse out-of-UCS2
code points by parsing UTF-16BE, it was inadequate because
String.fromCharCode only supports UCS-2 range of Unicode.
2019-02-10 18:14:53 +09:00
Jonas Jenwald
3bcf9187ec Add a polyfill for classList.{add, remove} with more than one parameter
Unsurprisingly IE11 doesn't support this, so a polyfill is needed since otherwise the sidebar can no longer be opened.

Also, simplifies the existing `classList.toggle` polyfill.
2019-02-08 13:35:01 +01:00
Jonas Jenwald
614e502227 [api-minor] Remove the document.currentScript polyfill
This polyfill is currently used in only *one* file, i.e. `src/display/api.js`, and only when trying to build a *fallback* `workerSrc` path.

Given that the global `workerSrc` should *always* be set[1] when using the PDF.js library[2], and that the fallback `workerSrc` should only be regarded as a best-effort solution anyway, there isn't a particularily strong reason to keep the compatibility code in my opinion.

---
[1] Other supported options include setting the global `workerPort`, or passing in a `PDFWorker` instance as part of the `getDocument` call.

[2] Which is clearly mentioned in the JSDocs in `src/display/worker_options.js`.
2019-02-03 14:09:24 +01:00
Jonas Jenwald
22468817e1 Add a settled property, tracking the fulfilled/rejected stated of the Promise, to createPromiseCapability
This allows cleaning-up code which is currently manually tracking the state of the Promise of a `createPromiseCapability` instance.
2019-02-02 15:18:56 +01:00
Jonas Jenwald
6f94a05a29 Do the final text scaling correctly in flushTextContentItem (issue 8276)
It's necessary to take into account whether or not the text is vertical, to avoid either the textContent `width` or `height` becoming incorrect.
2019-01-29 15:24:04 +01:00
Jonas Jenwald
5081063b9e Attempt to clean-up/restore pending rendering operations when errors occurs while a RenderTask runs (PR 10202 follow-up)
This piggybacks of the existing `cancel` functionality, to ensure that any pending operations are closed *and* that any temporary canvases are actually being removed.

Also simplifies `finishPaintTask` in `PDFPageView.draw` slightly, by converting it to an async function.
2019-01-26 16:02:51 +01:00
Jonas Jenwald
29f36d7a1b Reduce unnecessary duplication of the isDefaultDecode methods on ColorSpace instances
The recent PR 10482 made me realize that I missed an opportunity for simplification when doing the class conversion of this code in PR 10007.
2019-01-25 08:53:08 +01:00
Tim van der Meij
e2701d5422
Merge pull request #10482 from janpe2/indexed-decode
Implement Decode entry in Indexed images
2019-01-24 23:46:55 +01:00
Jonas Jenwald
41fbc71ef9 Ensure that XRef.indexObjects can handle object numbers with zero-padding (issue 10491)
All objects in the PDF document follow this pattern:
```
0000000001 0 obj
<<
% Some content here...
>>
endobj
0000000002 0 obj
<<
% More content here...
endobj

```
2019-01-24 22:37:18 +01:00
Jonas Jenwald
249b199ff1 Stop bundling the ReadableStream polyfill in MOZCENTRAL builds (PR 10470 follow-up)
Based on the discussion in https://bugzilla.mozilla.org/show_bug.cgi?id=1521413, this patch simply removes the `ReadableStream` polyfill completely from MOZCENTRAL builds.

With this patch, the size of the `gulp mozcentral` build target is thus further reduced (building on PR 10470):

|       | `build/mozcentral`
|-------|-------------------
|master |   3 339 666
|patch  |   3 209 572
2019-01-23 20:33:20 +01:00
Jani Pehkonen
26121177ab Implement Decode entry in Indexed images 2019-01-22 22:51:04 +02:00
Jonas Jenwald
01d624f6a0 Add an Array.from polyfill, using core-js, and remove some compatibility hacks from the src/display/content_disposition.js file 2019-01-20 08:49:20 +01:00
Tim van der Meij
66acc7397f
Merge pull request #10470 from Snuffleupagus/mozcentral-streams
Try to, completely, avoid loading the `ReadableStream` polyfill in MOZCENTRAL builds
2019-01-19 21:22:18 +01:00
Jonas Jenwald
480110625a Try to, completely, avoid loading the ReadableStream polyfill in MOZCENTRAL builds
With https://bugzilla.mozilla.org/show_bug.cgi?id=1505122 landing in Firefox 65, the native `ReadableStream` implementation is now enabled by default in Firefox.

Obviously it would be nice to simply stop bundling the polyfill in MOZCENTRAL builds altogether, however given that it's still possible to disable[1] `ReadableStream` this is probably not a good idea just yet.
Nonetheless, now that native support is available, it seems unnecessary (and wasteful) to keep bundling the polyfill twice[2] in MOZCENTRAL builds. Hence this patch, which contains a suggest approach for packing the polyfill in a *separate* file which is then *only* loaded if/when needed.

With this patch, the size of the `gulp mozcentral` build target is thus reduced accordingly:

|       | `build/mozcentral`
|-------|-------------------
|master |   3 461 089
|patch  |   3 340 268

Besides the PDF.js files taking up less space in Firefox this way, the additional benefit is that there's (by default) less code that needs to be loaded and parsed when the PDF Viewer is used which also cannot hurt.

---
[1] In `about:config`, by toggling the `javascript.options.streams` preference.

[2] Once in the `build/pdf.js` file, and once in the `build/pdf.worker.js` file.
2019-01-19 09:05:01 +01:00
Jonas Jenwald
24a688d6c6 Convert some usage of indexOf to startsWith/includes where applicable
In many cases in the code you don't actually care about the index itself, but rather just want to know if something exists in a String/Array or if a String starts in a particular way. With modern JavaScript functionality, it's thus possible to remove a number of existing `indexOf` cases.
2019-01-18 17:57:41 +01:00
Tim van der Meij
cdbc33ba06
Merge pull request #10457 from Snuffleupagus/metadata-tests
When parsing Metadata, attempt to remove "junk" before the first tag (PR 10398 follow-up)
2019-01-16 23:03:39 +01:00
Jonas Jenwald
68ad3e8e9d Tweak the DOMTokenList.toggle polyfill (issue 10460) 2019-01-16 20:15:44 +01:00
Jonas Jenwald
9f45f8dfda When parsing Metadata, attempt to remove "junk" before the first tag (PR 10398 follow-up)
This will allow the Metadata to be successfully extracted from the PDF file in issue 10395.
Furthermore, this patch also fixes a bug in `Metadata.get` which causes the method to return `null` rather than an empty string or zero (since either ought to be allowed).
2019-01-16 12:44:27 +01:00
Jonas Jenwald
b531fc4106 Avoid truncating inline images, where the data and the "EI" marker is glued together (issue 10388) (#10436)
Thanks to the *excellent* debugging done by @janpe2, this was easy to fix!
2019-01-12 20:31:23 +01:00
Jonas Jenwald
d4a3858ed5 Handle more cases of corrupt PDF files with missing 'endobj' operators, where the "obj" string is immediately followed by the dictionary (PR 9288 follow-up) 2019-01-10 17:55:28 +01:00
Jonas Jenwald
358cd0c096 Add a few more String polyfills (startsWith, endsWith, padStart, padEnd) 2019-01-06 20:10:55 +01:00
Tim van der Meij
f162fed6b9
Convert src/core/charsets.js and src/core/standard_fonts.js to ES6 syntax
Moreover, include the "no var" ESLint comment to
`src/core/annotation.js` and `src/core/ps_parser.js` since they are
already converted.
2019-01-06 15:04:01 +01:00
Tim van der Meij
3b637e71d4
Convert src/core/arithmetic_decoder.js to ES6 syntax 2019-01-06 15:04:01 +01:00
Tim van der Meij
b81984f0cb
Merge pull request #10417 from brendandahl/metric-length
Fix reading number of HTMX metrics.
2019-01-05 13:35:16 +01:00
Jonas Jenwald
e8f4b47d59 Prevent errors, in SimpleXMLParser.onEndElement, when the stack has already been completely parsed (issue 10410)
The error was triggered for a particular set of metadata, where an end tag was encountered without the corresponding begin tag being present in the data.
(The patch also fixes a minor oversight, from a recent PR, in the `SimpleDOMNode.nextSibling` method.)
2019-01-05 11:15:34 +01:00
Brendan Dahl
32eace043b Fix reading number of HTMX metrics.
The length of the HHEA table can be incorrect, so it is better to
read the number of metrics offset from beginning of table instead.
2019-01-04 15:13:13 -08:00
Tim van der Meij
b39ec7af96
Merge pull request #10408 from Snuffleupagus/issue-10407
Prevent errors, because of incorrect scope, in the `XMLParserBase._resolveEntities` method (issue 10407)
2019-01-04 23:45:26 +01:00
Jonas Jenwald
66fccd860b Adjust how AnnotationBorderStyle.setWidth handles the input being a Name (issue 10385)
In order to be consistent with the behaviour in Adobe Reader, the width will now always be set to zero when the input is a `Name`.
2019-01-04 10:38:10 +01:00
Jonas Jenwald
6cd9ff48f3 Prevent errors, because of incorrect scope, in the XMLParserBase._resolveEntities method (issue 10407) 2019-01-04 10:13:32 +01:00
Tim van der Meij
2d00bb098b
Merge pull request #10404 from Snuffleupagus/issue-10401
Remove the `for ... of` loop from the `PDFDocument.fingerprint` getter (issue 10401)
2019-01-03 22:46:51 +01:00
Brendan Dahl
e2686db49b
Merge pull request #10277 from janpe2/cff-stems
Repair CFF fonts if stem hints are in wrong order
2019-01-03 10:30:43 -08:00
Jonas Jenwald
8c278530dd Remove the for ... of loop from the PDFDocument.fingerprint getter (issue 10401)
It appears that the `Symbol` polyfill doesn't work well in conjunction with `TypedArray`s, and that part of PR 10393 is thus reverted.
2019-01-03 11:17:45 +01:00
Tim van der Meij
1b84b2ed60
Merge pull request #10398 from Snuffleupagus/issue-10395
Prevent errors in various methods in `SimpleDOMNode` when the `childNodes` property is not defined (issue 10395)
2019-01-01 16:22:11 +01:00
Jonas Jenwald
d371d23382 Prevent errors in various methods in SimpleDOMNode when the childNodes property is not defined (issue 10395)
Given that the issue, as filed, is incomplete since no PDF file was provided for debugging, this patch is really the best that we can do here. *Please note:* This patch will *not* enable the Metadata to be successfully parsed, but it should at least prevent the errors.
2018-12-31 13:07:15 +01:00
Tim van der Meij
d8f201ea2a
Merge pull request #10397 from Snuffleupagus/issue-10385
Ensure that `AnnotationBorderStyle.setWidth` is able to handle the input being a `Name`, to correctly deal with corrupt PDF documents (issue 10385)
2018-12-31 12:58:28 +01:00
Jonas Jenwald
76a9580aeb Ensure that AnnotationBorderStyle.setWidth is able to handle the input being a Name, to correctly deal with corrupt PDF documents (issue 10385) 2018-12-31 12:21:28 +01:00
Jonas Jenwald
15b3806937 Actually validate the input in AnnotationBorderStyle.setStyle 2018-12-31 12:15:15 +01:00
Tim van der Meij
5b57e69da2
Optimize CanvasGraphics.setFont to avoid intermediate string creation
This method creates quite a few intermediate strings on each call and
it's called often, even for smaller documents like the Tracemonkey
document. Scrolling from top to bottom in that document resulted in
14126 strings being created in this method. With this commit applied,
this is reduced to 2018 strings.
2018-12-30 14:58:32 +01:00
Tim van der Meij
95f9075565
Optimize TextLayerRenderTask._layoutText to avoid intermediate string creation
This method creates quite a few intermediate strings on each call and
it's called often, even for smaller documents like the Tracemonkey
document. Scrolling from top to bottom in that document resulted in
12936 strings being created in this method. With this commit applied,
this is reduced to 3610 strings.
2018-12-30 14:39:08 +01:00
Tim van der Meij
d5e5d18430
Convert the PDFDocument class in src/core/document.js to ES6 syntax 2018-12-30 13:54:43 +01:00
Tim van der Meij
612fc9fcc2
Convert the Page class in src/core/document.js to ES6 syntax 2018-12-30 13:54:43 +01:00
Tim van der Meij
aad27ff9a0
Optimize the Ref class in src/core/primitives.js
The `toString` method always creates two string objects (for the 'R'
character and for the `num` concatenation) and in the worst case
creates three string objects (one more for the `gen` concatenation).
For the Tracemonkey paper alone, this resulted in 12000 string
objects when scrolling from the top to the bottom of the document.
Since this is a hot function, it's worth minimizing the number of string
objects, especially for large documents, to reduce peak memory usage.

This commit refactors the `toString` method to always create only one
string object.
2018-12-29 17:48:41 +01:00
Jonas Jenwald
60bcce184e Check that the first page can be successfully loaded, to try and ascertain the validity of the XRef table (issue 7496, issue 10326)
For PDF documents with sufficiently broken XRef tables, it's usually quite obvious when you need to fallback to indexing the entire file. However, for certain kinds of corrupted PDF documents the XRef table will, for all intents and purposes, appear to be valid. It's not until you actually try to fetch various objects that things will start to break, which is the case in the referenced issues[1].

Since there's generally a real effort being in made PDF.js to load even corrupt PDF documents, this patch contains a suggested approach to attempt to do a bit more validation of the XRef table during the initial document loading phase.

Here the choice is made to attempt to load the *first* page, as a basic sanity check of the validity of the XRef table. Please note that attempting to load a more-or-less arbitrarily chosen object without any context of what it's supposed to be isn't a very useful, which is why this particular choice was made.
Obviously, just because the first page can be loaded successfully that doesn't guarantee that the *entire* XRef table is valid, however if even the first page fails to load you can be reasonably sure that the document is *not* valid[2].

Even though this patch won't cause any significant increase in the amount of parsing required during initial loading of the document[3], it will require loading of more data upfront which thus delays the initial `getDocument` call.
Whether or not this is a problem depends very much on what you actually measure, please consider the following examples:

```javascript
console.time('first');
getDocument(...).promise.then((pdfDocument) => {
  console.timeEnd('first');
});

console.time('second');
getDocument(...).promise.then((pdfDocument) => {
  pdfDocument.getPage(1).then((pdfPage) => { // Note: the API uses `pageNumber >= 1`, the Worker uses `pageIndex >= 0`.
    console.timeEnd('second');
  });
});
```

The first case is pretty much guaranteed to show a small regression, however the second case won't be affected at all since the Worker caches the result of `getPage` calls. Again, please remember that the second case is what matters for the standard PDF.js use-case which is why I'm hoping that this patch is deemed acceptable.

---
[1] In issue 7496, the problem is that the document is edited without the XRef table being correctly updated.
In issue 10326, the generator was sorting the XRef table according to the offsets rather than the objects.

[2] The idea of checking the first page in particular came from the "standard" use-case for the PDF.js library, i.e. the default viewer, where a failure to load the first page basically means that nothing will work; note how `{BaseViewer, PDFThumbnailViewer}.setDocument` depends completely on being able to fetch the *first* page.

[3] The only extra parsing is caused by, potentially, having to traverse *part* of the `Pages` tree to find the first page.
2018-12-29 12:47:25 +01:00
Tim van der Meij
360c3d3813
Remove the unused url argument for the ChunkedStreamManager class 2018-12-24 13:14:42 +01:00
Tim van der Meij
47344197f4
Convert src/core/chunked_stream.js to ES6 syntax 2018-12-24 13:14:42 +01:00
Tim van der Meij
103f4616ac
Merge pull request #10334 from Snuffleupagus/OpenAction-dest
[api-minor] Add support for OpenAction destinations (issue 10332)
2018-12-23 20:49:50 +01:00
Jonas Jenwald
f0719ed565 [api-minor] Change the getViewport method, on PDFPageProxy, to take a parameter object rather than a bunch of (randomly) ordered parameters
If, as PR 10368 suggests, more parameters should be added to `getViewport` I think that it would be a mistake to not change the signature *first* to avoid needlessly unwieldy call-sites.

To not break any existing code and third-party use-cases, this is obviously implemented with a deprecation warning *and* with a working fallback[1] for the old method signature.

---
[1] This is limited to `GENERIC` builds, which should be sufficient.
2018-12-21 11:55:20 +01:00
Jonas Jenwald
b05f053287 [api-minor] Add support for OpenAction destinations (issue 10332)
Note that the OpenAction dictionary may contain other information besides just a destination array, e.g. instructions for auto-printing[1].
Given first of all that an arbitrary `Dict` cannot be sent from the Worker (since cloning would fail), and second of all that the data obviously needs to be validated, this patch purposely only adds support for fetching a destination from the OpenAction entry[2].

---
[1] This information is, currently in PDF.js, being included through the `getJavaScript` API method.

[2] This significantly reduces the complexity of the implementation, which seems fine for now. If there's ever need for other kinds of OpenAction to be fetched, additional API methods could/should be implemented as necessary (could e.g. follow the `getOpenActionWhatever` naming scheme).
2018-12-19 11:45:16 +01:00
Jonas Jenwald
ba2edeae18 [api-minor] Add support, in getMetadata, for custom information dictionary entries (issue 5970, issue 10344) (#10346)
The custom entries, provided that they exist *and* that their types are safe to include, are exposed through a new `Custom` infoDict entry to clearly separate them from the standard ones.

Fixes 5970.
Fixes 10344.
2018-12-18 23:26:02 +01:00
Jonas Jenwald
437fb8a8a7 Ignore the fieldValue for Signature annotations, since they're currently unsupported (issue 10374)
Given that Signature (Widget) annotations are currently not supported, since they cannot be validated, simply ignoring the `fieldValue` seems OK for now considering that attempting to blindly include unparsed/unvalidated data isn't very useful.

Fixes 10347.
2018-12-12 18:01:43 +01:00
Tim van der Meij
45c0197465
Merge pull request #10330 from janpe2/svg-line-width-zero
Handle line width of zero in SVG
2018-12-07 23:34:27 +01:00
Jani Pehkonen
ddabeb0645 Handle line width of zero in SVG 2018-12-04 16:05:32 +02:00
Jonas Jenwald
d0fec7c6fb Fix NameOrNumberTree.get to actually perform a binary search to find the requested key
The intent of the code, based on existing comments, is to perform a binary search. However, because of what appears to be a typo in the code responsible for computing the current search index, this code is always checking *every* entry (albeit only at the "final" node) starting from the last one.
2018-11-23 23:52:33 +01:00
Jonas Jenwald
fdad0a0b0b Fallback to an exhaustive search, in corrupt PDF files, for NameTrees/NumberTrees that are not correctly ordered (issue 10272)
According to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.2384179, the keys of NameTree/NumberTree should be ordered.
For corrupt PDF files, which violate this assumption, we thus need to fallback to an exhaustive search in order to e.g. find all destinations.

*Please note:* Given that this only implements a fallback for the "final" node of the Tree, there's obviously a risk that the patch isn't sufficient for dealing with all kinds of out-of-order corruption. However, this kind of problem should be rare in practice, and without a real-world test-case it's difficult to implement a completely general solution (and there's obviously a question if you'd even want to).
2018-11-20 17:50:47 +01:00
Jani Pehkonen
9e990f6f3e Repair CFF fonts if stem hints are in wrong order 2018-11-20 18:50:37 +02:00
Jonas Jenwald
ac6b94c9dd Replace the remaining occurences, in src/display/api.js, of var with let/const 2018-11-18 19:08:27 +01:00
Jonas Jenwald
061f7bd2f3 Convert PDFWorker, in src/display/api.js, to an ES6 class
Also changes all occurrences of `var` to `let`/`const` in this code.
2018-11-18 19:08:27 +01:00
Jonas Jenwald
02e77a39ec Convert InternalRenderTask, in src/display/api.js, to an ES6 class
This changes all occurrences of `var` to `let`/`const` in this code, and updates the signature of the constructor to use object destructuring for better readability (and self documentation).
Also, `useRequestAnimationFrame` is changed to a parameter and the `typeof window` check is now done *once* rather than at every `_scheduleNext` call.
2018-11-18 19:08:27 +01:00
Jonas Jenwald
5a0d64a6de Convert PDFPageProxy, in src/display/api.js, to an ES6 class
This changes all occurrences of `var` to `let`/`const` in this code, and updates the signatures of a couple of methods to use object destructuring.
Finally, when creating `InternalRenderTask` instances *only* the necessary parameter are now provided, since passing through the `RenderParameters` as-is seems completely unnecessary.
2018-11-18 19:08:25 +01:00
Jonas Jenwald
2c003a82d5 Convert RenderTask, in src/display/api.js, to an ES6 class
Also deprecates the `then` method, in favour of the `promise` getter.
2018-11-18 19:08:00 +01:00
Jonas Jenwald
ef8e5fd77c Convert PDFDocumentLoadingTask, in src/display/api.js, to an ES6 class
Also deprecates the `then` method, in favour of the `promise` getter.
2018-11-18 19:07:57 +01:00
PalmerAL
5f15dc2023 Use span instead of div in the text layer
This improves copy/pasting text content since it reduces the amount of unnecessary newlines.
2018-11-18 15:54:08 +01:00
Tim van der Meij
2194aef03e
Merge pull request #10238 from Snuffleupagus/interfaces
Move the `interface` definitions out of `src/core/worker.js` and into their own file
2018-11-08 23:28:46 +01:00
Jonas Jenwald
4829f567c1 Move the interface definitions out of src/core/worker.js and into their own file
These interfaces are already used in different files, in both the `src/core/` and `src/display/` folders, and having them reside in their own file seems a lot clearer and is also similar to the existing viewer interfaces.

As part of moving the `interface` definitions, they're also converted to ES6 classes.
2018-11-08 13:21:37 +01:00
Jonas Jenwald
60da2d882b [api-minor] Refactor/simplify the PDFObject class
First of all, note how there's currently *two* methods for checking if a certain object exists, which seems completely unwarranted.
Furthermore, the rarely used `getData` method was removed and its only callsite changed to use a combination of `PDFObjects.{has, get}` instead.
Finally, the methods were rearranged slightly, to bring the most important ones (for an API user) to the top of the class.
2018-11-08 10:13:39 +01:00
Jonas Jenwald
d32321d84f Convert PDFObjects, in src/display/api.js, to an ES6 class
Also changes all occurrences of `var` to `const`, and marks internal properties/methods as "private".
2018-11-08 10:11:40 +01:00
Tim van der Meij
3e342554d1
Merge pull request #10228 from morille/patch-2
Don't detect nw.js as node.js
2018-11-07 23:51:01 +01:00
Romain Petit
13b0ca6b2a Don't detect nw.js as node.js
nw.js is chrome plus nodejs.
It will succeed everywhere chrome succeeds, but fail in many cases where nodejs succeeds (see issue 9071).
So it's safer to consider it as a browser context rather than a nodejs context.

Make travis happy again

CS

Readability + Explanation

The relevant portion of the NW.js documentation:
http://docs.nwjs.io/en/latest/For%20Users/Advanced/JavaScript%20Contexts%20in%20NW.js/#access-nodejs-and-nwjs-api-in-browser-context

Added full link to relevant doc.
2018-11-07 11:14:22 +01:00
Jonas Jenwald
a963d139dc Convert src/core/ps_parser.js to use ES6 classes
Besides being a fairly small and self-contained file, this code also shows a possible way of defining static constants on classes.
2018-11-03 17:43:06 +01:00
Tim van der Meij
ec76aa531e
Merge pull request #10202 from Snuffleupagus/issue-10200
Attempt to clean-up/restore pending rendering operations on `RenderTask.cancel` (issue 10200)
2018-11-02 23:11:47 +01:00
Jonas Jenwald
f23dba1c10 Change canvasInRendering to a WeakSet instead of a WeakMap
Note how nowhere in the code `canvasInRendering.get()` is ever called, and that this structure is really only used to store references to `<canvas>` DOM elements.
The reason for this being a `WeakMap` is probably because at the time we weren't using `core-js` polyfills yet, and since there already existed a manually implemented `WeakMap` polyfill it was probably simpler to use that.
2018-10-31 18:15:23 +01:00
Jonas Jenwald
f77b463339 Attempt to clean-up/restore pending rendering operations on RenderTask.cancel (issue 10200)
Please note that, given the lack of a runnable example, I'm not totally sure if this first of all is enough to *completely* address the issue as filed and second of all if we actually want this new behaviour.
2018-10-31 16:22:17 +01:00
Tim van der Meij
ed4ac1bc67
Merge pull request #10162 from janpe2/svg-normalize-bbox
Normalize BBox of form XObjects in SVG back-end
2018-10-28 13:18:48 +01:00
Jani Pehkonen
9cd5f94f03 Normalize the BBox of form XObjects on the /core side 2018-10-22 14:17:05 +03:00
Jonas Jenwald
5bb7f4b615 Convert PDFDataRangeTransport to an ES6 class 2018-10-20 17:15:27 +02:00
Tim van der Meij
d21892933d
Merge pull request #10161 from Snuffleupagus/DataLoaded-onProgress
Ensure that `onProgress` is always called when the entire PDF file has been loaded, regardless of how it was fetched (issue 10160)
2018-10-20 15:22:05 +02:00
Jonas Jenwald
54f9883c51 Export CMapCompressionType and PermissionFlag on the pdfjsLib object (issue 10148, PR 10033 follow-up)
`CMapCompressionType` makes a lot of sense to export, for anyone attempting to implement a custom `CMapReaderFactory`; fixes 10148.

`PermissionFlag` likewise needs to be exported, since otherwise the result of the `getPermissions` API method becomes difficult to interpret; follow-up to 10033.
2018-10-20 11:38:00 +02:00
Jonas Jenwald
327f2eb588 Ensure that onProgress is always called when the entire PDF file has been loaded, regardless of how it was fetched (issue 10160)
*Please note:* I'm totally fine with this patch being rejected, and the issue closed as WONTFIX; however these changes should address the issue if that's desired.

From a conceptual point of view, reporting loading progress doesn't really make a lot of sense for PDF files opened by passing raw binary data directly to `getDocument` (since obviously *all* data was loaded).
This is compared to PDF files loaded via e.g. `XMLHttpRequest` or the Fetch API, where the entire PDF file isn't available from the start and knowing the loading progress makes total sense.

However I can certainly see why the current API could be considered inconsistent, which isn't great, since a registered `onProgress` callback will never be called for certain `getDocument` calls.
The simplest solution to this inconsistency thus seem to be to ensure that `onProgress` is always called when handling the `DataLoaded` message, since that will *always* be dispatched[1] from the worker-thread.

---
[1] Note that this isn't guaranteed to happen, since setting `disableAutoFetch = true` often prevents the *entire* file from ever loading. However, this isn't relevant for the issue at hand, and is a well-known consequence of using `disableAutoFetch = true`; note how the default viewer even has a specialized code-path for hiding the loadingBar.
2018-10-16 13:51:12 +02:00
Jonas Jenwald
4cde844ffe Add a DOMTokenList.toggle polyfill for the second, optional, "force" parameter
This is based on the polyfill available at https://developer.mozilla.org/en-US/docs/Web/API/Element/classList#Polyfill
2018-10-12 15:41:09 +02:00
Tim van der Meij
9e9426c354
Merge pull request #10143 from Snuffleupagus/getMainThreadWorkerMessageHandler-catch-errors
Ensure that `getMainThreadWorkerMessageHandler` won't accidentally break `getDocument` (PR 10139 follow-up)
2018-10-11 00:05:01 +02:00
Jonas Jenwald
0e2c6047e4 Ensure that getMainThreadWorkerMessageHandler won't accidentally break getDocument (PR 10139 follow-up)
*This should have been part of PR 10139.*

In the event that a user has attempted to manually load the worker file on the main-thread, but somehow failed to do that correctly, there's a possibility that `getMainThreadWorkerMessageHandler` could throw. Considering how/where that helper function is being called, an error could still prevent `PDFDocumentLoadingTask` from completing (regardless if it's being resolved/rejected).
2018-10-09 15:44:31 +02:00
Jonas Jenwald
21c8dd4842 Combine the pdfjsFilePath and fallback workerSrc handling in src/display/api.js
With the way that the `getWorkerSrc()` helper function is implemented now, there's no longer a particularly strong reason for keeping the global `pdfjsFilePath` variable around.
With this patch the fallback `workerSrc` will thus, assuming is wasn't already set, be set to the "pdfjsFilePath" which simplifies the `getWorkerSrc()` function and reduces the amount of global state.

Finally, the global `workerSrc` variable was renamed to prevent shadowing.
2018-10-09 13:47:48 +02:00
Tim van der Meij
f45e46d7ad
Merge pull request #10133 from kevinleedrum/fix-content-length
Set returnValues.suggestedLength to Content-Length if integer
2018-10-09 00:05:57 +02:00
Kevin Lee Drum
4cf10ac79d set returnValues.suggestedLength to Content-Length if integer 2018-10-07 13:26:29 -04:00
Jonas Jenwald
755c6edc5e Ensure that the PDFDocumentLoadingTask is rejected when "setting up fake worker" failed (issue 10135)
This should, hopefully, cover all the possible ways[1] in which "fake workers" are loaded. Given the different code-paths, adding unit-tests might not be that simple.
Note that in order to make this work, the various `fakeWorkerFilesLoader` functions were converted to return `Promises`.

---
[1] Unfortunately there's lots of them, for various build targets and configurations.
2018-10-06 13:18:51 +02:00
Simon Leblanc
b5806735d8 Add support of Ink annotation 2018-10-03 00:28:49 +02:00
Tim van der Meij
138324502c
Merge pull request #10119 from Snuffleupagus/rm-onFileAttachmentAnnotation
Attempt to simplify the `fileattachmentannotation` event dispatching
2018-10-02 23:25:22 +02:00
Jonas Jenwald
d60ce998f1 Attempt to simplify the fileattachmentannotation event dispatching
This attempts to reduced the level of indirection, and the amount of code, when dispatching `fileattachmentannotation` events, by removing the `PDFLinkService.onFileAttachmentAnnotation` method and just accessing `PDFLinkService.eventBus` directly in the `FileAttachmentAnnotationElement` constructor.
Given that other properties, such as `externalLinkTarget`/`externalLinkRel`, are already being accessed directly this pattern seems fine here as well.
2018-10-01 15:09:08 +02:00
Jonas Jenwald
d6f4d2ff33 Add a Symbol polyfill, using core-js, to allow using for...of loops
https://github.com/zloirock/core-js#ecmascript-symbol
2018-09-29 16:05:00 +02:00
Jonas Jenwald
435ec6a0d5 Use the Font Loading API in MOZCENTRAL builds, and GENERIC builds for Firefox version 63 and above (issue 9945) 2018-09-29 16:05:00 +02:00
Jonas Jenwald
05b021bcce Refactor the FontLoader into proper, build-specific, ES6 classes
Also changes `var` to `let`/`const` in code already touched in the patch, and makes use of template strings in a few spots.
2018-09-29 16:05:00 +02:00
Jonas Jenwald
45d6651976 Refactor unused Date.now() calls in FontLoader.queueLoadingCallback
The `started` timestamp is completely usused, and the `end` timestamp is currently[1] being used essentially like a boolean value.
Hence this code can be simplified to use an actual boolean value instead, which avoids potentially hundreds (or even thousands) of unnecessary `Date.now()` calls.

---
[1] Looking briefly at the history of this code, I cannot tell if the timestamps themselves were ever used for anything (except for tracking "boolean" state).
2018-09-29 15:57:04 +02:00
Jonas Jenwald
ad3e937816 Replace the Font.loading property with, the already existing, Font.missingFile property
The `Font.loading` property is only ever used *once* in the code, whereas `Font.missingFile` is more widely used. Furthermore the name `loading` feels, at least to me, slight less clear than `missingFile`. Finally, note that these two properties are the inverse of each other.
2018-09-29 15:57:04 +02:00
Jonas Jenwald
caf90ff6ee Convert FontFaceObject to an ES6 class
Also changes `var` to `let`/`const` in code already touched in the patch, and makes use of template strings in a few spots.
2018-09-29 15:57:04 +02:00
Jonas Jenwald
842e9206c0 Replace String.prototype.substr() occurrences with String.prototype.substring()
As outlined in https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/substr, which refers to the ECMA-262 specification, using the `substr` function is advised against.

Hence this PR, which replaces all remaining `substr` occurrences with `substring` instead. Please refer to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/substr#Syntax respectively https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/substring#Syntax for the differences between the two functions.

Note that in most cases in the code-base there's only one argument passed to `substr`, and those require no other changes except replacing "substr" with "substring". For the other cases, the `substr(start, length)` calls are changed to `substring(start, start + length)` instead.
2018-09-28 11:41:07 +02:00
Brendan Dahl
ae7dcae27e Fix abbreviation. 2018-09-13 13:10:38 -07:00
Brendan Dahl
6adeabbb66 Add Glyph & Cog's XPDF copyright/license information. 2018-09-12 13:59:56 -07:00