Commit Graph

12163 Commits

Author SHA1 Message Date
Wojciech Maj
cd03fe014d
Use const in ui_utils.js 2019-12-23 08:00:15 +01:00
Jonas Jenwald
ad0b0d60a5 Ignore Metadata entries with incorrectly encoded characters, when setting the viewer title (bug 1605526)
Apparently Ghostscript can, in some cases, generate/include `Metadata` with incorrectly encoded characters.[1] This results in the viewer title looking wrong, which we thus attempt to avoid by falling back to the `Info` entry instead.

*Please note:* Obviously it would be better if this was fixed in the `Metadata` parser instead, rather than using a viewer work-around, but I'm just not sure how or even *if* that could actually be done given that the `Metadata` stream contains no trace of the *original* character.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1605526

---
[1] The problematic characters are from the Specials Unicode block, see https://en.wikipedia.org/wiki/Specials_(Unicode_block)
2019-12-22 17:20:14 +01:00
Jonas Jenwald
eeaea85294 Re-factor how the viewer title is determined, based on the Info/Metadata data
With these changes we'll always set the `pdfTitle` to the `Info` dictionary entry *first* (assuming it exists and isn't empty), before attempting to override it with the `Metadata` stream entry (assuming it exists, is non-empty *and* valid).

There should be no functional changes with this patch, but it will simplify the following patch somewhat.
2019-12-21 18:59:17 +01:00
Tim van der Meij
6316b2a195
Merge pull request #11422 from Snuffleupagus/issue-10768
Use the `strict` mode `assert` in the pdf2png Node.js example (issue 10768)
2019-12-21 13:37:22 +01:00
Tim van der Meij
04df367ab6
Merge pull request #11418 from Snuffleupagus/simplify-fakeWorkerLoader
[api-minor] Simplify the *fallback* fake worker loader code in `src/display/api.js`
2019-12-21 13:24:51 +01:00
Jonas Jenwald
3783eccfa4 Use the strict mode assert in the pdf2png Node.js example (issue 10768)
See https://nodejs.org/api/assert.html#assert_strict_mode
2019-12-21 13:24:13 +01:00
Jonas Jenwald
d370037618 [api-minor] Tweak the Node.js fake worker loader to prevent Critical dependency: ... warnings from Webpack
Since bundlers, such as Webpack, cannot be told to leave `require` statements alone we are thus forced to jump through hoops in order to prevent these warnings in third-party deployments of the PDF.js library; please see [Webpack issue 8826](https://github.com/webpack/webpack) and libraries such as [require-fool-webpack](https://github.com/sindresorhus/require-fool-webpack).

*Please note:* This is based on the assumption that code running in Node.js won't ever be affected by e.g. Content Security Policies that prevent use of `eval`. If that ever occurs, we should revert to a normal `require` statement and simply document the Webpack warnings instead.
2019-12-20 17:36:10 +01:00
Jonas Jenwald
8519f87efb Re-factor the setupFakeWorkerGlobal function (in src/display/api.js), and the loadFakeWorker function (in web/app.js)
This patch reduces some duplication, by moving *all* fake worker loader code into the `setupFakeWorkerGlobal` function. Furthermore, the functions are simplified further by using `async`/`await` where appropriate.
2019-12-20 17:36:10 +01:00
Jonas Jenwald
a5485e1ef7 [api-minor] Support loading the fake worker from GlobalWorkerOptions.workerSrc in Node.js
There's no particularily good reason, as far as I can tell, to not support a custom worker path in Node.js environments (even if workers aren't supported). This patch thus make the Node.js fake worker loader code-path consistent with the fallback code-path used with *browser* fake worker loader.

Finally, this patch also deprecates[1] the `fallbackWorkerSrc` functionality, except in Node.js, since the user should *always* provide correct worker options since the fallback is nothing more than a best-effort solution.

---
[1] Although it probably shouldn't be removed until the next major version.
2019-12-20 17:36:10 +01:00
Jonas Jenwald
591e754831 Move the fake worker loader code into the PDFWorkerClosure
Given that this code isn't needed "globally" in the file, it seems reasonable to move it to where it's actually used instead.
2019-12-20 17:36:10 +01:00
Jonas Jenwald
aab0f91740 [api-minor] Simplify the *fallback* fake worker loader code in src/display/api.js
For performance reasons, and to avoid hanging the browser UI, the PDF.js library should *always* be used with web workers enabled.
At this point in time all of the supported browsers should have proper worker support, and Node.js is thus the only environment where workers aren't supported. Hence it no longer seems relevant/necessary to provide, by default, fake worker loaders for various JS builders/bundlers/frameworks in the PDF.js code itself.[1]

In order to simplify things, the fake worker loader code is thus simplified to now *only* support Node.js usage respectively "normal" browser usage out-of-the-box.[2]

*Please note:* The officially intended way of using the PDF.js library is with workers enabled, which can be done by setting `GlobalWorkerOptions.workerSrc`, `GlobalWorkerOptions.workerPort`, or manually providing a `PDFWorker` instance when calling `getDocument`.

---
[1] Note that it's still possible to *manually* disable workers, simply my manually loading the built `pdf.worker.js` file into the (current) global scope, however this's mostly intended for testing/debugging purposes.

[2] Unfortunately some bundlers such as Webpack, when used with third-party deployments of the PDF.js library, will start to print `Critical dependency: ...` warnings when run against the built `pdf.js` file from this patch. The reason is that despite the `require` calls being protected by *runtime* `isNodeJS` checks, it's not possible to simply tell Webpack to just ignore the `require`; please see [Webpack issue 8826](https://github.com/webpack/webpack) and libraries such as [require-fool-webpack](https://github.com/sindresorhus/require-fool-webpack).
2019-12-20 17:36:08 +01:00
Tim van der Meij
693240cf06
Merge pull request #11415 from Snuffleupagus/update-packages
Update `npm` packages and l10n files
2019-12-19 23:51:55 +01:00
Jonas Jenwald
9386544555 Update l10n files 2019-12-19 11:30:57 +01:00
Jonas Jenwald
be43001b29 Update npm packages 2019-12-19 11:28:13 +01:00
Tim van der Meij
cf3f342373
Merge pull request #11413 from Snuffleupagus/gulp-npm-test
Re-factor the `npm test` command, used by Travis, to avoid running the 'default_preferences' tasks concurrently (issue 10732)
2019-12-18 22:39:48 +01:00
Jonas Jenwald
f406263fc2 Re-factor the npm test command, used by Travis, to avoid running the 'default_preferences' tasks concurrently (issue 10732)
*Please note:* This patch does *not* prevent the 'default_preferences' task from running more than once during `npm test`, but it does ensure that the tasks won't run *concurrently* by running the relevant tests in *series*.

While it would obviously still make sense to re-factor the gulpfile to account for changes in `gulp` version 4, by at least tweaking the `npm test` command the intermittent failures on Travis should at least go away.
2019-12-18 21:43:09 +01:00
Tim van der Meij
7ceb394c43
Merge pull request #11380 from Snuffleupagus/PDFHistory-reset
Add a `reset` method to the `PDFHistory` implementation
2019-12-15 16:45:53 +01:00
Tim van der Meij
e09fe7226d
Merge pull request #11406 from Snuffleupagus/find-scanBytes
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
2019-12-15 16:26:42 +01:00
Jonas Jenwald
dbb82f05fc Re-factor the find helper function, in src/core/document.js, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.

The main benefits here are:
 - No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
 - In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-14 13:43:26 +01:00
Tim van der Meij
8b35bba347
Merge pull request #11407 from mozilla/dependabot/npm_and_yarn/npm-6.13.4
Bump npm from 6.13.0 to 6.13.4
2019-12-13 23:37:11 +01:00
dependabot[bot]
80d7b4d8ab
Bump npm from 6.13.0 to 6.13.4
Bumps [npm](https://github.com/npm/cli) from 6.13.0 to 6.13.4.
- [Release notes](https://github.com/npm/cli/releases)
- [Changelog](https://github.com/npm/cli/blob/latest/CHANGELOG.md)
- [Commits](https://github.com/npm/cli/compare/v6.13.0...v6.13.4)

Signed-off-by: dependabot[bot] <support@github.com>
2019-12-13 17:30:33 +00:00
Jonas Jenwald
1fab637247 Prevent if (!state || false) { in the output, in PDFHistory._popState, for e.g. MOZCENTRAL builds
By re-ordering the condition, which includes a `PDFJSDev` check, we can avoid meaningless output in the built files.
2019-12-13 10:38:39 +01:00
Jonas Jenwald
c5f2f870cb Move the parseCurrentHash helper function into PDFHistory
Looking at the `parseCurrentHash` function again it's now difficult for me to understand *what* I was thinking, since having a helper function that needs to be manually passed a `linkService` reference just looks weird.
2019-12-13 10:38:39 +01:00
Jonas Jenwald
d621899d50 Add a reset method to the PDFHistory implementation
This patch addresses a couple of smaller issues with the `PDFHistory` class:
 - Most, if not all, other viewer components can be reset in one way or another, and there's no good reason for the `PDFHistory` implementation to be different here.

 - Currently it's (technically) possible to keep adding entries to the browser history, via the `PDFHistory` instance, even after the document has been closed. That obviously makes no sense, and is caused by the lack of a `reset` method.

 - The internal `this._isPagesLoaded` property was never actually reset, which would lead to it being temporarily wrong when a new document was opened in the default viewer.
2019-12-13 10:38:39 +01:00
Tim van der Meij
903bf177cb
Merge pull request #11404 from Snuffleupagus/global-ReadableStream
[api-minor] Move the `ReadableStream` polyfill to the global scope
2019-12-12 00:05:58 +01:00
Jonas Jenwald
e24050fa13 [api-minor] Move the ReadableStream polyfill to the global scope
Note that most (reasonably) modern browsers have supported this for a while now, see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#Browser_compatibility

By moving the polyfill into `src/shared/compatibility.js` we can thus get rid of the need to manually export/import `ReadableStream` and simply use it directly instead.

The only change here which *could* possibly lead to a difference in behavior is in the `isFetchSupported` function. Previously we attempted to check for the existence of a global `ReadableStream` implementation, which could now pass (assuming obviously that the preceding checks also succeeded).
However I'm not sure if that's a problem, since the previous check only confirmed the existence of a native `ReadableStream` implementation and not that it actually worked correctly. Finally it *could* just as well have been a globally registered polyfill from an application embedding the PDF.js library.
2019-12-11 19:02:37 +01:00
Tim van der Meij
af4ba75f68
Merge pull request #11398 from Snuffleupagus/issue-5887
Attempt to improve the `PDFDocument` error message for empty files (issue 5887)
2019-12-09 22:08:08 +01:00
Jonas Jenwald
b00835f589 Attempt to improve the PDFDocument error message for empty files (issue 5887)
Given that the error in question is surfaced on the API-side, this patch makes the following changes:
 - Updates the wording such that it'll hopefully be slightly easier for users to understand.
 - Changes the plain `Error` to an `InvalidPDFException` instead, since that should work better with the existing Error handling.
 - Adds a unit-test which loads an empty PDF document (and also improves a pre-existing `InvalidPDFException` message and its test-case).
2019-12-09 15:45:50 +01:00
Tim van der Meij
a6db045789
Merge pull request #11387 from Snuffleupagus/issue-11385
Handle corrupt ASCII85Decode inline images with truncated EOD markers (issue 11385)
2019-12-08 20:27:46 +01:00
Tim van der Meij
16778118f6
Merge pull request #11391 from Snuffleupagus/globalThis
Replace `globalScope` with the standard `globalThis` property instead
2019-12-08 20:23:19 +01:00
Jonas Jenwald
71d61e4c6f Re-factor getMainThreadWorkerMessageHandler to support arbitrary global scopes, rather than only window 2019-12-08 20:19:04 +01:00
Jonas Jenwald
a8fc306b6e Replace globalScope with the standard globalThis property instead
Please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/globalThis and note that most (reasonably) modern browsers have supported this for a while now, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/globalThis#Browser_compatibility

Since ESLint doesn't support this new global yet, it was added to the `globals` list in the top-level configuration file to prevent issues.

Finally, for older browsers a polyfill was added in `ssrc/shared/compatibility.js`.
2019-12-08 20:19:02 +01:00
Tim van der Meij
07212bf5f2
Merge pull request #11390 from Snuffleupagus/checkFirstPage-await-cleanup
Ensure that `PDFDocument.checkFirstPage` waits for cleanup to complete (PR 10392 follow-up)
2019-12-08 20:13:00 +01:00
Tim van der Meij
7b503c8923
Merge pull request #11388 from Snuffleupagus/rm-PDFPresentationMode-viewer-option
Remove the `viewer` option from the `PDFPresentationMode` constructor
2019-12-08 19:55:43 +01:00
Tim van der Meij
3d549f12fa
Merge pull request #11382 from Snuffleupagus/pr-10217-follow-up
Fix an incorrect condition in `BaseViewer.isPageVisible` (PR 10217 follow-up)
2019-12-08 19:52:48 +01:00
Jonas Jenwald
a02122e984 Ensure that PDFDocument.checkFirstPage waits for cleanup to complete (PR 10392 follow-up)
Given how this method is currently used there shouldn't be any fonts loaded at the point in time where it's called, but it does seem like a bad idea to assume that that's always going to be the case. Since `PDFDocument.checkFirstPage` is already asynchronous, it's easy enough to simply await `Catalog.cleanup` here.

(The patch also makes a tiny simplification in a loop in `Catalog.cleanup`.)
2019-12-07 12:31:41 +01:00
Jonas Jenwald
1c466b4648 Remove the viewer option from the PDFPresentationMode constructor
The `viewer` option was *only* used for checking that a document is loaded in `PDFPresentationMode.request`, however that's just as easy to do by simply utilizing `BaseViewer.pagesCount` instead and this way we can also avoid the DOM lookup.
2019-12-06 00:20:56 +01:00
Jonas Jenwald
5c0336872e Handle corrupt ASCII85Decode inline images with truncated EOD markers (issue 11385)
In the PDF document in question, there's an ASCII85Decode inline image where the '>' part of EOD (end-of-data) marker is missing; hence the PDF document is corrupt.
2019-12-05 15:53:18 +01:00
Jonas Jenwald
06b1f619c6 Fix an incorrect condition in BaseViewer.isPageVisible (PR 10217 follow-up)
This was a blatant oversight in PR 10217, since there's obviously no `this.pageNumber` property anywhere in the `BaseViewer`. Luckily this shouldn't have caused any bugs, since the only call-site is also validating the `pageNumber` (but correctly that time).
2019-12-04 13:38:07 +01:00
Tim van der Meij
514b500a6c
Merge pull request #11374 from Snuffleupagus/set-first-pdfPage
Set the first `pdfPage` immediately in `{BaseViewer, PDFThumbnailViewer}.setDocument`
2019-12-01 13:34:36 +01:00
Tim van der Meij
ded56f2fe4
Merge pull request #11373 from Snuffleupagus/fetch-cacheMap-rm-has
Slightly simplify the XRef cache lookup in `XRef.fetch`
2019-12-01 13:23:53 +01:00
Jonas Jenwald
6732df6aae Set the first pdfPage immediately in {BaseViewer, PDFThumbnailViewer}.setDocument
*This patch is simple enough that I almost feel like I'm overlooking some trivial reason why this would be a bad idea.*

Note how in `{BaseViewer, PDFThumbnailViewer}.setDocument` we're always getting the *first* `pdfPage` in order to initialize all pages/thumbnails.
However, once that's done the first `pdfPage` is simply ignored and rendering of the first page thus requires calling `PDFDocumentProxy.getPage` yet again. (And in the `BaseViewer` case, it's even done once more after `onePageRenderedCapability` is resolved.)

All in all, I cannot see why we cannot just immediately set the first `pdfPage` and thus avoid an early round-trip to the API in the `_ensurePdfPageLoaded` method before rendering can begin.
2019-12-01 12:39:55 +01:00
Jonas Jenwald
c3b1c8f857 Slightly simplify the XRef cache lookup in XRef.fetch
Note that the XRef cache will only hold objects returned through `Parser.getObj`, and indirectly via `Lexer.getObj`. Since neither of those methods will ever return `undefined`, we can simply `assert` that when inserting objects into the cache and thus get rid of one function call when doing cache lookups.

Obviously this won't have a huge effect on performance, however `XRef.fetch` is usually called *a lot* in larger documents and this patch thus cannot hurt.
2019-11-30 22:41:53 +01:00
Tim van der Meij
62ec8109b5
Merge pull request #11370 from Snuffleupagus/fetchCompressed-isStream
Stop caching Streams in `XRef.fetchCompressed`
2019-11-30 14:56:47 +01:00
Jonas Jenwald
168c6aecae Stop caching Streams in XRef.fetchCompressed
I'm slightly surprised that this hasn't actually caused any (known) bugs, but that may be more luck than anything else since it fortunately doesn't seem common for Streams to be defined inside of an 'ObjStm'.[1]

Note that in the `XRef.fetchUncompressed` method we're *not* caching Streams, and that for very good reasons too.

 - Streams, especially the `DecodeStream` ones, can become *very* large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so.

 - Attempting to read from the *same* Stream more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position.

 - Given that even the `src/core/` code is now fairly asynchronous, see e.g. the `PartialEvaluator`, it's generally impossible to assert that any one Stream isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Streams isn't going to work in the general case.

All in all, I cannot understand why it'd ever be correct to cache Streams in the `XRef.fetchCompressed` method.

---
[1] One example where that happens is the `issue3115r.pdf` file in the test-suite, where the streams in question are not actually used for anything within the PDF.js code.
2019-11-30 10:21:08 +01:00
Jonas Jenwald
06412a557b Slighthly re-factor XRef.fetchCompressed
- Change all occurences of `var` to `let`/`const`.

 - Initialize the (temporary) Arrays with the correct sizes upfront.

 - Inline the `isCmd` check. Obviously this won't make a huge difference, but given that the check is only relevant for corrupt documents it cannot hurt.
2019-11-30 09:49:51 +01:00
Tim van der Meij
b0aee6b1f0
Merge pull request #11363 from Snuffleupagus/fetchUncompressed-isInteger-checks
Remove the `Number.isInteger` checks from `XRef.fetchUncompressed` (PR 8857 follow-up)
2019-11-29 22:27:38 +01:00
Jonas Jenwald
725566cfea Remove the Number.isInteger checks from XRef.fetchUncompressed (PR 8857 follow-up)
Having ran the entire test-suite locally with these `Number.isInteger` checks removed, there wasn't a single test failure anywhere; see also PR 8857.
Hence everything points to this being completely unnecessary now, and by removing this code there's thus fewer function calls being made in `XRef.fetchUncompressed`.
2019-11-28 23:25:39 +01:00
Tim van der Meij
dcf998a1c1
Merge pull request #11356 from Snuffleupagus/rm-wrong-PDFDocument-comment
Remove outdated, and misleading, JSDoc comment from the `PDFDocument` class
2019-11-25 22:24:54 +01:00
Jonas Jenwald
cc76132c24 Remove outdated, and misleading, JSDoc comment from the PDFDocument class
The contents of this comment hasn't been correct for *years*, ever since the library was properly split into main/worker-threads, so it's probably high time for this to be updated.
2019-11-25 11:36:29 +01:00