pdf.js

Author	SHA1	Message	Date
Wojciech Maj	cd03fe014d	Use const in ui_utils.js	2019-12-23 08:00:15 +01:00
Jonas Jenwald	ad0b0d60a5	Ignore Metadata entries with incorrectly encoded characters, when setting the viewer title (bug 1605526) Apparently Ghostscript can, in some cases, generate/include `Metadata` with incorrectly encoded characters.[1] This results in the viewer title looking wrong, which we thus attempt to avoid by falling back to the `Info` entry instead. Please note: Obviously it would be better if this was fixed in the `Metadata` parser instead, rather than using a viewer work-around, but I'm just not sure how or even if that could actually be done given that the `Metadata` stream contains no trace of the original character. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1605526 --- [1] The problematic characters are from the Specials Unicode block, see https://en.wikipedia.org/wiki/Specials_(Unicode_block)	2019-12-22 17:20:14 +01:00
Jonas Jenwald	eeaea85294	Re-factor how the viewer title is determined, based on the `Info`/`Metadata` data With these changes we'll always set the `pdfTitle` to the `Info` dictionary entry first (assuming it exists and isn't empty), before attempting to override it with the `Metadata` stream entry (assuming it exists, is non-empty and valid). There should be no functional changes with this patch, but it will simplify the following patch somewhat.	2019-12-21 18:59:17 +01:00
Tim van der Meij	6316b2a195	Merge pull request #11422 from Snuffleupagus/issue-10768 Use the `strict` mode `assert` in the pdf2png Node.js example (issue 10768)	2019-12-21 13:37:22 +01:00
Tim van der Meij	04df367ab6	Merge pull request #11418 from Snuffleupagus/simplify-fakeWorkerLoader [api-minor] Simplify the fallback fake worker loader code in `src/display/api.js`	2019-12-21 13:24:51 +01:00
Jonas Jenwald	3783eccfa4	Use the `strict` mode `assert` in the pdf2png Node.js example (issue 10768) See https://nodejs.org/api/assert.html#assert_strict_mode	2019-12-21 13:24:13 +01:00
Jonas Jenwald	d370037618	[api-minor] Tweak the Node.js fake worker loader to prevent `Critical dependency: ...` warnings from Webpack Since bundlers, such as Webpack, cannot be told to leave `require` statements alone we are thus forced to jump through hoops in order to prevent these warnings in third-party deployments of the PDF.js library; please see [Webpack issue 8826](https://github.com/webpack/webpack) and libraries such as [require-fool-webpack](https://github.com/sindresorhus/require-fool-webpack). Please note: This is based on the assumption that code running in Node.js won't ever be affected by e.g. Content Security Policies that prevent use of `eval`. If that ever occurs, we should revert to a normal `require` statement and simply document the Webpack warnings instead.	2019-12-20 17:36:10 +01:00
Jonas Jenwald	8519f87efb	Re-factor the `setupFakeWorkerGlobal` function (in `src/display/api.js`), and the `loadFakeWorker` function (in `web/app.js`) This patch reduces some duplication, by moving all fake worker loader code into the `setupFakeWorkerGlobal` function. Furthermore, the functions are simplified further by using `async`/`await` where appropriate.	2019-12-20 17:36:10 +01:00
Jonas Jenwald	a5485e1ef7	[api-minor] Support loading the fake worker from `GlobalWorkerOptions.workerSrc` in Node.js There's no particularily good reason, as far as I can tell, to not support a custom worker path in Node.js environments (even if workers aren't supported). This patch thus make the Node.js fake worker loader code-path consistent with the fallback code-path used with browser fake worker loader. Finally, this patch also deprecates[1] the `fallbackWorkerSrc` functionality, except in Node.js, since the user should always provide correct worker options since the fallback is nothing more than a best-effort solution. --- [1] Although it probably shouldn't be removed until the next major version.	2019-12-20 17:36:10 +01:00
Jonas Jenwald	591e754831	Move the fake worker loader code into the `PDFWorkerClosure` Given that this code isn't needed "globally" in the file, it seems reasonable to move it to where it's actually used instead.	2019-12-20 17:36:10 +01:00
Jonas Jenwald	aab0f91740	[api-minor] Simplify the fallback fake worker loader code in `src/display/api.js` For performance reasons, and to avoid hanging the browser UI, the PDF.js library should always be used with web workers enabled. At this point in time all of the supported browsers should have proper worker support, and Node.js is thus the only environment where workers aren't supported. Hence it no longer seems relevant/necessary to provide, by default, fake worker loaders for various JS builders/bundlers/frameworks in the PDF.js code itself.[1] In order to simplify things, the fake worker loader code is thus simplified to now only support Node.js usage respectively "normal" browser usage out-of-the-box.[2] Please note: The officially intended way of using the PDF.js library is with workers enabled, which can be done by setting `GlobalWorkerOptions.workerSrc`, `GlobalWorkerOptions.workerPort`, or manually providing a `PDFWorker` instance when calling `getDocument`. --- [1] Note that it's still possible to manually disable workers, simply my manually loading the built `pdf.worker.js` file into the (current) global scope, however this's mostly intended for testing/debugging purposes. [2] Unfortunately some bundlers such as Webpack, when used with third-party deployments of the PDF.js library, will start to print `Critical dependency: ...` warnings when run against the built `pdf.js` file from this patch. The reason is that despite the `require` calls being protected by runtime `isNodeJS` checks, it's not possible to simply tell Webpack to just ignore the `require`; please see [Webpack issue 8826](https://github.com/webpack/webpack) and libraries such as [require-fool-webpack](https://github.com/sindresorhus/require-fool-webpack).	2019-12-20 17:36:08 +01:00
Tim van der Meij	693240cf06	Merge pull request #11415 from Snuffleupagus/update-packages Update `npm` packages and l10n files	2019-12-19 23:51:55 +01:00
Jonas Jenwald	9386544555	Update l10n files	2019-12-19 11:30:57 +01:00
Jonas Jenwald	be43001b29	Update `npm` packages	2019-12-19 11:28:13 +01:00
Tim van der Meij	cf3f342373	Merge pull request #11413 from Snuffleupagus/gulp-npm-test Re-factor the `npm test` command, used by Travis, to avoid running the 'default_preferences' tasks concurrently (issue 10732)	2019-12-18 22:39:48 +01:00
Jonas Jenwald	f406263fc2	Re-factor the `npm test` command, used by Travis, to avoid running the 'default_preferences' tasks concurrently (issue 10732) Please note: This patch does not prevent the 'default_preferences' task from running more than once during `npm test`, but it does ensure that the tasks won't run concurrently by running the relevant tests in series. While it would obviously still make sense to re-factor the gulpfile to account for changes in `gulp` version 4, by at least tweaking the `npm test` command the intermittent failures on Travis should at least go away.	2019-12-18 21:43:09 +01:00
Tim van der Meij	7ceb394c43	Merge pull request #11380 from Snuffleupagus/PDFHistory-reset Add a `reset` method to the `PDFHistory` implementation	2019-12-15 16:45:53 +01:00
Tim van der Meij	e09fe7226d	Merge pull request #11406 from Snuffleupagus/find-scanBytes Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string	2019-12-15 16:26:42 +01:00
Jonas Jenwald	dbb82f05fc	Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization. This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function. The main benefits here are: - No longer necessary to allocate temporary `1 kB` strings during initial parsing, thus saving some memory. - In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a total of only 30 loop iterations.)	2019-12-14 13:43:26 +01:00
Tim van der Meij	8b35bba347	Merge pull request #11407 from mozilla/dependabot/npm_and_yarn/npm-6.13.4 Bump npm from 6.13.0 to 6.13.4	2019-12-13 23:37:11 +01:00
dependabot[bot]	80d7b4d8ab	Bump npm from 6.13.0 to 6.13.4 Bumps [npm](https://github.com/npm/cli) from 6.13.0 to 6.13.4. - [Release notes](https://github.com/npm/cli/releases) - [Changelog](https://github.com/npm/cli/blob/latest/CHANGELOG.md) - [Commits](https://github.com/npm/cli/compare/v6.13.0...v6.13.4) Signed-off-by: dependabot[bot] <support@github.com>	2019-12-13 17:30:33 +00:00
Jonas Jenwald	1fab637247	Prevent `if (!state \|\| false) {` in the output, in `PDFHistory._popState`, for e.g. MOZCENTRAL builds By re-ordering the condition, which includes a `PDFJSDev` check, we can avoid meaningless output in the built files.	2019-12-13 10:38:39 +01:00
Jonas Jenwald	c5f2f870cb	Move the `parseCurrentHash` helper function into `PDFHistory` Looking at the `parseCurrentHash` function again it's now difficult for me to understand what I was thinking, since having a helper function that needs to be manually passed a `linkService` reference just looks weird.	2019-12-13 10:38:39 +01:00
Jonas Jenwald	d621899d50	Add a `reset` method to the `PDFHistory` implementation This patch addresses a couple of smaller issues with the `PDFHistory` class: - Most, if not all, other viewer components can be reset in one way or another, and there's no good reason for the `PDFHistory` implementation to be different here. - Currently it's (technically) possible to keep adding entries to the browser history, via the `PDFHistory` instance, even after the document has been closed. That obviously makes no sense, and is caused by the lack of a `reset` method. - The internal `this._isPagesLoaded` property was never actually reset, which would lead to it being temporarily wrong when a new document was opened in the default viewer.	2019-12-13 10:38:39 +01:00
Tim van der Meij	903bf177cb	Merge pull request #11404 from Snuffleupagus/global-ReadableStream [api-minor] Move the `ReadableStream` polyfill to the global scope	2019-12-12 00:05:58 +01:00
Jonas Jenwald	e24050fa13	[api-minor] Move the `ReadableStream` polyfill to the global scope Note that most (reasonably) modern browsers have supported this for a while now, see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#Browser_compatibility By moving the polyfill into `src/shared/compatibility.js` we can thus get rid of the need to manually export/import `ReadableStream` and simply use it directly instead. The only change here which could possibly lead to a difference in behavior is in the `isFetchSupported` function. Previously we attempted to check for the existence of a global `ReadableStream` implementation, which could now pass (assuming obviously that the preceding checks also succeeded). However I'm not sure if that's a problem, since the previous check only confirmed the existence of a native `ReadableStream` implementation and not that it actually worked correctly. Finally it could just as well have been a globally registered polyfill from an application embedding the PDF.js library.	2019-12-11 19:02:37 +01:00
Tim van der Meij	af4ba75f68	Merge pull request #11398 from Snuffleupagus/issue-5887 Attempt to improve the `PDFDocument` error message for empty files (issue 5887)	2019-12-09 22:08:08 +01:00
Jonas Jenwald	b00835f589	Attempt to improve the `PDFDocument` error message for empty files (issue 5887) Given that the error in question is surfaced on the API-side, this patch makes the following changes: - Updates the wording such that it'll hopefully be slightly easier for users to understand. - Changes the plain `Error` to an `InvalidPDFException` instead, since that should work better with the existing Error handling. - Adds a unit-test which loads an empty PDF document (and also improves a pre-existing `InvalidPDFException` message and its test-case).	2019-12-09 15:45:50 +01:00
Tim van der Meij	a6db045789	Merge pull request #11387 from Snuffleupagus/issue-11385 Handle corrupt ASCII85Decode inline images with truncated EOD markers (issue 11385)	2019-12-08 20:27:46 +01:00
Tim van der Meij	16778118f6	Merge pull request #11391 from Snuffleupagus/globalThis Replace `globalScope` with the standard `globalThis` property instead	2019-12-08 20:23:19 +01:00
Jonas Jenwald	71d61e4c6f	Re-factor `getMainThreadWorkerMessageHandler` to support arbitrary global scopes, rather than only `window`	2019-12-08 20:19:04 +01:00
Jonas Jenwald	a8fc306b6e	Replace `globalScope` with the standard `globalThis` property instead Please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/globalThis and note that most (reasonably) modern browsers have supported this for a while now, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/globalThis#Browser_compatibility Since ESLint doesn't support this new global yet, it was added to the `globals` list in the top-level configuration file to prevent issues. Finally, for older browsers a polyfill was added in `ssrc/shared/compatibility.js`.	2019-12-08 20:19:02 +01:00
Tim van der Meij	07212bf5f2	Merge pull request #11390 from Snuffleupagus/checkFirstPage-await-cleanup Ensure that `PDFDocument.checkFirstPage` waits for cleanup to complete (PR 10392 follow-up)	2019-12-08 20:13:00 +01:00
Tim van der Meij	7b503c8923	Merge pull request #11388 from Snuffleupagus/rm-PDFPresentationMode-viewer-option Remove the `viewer` option from the `PDFPresentationMode` constructor	2019-12-08 19:55:43 +01:00
Tim van der Meij	3d549f12fa	Merge pull request #11382 from Snuffleupagus/pr-10217-follow-up Fix an incorrect condition in `BaseViewer.isPageVisible` (PR 10217 follow-up)	2019-12-08 19:52:48 +01:00
Jonas Jenwald	a02122e984	Ensure that `PDFDocument.checkFirstPage` waits for cleanup to complete (PR 10392 follow-up) Given how this method is currently used there shouldn't be any fonts loaded at the point in time where it's called, but it does seem like a bad idea to assume that that's always going to be the case. Since `PDFDocument.checkFirstPage` is already asynchronous, it's easy enough to simply await `Catalog.cleanup` here. (The patch also makes a tiny simplification in a loop in `Catalog.cleanup`.)	2019-12-07 12:31:41 +01:00
Jonas Jenwald	1c466b4648	Remove the `viewer` option from the `PDFPresentationMode` constructor The `viewer` option was only used for checking that a document is loaded in `PDFPresentationMode.request`, however that's just as easy to do by simply utilizing `BaseViewer.pagesCount` instead and this way we can also avoid the DOM lookup.	2019-12-06 00:20:56 +01:00
Jonas Jenwald	5c0336872e	Handle corrupt ASCII85Decode inline images with truncated EOD markers (issue 11385) In the PDF document in question, there's an ASCII85Decode inline image where the '>' part of EOD (end-of-data) marker is missing; hence the PDF document is corrupt.	2019-12-05 15:53:18 +01:00
Jonas Jenwald	06b1f619c6	Fix an incorrect condition in `BaseViewer.isPageVisible` (PR 10217 follow-up) This was a blatant oversight in PR 10217, since there's obviously no `this.pageNumber` property anywhere in the `BaseViewer`. Luckily this shouldn't have caused any bugs, since the only call-site is also validating the `pageNumber` (but correctly that time).	2019-12-04 13:38:07 +01:00
Tim van der Meij	514b500a6c	Merge pull request #11374 from Snuffleupagus/set-first-pdfPage Set the first `pdfPage` immediately in `{BaseViewer, PDFThumbnailViewer}.setDocument`	2019-12-01 13:34:36 +01:00
Tim van der Meij	ded56f2fe4	Merge pull request #11373 from Snuffleupagus/fetch-cacheMap-rm-has Slightly simplify the XRef cache lookup in `XRef.fetch`	2019-12-01 13:23:53 +01:00
Jonas Jenwald	6732df6aae	Set the first `pdfPage` immediately in `{BaseViewer, PDFThumbnailViewer}.setDocument` This patch is simple enough that I almost feel like I'm overlooking some trivial reason why this would be a bad idea. Note how in `{BaseViewer, PDFThumbnailViewer}.setDocument` we're always getting the first `pdfPage` in order to initialize all pages/thumbnails. However, once that's done the first `pdfPage` is simply ignored and rendering of the first page thus requires calling `PDFDocumentProxy.getPage` yet again. (And in the `BaseViewer` case, it's even done once more after `onePageRenderedCapability` is resolved.) All in all, I cannot see why we cannot just immediately set the first `pdfPage` and thus avoid an early round-trip to the API in the `_ensurePdfPageLoaded` method before rendering can begin.	2019-12-01 12:39:55 +01:00
Jonas Jenwald	c3b1c8f857	Slightly simplify the XRef cache lookup in `XRef.fetch` Note that the XRef cache will only hold objects returned through `Parser.getObj`, and indirectly via `Lexer.getObj`. Since neither of those methods will ever return `undefined`, we can simply `assert` that when inserting objects into the cache and thus get rid of one function call when doing cache lookups. Obviously this won't have a huge effect on performance, however `XRef.fetch` is usually called a lot in larger documents and this patch thus cannot hurt.	2019-11-30 22:41:53 +01:00
Tim van der Meij	62ec8109b5	Merge pull request #11370 from Snuffleupagus/fetchCompressed-isStream Stop caching Streams in `XRef.fetchCompressed`	2019-11-30 14:56:47 +01:00
Jonas Jenwald	168c6aecae	Stop caching Streams in `XRef.fetchCompressed` I'm slightly surprised that this hasn't actually caused any (known) bugs, but that may be more luck than anything else since it fortunately doesn't seem common for Streams to be defined inside of an 'ObjStm'.[1] Note that in the `XRef.fetchUncompressed` method we're not caching Streams, and that for very good reasons too. - Streams, especially the `DecodeStream` ones, can become very large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so. - Attempting to read from the same Stream more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position. - Given that even the `src/core/` code is now fairly asynchronous, see e.g. the `PartialEvaluator`, it's generally impossible to assert that any one Stream isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Streams isn't going to work in the general case. All in all, I cannot understand why it'd ever be correct to cache Streams in the `XRef.fetchCompressed` method. --- [1] One example where that happens is the `issue3115r.pdf` file in the test-suite, where the streams in question are not actually used for anything within the PDF.js code.	2019-11-30 10:21:08 +01:00
Jonas Jenwald	06412a557b	Slighthly re-factor `XRef.fetchCompressed` - Change all occurences of `var` to `let`/`const`. - Initialize the (temporary) Arrays with the correct sizes upfront. - Inline the `isCmd` check. Obviously this won't make a huge difference, but given that the check is only relevant for corrupt documents it cannot hurt.	2019-11-30 09:49:51 +01:00
Tim van der Meij	b0aee6b1f0	Merge pull request #11363 from Snuffleupagus/fetchUncompressed-isInteger-checks Remove the `Number.isInteger` checks from `XRef.fetchUncompressed` (PR 8857 follow-up)	2019-11-29 22:27:38 +01:00
Jonas Jenwald	725566cfea	Remove the `Number.isInteger` checks from `XRef.fetchUncompressed` (PR 8857 follow-up) Having ran the entire test-suite locally with these `Number.isInteger` checks removed, there wasn't a single test failure anywhere; see also PR 8857. Hence everything points to this being completely unnecessary now, and by removing this code there's thus fewer function calls being made in `XRef.fetchUncompressed`.	2019-11-28 23:25:39 +01:00
Tim van der Meij	dcf998a1c1	Merge pull request #11356 from Snuffleupagus/rm-wrong-PDFDocument-comment Remove outdated, and misleading, JSDoc comment from the `PDFDocument` class	2019-11-25 22:24:54 +01:00
Jonas Jenwald	cc76132c24	Remove outdated, and misleading, JSDoc comment from the `PDFDocument` class The contents of this comment hasn't been correct for years, ever since the library was properly split into main/worker-threads, so it's probably high time for this to be updated.	2019-11-25 11:36:29 +01:00

1 2 3 4 5 ...

12163 Commits