pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	a5485e1ef7	[api-minor] Support loading the fake worker from `GlobalWorkerOptions.workerSrc` in Node.js There's no particularily good reason, as far as I can tell, to not support a custom worker path in Node.js environments (even if workers aren't supported). This patch thus make the Node.js fake worker loader code-path consistent with the fallback code-path used with browser fake worker loader. Finally, this patch also deprecates[1] the `fallbackWorkerSrc` functionality, except in Node.js, since the user should always provide correct worker options since the fallback is nothing more than a best-effort solution. --- [1] Although it probably shouldn't be removed until the next major version.	2019-12-20 17:36:10 +01:00
Jonas Jenwald	591e754831	Move the fake worker loader code into the `PDFWorkerClosure` Given that this code isn't needed "globally" in the file, it seems reasonable to move it to where it's actually used instead.	2019-12-20 17:36:10 +01:00
Jonas Jenwald	aab0f91740	[api-minor] Simplify the fallback fake worker loader code in `src/display/api.js` For performance reasons, and to avoid hanging the browser UI, the PDF.js library should always be used with web workers enabled. At this point in time all of the supported browsers should have proper worker support, and Node.js is thus the only environment where workers aren't supported. Hence it no longer seems relevant/necessary to provide, by default, fake worker loaders for various JS builders/bundlers/frameworks in the PDF.js code itself.[1] In order to simplify things, the fake worker loader code is thus simplified to now only support Node.js usage respectively "normal" browser usage out-of-the-box.[2] Please note: The officially intended way of using the PDF.js library is with workers enabled, which can be done by setting `GlobalWorkerOptions.workerSrc`, `GlobalWorkerOptions.workerPort`, or manually providing a `PDFWorker` instance when calling `getDocument`. --- [1] Note that it's still possible to manually disable workers, simply my manually loading the built `pdf.worker.js` file into the (current) global scope, however this's mostly intended for testing/debugging purposes. [2] Unfortunately some bundlers such as Webpack, when used with third-party deployments of the PDF.js library, will start to print `Critical dependency: ...` warnings when run against the built `pdf.js` file from this patch. The reason is that despite the `require` calls being protected by runtime `isNodeJS` checks, it's not possible to simply tell Webpack to just ignore the `require`; please see [Webpack issue 8826](https://github.com/webpack/webpack) and libraries such as [require-fool-webpack](https://github.com/sindresorhus/require-fool-webpack).	2019-12-20 17:36:08 +01:00
Jonas Jenwald	dbb82f05fc	Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization. This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function. The main benefits here are: - No longer necessary to allocate temporary `1 kB` strings during initial parsing, thus saving some memory. - In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a total of only 30 loop iterations.)	2019-12-14 13:43:26 +01:00
Jonas Jenwald	e24050fa13	[api-minor] Move the `ReadableStream` polyfill to the global scope Note that most (reasonably) modern browsers have supported this for a while now, see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#Browser_compatibility By moving the polyfill into `src/shared/compatibility.js` we can thus get rid of the need to manually export/import `ReadableStream` and simply use it directly instead. The only change here which could possibly lead to a difference in behavior is in the `isFetchSupported` function. Previously we attempted to check for the existence of a global `ReadableStream` implementation, which could now pass (assuming obviously that the preceding checks also succeeded). However I'm not sure if that's a problem, since the previous check only confirmed the existence of a native `ReadableStream` implementation and not that it actually worked correctly. Finally it could just as well have been a globally registered polyfill from an application embedding the PDF.js library.	2019-12-11 19:02:37 +01:00
Jonas Jenwald	b00835f589	Attempt to improve the `PDFDocument` error message for empty files (issue 5887) Given that the error in question is surfaced on the API-side, this patch makes the following changes: - Updates the wording such that it'll hopefully be slightly easier for users to understand. - Changes the plain `Error` to an `InvalidPDFException` instead, since that should work better with the existing Error handling. - Adds a unit-test which loads an empty PDF document (and also improves a pre-existing `InvalidPDFException` message and its test-case).	2019-12-09 15:45:50 +01:00
Tim van der Meij	a6db045789	Merge pull request #11387 from Snuffleupagus/issue-11385 Handle corrupt ASCII85Decode inline images with truncated EOD markers (issue 11385)	2019-12-08 20:27:46 +01:00
Tim van der Meij	16778118f6	Merge pull request #11391 from Snuffleupagus/globalThis Replace `globalScope` with the standard `globalThis` property instead	2019-12-08 20:23:19 +01:00
Jonas Jenwald	71d61e4c6f	Re-factor `getMainThreadWorkerMessageHandler` to support arbitrary global scopes, rather than only `window`	2019-12-08 20:19:04 +01:00
Jonas Jenwald	a8fc306b6e	Replace `globalScope` with the standard `globalThis` property instead Please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/globalThis and note that most (reasonably) modern browsers have supported this for a while now, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/globalThis#Browser_compatibility Since ESLint doesn't support this new global yet, it was added to the `globals` list in the top-level configuration file to prevent issues. Finally, for older browsers a polyfill was added in `ssrc/shared/compatibility.js`.	2019-12-08 20:19:02 +01:00
Jonas Jenwald	a02122e984	Ensure that `PDFDocument.checkFirstPage` waits for cleanup to complete (PR 10392 follow-up) Given how this method is currently used there shouldn't be any fonts loaded at the point in time where it's called, but it does seem like a bad idea to assume that that's always going to be the case. Since `PDFDocument.checkFirstPage` is already asynchronous, it's easy enough to simply await `Catalog.cleanup` here. (The patch also makes a tiny simplification in a loop in `Catalog.cleanup`.)	2019-12-07 12:31:41 +01:00
Jonas Jenwald	5c0336872e	Handle corrupt ASCII85Decode inline images with truncated EOD markers (issue 11385) In the PDF document in question, there's an ASCII85Decode inline image where the '>' part of EOD (end-of-data) marker is missing; hence the PDF document is corrupt.	2019-12-05 15:53:18 +01:00
Jonas Jenwald	c3b1c8f857	Slightly simplify the XRef cache lookup in `XRef.fetch` Note that the XRef cache will only hold objects returned through `Parser.getObj`, and indirectly via `Lexer.getObj`. Since neither of those methods will ever return `undefined`, we can simply `assert` that when inserting objects into the cache and thus get rid of one function call when doing cache lookups. Obviously this won't have a huge effect on performance, however `XRef.fetch` is usually called a lot in larger documents and this patch thus cannot hurt.	2019-11-30 22:41:53 +01:00
Jonas Jenwald	168c6aecae	Stop caching Streams in `XRef.fetchCompressed` I'm slightly surprised that this hasn't actually caused any (known) bugs, but that may be more luck than anything else since it fortunately doesn't seem common for Streams to be defined inside of an 'ObjStm'.[1] Note that in the `XRef.fetchUncompressed` method we're not caching Streams, and that for very good reasons too. - Streams, especially the `DecodeStream` ones, can become very large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so. - Attempting to read from the same Stream more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position. - Given that even the `src/core/` code is now fairly asynchronous, see e.g. the `PartialEvaluator`, it's generally impossible to assert that any one Stream isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Streams isn't going to work in the general case. All in all, I cannot understand why it'd ever be correct to cache Streams in the `XRef.fetchCompressed` method. --- [1] One example where that happens is the `issue3115r.pdf` file in the test-suite, where the streams in question are not actually used for anything within the PDF.js code.	2019-11-30 10:21:08 +01:00
Jonas Jenwald	06412a557b	Slighthly re-factor `XRef.fetchCompressed` - Change all occurences of `var` to `let`/`const`. - Initialize the (temporary) Arrays with the correct sizes upfront. - Inline the `isCmd` check. Obviously this won't make a huge difference, but given that the check is only relevant for corrupt documents it cannot hurt.	2019-11-30 09:49:51 +01:00
Jonas Jenwald	725566cfea	Remove the `Number.isInteger` checks from `XRef.fetchUncompressed` (PR 8857 follow-up) Having ran the entire test-suite locally with these `Number.isInteger` checks removed, there wasn't a single test failure anywhere; see also PR 8857. Hence everything points to this being completely unnecessary now, and by removing this code there's thus fewer function calls being made in `XRef.fetchUncompressed`.	2019-11-28 23:25:39 +01:00
Jonas Jenwald	cc76132c24	Remove outdated, and misleading, JSDoc comment from the `PDFDocument` class The contents of this comment hasn't been correct for years, ever since the library was properly split into main/worker-threads, so it's probably high time for this to be updated.	2019-11-25 11:36:29 +01:00
Jonas Jenwald	a965662184	Enable the `getter-return`, `no-dupe-else-if`, and `no-setter-return` ESLint rules All of these rules can help catch errors during development. Please note that only `getter-return` required a few changes, which was limited to disabling the rule in a couple of spots; please find additional details about these rules at: - https://eslint.org/docs/rules/getter-return - https://eslint.org/docs/rules/no-dupe-else-if - https://eslint.org/docs/rules/no-setter-return	2019-11-23 11:40:30 +01:00
Tim van der Meij	be02e67972	Merge pull request #11335 from Snuffleupagus/issue-11330 Subtract `stream.start` when getting the `startXRef` property for documents with a Linearization dictionary (issue 11330)	2019-11-16 13:56:01 +01:00
Jonas Jenwald	9199b02a42	Subtract `stream.start` when getting the `startXRef` property for documents with a Linearization dictionary (issue 11330) For documents with a Linearization dictionary the computed `startXRef` position will be relative to the raw file, rather than the actual PDF document itself (which begins with `%PDF-`). Hence it's necessary to subtract `stream.start` in this case, since otherwise the `XRef.readXRef` method will increment the position too far resulting in parsing errors.	2019-11-16 09:29:10 +01:00
Jonas Jenwald	688d15526e	Use `getBytes`, rather than looping over `getByte`, in `FlateStream.prototype.readBlock` Please note: A a similar change was attempted in PR 5005, but it was subsequently backed out (in PR 5069) since other parts of the patch caused issues. With these changes, it's possible to replace repeated function calls within a loop with just a single function call and subsequent assignment instead.	2019-11-15 15:45:31 +01:00
Jonas Jenwald	878432784c	[PDFHistory] Move the IE11 `pushState`/`replaceState` work-around to `src/shared/compatibility.js` (PR 10461 follow-up) I've always disliked the solution in PR 10461, since it required changes to the `PDFHistory` code itself to deal with a bug in IE11. Now that IE11 support is limited, it seems reasonable to remove these `pushState`/`replaceState` hacks from the main code-base and simply use polyfills instead.	2019-11-11 17:48:04 +01:00
Jonas Jenwald	74e00ed93c	Change `isNodeJS` from a function to a constant Given that this shouldn't change after the `pdf.js`/`pdf.worker.js` files have been loaded, it doesn't seems necessary to keep this as a function.	2019-11-10 16:44:29 +01:00
Jonas Jenwald	2817121bc1	Convert `globalScope` and `isNodeJS` to proper modules Slightly unrelated to the rest of the patch, but this also removes an out-of-place `globals` definition from the `web/viewer.js` file.	2019-11-10 16:44:29 +01:00
Tim van der Meij	6763e16804	Merge pull request #11313 from Snuffleupagus/issue-11122 Ensure that Popup annotations, where the parent annotation is a polyline, will always be possible to open/close (issue 11122)	2019-11-10 13:31:51 +01:00
Jonas Jenwald	0233fc07b6	Revert "Convert `Catalog.getPageDict` to an `async` method"	2019-11-09 22:36:23 +01:00
Jonas Jenwald	536a52e981	Ensure that Popup annotations, where the parent annotation is a polyline, will always be possible to open/close (issue 11122) For Popup annotation trigger elements consisting of an arbitrary polyline, you need to ensure that the 'stroke-width' is always non-zero since otherwise it's impossible to actually open/close the popup. Unfortunately I don't believe that any of the test-suites can be used to test this, hence why no tests are included in the patch.	2019-11-09 13:35:59 +01:00
Jonas Jenwald	79d7c002de	Inline a couple of `isRef`/`isDict` checks in the `getPageDict` method As we've seen in numerous other cases, avoiding unnecessary function calls is never a bad thing (even if the effect is probably tiny here). With these changes we also avoid potentially two back-to-back `isDict` checks when evaluating possible Page nodes, and can also no longer accidentally pick a dictionary with an incorrect /Type.	2019-11-08 17:53:00 +01:00
Jonas Jenwald	0d89006bf1	Convert `Catalog.getPageDict` to an `async` method This makes it possible to remove the internal `next` helper function, and also gets rid of the need to manually resolve/reject a `PromiseCapability`.	2019-11-08 17:45:28 +01:00
Jonas Jenwald	98f570c103	Prevent browser exceptions from incorrectly triggering the `assert` in `PDFPageProxy._abortOperatorList` (PR 11069 follow-up) For certain canvas-related errors (and probably others), the browser rendering exceptions may be propagated "as-is" to the PDF.js code. In this case, the exceptions are of the somewhat cryptic `NS_ERROR_FAILURE` type. Unfortunately these aren't actual `Error`s, which thus ends up unintentionally triggering the `assert` in `PDFPageProxy._abortOperatorList`; sorry about that!	2019-11-07 11:37:48 +01:00
Jonas Jenwald	80342e2fdc	Support UTF-16 little-endian strings in the `stringToPDFString` helper function (bug 1593902) The bug report seem to suggest that we don't support UTF-16 strings with a BOM (byte order mark), which we actually do as evident by both the code and a unit-test. The issue at play here is rather that we previously only supported big-endian UTF-16 BOM, and the `Title` string in the PDF document is using a little-endian UTF-16 BOM instead. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1593902	2019-11-05 12:43:17 +01:00
Jonas Jenwald	04497bcb3c	Re-factor the `ObjectLoader._walk` method to be properly asynchronous Rather than having to store a `PromiseCapability` on the `ObjectLoader` instances, we can simply convert `_walk` to be `async` and thus have the same functionality with native JavaScript instead.	2019-11-03 15:04:20 +01:00
Jonas Jenwald	fec1f02b2a	Slightly re-factor setting of the link `target` in `addLinkAttributes` I happened to look at this code and the way that the link target is set seems unecessarily convoluted, since we're using `Object.values` and `Array.prototype.includes` for every link being parsed. Given that the number of link targets are so few, the easist solution honestly seem to be to just use a `switch` statement to do the link target mapping.	2019-11-02 14:01:31 +01:00
Tim van der Meij	bbd2386bd9	Merge pull request #11296 from Snuffleupagus/parseColorSpace-stopAtErrors Allow skipping of errors when parsing broken/unsupported ColorSpaces (issue 6707, issue 11287)	2019-11-01 22:47:50 +01:00
Jonas Jenwald	829d6ba2dc	Ensure that the `peekByte` methods, on the various Streams, handles end of data correctly (PR 5286 follow-up) When the end of data has already been reached for the various Streams, the `getByte` methods will return `-1` to signal that to the caller. Note however that the current position obviously won't be incremented in this case, meaning that the `peekByte` methods will in this case incorrectly decrement the position. Thankfully the corresponding `peekBytes` shouldn't be affected by this bug, since they decrement the current position with the actually returned number of bytes. I'm not aware of any bugs caused by this blatant oversight, but that doesn't mean this shouldn't be fixed :-)	2019-11-01 18:22:33 +01:00
Jonas Jenwald	835d8c2be5	Allow skipping of errors when parsing broken/unsupported ColorSpaces (issue 6707, issue 11287) This will allow us to attempt to recover as much as possible of a page, rather than immediately failing, when a broken/unsupported ColorSpace is encountered. This patch thus extends the framework added in PRs such as e.g. 8240 and 8922, to also cover parsing of ColorSpaces.	2019-11-01 09:01:24 +01:00
Tim van der Meij	30ef05c161	Merge pull request #11290 from Snuffleupagus/MessageHandler-rm-in [MessageHandler] Re-factor and convert the code to a proper `class`	2019-10-31 23:57:52 +01:00
Jonas Jenwald	eedd449cb4	Remove some unused `require` statements, used when loading fake workers, in non-`PRODUCTION` mode The code in question is only relevant in non-`PRODUCTION` mode, i.e. the development version of the viewer run with `gulp server`, and has been completely unused at least since SystemJS was added. I really cannot see any reason to keep this, since it's code which first of all isn't shipping and secondly isn't even being used in the development viewer.	2019-10-31 12:08:07 +01:00
Jonas Jenwald	0293222b96	[MessageHandler] Convert the code to a proper `class`	2019-10-30 23:22:59 +01:00
Jonas Jenwald	5d5733c0a7	[MessageHandler] Convert all instances of `var` to `const` in the code	2019-10-30 23:22:59 +01:00
Jonas Jenwald	f61fb3e0f9	[MessageHandler] Re-factor the `_onComObjOnMessage` function to use early returns When `ReadableStream` support was added to the `MessageHandler`, the `_onComObjOnMessage` function became more complex than previously. All of the nested `if`/`else if`/`else` branches are now, at least in my opinion, making some of this code a bit difficult to follow. Hence this patch, which attempts to help readability by making use of early `return`s and `Error`s. The patch also changes a couple of `var`/`let` occurences to `const`.	2019-10-30 23:22:59 +01:00
Jonas Jenwald	62f28e11a3	[MessageHandler] Remove unnecessary usage of `in` from the code Note that using `in` leads to unnecessary stringification of the properties, which seems completely unnecessary here. To avoid future problems from these changes the `MessageHandler.on` method will now assert, in non-`PRODUCTION`/`TESTING` builds, that it's always called with a function as expected. This patch also renames `callbacksCapabilities` to `callbackCapabilities`, note the removed "s", since using a double plural format looks a bit strange.	2019-10-30 23:22:59 +01:00
Jonas Jenwald	3e46e800a0	[MessageHandler] Replace the internal `isReply` property, as sent when Promise callbacks are used, with enumeration values Given that the `isReply` property is an internal implementation detail, changing its type shouldn't be a problem. Note that by directly indicating if either data or an Error is sent, it's no longer necessary to use `in` when handling the callback.	2019-10-30 23:22:59 +01:00
Jonas Jenwald	2d35a49dd8	Inline a couple of `isRef`/`isDict` checks in the `ObjectLoader` code As we've seen in numerous other cases, avoiding unnecessary function calls is never a bad thing (even if the effect is probably tiny here).	2019-10-29 23:20:10 +01:00
Jonas Jenwald	1133dbac33	Make the `ObjectLoader` use more efficient methods when determining if data needs to be loaded Currently, for data in `ChunkedStream` instances, the `getMissingChunks` method is used in a couple of places to determine if data is already available or if it needs to be loaded. When looking at how `ChunkedStream.getMissingChunks` is being used in the `ObjectLoader` you'll notice that we don't actually care about which specific chunks are missing, but rather only want essentially a yes/no answer to the "Is the data available?" question. Furthermore, when looking at how `ChunkedStream.getMissingChunks` itself is implemented you'll notice that it (somewhat expectedly) always iterates over all chunks. All in all, using `ChunkedStream.getMissingChunks` in the `ObjectLoader` seems like an unnecessary "heavy" and roundabout way to obtain a boolean value. However, it turns out there already exists a `ChunkedStream.allChunksLoaded` method, consisting of a single simple check, which seems like a perfect fit for the `ObjectLoader` use cases. In particular, once the entire PDF document has been loaded (which is usually fairly quick with streaming enabled), you'd really want the `ObjectLoader` to be as simple/quick as possible (similar to e.g. loading a local files) which this patch should help with. Note that I wouldn't expect this patch to have a huge effect on performance, but it will nonetheless save some CPU/memory resources when the `ObjectLoader` is used. (As usual this should help larger PDF documents, w.r.t. both file size and number of pages, the most.)	2019-10-29 23:20:09 +01:00
Jonas Jenwald	0496ea61f5	Ensure that `PartialEvaluator.hasBlendModes` handles Blend Modes in Arrays (PR 11281 follow-up) I completely overlooked this in PR 11281, but you obviously need to make similar changes in `PartialEvaluator.hasBlendModes` since it will otherwise ignore valid Blend Modes.	2019-10-28 11:37:05 +01:00
Jonas Jenwald	5c266f0e8c	Support Blend Modes which are specified in an Array of Names (issue 11279) According to the specification, the first supported Blend Mode should be choosen in this case; please see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G10.4848607	2019-10-26 14:24:31 +02:00
Tim van der Meij	4a5a4328f4	Merge pull request #11273 from Snuffleupagus/getViewport-offsets [api-minor] Support custom `offsetX`/`offsetY` values in `PDFPageProxy.getViewport` and `PageViewport.clone`	2019-10-24 00:08:40 +02:00
Jonas Jenwald	681bc9d70e	[api-minor] Support custom `offsetX`/`offsetY` values in `PDFPageProxy.getViewport` and `PageViewport.clone` There's no good reason, as far as I can tell, to not also support `offsetX`/`offsetY` in addition to e.g. `dontFlip`.	2019-10-23 20:48:14 +02:00
Jonas Jenwald	6f7f8257bc	Slightly re-factor the String handling in `StatTimer` This uses template strings in a couple of spots, and a buffer in the `toString` method.	2019-10-23 14:45:18 +02:00

1 2 3 4 5 ...

3754 Commits