pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	a8fc306b6e	Replace `globalScope` with the standard `globalThis` property instead Please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/globalThis and note that most (reasonably) modern browsers have supported this for a while now, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/globalThis#Browser_compatibility Since ESLint doesn't support this new global yet, it was added to the `globals` list in the top-level configuration file to prevent issues. Finally, for older browsers a polyfill was added in `ssrc/shared/compatibility.js`.	2019-12-08 20:19:02 +01:00
Jonas Jenwald	a02122e984	Ensure that `PDFDocument.checkFirstPage` waits for cleanup to complete (PR 10392 follow-up) Given how this method is currently used there shouldn't be any fonts loaded at the point in time where it's called, but it does seem like a bad idea to assume that that's always going to be the case. Since `PDFDocument.checkFirstPage` is already asynchronous, it's easy enough to simply await `Catalog.cleanup` here. (The patch also makes a tiny simplification in a loop in `Catalog.cleanup`.)	2019-12-07 12:31:41 +01:00
Jonas Jenwald	5c0336872e	Handle corrupt ASCII85Decode inline images with truncated EOD markers (issue 11385) In the PDF document in question, there's an ASCII85Decode inline image where the '>' part of EOD (end-of-data) marker is missing; hence the PDF document is corrupt.	2019-12-05 15:53:18 +01:00
Jonas Jenwald	c3b1c8f857	Slightly simplify the XRef cache lookup in `XRef.fetch` Note that the XRef cache will only hold objects returned through `Parser.getObj`, and indirectly via `Lexer.getObj`. Since neither of those methods will ever return `undefined`, we can simply `assert` that when inserting objects into the cache and thus get rid of one function call when doing cache lookups. Obviously this won't have a huge effect on performance, however `XRef.fetch` is usually called a lot in larger documents and this patch thus cannot hurt.	2019-11-30 22:41:53 +01:00
Jonas Jenwald	168c6aecae	Stop caching Streams in `XRef.fetchCompressed` I'm slightly surprised that this hasn't actually caused any (known) bugs, but that may be more luck than anything else since it fortunately doesn't seem common for Streams to be defined inside of an 'ObjStm'.[1] Note that in the `XRef.fetchUncompressed` method we're not caching Streams, and that for very good reasons too. - Streams, especially the `DecodeStream` ones, can become very large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so. - Attempting to read from the same Stream more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position. - Given that even the `src/core/` code is now fairly asynchronous, see e.g. the `PartialEvaluator`, it's generally impossible to assert that any one Stream isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Streams isn't going to work in the general case. All in all, I cannot understand why it'd ever be correct to cache Streams in the `XRef.fetchCompressed` method. --- [1] One example where that happens is the `issue3115r.pdf` file in the test-suite, where the streams in question are not actually used for anything within the PDF.js code.	2019-11-30 10:21:08 +01:00
Jonas Jenwald	06412a557b	Slighthly re-factor `XRef.fetchCompressed` - Change all occurences of `var` to `let`/`const`. - Initialize the (temporary) Arrays with the correct sizes upfront. - Inline the `isCmd` check. Obviously this won't make a huge difference, but given that the check is only relevant for corrupt documents it cannot hurt.	2019-11-30 09:49:51 +01:00
Jonas Jenwald	725566cfea	Remove the `Number.isInteger` checks from `XRef.fetchUncompressed` (PR 8857 follow-up) Having ran the entire test-suite locally with these `Number.isInteger` checks removed, there wasn't a single test failure anywhere; see also PR 8857. Hence everything points to this being completely unnecessary now, and by removing this code there's thus fewer function calls being made in `XRef.fetchUncompressed`.	2019-11-28 23:25:39 +01:00
Jonas Jenwald	cc76132c24	Remove outdated, and misleading, JSDoc comment from the `PDFDocument` class The contents of this comment hasn't been correct for years, ever since the library was properly split into main/worker-threads, so it's probably high time for this to be updated.	2019-11-25 11:36:29 +01:00
Jonas Jenwald	a965662184	Enable the `getter-return`, `no-dupe-else-if`, and `no-setter-return` ESLint rules All of these rules can help catch errors during development. Please note that only `getter-return` required a few changes, which was limited to disabling the rule in a couple of spots; please find additional details about these rules at: - https://eslint.org/docs/rules/getter-return - https://eslint.org/docs/rules/no-dupe-else-if - https://eslint.org/docs/rules/no-setter-return	2019-11-23 11:40:30 +01:00
Tim van der Meij	be02e67972	Merge pull request #11335 from Snuffleupagus/issue-11330 Subtract `stream.start` when getting the `startXRef` property for documents with a Linearization dictionary (issue 11330)	2019-11-16 13:56:01 +01:00
Jonas Jenwald	9199b02a42	Subtract `stream.start` when getting the `startXRef` property for documents with a Linearization dictionary (issue 11330) For documents with a Linearization dictionary the computed `startXRef` position will be relative to the raw file, rather than the actual PDF document itself (which begins with `%PDF-`). Hence it's necessary to subtract `stream.start` in this case, since otherwise the `XRef.readXRef` method will increment the position too far resulting in parsing errors.	2019-11-16 09:29:10 +01:00
Jonas Jenwald	688d15526e	Use `getBytes`, rather than looping over `getByte`, in `FlateStream.prototype.readBlock` Please note: A a similar change was attempted in PR 5005, but it was subsequently backed out (in PR 5069) since other parts of the patch caused issues. With these changes, it's possible to replace repeated function calls within a loop with just a single function call and subsequent assignment instead.	2019-11-15 15:45:31 +01:00
Jonas Jenwald	878432784c	[PDFHistory] Move the IE11 `pushState`/`replaceState` work-around to `src/shared/compatibility.js` (PR 10461 follow-up) I've always disliked the solution in PR 10461, since it required changes to the `PDFHistory` code itself to deal with a bug in IE11. Now that IE11 support is limited, it seems reasonable to remove these `pushState`/`replaceState` hacks from the main code-base and simply use polyfills instead.	2019-11-11 17:48:04 +01:00
Jonas Jenwald	74e00ed93c	Change `isNodeJS` from a function to a constant Given that this shouldn't change after the `pdf.js`/`pdf.worker.js` files have been loaded, it doesn't seems necessary to keep this as a function.	2019-11-10 16:44:29 +01:00
Jonas Jenwald	2817121bc1	Convert `globalScope` and `isNodeJS` to proper modules Slightly unrelated to the rest of the patch, but this also removes an out-of-place `globals` definition from the `web/viewer.js` file.	2019-11-10 16:44:29 +01:00
Tim van der Meij	6763e16804	Merge pull request #11313 from Snuffleupagus/issue-11122 Ensure that Popup annotations, where the parent annotation is a polyline, will always be possible to open/close (issue 11122)	2019-11-10 13:31:51 +01:00
Jonas Jenwald	0233fc07b6	Revert "Convert `Catalog.getPageDict` to an `async` method"	2019-11-09 22:36:23 +01:00
Jonas Jenwald	536a52e981	Ensure that Popup annotations, where the parent annotation is a polyline, will always be possible to open/close (issue 11122) For Popup annotation trigger elements consisting of an arbitrary polyline, you need to ensure that the 'stroke-width' is always non-zero since otherwise it's impossible to actually open/close the popup. Unfortunately I don't believe that any of the test-suites can be used to test this, hence why no tests are included in the patch.	2019-11-09 13:35:59 +01:00
Jonas Jenwald	79d7c002de	Inline a couple of `isRef`/`isDict` checks in the `getPageDict` method As we've seen in numerous other cases, avoiding unnecessary function calls is never a bad thing (even if the effect is probably tiny here). With these changes we also avoid potentially two back-to-back `isDict` checks when evaluating possible Page nodes, and can also no longer accidentally pick a dictionary with an incorrect /Type.	2019-11-08 17:53:00 +01:00
Jonas Jenwald	0d89006bf1	Convert `Catalog.getPageDict` to an `async` method This makes it possible to remove the internal `next` helper function, and also gets rid of the need to manually resolve/reject a `PromiseCapability`.	2019-11-08 17:45:28 +01:00
Jonas Jenwald	98f570c103	Prevent browser exceptions from incorrectly triggering the `assert` in `PDFPageProxy._abortOperatorList` (PR 11069 follow-up) For certain canvas-related errors (and probably others), the browser rendering exceptions may be propagated "as-is" to the PDF.js code. In this case, the exceptions are of the somewhat cryptic `NS_ERROR_FAILURE` type. Unfortunately these aren't actual `Error`s, which thus ends up unintentionally triggering the `assert` in `PDFPageProxy._abortOperatorList`; sorry about that!	2019-11-07 11:37:48 +01:00
Jonas Jenwald	80342e2fdc	Support UTF-16 little-endian strings in the `stringToPDFString` helper function (bug 1593902) The bug report seem to suggest that we don't support UTF-16 strings with a BOM (byte order mark), which we actually do as evident by both the code and a unit-test. The issue at play here is rather that we previously only supported big-endian UTF-16 BOM, and the `Title` string in the PDF document is using a little-endian UTF-16 BOM instead. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1593902	2019-11-05 12:43:17 +01:00
Jonas Jenwald	04497bcb3c	Re-factor the `ObjectLoader._walk` method to be properly asynchronous Rather than having to store a `PromiseCapability` on the `ObjectLoader` instances, we can simply convert `_walk` to be `async` and thus have the same functionality with native JavaScript instead.	2019-11-03 15:04:20 +01:00
Jonas Jenwald	fec1f02b2a	Slightly re-factor setting of the link `target` in `addLinkAttributes` I happened to look at this code and the way that the link target is set seems unecessarily convoluted, since we're using `Object.values` and `Array.prototype.includes` for every link being parsed. Given that the number of link targets are so few, the easist solution honestly seem to be to just use a `switch` statement to do the link target mapping.	2019-11-02 14:01:31 +01:00
Tim van der Meij	bbd2386bd9	Merge pull request #11296 from Snuffleupagus/parseColorSpace-stopAtErrors Allow skipping of errors when parsing broken/unsupported ColorSpaces (issue 6707, issue 11287)	2019-11-01 22:47:50 +01:00
Jonas Jenwald	829d6ba2dc	Ensure that the `peekByte` methods, on the various Streams, handles end of data correctly (PR 5286 follow-up) When the end of data has already been reached for the various Streams, the `getByte` methods will return `-1` to signal that to the caller. Note however that the current position obviously won't be incremented in this case, meaning that the `peekByte` methods will in this case incorrectly decrement the position. Thankfully the corresponding `peekBytes` shouldn't be affected by this bug, since they decrement the current position with the actually returned number of bytes. I'm not aware of any bugs caused by this blatant oversight, but that doesn't mean this shouldn't be fixed :-)	2019-11-01 18:22:33 +01:00
Jonas Jenwald	835d8c2be5	Allow skipping of errors when parsing broken/unsupported ColorSpaces (issue 6707, issue 11287) This will allow us to attempt to recover as much as possible of a page, rather than immediately failing, when a broken/unsupported ColorSpace is encountered. This patch thus extends the framework added in PRs such as e.g. 8240 and 8922, to also cover parsing of ColorSpaces.	2019-11-01 09:01:24 +01:00
Tim van der Meij	30ef05c161	Merge pull request #11290 from Snuffleupagus/MessageHandler-rm-in [MessageHandler] Re-factor and convert the code to a proper `class`	2019-10-31 23:57:52 +01:00
Jonas Jenwald	eedd449cb4	Remove some unused `require` statements, used when loading fake workers, in non-`PRODUCTION` mode The code in question is only relevant in non-`PRODUCTION` mode, i.e. the development version of the viewer run with `gulp server`, and has been completely unused at least since SystemJS was added. I really cannot see any reason to keep this, since it's code which first of all isn't shipping and secondly isn't even being used in the development viewer.	2019-10-31 12:08:07 +01:00
Jonas Jenwald	0293222b96	[MessageHandler] Convert the code to a proper `class`	2019-10-30 23:22:59 +01:00
Jonas Jenwald	5d5733c0a7	[MessageHandler] Convert all instances of `var` to `const` in the code	2019-10-30 23:22:59 +01:00
Jonas Jenwald	f61fb3e0f9	[MessageHandler] Re-factor the `_onComObjOnMessage` function to use early returns When `ReadableStream` support was added to the `MessageHandler`, the `_onComObjOnMessage` function became more complex than previously. All of the nested `if`/`else if`/`else` branches are now, at least in my opinion, making some of this code a bit difficult to follow. Hence this patch, which attempts to help readability by making use of early `return`s and `Error`s. The patch also changes a couple of `var`/`let` occurences to `const`.	2019-10-30 23:22:59 +01:00
Jonas Jenwald	62f28e11a3	[MessageHandler] Remove unnecessary usage of `in` from the code Note that using `in` leads to unnecessary stringification of the properties, which seems completely unnecessary here. To avoid future problems from these changes the `MessageHandler.on` method will now assert, in non-`PRODUCTION`/`TESTING` builds, that it's always called with a function as expected. This patch also renames `callbacksCapabilities` to `callbackCapabilities`, note the removed "s", since using a double plural format looks a bit strange.	2019-10-30 23:22:59 +01:00
Jonas Jenwald	3e46e800a0	[MessageHandler] Replace the internal `isReply` property, as sent when Promise callbacks are used, with enumeration values Given that the `isReply` property is an internal implementation detail, changing its type shouldn't be a problem. Note that by directly indicating if either data or an Error is sent, it's no longer necessary to use `in` when handling the callback.	2019-10-30 23:22:59 +01:00
Jonas Jenwald	2d35a49dd8	Inline a couple of `isRef`/`isDict` checks in the `ObjectLoader` code As we've seen in numerous other cases, avoiding unnecessary function calls is never a bad thing (even if the effect is probably tiny here).	2019-10-29 23:20:10 +01:00
Jonas Jenwald	1133dbac33	Make the `ObjectLoader` use more efficient methods when determining if data needs to be loaded Currently, for data in `ChunkedStream` instances, the `getMissingChunks` method is used in a couple of places to determine if data is already available or if it needs to be loaded. When looking at how `ChunkedStream.getMissingChunks` is being used in the `ObjectLoader` you'll notice that we don't actually care about which specific chunks are missing, but rather only want essentially a yes/no answer to the "Is the data available?" question. Furthermore, when looking at how `ChunkedStream.getMissingChunks` itself is implemented you'll notice that it (somewhat expectedly) always iterates over all chunks. All in all, using `ChunkedStream.getMissingChunks` in the `ObjectLoader` seems like an unnecessary "heavy" and roundabout way to obtain a boolean value. However, it turns out there already exists a `ChunkedStream.allChunksLoaded` method, consisting of a single simple check, which seems like a perfect fit for the `ObjectLoader` use cases. In particular, once the entire PDF document has been loaded (which is usually fairly quick with streaming enabled), you'd really want the `ObjectLoader` to be as simple/quick as possible (similar to e.g. loading a local files) which this patch should help with. Note that I wouldn't expect this patch to have a huge effect on performance, but it will nonetheless save some CPU/memory resources when the `ObjectLoader` is used. (As usual this should help larger PDF documents, w.r.t. both file size and number of pages, the most.)	2019-10-29 23:20:09 +01:00
Jonas Jenwald	0496ea61f5	Ensure that `PartialEvaluator.hasBlendModes` handles Blend Modes in Arrays (PR 11281 follow-up) I completely overlooked this in PR 11281, but you obviously need to make similar changes in `PartialEvaluator.hasBlendModes` since it will otherwise ignore valid Blend Modes.	2019-10-28 11:37:05 +01:00
Jonas Jenwald	5c266f0e8c	Support Blend Modes which are specified in an Array of Names (issue 11279) According to the specification, the first supported Blend Mode should be choosen in this case; please see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G10.4848607	2019-10-26 14:24:31 +02:00
Tim van der Meij	4a5a4328f4	Merge pull request #11273 from Snuffleupagus/getViewport-offsets [api-minor] Support custom `offsetX`/`offsetY` values in `PDFPageProxy.getViewport` and `PageViewport.clone`	2019-10-24 00:08:40 +02:00
Jonas Jenwald	681bc9d70e	[api-minor] Support custom `offsetX`/`offsetY` values in `PDFPageProxy.getViewport` and `PageViewport.clone` There's no good reason, as far as I can tell, to not also support `offsetX`/`offsetY` in addition to e.g. `dontFlip`.	2019-10-23 20:48:14 +02:00
Jonas Jenwald	6f7f8257bc	Slightly re-factor the String handling in `StatTimer` This uses template strings in a couple of spots, and a buffer in the `toString` method.	2019-10-23 14:45:18 +02:00
Jonas Jenwald	8e5d3836d6	Remove the `enable` argument from the `StatTimer` constructor This argument is a left-over from older API code, where we unconditionally initialized `StatTimer` instances for every page. For quite some time that's only been done when `pdfBug` is set, hence it seems unnecessary to keep this functionality.	2019-10-23 14:45:18 +02:00
Jonas Jenwald	9fc40f8b84	Remove `DummyStatTimer` since it's unused now Since this isn't part of the API surface, removing it now that it's unused shouldn't cause any problems.	2019-10-23 14:45:16 +02:00
Jonas Jenwald	860da8b840	Stop using the `DummyStatTimer` in the API, and check if `this._stats` exists when trying to report statistics Even though the currect situation only results in six unnecessary function calls per page, it nonetheless seems completely unnecessary to call dummy functions when `pdfBug` is not set (i.e. the default behaviour).	2019-10-23 13:23:41 +02:00
Jonas Jenwald	df0e1edab5	Re-factor sending of various Exceptions from the worker to the API As can be seen in the API, there's a number of document loading Exception handlers which are both really simple and highly similar. Hence these are changed such that all the relevant Exceptions are sent via one message instead. Furthermore, the patch also avoids unnecessarily re-creating `UnknownErrorException`s at the worker side and removes an unnecessary `bind` call.	2019-10-19 12:54:54 +02:00
Tim van der Meij	11f3851a97	Merge pull request #11243 from Snuffleupagus/issue-11242 Add a fallback for non-embedded composite Verdana fonts (issue 11242)	2019-10-18 23:56:46 +02:00
Tim van der Meij	c54bb222ca	Merge pull request #11231 from Snuffleupagus/indexObjects-entries-gen Allow over-writing entries, in `XRef.indexObjects`, only when the generation number matches (issues 11230, 11139, 9552, 9129, 7303)	2019-10-17 23:56:26 +02:00
Jonas Jenwald	2fcb5afc7b	Add a fallback for non-embedded composite Verdana fonts (issue 11242) Obviously this won't look exactly right, but considering that the PDF file doesn't bother embedding non-standard fonts this is the best that we can do here.	2019-10-17 17:00:55 +02:00
Pedro Luiz Cabral Salomon Prado	4d0c759b7f	Change variable assignment (#11247 ) Remove unused variable assignment in `src/core/fonts.js`	2019-10-16 00:39:25 +02:00
Jonas Jenwald	ffc847eaa5	Allow over-writing entries, in `XRef.indexObjects`, only when the generation number matches (issues 11230, 11139, 9552, 9129, 7303) This patch is making me somewhat worried about future regressions, since it's certainly easy to imagine this completely breaking certain kinds of corrupt/edited PDF documents while fixing others.[1] Obviously it passes all existing reference tests (and even improves one), however compared to many other patches there's no telling how much it could break. The only reason that I'm even submitting this patch, is because of the number of open issues that it would address. Generally speaking though, the best course of action would probably be if `XRef.indexObjects` was re-written to be much more robust (since it currently feels somewhat hand-wavy in parts). E.g. by actually checking/validating more of the objects before committing to them. --- [1] Especially given that it's reverting part of PR 5910, however in the case of issue 5909 it seems that other (more recent) changes have actually made that PR redundant.	2019-10-14 22:10:04 +02:00

1 2 3 4 5 ...

3845 Commits