pdf.js

Author	SHA1	Message	Date
Tim van der Meij	fefb9ed5b4	Merge pull request #14360 from timvandermeij/updates Update packages and translations	2021-12-11 19:51:38 +01:00
Tim van der Meij	c5847141b4	Update translations to the most recent versions	2021-12-11 19:44:52 +01:00
Tim van der Meij	2757000bb2	Fix some dependency vulnerabilities reported by `npm audit` This is done automatically using the `npm audit fix` command.	2021-12-11 19:44:52 +01:00
Tim van der Meij	d3d8141372	Update packages to the most recent versions	2021-12-11 19:44:48 +01:00
Jonas Jenwald	b1d3e7f121	Support disabling of form editing when `pdfjs.enablePermissions` is set (issue 14356) For encrypted PDF documents without the required permissions set, this patch adds support for disabling of form editing. However, please note that it also requires that the `pdfjs.enablePermissions` preference is set to `true`[1] (since PDF document permissions could be seen as user hostile). Based on https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G6.1942134, this condition hopefully makes sense. --- [1] Either manually with `about:config`, or using e.g. a [Group Policy](https://github.com/mozilla/policy-templates).	2021-12-11 18:26:13 +01:00
Jonas Jenwald	b03281de18	Move the permissions handling into the `BaseViewer` (PR 11789 follow-up) Besides making the permissions-functionality directly available in the viewer-components, these changes are also necessary for the next patch.	2021-12-11 17:13:41 +01:00
Jonas Jenwald	d856ed9395	Merge pull request #14361 from timvandermeij/nodejs Upgrade Node.js to version 16 in the CI workflow	2021-12-11 15:58:00 +01:00
Tim van der Meij	4269148d3d	Upgrade Node.js to version 16 in the CI workflow Version 14 that we used before is now in maintenance mode, so we should upgrade to the most recent LTS version. Moreover, use the most recent `setup-node` workflow version and syntax; see https://github.com/actions/setup-node#usage.	2021-12-11 15:50:23 +01:00
Tim van der Meij	3a8318aa1c	Merge pull request #14359 from Snuffleupagus/PAUSE_EAGER_PAGE_INIT Avoid overloading the worker-thread during eager page initialization in the viewer (PR 11263 follow-up)	2021-12-11 13:28:35 +01:00
Tim van der Meij	a6dd39b645	Merge pull request #14358 from Snuffleupagus/checkLastPage-improvements Improve `PDFDocument.checkLastPage`/`Catalog.getAllPageDicts` for documents with corrupt XRef tables (PR 14311, 14335 follow-up)	2021-12-11 13:07:54 +01:00
Tim van der Meij	70809a80ce	Merge pull request #14355 from Snuffleupagus/api-page-caches-Map Change `WorkerTransport.{pageCache, pagePromises}` from an Array to a Map	2021-12-11 13:00:11 +01:00
Tim van der Meij	2b8a5dce70	Merge pull request #14354 from Snuffleupagus/improve-pageKidsCountCache-further Further improve caching in `Catalog.getPageDict`, for `disableAutoFetch` mode (PR 8207 follow-up)	2021-12-11 12:54:39 +01:00
Jonas Jenwald	90472e5130	Avoid overloading the worker-thread during eager page initialization in the viewer (PR 11263 follow-up) This patch is essentially another continuation of PR 11263, which tried to improve loading/initialization performance of very large/long documents. For most documents, unless they're very long, we'll eagerly initialize all of the pages in the viewer. For shorter documents having all pages loaded/initialized early provides overall better performance/UX in the viewer, however there's cases where it can instead hurt performance. For documents with a couple of thousand pages[1], the parsing and pre-rendering of the second page of the document can be delayed (quite a bit). The reason for this is that we trigger `PDFDocumentProxy.getPage` for all pages early during the viewer initialization, which causes the worker-thread to be swamped with handling (potentially) thousands of `getPage`-calls and leaving very little time for other parsing (such as e.g. of operatorLists). To address this situation, this patch thus proposes temporarily "pausing" the eager `PDFDocumentProxy.getPage`-calls once a threshold has been reached, to give the worker-thread a change to handle other requests.[2] Obviously this may slightly delay the "pagesloaded" event in longer documents, but considering that it's already the result of asynchronous parsing that'll hopefully not be seen as a blocker for these changes.[3] --- [1] A particularly problematic example is https://github.com/mozilla/pdf.js/files/876321/kjv.pdf (16 MB large), which is a document with 2236 pages and a /Pages-tree that's only one level deep. [2] Please note that I initially considered simply chaining the `PDFDocumentProxy.getPage`-calls, however that'd slowed things down for all documents which didn't seem appropriate. [3] This patch will hopefully also make it possible to re-visit PR 11312, since it seems that changing `Catalog.getPageDict` to an `async` method wasn't the problem in itself. Rather it appears that it leads to slightly different timings, thus exacerbating the already existing issues with the worker-thread being overloaded by `getPage`-calls. Having recently worked with that method, there's a couple of (very old) issues that I'd also like to address and having `Catalog.getPageDict` be `async` would simplify things a great deal.	2021-12-10 20:44:06 +01:00
Jonas Jenwald	70ac6b1694	Update `Catalog.getAllPageDicts` to always propagate the actual Errors (PR 14335 follow-up) Rather than "swallowing" the actual Errors, when data fetching fails, ensure that they're always being propagated as intended to the call-site instead. Note that we purposely handle `XRefEntryException` specially, to make it possible to fallback to indexing all XRef objects.	2021-12-10 15:22:36 +01:00
Jonas Jenwald	47f9eef584	Improve `PDFDocument.checkLastPage` for documents with corrupt XRef tables (PR 14311, 14335 follow-up) Rather than trying, and failing, to fetch the entire /Pages-tree for documents with corrupt XRef tables, let's fallback to indexing all objects before trying to invoke the `Catalog.getAllPageDicts` method.	2021-12-10 11:45:09 +01:00
Jonas Jenwald	f39536a30b	Change `WorkerTransport.pagePromises` from an Array to a Map Given that not all pages necessarily are being accessed, or that the pages may be accessed out of order, using a `Map` seems like a more appropriate data-structure here. Finally, also changes the `pagePromises` to a private property since it's not supposed to be accessed from the "outside".	2021-12-09 15:30:10 +01:00
Jonas Jenwald	c5525dcb69	Change `WorkerTransport.pageCache` from an Array to a Map Given that not all pages necessarily are being accessed, or that the pages may be accessed out of order, using a `Map` seems like a more appropriate data-structure here. For one thing, this simplifies iteration since we no longer have to worry about/check if `pageCache`-entries are undefined (which will happen for sparse `Array`s). Of particular note is that we're no longer attempting to "null" the `pageCache`-entry from within the `PDFPageProxy._destroy`-method. Given that synchronous JavaScript will always run to completion[1] and that we're looping through all pages in `WorkerTransport.destroy` and immediately clear the cache afterwards, that code did/does not really make a lot of sense (as far as I can tell). Finally, also changes the `pageCache` to a private property since it's not supposed to be accessed from the "outside". --- [1] Unless there are errors, of course.	2021-12-09 15:29:47 +01:00
Jonas Jenwald	8a05db230e	Further improve caching in `Catalog.getPageDict`, for `disableAutoFetch` mode (PR 8207 follow-up) PR 8207 added caching to improve the performance of `Catalog.getPageDict`, by not having to repeatedly fetch the same data and also reducing the asynchronicity of that method. However, because of another oversight on my part, we're only caching /Page references once we've found the correct page. As long as all pages are loaded in order this doesn't really matter (happens by default in the viewer), but when `disableAutoFetch` is used the pages may be fetched in a more random order (this patch reduces the asynchronicity of `Catalog.getPageDict` slightly in that case).	2021-12-09 12:54:49 +01:00
Tim van der Meij	97dc048e56	Merge pull request #14350 from Snuffleupagus/ccitt-infinite-loop Prevent an infinite loop when parsing corrupt /CCITTFaxDecode data (issue 14305)	2021-12-08 20:01:21 +01:00
Tim van der Meij	b178985615	Merge pull request #14347 from Snuffleupagus/improve-pageKidsCountCache Improve caching in `Catalog.getPageDict` (PR 8207 follow-up)	2021-12-08 19:58:46 +01:00
Jonas Jenwald	e8562173b8	Prevent an infinite loop when parsing corrupt /CCITTFaxDecode data (issue 14305) Fixes one of the documents in issue 14305.	2021-12-07 13:57:25 +01:00
Jonas Jenwald	c42b19f26a	Merge pull request #14348 from Snuffleupagus/issue-8022-reftest Add a (linked) test-case for issue 8022	2021-12-06 16:17:28 +01:00
Jonas Jenwald	909f012fb8	Add a (linked) test-case for issue 8022 Given that [bug 1336591](https://bugzilla.mozilla.org/show_bug.cgi?id=1336591) was just closed as fixed, thus fixing issue 8022 in Firefox, let's add a test-case to enable us to catch any future regressions either in PDF.js or in browsers themselves.	2021-12-06 15:27:40 +01:00
Jonas Jenwald	5f295ba280	Improve caching in `Catalog.getPageDict` (PR 8207 follow-up) PR 8207 added caching to improve the performance of `Catalog.getPageDict`, by not having to repeatedly fetch the same data and also reducing the asynchronicity of that method. However, because of annoying off-by-one errors[1] the caching became less efficient than it could/should be.[2] Note here that the /Pages-tree is zero-indexed, and that e.g. `pageIndex = 5` thus correspond to the sixth page of the document. --- [1] In particular the `currentPageIndex + count < pageIndex` part. [2] For example, even when loading a relatively small/simple document such as `tracemonkey.pdf` in the viewer, the number of `xref.fetchAsync(currentNode)` calls are reduced from `56` to `44` with this patch.	2021-12-06 11:49:31 +01:00
Jonas Jenwald	034b870c4a	Merge pull request #14344 from timvandermeij/test-driver Modernize the test driver	2021-12-05 23:52:46 +01:00
Tim van der Meij	911a9d34b1	Fix code duplication in the rasterization logic in `test/driver.js` Now that the rasterization logic is encapsulated in a class, we can easily move the container creation into a separate static method.	2021-12-05 19:29:39 +01:00
Tim van der Meij	03506f25c0	Move the rasterization logic into one single class This refactoring ensures that we can get rid of the closures and encapsulate the logic in a nicer way with e.g., getters for the style promises.	2021-12-05 19:28:51 +01:00
Tim van der Meij	33dc0628a0	Enable the `no-var` linting rule in `test/driver.js` This is done automatically with the `gulp lint --fix` command with the only exception of the `annotationLayerContext` variable.	2021-12-05 15:41:36 +01:00
Tim van der Meij	5fd4276dcf	Use async/await in the rasterization classes in `test/driver.js` This is achieved by letting the `writeSVG` function return a promise so we don't need callback passing anymore.	2021-12-05 14:11:09 +01:00
Tim van der Meij	13786ef806	Use arrow functions instead of `self` variables in `test/driver.js`	2021-12-05 14:11:08 +01:00
Tim van der Meij	1d1f713bfc	Inline `loadStyles` calls in the rasterization classes in `test/driver.js` The wrapper functions in this case only really added indirection, so this commit simplifies the code a bit.	2021-12-05 13:49:04 +01:00
Tim van der Meij	a58700b0dc	Convert the `Driver` class to ES6 syntax in `test/driver.js`	2021-12-05 13:43:02 +01:00
Tim van der Meij	3264d72e60	Merge pull request #14345 from Snuffleupagus/viewer-pagesPromise-reject Ensure that the viewer handles `BaseViewer` initialization failures	2021-12-05 13:40:24 +01:00
Jonas Jenwald	e027178356	Tweak the "pagesloaded" event handler in `PDFOutlineViewer` These changes improves the consistency ever so slightly in the `PDFOutlineViewer._dispatchEvent` method, by making sure that we can tell the following two cases apart: - The "pagesloaded" event has not yet been fired. - The "pagesloaded" event has been fired, but no pages were available.	2021-12-05 11:04:17 +01:00
Jonas Jenwald	9de30c4ff0	Ensure that the viewer handles `BaseViewer` initialization failures This patch can be tested e.g. with the `poppler-85140-0.pdf` document from the test-suite. For some sufficiently corrupt documents the `getDocument` call will succeed, but fetching even the very first page fails. Currently we only print error messages (in the console) from the `{BaseViewer, PDFThumbnailViewer}.setDocument` methods, but don't actually provide these errors to allow the viewer to handle them properly. In practice this means that the GENERIC viewer won't display the `errorWrapper`, and in the MOZCENTRAL viewer the browser loading indicator is never hidden (since we never unblock the "load" event).	2021-12-05 10:55:47 +01:00
Tim van der Meij	dc455c836e	Merge pull request #14339 from Snuffleupagus/issue-8019-reftest Add a (linked) test-case for issue 8019	2021-12-04 13:26:47 +01:00
Tim van der Meij	335c4c8a43	Merge pull request #14338 from Snuffleupagus/XRef-more-Pages-validation [api-minor] Clear all caches in `XRef.indexObjects`, and improve /Root dictionary validation in `XRef.parse` (issue 14303)	2021-12-04 13:23:40 +01:00
Tim van der Meij	3117985c55	Merge pull request #14340 from Snuffleupagus/Metadata-fetch-error Handle errors when fetching the raw /Metadata (issue 14305)	2021-12-04 13:19:37 +01:00
Tim van der Meij	bceed26e67	Merge pull request #14341 from Snuffleupagus/shadow-prop-assert Ensure that the `shadow` helper function is passed a valid property (PR 14152 follow-up)	2021-12-04 13:17:14 +01:00
Jonas Jenwald	d9fac34596	Ensure that the `shadow` helper function is passed a valid property (PR 14152 follow-up) Trying to shadow a non-existent property is always an implementation mistake, since it leads to the `shadow`-call not having any effect. In PR 14152 I overlooked the fact that it's fairly easy to enforce this during development/testing, since that can help catch e.g. simple spelling bugs.	2021-12-04 10:07:21 +01:00
Jonas Jenwald	40291d1943	Handle errors when fetching the raw /Metadata (issue 14305) Currently the `Catalog.metadata` getter only handles errors during parsing, however in a corrupt PDF document fetching of the raw /Metadata can obviously fail as well. Without this patch the `PDFDocumentProxy.getMetadata` method, in the API, can thus fail which it never should and this will cause the viewer to not initialize all state as expected. Fixes one of the documents in issue 14305.	2021-12-04 09:41:42 +01:00
Jonas Jenwald	ca82e1832f	Add a (linked) test-case for issue 8019 Given that [bug 1336572](https://bugzilla.mozilla.org/show_bug.cgi?id=1336572) was just closed as fixed, thus fixing issue 8019 in Firefox[1], let's add a test-case to enable us to catch any future regressions either in PDF.js or in browsers themselves. --- [1] It also seems to be working in Google Chrome, although I'm having a slightly difficult time deciphering exactly what configurations were affected when looking through issue 8019.	2021-12-04 08:56:04 +01:00
Jonas Jenwald	ad3a271fc4	[api-minor] Clear all caches in `XRef.indexObjects`, and improve /Root dictionary validation in `XRef.parse` (issue 14303) This patch improves handling of a couple of PDF documents from issue 14303. - Update `XRef.indexObjects` to actually clear all XRef-caches. Invalid XRef tables usually cause issues early enough during parsing that we've not populated the XRef-cache, however to prevent any issues we obviously need to clear that one as well. - Improve the /Root dictionary validation in `XRef.parse` (PR 9827 follow-up). In addition to checking that a /Pages entry exists, we'll now also check that it can be successfully fetched and that it's of the correct type. There's really no point trying to use a /Root dictionary that e.g. `Catalog.toplevelPagesDict` will reject, and this way we'll be able to fallback to indexing the objects in corrupt documents. - Throw an `InvalidPDFException`, rather than a general `FormatError`, in `XRef.parse` when no usable /Root dictionary could be found. That really seems more appropriate overall, since all attempts at parsing/recovery have failed. (This part of the patch is API-observable, hence the tag.) With these changes, two existing test-cases are improved and the unit-tests are updated/re-factored to highlight that. In particular `GHOSTSCRIPT-698804-1-fuzzed.pdf` will now both load and "render" correctly, whereas `poppler-395-0-fuzzed.pdf` will now fail immediately upon loading (rather than appearing to work).	2021-12-03 11:57:38 +01:00
Tim van der Meij	e9e4b913c0	Merge pull request #14324 from Snuffleupagus/force-PAGE-scrolling Enforce PAGE-scrolling for very large/long documents (bug 1588435, PR 11263 follow-up)	2021-12-02 20:01:13 +01:00
Tim van der Meij	4c145fc9c4	Merge pull request #14335 from Snuffleupagus/Catalog-getAllPageDicts [Regression] Eagerly fetch/parse the entire /Pages-tree in corrupt documents (issue 14303, PR 14311 follow-up)	2021-12-02 19:54:30 +01:00
Tim van der Meij	aee4b7c73f	Merge pull request #14328 from Snuffleupagus/node-examples-page-cleanup Update (primarily) the Node.js examples to release page resources	2021-12-02 19:44:17 +01:00
Jonas Jenwald	1fac6371d3	[Regression] Eagerly fetch/parse the entire /Pages-tree in corrupt documents (issue 14303, PR 14311 follow-up) Please note: This is similar to the method that existed prior to PR 3848, but the new method will only be used as a fallback when parsing of corrupt PDF documents. The implementation in PR 14311 unfortunately turned out to be way too simplistic, as evident by the recently added test-files in issue 14303, since it may cause infinite loops in `PDFDocument.checkLastPage` for some corrupt PDF documents.[1] To avoid this, the easiest solution that I could come up with was to fallback to eagerly parsing the entire /Pages-tree when the /Count-entry validation fails during document initialization. Fixes at least two of the issues listed in issue 14303, namely the `poppler-395-0.pdf...` and `GHOSTSCRIPT-698804-1.pdf...` documents. --- [1] The whole point of PR 14311 was obviously to get rid of infinte loops during document initialization, not to introduce any more of those.	2021-12-02 14:31:04 +01:00
Jonas Jenwald	f61b74e38e	Merge pull request #14325 from Snuffleupagus/getPageDict-rm-skipCount Remove the unused `skipCount` parameter from `Catalog.getPageDict` (PR 14311 follow-up)	2021-12-02 13:16:26 +01:00
Jonas Jenwald	8ea740c800	Slightly extend the "creates pdf doc from PDF file with bad XRef table" unit-test (PR 14304 follow-up) Given that we're able to "render" this document, let's extend the unit-test to actually check that we're able to obtain the operatorList; although given the overall issues in the document it'll be empty.	2021-12-02 11:51:40 +01:00
Jonas Jenwald	e045cd4520	Remove the unused `skipCount` parameter from `Catalog.getPageDict` (PR 14311 follow-up) This was added in PR 14311, but given that I completely missed to update the `PDFDocument.getPage` signature accordingly it's completely unused. Given that things work just as fine as-is, let's simply remove that optional parameter for now; sorry about the churn here!	2021-12-02 11:51:38 +01:00

1 2 3 4 5 ...

15191 Commits