pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	472bbf4592	Unblock the "load" event in inactive windows/tabs (bug 1746213, PR 11646 follow-up) Given that `requestAnimationFrame` is being used, see the `src/diplay/api.js` file, an inactive window/tab means that rendering will not run and we'll thus not fetch all pages. The latter is a requirement for the "load" event to be unblocked, in the MOZCENTRAL-version of, the default viewer. This patch is a partial solution, since it only addresses the following situations: - A background tab (containing the viewer) is reloaded, e.g. via the tab-bar context menu. - The viewer is loaded in a active tab, but the user switches away from it (or switches to another program window) before rendering has started.	2021-12-19 10:39:48 +01:00
Tim van der Meij	a2ae56f394	Merge pull request #14387 from timvandermeij/test-utils Modernize the test utilities v2.12.313	2021-12-18 16:40:56 +01:00
Tim van der Meij	71326c6a1c	Enable the `no-var` linting rule in `test/testutils.js` This is done automatically with the `gulp lint --fix` command with the only exception of the `parts` variable.	2021-12-18 15:58:47 +01:00
Tim van der Meij	a24982a733	Drop custom confirmation logic in favor of using the built-in Node.js `readline` module Most likely this code predates our use of Node.js, but in Node.js asking for user confirmation is a solved problem, so we can remove the custom logic we have for this, which overall makes things much simpler.	2021-12-18 15:52:04 +01:00
Tim van der Meij	869b396011	Merge pull request #14373 from Snuffleupagus/update-TypeScript [api-minor] Fix broken/missing JSDocs and `typedef`s, to allow updating TypeScript to the latest version (issue 14342)	2021-12-18 13:35:54 +01:00
Tim van der Meij	afa43d3af0	Merge pull request #14386 from Snuffleupagus/issue-14385 Ignore negative /FitH parameters in the viewer (issue 14385)	2021-12-18 13:24:42 +01:00
Jonas Jenwald	6b75e46d11	Ignore negative /FitH parameters in the viewer (issue 14385) This provides a work-around for badly generated PDF documents that contain negative /FitH parameters (in the referenced issue the value `-32768` is used).	2021-12-18 11:35:21 +01:00
Jonas Jenwald	e19020c028	Move the `Default{...}LayerFactory` into a new `web/default_factory.js` file This patch, first of all, removes circular dependencies in the TypeScript definitions. Secondly, it also moves `RenderingStates` into `web/ui_utils.js` to break another type-dependency and directly use the `XfaLayerBuilder` during XFA-printing. Finally, note that this patch slightly reduces the size of the default viewer (e.g. in the `MOZCENTRAL` build) by not having to bundle code which is completely unused.	2021-12-15 23:17:08 +01:00
Jonas Jenwald	e0dba504d2	Fix broken/missing JSDocs and `typedef`s, to allow updating TypeScript to the latest version (issue 14342) This patch circumvents the issues seen when trying to update TypeScript to version `4.5`, by "simply" fixing the broken/missing JSDocs and `typedef`s such that `gulp typestest` now passes. As always, given that I don't really know anything about TypeScript, I cannot tell if this is a "correct" and/or proper way of doing things; we'll need TypeScript users to help out with testing! Please note: I'm sorry about the size of this patch, but given how intertwined all of this unfortunately is it just didn't seem easy to split this into smaller parts. However, one good thing about this TypeScript update is that it helped uncover a number of pre-existing bugs in our JSDocs comments.	2021-12-15 23:14:25 +01:00
Tim van der Meij	d3e1d7090a	Merge pull request #14370 from Snuffleupagus/getPageDict-sync-Pages Slightly reduce asynchronicity in the `Catalog.getPageDict` method (PR 14338 follow-up)	2021-12-15 19:40:39 +01:00
Tim van der Meij	274989ab56	Merge pull request #14372 from Snuffleupagus/BaseViewer-Lang Move the /Lang handling into the `BaseViewer` (PR 14114 follow-up)	2021-12-15 19:37:50 +01:00
Tim van der Meij	21aea0b1a2	Merge pull request #14380 from Snuffleupagus/event-utils Move the `EventBus`, and related functionality, into its own file	2021-12-15 19:34:43 +01:00
Jonas Jenwald	0a19ef6864	Move the `EventBus`, and related functionality, into its own file The size of the `web/ui_utils.js` file has increased over time, as more code has been added to (or moved into) that file. To reduce its size slightly, this patch moves the event-related functionality into a separate file.	2021-12-15 17:18:57 +01:00
Jonas Jenwald	760f765e56	Move the /Lang handling into the `BaseViewer` (PR 14114 follow-up) In PR 14114 this was only added to the default viewer, which means that in the viewer components the user would need to manually implement /Lang handling. This was (obviously) a bad choice, since the viewer components already support e.g. structTrees by default; sorry about overlooking this! To avoid having to make two `getMetadata` API-calls[1] very early during initialization, in the default viewer, the API will now cache its result. This will also come in handy elsewhere in the default viewer, e.g. by reducing parsing when opening the "document properties" dialog. --- [1] This not only includes a round-trip to the worker-thread, but also having to re-parse the /Metadata-entry when it exists.	2021-12-14 13:19:05 +01:00
Jonas Jenwald	a425c9cfa5	Merge pull request #14368 from timvandermeij/puppeteer Consistently use string arguments for page.waitForFunction calls and upgrade to Puppeteer 13.0.0	2021-12-14 10:36:06 +01:00
Jonas Jenwald	fa51fd9428	Slightly reduce asynchronicity in the `Catalog.getPageDict` method (PR 14338 follow-up) After the changes in PR 14338, specifically in the `XRef.parse`-method, the /Pages-entry will now always have been fetched/validated when the `Catalog`-instance is created. Hence we can directly access the /Pages-entry in `Catalog.getPageDict` and thus avoid one asynchronous data-lookup per page in the document. (In practice this is unlikely to show up in e.g. benchmarks, but it really cannot hurt.) Finally, make sure that the `getPageDict`/`getAllPageDicts`-methods track the /Pages-tree reference correctly to prevent circular references in corrupt documents.	2021-12-13 21:18:06 +01:00
Tim van der Meij	da2b3dd3be	Upgrade to Puppeteer 13.0.0	2021-12-12 19:52:11 +01:00
Tim van der Meij	1bc6b846b6	Consistently use string arguments for `page.waitForFunction` calls We use string arguments in all other places, so these two places are a bit inconsistent in that sense. Moreover, it's just one argument now, which makes it a bit easier to read and see what it does because we don't have to pass the always-empty options argument anymore. Finally, doing it like this ensures it works in all Puppeteer versions given https://github.com/puppeteer/puppeteer/issues/7836.	2021-12-12 19:45:34 +01:00
Tim van der Meij	e638a84afe	Merge pull request #14367 from timvandermeij/integration-tests Disable failing print actions integration test in Firefox	2021-12-12 16:20:34 +01:00
Tim van der Meij	2643e6a823	Disable failing print actions integration test in Firefox Once the upstream bug is fixed it can be enabled again because it's causing way too much noise now. This is tracked in issue #14293. Note that I deliberately added a new block so we can easily remove it later on and because the other block is about another bug.	2021-12-12 16:10:50 +01:00
Tim van der Meij	d47b6735b4	Merge pull request #14364 from Snuffleupagus/BaseViewer-conditional-getPermissions Only call `PDFDocumentProxy.getPermissions`, in the viewer, when `pdfjs.enablePermissions` is set (PR 14362 follow-up)	2021-12-12 14:00:04 +01:00
Jonas Jenwald	63af15eb8f	Only call `PDFDocumentProxy.getPermissions`, in the viewer, when `pdfjs.enablePermissions` is set (PR 14362 follow-up) By making this API-call unconditionally, we introduce a (slight) delay in the initialization of all documents. That seems quite unfortunate, since `pdfjs.enablePermissions` is off by default, and it thus seem better only do the API-call when actually needed; sorry about this!	2021-12-11 20:46:19 +01:00
Tim van der Meij	6d8d37e93d	Merge pull request #14362 from Snuffleupagus/issue-14356 Support disabling of form editing when `pdfjs.enablePermissions` is set (issue 14356)	2021-12-11 20:02:23 +01:00
Tim van der Meij	fefb9ed5b4	Merge pull request #14360 from timvandermeij/updates Update packages and translations	2021-12-11 19:51:38 +01:00
Tim van der Meij	c5847141b4	Update translations to the most recent versions	2021-12-11 19:44:52 +01:00
Tim van der Meij	2757000bb2	Fix some dependency vulnerabilities reported by `npm audit` This is done automatically using the `npm audit fix` command.	2021-12-11 19:44:52 +01:00
Tim van der Meij	d3d8141372	Update packages to the most recent versions	2021-12-11 19:44:48 +01:00
Jonas Jenwald	b1d3e7f121	Support disabling of form editing when `pdfjs.enablePermissions` is set (issue 14356) For encrypted PDF documents without the required permissions set, this patch adds support for disabling of form editing. However, please note that it also requires that the `pdfjs.enablePermissions` preference is set to `true`[1] (since PDF document permissions could be seen as user hostile). Based on https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G6.1942134, this condition hopefully makes sense. --- [1] Either manually with `about:config`, or using e.g. a [Group Policy](https://github.com/mozilla/policy-templates).	2021-12-11 18:26:13 +01:00
Jonas Jenwald	b03281de18	Move the permissions handling into the `BaseViewer` (PR 11789 follow-up) Besides making the permissions-functionality directly available in the viewer-components, these changes are also necessary for the next patch.	2021-12-11 17:13:41 +01:00
Jonas Jenwald	d856ed9395	Merge pull request #14361 from timvandermeij/nodejs Upgrade Node.js to version 16 in the CI workflow	2021-12-11 15:58:00 +01:00
Tim van der Meij	4269148d3d	Upgrade Node.js to version 16 in the CI workflow Version 14 that we used before is now in maintenance mode, so we should upgrade to the most recent LTS version. Moreover, use the most recent `setup-node` workflow version and syntax; see https://github.com/actions/setup-node#usage.	2021-12-11 15:50:23 +01:00
Tim van der Meij	3a8318aa1c	Merge pull request #14359 from Snuffleupagus/PAUSE_EAGER_PAGE_INIT Avoid overloading the worker-thread during eager page initialization in the viewer (PR 11263 follow-up)	2021-12-11 13:28:35 +01:00
Tim van der Meij	a6dd39b645	Merge pull request #14358 from Snuffleupagus/checkLastPage-improvements Improve `PDFDocument.checkLastPage`/`Catalog.getAllPageDicts` for documents with corrupt XRef tables (PR 14311, 14335 follow-up)	2021-12-11 13:07:54 +01:00
Tim van der Meij	70809a80ce	Merge pull request #14355 from Snuffleupagus/api-page-caches-Map Change `WorkerTransport.{pageCache, pagePromises}` from an Array to a Map	2021-12-11 13:00:11 +01:00
Tim van der Meij	2b8a5dce70	Merge pull request #14354 from Snuffleupagus/improve-pageKidsCountCache-further Further improve caching in `Catalog.getPageDict`, for `disableAutoFetch` mode (PR 8207 follow-up)	2021-12-11 12:54:39 +01:00
Jonas Jenwald	90472e5130	Avoid overloading the worker-thread during eager page initialization in the viewer (PR 11263 follow-up) This patch is essentially another continuation of PR 11263, which tried to improve loading/initialization performance of very large/long documents. For most documents, unless they're very long, we'll eagerly initialize all of the pages in the viewer. For shorter documents having all pages loaded/initialized early provides overall better performance/UX in the viewer, however there's cases where it can instead hurt performance. For documents with a couple of thousand pages[1], the parsing and pre-rendering of the second page of the document can be delayed (quite a bit). The reason for this is that we trigger `PDFDocumentProxy.getPage` for all pages early during the viewer initialization, which causes the worker-thread to be swamped with handling (potentially) thousands of `getPage`-calls and leaving very little time for other parsing (such as e.g. of operatorLists). To address this situation, this patch thus proposes temporarily "pausing" the eager `PDFDocumentProxy.getPage`-calls once a threshold has been reached, to give the worker-thread a change to handle other requests.[2] Obviously this may slightly delay the "pagesloaded" event in longer documents, but considering that it's already the result of asynchronous parsing that'll hopefully not be seen as a blocker for these changes.[3] --- [1] A particularly problematic example is https://github.com/mozilla/pdf.js/files/876321/kjv.pdf (16 MB large), which is a document with 2236 pages and a /Pages-tree that's only one level deep. [2] Please note that I initially considered simply chaining the `PDFDocumentProxy.getPage`-calls, however that'd slowed things down for all documents which didn't seem appropriate. [3] This patch will hopefully also make it possible to re-visit PR 11312, since it seems that changing `Catalog.getPageDict` to an `async` method wasn't the problem in itself. Rather it appears that it leads to slightly different timings, thus exacerbating the already existing issues with the worker-thread being overloaded by `getPage`-calls. Having recently worked with that method, there's a couple of (very old) issues that I'd also like to address and having `Catalog.getPageDict` be `async` would simplify things a great deal.	2021-12-10 20:44:06 +01:00
Jonas Jenwald	70ac6b1694	Update `Catalog.getAllPageDicts` to always propagate the actual Errors (PR 14335 follow-up) Rather than "swallowing" the actual Errors, when data fetching fails, ensure that they're always being propagated as intended to the call-site instead. Note that we purposely handle `XRefEntryException` specially, to make it possible to fallback to indexing all XRef objects.	2021-12-10 15:22:36 +01:00
Jonas Jenwald	47f9eef584	Improve `PDFDocument.checkLastPage` for documents with corrupt XRef tables (PR 14311, 14335 follow-up) Rather than trying, and failing, to fetch the entire /Pages-tree for documents with corrupt XRef tables, let's fallback to indexing all objects before trying to invoke the `Catalog.getAllPageDicts` method.	2021-12-10 11:45:09 +01:00
Jonas Jenwald	f39536a30b	Change `WorkerTransport.pagePromises` from an Array to a Map Given that not all pages necessarily are being accessed, or that the pages may be accessed out of order, using a `Map` seems like a more appropriate data-structure here. Finally, also changes the `pagePromises` to a private property since it's not supposed to be accessed from the "outside".	2021-12-09 15:30:10 +01:00
Jonas Jenwald	c5525dcb69	Change `WorkerTransport.pageCache` from an Array to a Map Given that not all pages necessarily are being accessed, or that the pages may be accessed out of order, using a `Map` seems like a more appropriate data-structure here. For one thing, this simplifies iteration since we no longer have to worry about/check if `pageCache`-entries are undefined (which will happen for sparse `Array`s). Of particular note is that we're no longer attempting to "null" the `pageCache`-entry from within the `PDFPageProxy._destroy`-method. Given that synchronous JavaScript will always run to completion[1] and that we're looping through all pages in `WorkerTransport.destroy` and immediately clear the cache afterwards, that code did/does not really make a lot of sense (as far as I can tell). Finally, also changes the `pageCache` to a private property since it's not supposed to be accessed from the "outside". --- [1] Unless there are errors, of course.	2021-12-09 15:29:47 +01:00
Jonas Jenwald	8a05db230e	Further improve caching in `Catalog.getPageDict`, for `disableAutoFetch` mode (PR 8207 follow-up) PR 8207 added caching to improve the performance of `Catalog.getPageDict`, by not having to repeatedly fetch the same data and also reducing the asynchronicity of that method. However, because of another oversight on my part, we're only caching /Page references once we've found the correct page. As long as all pages are loaded in order this doesn't really matter (happens by default in the viewer), but when `disableAutoFetch` is used the pages may be fetched in a more random order (this patch reduces the asynchronicity of `Catalog.getPageDict` slightly in that case).	2021-12-09 12:54:49 +01:00
Tim van der Meij	97dc048e56	Merge pull request #14350 from Snuffleupagus/ccitt-infinite-loop Prevent an infinite loop when parsing corrupt /CCITTFaxDecode data (issue 14305)	2021-12-08 20:01:21 +01:00
Tim van der Meij	b178985615	Merge pull request #14347 from Snuffleupagus/improve-pageKidsCountCache Improve caching in `Catalog.getPageDict` (PR 8207 follow-up)	2021-12-08 19:58:46 +01:00
Jonas Jenwald	e8562173b8	Prevent an infinite loop when parsing corrupt /CCITTFaxDecode data (issue 14305) Fixes one of the documents in issue 14305.	2021-12-07 13:57:25 +01:00
Jonas Jenwald	c42b19f26a	Merge pull request #14348 from Snuffleupagus/issue-8022-reftest Add a (linked) test-case for issue 8022	2021-12-06 16:17:28 +01:00
Jonas Jenwald	909f012fb8	Add a (linked) test-case for issue 8022 Given that [bug 1336591](https://bugzilla.mozilla.org/show_bug.cgi?id=1336591) was just closed as fixed, thus fixing issue 8022 in Firefox, let's add a test-case to enable us to catch any future regressions either in PDF.js or in browsers themselves.	2021-12-06 15:27:40 +01:00
Jonas Jenwald	5f295ba280	Improve caching in `Catalog.getPageDict` (PR 8207 follow-up) PR 8207 added caching to improve the performance of `Catalog.getPageDict`, by not having to repeatedly fetch the same data and also reducing the asynchronicity of that method. However, because of annoying off-by-one errors[1] the caching became less efficient than it could/should be.[2] Note here that the /Pages-tree is zero-indexed, and that e.g. `pageIndex = 5` thus correspond to the sixth page of the document. --- [1] In particular the `currentPageIndex + count < pageIndex` part. [2] For example, even when loading a relatively small/simple document such as `tracemonkey.pdf` in the viewer, the number of `xref.fetchAsync(currentNode)` calls are reduced from `56` to `44` with this patch.	2021-12-06 11:49:31 +01:00
Jonas Jenwald	034b870c4a	Merge pull request #14344 from timvandermeij/test-driver Modernize the test driver	2021-12-05 23:52:46 +01:00
Tim van der Meij	911a9d34b1	Fix code duplication in the rasterization logic in `test/driver.js` Now that the rasterization logic is encapsulated in a class, we can easily move the container creation into a separate static method.	2021-12-05 19:29:39 +01:00
Tim van der Meij	03506f25c0	Move the rasterization logic into one single class This refactoring ensures that we can get rid of the closures and encapsulate the logic in a nicer way with e.g., getters for the style promises.	2021-12-05 19:28:51 +01:00

1 2 3 4 5 ...

15164 Commits