Commit Graph

15164 Commits

Author SHA1 Message Date
Jonas Jenwald
472bbf4592 Unblock the "load" event in inactive windows/tabs (bug 1746213, PR 11646 follow-up)
Given that `requestAnimationFrame` is being used, see the `src/diplay/api.js` file, an inactive window/tab means that rendering will not run and we'll thus not fetch all pages. The latter is a requirement for the "load" event to be unblocked, in the MOZCENTRAL-version of, the default viewer.

This patch is a *partial* solution, since it only addresses the following situations:
 - A *background*  tab (containing the viewer) is reloaded, e.g. via the tab-bar context menu.
 - The viewer is loaded in a active tab, but the user switches away from it (or switches to another program window) *before* rendering has started.
2021-12-19 10:39:48 +01:00
Tim van der Meij
a2ae56f394
Merge pull request #14387 from timvandermeij/test-utils
Modernize the test utilities
2021-12-18 16:40:56 +01:00
Tim van der Meij
71326c6a1c
Enable the no-var linting rule in test/testutils.js
This is done automatically with the `gulp lint --fix` command with the
only exception of the `parts` variable.
2021-12-18 15:58:47 +01:00
Tim van der Meij
a24982a733
Drop custom confirmation logic in favor of using the built-in Node.js readline module
Most likely this code predates our use of Node.js, but in Node.js asking
for user confirmation is a solved problem, so we can remove the custom
logic we have for this, which overall makes things much simpler.
2021-12-18 15:52:04 +01:00
Tim van der Meij
869b396011
Merge pull request #14373 from Snuffleupagus/update-TypeScript
[api-minor] Fix broken/missing JSDocs and `typedef`s, to allow updating TypeScript to the latest version (issue 14342)
2021-12-18 13:35:54 +01:00
Tim van der Meij
afa43d3af0
Merge pull request #14386 from Snuffleupagus/issue-14385
Ignore *negative* /FitH parameters in the viewer (issue 14385)
2021-12-18 13:24:42 +01:00
Jonas Jenwald
6b75e46d11 Ignore *negative* /FitH parameters in the viewer (issue 14385)
This provides a work-around for badly generated PDF documents that contain *negative* /FitH parameters (in the referenced issue the value `-32768` is used).
2021-12-18 11:35:21 +01:00
Jonas Jenwald
e19020c028 Move the Default{...}LayerFactory into a new web/default_factory.js file
This patch, first of all, removes circular dependencies in the TypeScript definitions. Secondly, it also moves `RenderingStates` into `web/ui_utils.js` to break another type-dependency and directly use the `XfaLayerBuilder` during XFA-printing.
Finally, note that this patch *slightly* reduces the size of the default viewer (e.g. in the `MOZCENTRAL` build) by not having to bundle code which is completely unused.
2021-12-15 23:17:08 +01:00
Jonas Jenwald
e0dba504d2 Fix broken/missing JSDocs and typedefs, to allow updating TypeScript to the latest version (issue 14342)
This patch circumvents the issues seen when trying to update TypeScript to version `4.5`, by "simply" fixing the broken/missing JSDocs and `typedef`s such that `gulp typestest` now passes.
As always, given that I don't really know anything about TypeScript, I cannot tell if this is a "correct" and/or proper way of doing things; we'll need TypeScript users to help out with testing!

*Please note:* I'm sorry about the size of this patch, but given how intertwined all of this unfortunately is it just didn't seem easy to split this into smaller parts.
However, one good thing about this TypeScript update is that it helped uncover a number of pre-existing bugs in our JSDocs comments.
2021-12-15 23:14:25 +01:00
Tim van der Meij
d3e1d7090a
Merge pull request #14370 from Snuffleupagus/getPageDict-sync-Pages
Slightly reduce asynchronicity in the `Catalog.getPageDict` method (PR 14338 follow-up)
2021-12-15 19:40:39 +01:00
Tim van der Meij
274989ab56
Merge pull request #14372 from Snuffleupagus/BaseViewer-Lang
Move the /Lang handling into the `BaseViewer` (PR 14114 follow-up)
2021-12-15 19:37:50 +01:00
Tim van der Meij
21aea0b1a2
Merge pull request #14380 from Snuffleupagus/event-utils
Move the `EventBus`, and related functionality, into its own file
2021-12-15 19:34:43 +01:00
Jonas Jenwald
0a19ef6864 Move the EventBus, and related functionality, into its own file
The size of the `web/ui_utils.js` file has increased over time, as more code has been added to (or moved into) that file. To reduce its size slightly, this patch moves the event-related functionality into a separate file.
2021-12-15 17:18:57 +01:00
Jonas Jenwald
760f765e56 Move the /Lang handling into the BaseViewer (PR 14114 follow-up)
In PR 14114 this was only added to the default viewer, which means that in the viewer components the user would need to *manually* implement /Lang handling. This was (obviously) a bad choice, since the viewer components already support e.g. structTrees by default; sorry about overlooking this!

To avoid having to make *two* `getMetadata` API-calls[1] very early during initialization, in the default viewer, the API will now cache its result. This will also come in handy elsewhere in the default viewer, e.g. by reducing parsing when opening the "document properties" dialog.

---
[1] This not only includes a round-trip to the worker-thread, but also having to re-parse the /Metadata-entry when it exists.
2021-12-14 13:19:05 +01:00
Jonas Jenwald
a425c9cfa5
Merge pull request #14368 from timvandermeij/puppeteer
Consistently use string arguments for page.waitForFunction calls and upgrade to Puppeteer 13.0.0
2021-12-14 10:36:06 +01:00
Jonas Jenwald
fa51fd9428 Slightly reduce asynchronicity in the Catalog.getPageDict method (PR 14338 follow-up)
After the changes in PR 14338, specifically in the `XRef.parse`-method, the /Pages-entry will now always have been fetched/validated when the `Catalog`-instance is created.
Hence we can directly access the /Pages-entry in `Catalog.getPageDict` and thus avoid *one* asynchronous data-lookup per page in the document. (In practice this is unlikely to show up in e.g. benchmarks, but it really cannot hurt.)

Finally, make sure that the `getPageDict`/`getAllPageDicts`-methods track the /Pages-tree reference correctly to prevent circular references in corrupt documents.
2021-12-13 21:18:06 +01:00
Tim van der Meij
da2b3dd3be
Upgrade to Puppeteer 13.0.0 2021-12-12 19:52:11 +01:00
Tim van der Meij
1bc6b846b6
Consistently use string arguments for page.waitForFunction calls
We use string arguments in all other places, so these two places are a
bit inconsistent in that sense. Moreover, it's just one argument now,
which makes it a bit easier to read and see what it does because we
don't have to pass the always-empty options argument anymore. Finally,
doing it like this ensures it works in all Puppeteer versions given
https://github.com/puppeteer/puppeteer/issues/7836.
2021-12-12 19:45:34 +01:00
Tim van der Meij
e638a84afe
Merge pull request #14367 from timvandermeij/integration-tests
Disable failing print actions integration test in Firefox
2021-12-12 16:20:34 +01:00
Tim van der Meij
2643e6a823
Disable failing print actions integration test in Firefox
Once the upstream bug is fixed it can be enabled again because it's
causing way too much noise now. This is tracked in issue #14293. Note
that I deliberately added a new block so we can easily remove it later
on and because the other block is about another bug.
2021-12-12 16:10:50 +01:00
Tim van der Meij
d47b6735b4
Merge pull request #14364 from Snuffleupagus/BaseViewer-conditional-getPermissions
Only call `PDFDocumentProxy.getPermissions`, in the viewer, when `pdfjs.enablePermissions` is set (PR 14362 follow-up)
2021-12-12 14:00:04 +01:00
Jonas Jenwald
63af15eb8f Only call PDFDocumentProxy.getPermissions, in the viewer, when pdfjs.enablePermissions is set (PR 14362 follow-up)
By making this API-call *unconditionally*, we introduce a (slight) delay in the initialization of *all* documents.
That seems quite unfortunate, since `pdfjs.enablePermissions` is off by default, and it thus seem better only do the API-call when actually needed; sorry about this!
2021-12-11 20:46:19 +01:00
Tim van der Meij
6d8d37e93d
Merge pull request #14362 from Snuffleupagus/issue-14356
Support disabling of form editing when `pdfjs.enablePermissions` is set (issue 14356)
2021-12-11 20:02:23 +01:00
Tim van der Meij
fefb9ed5b4
Merge pull request #14360 from timvandermeij/updates
Update packages and translations
2021-12-11 19:51:38 +01:00
Tim van der Meij
c5847141b4
Update translations to the most recent versions 2021-12-11 19:44:52 +01:00
Tim van der Meij
2757000bb2
Fix some dependency vulnerabilities reported by npm audit
This is done automatically using the `npm audit fix` command.
2021-12-11 19:44:52 +01:00
Tim van der Meij
d3d8141372
Update packages to the most recent versions 2021-12-11 19:44:48 +01:00
Jonas Jenwald
b1d3e7f121 Support disabling of form editing when pdfjs.enablePermissions is set (issue 14356)
For encrypted PDF documents without the required permissions set, this patch adds support for disabling of form editing. However, please note that it also requires that the `pdfjs.enablePermissions` preference is set to `true`[1] (since PDF document permissions could be seen as user hostile).

Based on https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G6.1942134, this condition hopefully makes sense.

---
[1] Either manually with `about:config`, or using e.g. a [Group Policy](https://github.com/mozilla/policy-templates).
2021-12-11 18:26:13 +01:00
Jonas Jenwald
b03281de18 Move the permissions handling into the BaseViewer (PR 11789 follow-up)
Besides making the permissions-functionality directly available in the viewer-components, these changes are also necessary for the next patch.
2021-12-11 17:13:41 +01:00
Jonas Jenwald
d856ed9395
Merge pull request #14361 from timvandermeij/nodejs
Upgrade Node.js to version 16 in the CI workflow
2021-12-11 15:58:00 +01:00
Tim van der Meij
4269148d3d
Upgrade Node.js to version 16 in the CI workflow
Version 14 that we used before is now in maintenance mode, so we should
upgrade to the most recent LTS version.

Moreover, use the most recent `setup-node` workflow version and syntax;
see https://github.com/actions/setup-node#usage.
2021-12-11 15:50:23 +01:00
Tim van der Meij
3a8318aa1c
Merge pull request #14359 from Snuffleupagus/PAUSE_EAGER_PAGE_INIT
Avoid overloading the worker-thread during eager page initialization in the viewer (PR 11263 follow-up)
2021-12-11 13:28:35 +01:00
Tim van der Meij
a6dd39b645
Merge pull request #14358 from Snuffleupagus/checkLastPage-improvements
Improve `PDFDocument.checkLastPage`/`Catalog.getAllPageDicts` for documents with corrupt XRef tables (PR 14311, 14335 follow-up)
2021-12-11 13:07:54 +01:00
Tim van der Meij
70809a80ce
Merge pull request #14355 from Snuffleupagus/api-page-caches-Map
Change `WorkerTransport.{pageCache, pagePromises}` from an Array to a Map
2021-12-11 13:00:11 +01:00
Tim van der Meij
2b8a5dce70
Merge pull request #14354 from Snuffleupagus/improve-pageKidsCountCache-further
Further improve caching in `Catalog.getPageDict`, for `disableAutoFetch` mode (PR 8207 follow-up)
2021-12-11 12:54:39 +01:00
Jonas Jenwald
90472e5130 Avoid overloading the worker-thread during eager page initialization in the viewer (PR 11263 follow-up)
This patch is essentially *another* continuation of PR 11263, which tried to improve loading/initialization performance of *very* large/long documents.

For most documents, unless they're *very* long, we'll eagerly initialize all of the pages in the viewer. For shorter documents having all pages loaded/initialized early provides overall better performance/UX in the viewer, however there's cases where it can instead *hurt* performance.
For documents with a couple of thousand pages[1], the parsing and pre-rendering of the *second* page of the document can be delayed (quite a bit). The reason for this is that we trigger `PDFDocumentProxy.getPage` for *all pages* early during the viewer initialization, which causes the worker-thread to be swamped with handling (potentially) thousands of `getPage`-calls and leaving very little time for other parsing (such as e.g. of operatorLists).

To address this situation, this patch thus proposes temporarily "pausing" the eager `PDFDocumentProxy.getPage`-calls once a threshold has been reached, to give the worker-thread a change to handle other requests.[2]

Obviously this may *slightly* delay the "pagesloaded" event in longer documents, but considering that it's already the result of asynchronous parsing that'll hopefully not be seen as a blocker for these changes.[3]

---
[1] A particularly problematic example is https://github.com/mozilla/pdf.js/files/876321/kjv.pdf (16 MB large), which is a document with 2236 pages and a /Pages-tree that's only *one* level deep.

[2] Please note that I initially considered simply chaining the `PDFDocumentProxy.getPage`-calls, however that'd slowed things down for all documents which didn't seem appropriate.

[3] This patch will *hopefully* also make it possible to re-visit PR 11312, since it seems that changing `Catalog.getPageDict` to an `async` method wasn't the problem in itself. Rather it appears that it leads to slightly different timings, thus exacerbating the already existing issues with the worker-thread being overloaded by `getPage`-calls.
Having recently worked with that method, there's a couple of (very old) issues that I'd also like to address and having `Catalog.getPageDict` be `async` would simplify things a great deal.
2021-12-10 20:44:06 +01:00
Jonas Jenwald
70ac6b1694 Update Catalog.getAllPageDicts to always propagate the actual Errors (PR 14335 follow-up)
Rather than "swallowing" the actual Errors, when data fetching fails, ensure that they're always being propagated as intended to the call-site instead.
Note that we purposely handle `XRefEntryException` specially, to make it possible to fallback to indexing all XRef objects.
2021-12-10 15:22:36 +01:00
Jonas Jenwald
47f9eef584 Improve PDFDocument.checkLastPage for documents with corrupt XRef tables (PR 14311, 14335 follow-up)
Rather than trying, and failing, to fetch the entire /Pages-tree for documents with corrupt XRef tables, let's fallback to indexing all objects *before* trying to invoke the `Catalog.getAllPageDicts` method.
2021-12-10 11:45:09 +01:00
Jonas Jenwald
f39536a30b Change WorkerTransport.pagePromises from an Array to a Map
Given that not all pages necessarily are being accessed, or that the pages may be accessed out of order, using a `Map` seems like a more appropriate data-structure here.

Finally, also changes the `pagePromises` to a *private* property since it's not supposed to be accessed from the "outside".
2021-12-09 15:30:10 +01:00
Jonas Jenwald
c5525dcb69 Change WorkerTransport.pageCache from an Array to a Map
Given that not all pages necessarily are being accessed, or that the pages may be accessed out of order, using a `Map` seems like a more appropriate data-structure here.
For one thing, this simplifies iteration since we no longer have to worry about/check if `pageCache`-entries are undefined (which will happen for *sparse* `Array`s).

Of particular note is that we're no longer attempting to "null" the `pageCache`-entry from within the `PDFPageProxy._destroy`-method. Given that *synchronous* JavaScript will always run to completion[1] and that we're looping through all pages in `WorkerTransport.destroy` and immediately clear the cache afterwards, that code did/does not really make a lot of sense (as far as I can tell).

Finally, also changes the `pageCache` to a *private* property since it's not supposed to be accessed from the "outside".

---
[1] Unless there are errors, of course.
2021-12-09 15:29:47 +01:00
Jonas Jenwald
8a05db230e Further improve caching in Catalog.getPageDict, for disableAutoFetch mode (PR 8207 follow-up)
PR 8207 added caching to improve the performance of `Catalog.getPageDict`, by not having to repeatedly fetch the same data and also reducing the asynchronicity of that method.
However, because of *another* oversight on my part, we're only caching /Page references once we've found the correct page. As long as all pages are loaded *in order* this doesn't really matter (happens by default in the viewer), but when `disableAutoFetch` is used the pages may be fetched in a more random order (this patch reduces the asynchronicity of `Catalog.getPageDict` slightly in that case).
2021-12-09 12:54:49 +01:00
Tim van der Meij
97dc048e56
Merge pull request #14350 from Snuffleupagus/ccitt-infinite-loop
Prevent an infinite loop when parsing corrupt /CCITTFaxDecode data (issue 14305)
2021-12-08 20:01:21 +01:00
Tim van der Meij
b178985615
Merge pull request #14347 from Snuffleupagus/improve-pageKidsCountCache
Improve caching in `Catalog.getPageDict` (PR 8207 follow-up)
2021-12-08 19:58:46 +01:00
Jonas Jenwald
e8562173b8 Prevent an infinite loop when parsing corrupt /CCITTFaxDecode data (issue 14305)
Fixes one of the documents in issue 14305.
2021-12-07 13:57:25 +01:00
Jonas Jenwald
c42b19f26a
Merge pull request #14348 from Snuffleupagus/issue-8022-reftest
Add a (linked) test-case for issue 8022
2021-12-06 16:17:28 +01:00
Jonas Jenwald
909f012fb8 Add a (linked) test-case for issue 8022
Given that [bug 1336591](https://bugzilla.mozilla.org/show_bug.cgi?id=1336591) was just closed as fixed, thus fixing issue 8022 in Firefox, let's add a test-case to enable us to catch any future regressions either in PDF.js or in browsers themselves.
2021-12-06 15:27:40 +01:00
Jonas Jenwald
5f295ba280 Improve caching in Catalog.getPageDict (PR 8207 follow-up)
PR 8207 added caching to improve the performance of `Catalog.getPageDict`, by not having to repeatedly fetch the same data and also reducing the asynchronicity of that method.
However, because of annoying off-by-one errors[1] the caching became less efficient than it could/should be.[2] Note here that the /Pages-tree is zero-indexed, and that e.g. `pageIndex = 5` thus correspond to the *sixth* page of the document.

---
[1] In particular the `currentPageIndex + count < pageIndex` part.

[2] For example, even when loading a relatively small/simple document such as `tracemonkey.pdf` in the viewer, the number of `xref.fetchAsync(currentNode)` calls are reduced from `56` to `44` with this patch.
2021-12-06 11:49:31 +01:00
Jonas Jenwald
034b870c4a
Merge pull request #14344 from timvandermeij/test-driver
Modernize the test driver
2021-12-05 23:52:46 +01:00
Tim van der Meij
911a9d34b1
Fix code duplication in the rasterization logic in test/driver.js
Now that the rasterization logic is encapsulated in a class, we can
easily move the container creation into a separate static method.
2021-12-05 19:29:39 +01:00
Tim van der Meij
03506f25c0
Move the rasterization logic into one single class
This refactoring ensures that we can get rid of the closures and
encapsulate the logic in a nicer way with e.g., getters for the style
promises.
2021-12-05 19:28:51 +01:00