Commit Graph

15120 Commits

Author SHA1 Message Date
Jonas Jenwald
e8562173b8 Prevent an infinite loop when parsing corrupt /CCITTFaxDecode data (issue 14305)
Fixes one of the documents in issue 14305.
2021-12-07 13:57:25 +01:00
Jonas Jenwald
c42b19f26a
Merge pull request #14348 from Snuffleupagus/issue-8022-reftest
Add a (linked) test-case for issue 8022
2021-12-06 16:17:28 +01:00
Jonas Jenwald
909f012fb8 Add a (linked) test-case for issue 8022
Given that [bug 1336591](https://bugzilla.mozilla.org/show_bug.cgi?id=1336591) was just closed as fixed, thus fixing issue 8022 in Firefox, let's add a test-case to enable us to catch any future regressions either in PDF.js or in browsers themselves.
2021-12-06 15:27:40 +01:00
Jonas Jenwald
034b870c4a
Merge pull request #14344 from timvandermeij/test-driver
Modernize the test driver
2021-12-05 23:52:46 +01:00
Tim van der Meij
911a9d34b1
Fix code duplication in the rasterization logic in test/driver.js
Now that the rasterization logic is encapsulated in a class, we can
easily move the container creation into a separate static method.
2021-12-05 19:29:39 +01:00
Tim van der Meij
03506f25c0
Move the rasterization logic into one single class
This refactoring ensures that we can get rid of the closures and
encapsulate the logic in a nicer way with e.g., getters for the style
promises.
2021-12-05 19:28:51 +01:00
Tim van der Meij
33dc0628a0
Enable the no-var linting rule in test/driver.js
This is done automatically with the `gulp lint --fix` command with the
only exception of the `annotationLayerContext` variable.
2021-12-05 15:41:36 +01:00
Tim van der Meij
5fd4276dcf
Use async/await in the rasterization classes in test/driver.js
This is achieved by letting the `writeSVG` function return a promise so
we don't need callback passing anymore.
2021-12-05 14:11:09 +01:00
Tim van der Meij
13786ef806
Use arrow functions instead of self variables in test/driver.js 2021-12-05 14:11:08 +01:00
Tim van der Meij
1d1f713bfc
Inline loadStyles calls in the rasterization classes in test/driver.js
The wrapper functions in this case only really added indirection, so
this commit simplifies the code a bit.
2021-12-05 13:49:04 +01:00
Tim van der Meij
a58700b0dc
Convert the Driver class to ES6 syntax in test/driver.js 2021-12-05 13:43:02 +01:00
Tim van der Meij
3264d72e60
Merge pull request #14345 from Snuffleupagus/viewer-pagesPromise-reject
Ensure that the viewer handles `BaseViewer` initialization failures
2021-12-05 13:40:24 +01:00
Jonas Jenwald
e027178356 Tweak the "pagesloaded" event handler in PDFOutlineViewer
These changes improves the consistency ever so slightly in the `PDFOutlineViewer._dispatchEvent` method, by making sure that we can tell the following two cases apart:
 - The "pagesloaded" event has *not yet* been fired.
 - The "pagesloaded" event has been fired, but no pages were available.
2021-12-05 11:04:17 +01:00
Jonas Jenwald
9de30c4ff0 Ensure that the viewer handles BaseViewer initialization failures
*This patch can be tested e.g. with the `poppler-85140-0.pdf` document from the test-suite.*

For some sufficiently corrupt documents the `getDocument` call will succeed, but fetching even the very first page fails. Currently we only print error messages (in the console) from the `{BaseViewer, PDFThumbnailViewer}.setDocument` methods, but don't actually provide these errors to allow the viewer to handle them properly.
In practice this means that the GENERIC viewer won't display the `errorWrapper`, and in the MOZCENTRAL viewer the *browser* loading indicator is never hidden (since we never unblock the "load" event).
2021-12-05 10:55:47 +01:00
Tim van der Meij
dc455c836e
Merge pull request #14339 from Snuffleupagus/issue-8019-reftest
Add a (linked) test-case for issue 8019
2021-12-04 13:26:47 +01:00
Tim van der Meij
335c4c8a43
Merge pull request #14338 from Snuffleupagus/XRef-more-Pages-validation
[api-minor] Clear all caches in `XRef.indexObjects`, and improve /Root dictionary validation in `XRef.parse` (issue 14303)
2021-12-04 13:23:40 +01:00
Tim van der Meij
3117985c55
Merge pull request #14340 from Snuffleupagus/Metadata-fetch-error
Handle errors when fetching the raw /Metadata (issue 14305)
2021-12-04 13:19:37 +01:00
Tim van der Meij
bceed26e67
Merge pull request #14341 from Snuffleupagus/shadow-prop-assert
Ensure that the `shadow` helper function is passed a valid property (PR 14152 follow-up)
2021-12-04 13:17:14 +01:00
Jonas Jenwald
d9fac34596 Ensure that the shadow helper function is passed a valid property (PR 14152 follow-up)
Trying to shadow a non-existent property is always an implementation mistake, since it leads to the `shadow`-call not having any effect.

In PR 14152 I overlooked the fact that it's fairly easy to enforce this during development/testing, since that can help catch e.g. simple spelling bugs.
2021-12-04 10:07:21 +01:00
Jonas Jenwald
40291d1943 Handle errors when fetching the raw /Metadata (issue 14305)
Currently the `Catalog.metadata` getter only handles errors during parsing, however in a *corrupt* PDF document fetching of the raw /Metadata can obviously fail as well.
Without this patch the `PDFDocumentProxy.getMetadata` method, in the API, can thus fail which it *never* should and this will cause the viewer to not initialize all state as expected.

Fixes one of the documents in issue 14305.
2021-12-04 09:41:42 +01:00
Jonas Jenwald
ca82e1832f Add a (linked) test-case for issue 8019
Given that [bug 1336572](https://bugzilla.mozilla.org/show_bug.cgi?id=1336572) was just closed as fixed, thus fixing issue 8019 in Firefox[1], let's add a test-case to enable us to catch any future regressions either in PDF.js or in browsers themselves.

---
[1] It also seems to be working in Google Chrome, although I'm having a slightly difficult time deciphering *exactly* what configurations were affected when looking through issue 8019.
2021-12-04 08:56:04 +01:00
Jonas Jenwald
ad3a271fc4 [api-minor] Clear all caches in XRef.indexObjects, and improve /Root dictionary validation in XRef.parse (issue 14303)
*This patch improves handling of a couple of PDF documents from issue 14303.*

 - Update `XRef.indexObjects` to actually clear *all* XRef-caches. Invalid XRef tables *usually* cause issues early enough during parsing that we've not populated the XRef-cache, however to prevent any issues we obviously need to clear that one as well.

 - Improve the /Root dictionary validation in `XRef.parse` (PR 9827 follow-up). In addition to checking that a /Pages entry exists, we'll now also check that it can be successfully fetched *and* that it's of the correct type. There's really no point trying to use a /Root dictionary that e.g. `Catalog.toplevelPagesDict` will reject, and this way we'll be able to fallback to indexing the objects in corrupt documents.

 - Throw an `InvalidPDFException`, rather than a general `FormatError`, in `XRef.parse` when no usable /Root dictionary could be found. That really seems more appropriate overall, since all attempts at parsing/recovery have failed. (This part of the patch is API-observable, hence the tag.)

With these changes, two existing test-cases are improved and the unit-tests are updated/re-factored to highlight that. In particular `GHOSTSCRIPT-698804-1-fuzzed.pdf` will now both load and "render" correctly, whereas `poppler-395-0-fuzzed.pdf` will now fail immediately upon loading (rather than *appearing* to work).
2021-12-03 11:57:38 +01:00
Tim van der Meij
e9e4b913c0
Merge pull request #14324 from Snuffleupagus/force-PAGE-scrolling
Enforce PAGE-scrolling for *very* large/long documents (bug 1588435, PR 11263 follow-up)
2021-12-02 20:01:13 +01:00
Tim van der Meij
4c145fc9c4
Merge pull request #14335 from Snuffleupagus/Catalog-getAllPageDicts
[Regression] Eagerly fetch/parse the entire /Pages-tree in corrupt documents (issue 14303, PR 14311 follow-up)
2021-12-02 19:54:30 +01:00
Tim van der Meij
aee4b7c73f
Merge pull request #14328 from Snuffleupagus/node-examples-page-cleanup
Update (primarily) the Node.js examples to release page resources
2021-12-02 19:44:17 +01:00
Jonas Jenwald
1fac6371d3 [Regression] Eagerly fetch/parse the entire /Pages-tree in corrupt documents (issue 14303, PR 14311 follow-up)
*Please note:* This is similar to the method that existed prior to PR 3848, but the new method will *only* be used as a fallback when parsing of corrupt PDF documents.

The implementation in PR 14311 unfortunately turned out to be *way* too simplistic, as evident by the recently added test-files in issue 14303, since it may *cause* infinite loops in `PDFDocument.checkLastPage` for some corrupt PDF documents.[1]
To avoid this, the easiest solution that I could come up with was to fallback to eagerly parsing the *entire* /Pages-tree when the /Count-entry validation fails during document initialization.

Fixes *at least* two of the issues listed in issue 14303, namely the `poppler-395-0.pdf...` and `GHOSTSCRIPT-698804-1.pdf...` documents.

---
[1] The whole point of PR 14311 was obviously to *get rid of* infinte loops during document initialization, not to introduce any more of those.
2021-12-02 14:31:04 +01:00
Jonas Jenwald
f61b74e38e
Merge pull request #14325 from Snuffleupagus/getPageDict-rm-skipCount
Remove the unused `skipCount` parameter from `Catalog.getPageDict` (PR 14311 follow-up)
2021-12-02 13:16:26 +01:00
Jonas Jenwald
8ea740c800 Slightly extend the "creates pdf doc from PDF file with bad XRef table" unit-test (PR 14304 follow-up)
Given that we're able to "render" this document, let's extend the unit-test to actually check that we're able to obtain the operatorList; although given the overall issues in the document it'll be empty.
2021-12-02 11:51:40 +01:00
Jonas Jenwald
e045cd4520 Remove the unused skipCount parameter from Catalog.getPageDict (PR 14311 follow-up)
This was added in PR 14311, but given that I completely missed to update the `PDFDocument.getPage` signature accordingly it's completely unused.
Given that things work just as fine as-is, let's simply remove that optional parameter for now; sorry about the churn here!
2021-12-02 11:51:38 +01:00
Jonas Jenwald
d9e0de8515
Merge pull request #14333 from Snuffleupagus/Pages-tree-corruption
Handle errors correctly when data lookup fails during /Pages-tree parsing (issue 14303)
2021-12-02 11:49:11 +01:00
Jonas Jenwald
63be23f05b Handle errors correctly when data lookup fails during /Pages-tree parsing (issue 14303)
This only applies to severely corrupt documents, where it's possible that the `Parser` throws when we try to access e.g. a /Kids-entry in the /Pages-tree.

Fixes two of the issues listed in issue 14303, namely the `poppler-742-0.pdf...` and `poppler-937-0.pdf...` documents.
2021-12-02 10:54:40 +01:00
Jonas Jenwald
487a7ddc7d Update (primarily) the Node.js examples to release page resources
Given that Node.js doesn't support Workers, general PDF.js performance will be worse when compared to browsers. In an attempt to improve at least memory usage a little bit, update the Node.js examples to release page resources once parsing is done for that page.
2021-11-30 13:11:50 +01:00
Jonas Jenwald
6dfe4a9140 Enforce PAGE-scrolling for *very* large/long documents (bug 1588435, PR 11263 follow-up)
This patch is essentially a continuation of PR 11263, which tried to improve loading/initialization performance of *very* large/long documents.

Note that browsers, in general, don't handle a huge amount of DOM-elements very well, with really poor (e.g. sluggish scrolling) performance once the number gets "large". Furthermore, at least in Firefox, it seems that DOM-elements towards the bottom of a HTML-page can effectively be ignored; for the PDF.js viewer that means that pages at the end of the document can become impossible to access.

Hence, in order to improve things for these *very* large/long documents, this patch will now enforce usage of the (recently added) PAGE-scrolling mode for these documents. As implemented, this will only happen once the number of pages *exceed* 15000 (which is hopefully rare in practice).
While this might feel a bit jarring to users being *forced* to use PAGE-scrolling, it seems all things considered like a better idea to ensure that the entire document actually remains accessible and with (hopefully) more acceptable performance.

Fixes [bug 1588435](https://bugzilla.mozilla.org/show_bug.cgi?id=1588435), to the extent that doing so is possible since the document contains 25560 pages (and is 197 MB large).
2021-11-29 13:54:24 +01:00
Jonas Jenwald
f15eb63ed5 Remove the PDFSinglePageViewer-specific code from web/secondary_toolbar.js (PR 9877 follow-up)
This was added on the assumption that the viewer would (eventually) start using the `PDFSinglePageViewer` for e.g. PAGE-scrolling mode and PresentationMode. However, having both a `PDFViewer` and a `PDFSinglePageViewer` side-by-side in the viewer would've been tricky to implement well, which is why PR 14112 implemented PAGE-scrolling for the general `BaseViewer` instead.

Given that the default viewer is no longer (potentially) going to use `PDFSinglePageViewer`, there's code in the `SecondaryToolbar` (and related CSS rules) which is now unnecessary.
2021-11-29 13:13:17 +01:00
Jonas Jenwald
700eaecddd
Merge pull request #14321 from timvandermeij/puppeteer
Upgrade to Puppeteer 12
2021-11-29 11:38:24 +01:00
Tim van der Meij
d5b5b665e4
Upgrade to Puppeteer 12 2021-11-28 19:24:11 +01:00
Tim van der Meij
96bb3c6217
Convert package-lock.json to lock file version 2
Since NPM 7, which is over a year old now since it released in October
2020, NPM automatically transforms lock files from version 1 to version
2. In the NPM 7 release notes they reported:

"One change to take note of is the new lockfile format, which is
backwards compatible with npm 6 users. The lockfile v2 unlocks the
ability to do deterministic and reproducible builds to produce a
package tree."

Not only is this change backwards compatible (so older versions of NPM
will still be able to install everything as expected), reproducability
is also a nice property to have and modern NPM versions will otherwise
constantly do the conversion anyway, causing contributors to explicitly
have to revert the change. Therefore, I believe we should do this now
since it doesn't break backwards compatibility for consumers of this
file. It only means that producers of this file (i.e., us contributors)
need to use at least NPM 7 or higher (as of writing NPM 8 is even
available). According to https://nodejs.org/en/download/releases/ this
means contributors should at least run Node.js 15.0.0, while 17.1.0 is
the most recent as of writing, so to me that sounds reasonable to ask.
2021-11-28 19:23:25 +01:00
Tim van der Meij
0d2cdff6c5
Fix browser page navigation for Puppeteer 11+ in test/test.js
In Puppeteer 11 we noticed that Firefox doesn't shut down once the tests
are done anymore. I tracked this down to the `page.goto` call, in
`startBrowser`, never resolving anymore. I can only assume that
something changed in Puppeteer, possibly in combination with recent
Firefox Nightly versions, that caused this, but haven't been able to
fully track it down.

However, I did find that the problem is that the `load` event no longer
triggers, so fortunately we can fix the problem by explicitly waiting
for the `domcontentloaded` event instead. In general this change might
even be better since we now wait until the test framework is fully
loaded before we continue. Note that this also still works for the
current Puppeteer version.

I did find two upstream references that appear to track this issue, both
on the Puppeteer side and on the Firefox side, making me further suspect
that the issue is partly on both sides:

- https://github.com/puppeteer/puppeteer/issues/5806
- https://bugzilla.mozilla.org/show_bug.cgi?id=1706353
2021-11-28 18:58:22 +01:00
Tim van der Meij
60ed3cd297
Fix compatibility with Node.js 17 in test/test.js
Node.js 17, which as of writing is the most recent version, contains a
breaking change in its DNS resolver, causing Firefox not to start
anymore in our test framework. The inline comment together with the
following resources provide more background:

- https://github.com/nodejs/node/issues/40702
- https://github.com/nodejs/node/pull/39987
- https://github.com/cyrus-and/chrome-remote-interface/issues/467
- https://github.com/nodejs/node/blob/master/doc/changelogs/CHANGELOG_V17.md#other-notable-changes
- https://github.com/DeviceFarmer/adbkit/issues/209
- https://nodejs.org/api/dns.html#dnssetdefaultresultorderorder

This commit ensures that versions both older and newer than Node.js 17
work as expected. This is mainly necessary since the bots as of writing
run Node.js 14.17.0 which is from before this API got introduced and for
example Node.js 12 LTS is only end-of-life in April 2022, so we have to
keep support for those older versions unfortunately.
2021-11-28 18:52:51 +01:00
Tim van der Meij
5309133a9d
Fix browser error logging in test/test.js
If a browser cannot be started, we currently get the following log:
`Error while starting firefox: [object Object]`. This is simply an
oversight from the initial Puppeteer integration work since we never got
into this code path before. With this fix the error log becomes more
useful: `Error while starting firefox: connect ECONNREFUSED ::1:45387`
2021-11-28 18:08:08 +01:00
Tim van der Meij
c14552874b
Merge pull request #14312 from Snuffleupagus/XRef-circular-reference
Prevent circular references in XRef tables from hanging the worker-thread (issue 14303)
2021-11-28 14:07:02 +01:00
Tim van der Meij
7613bb5522
Merge pull request #14320 from Snuffleupagus/update-packages
Update packages and translations
2021-11-28 13:58:53 +01:00
Jonas Jenwald
d62f847db2 Update Stylelint to version 14 (along with related packages)
Based on https://github.com/stylelint/stylelint/blob/14.0.0/docs/migration-guide/to-14.md none of the changes look directly relevant for us.
2021-11-28 12:36:16 +01:00
Jonas Jenwald
37bed4ea45 Update l10n files 2021-11-28 11:14:03 +01:00
Jonas Jenwald
fdf3f03985 Update the eslint-plugin-unicorn package to the latest version
Please see https://github.com/sindresorhus/eslint-plugin-unicorn/releases/tag/v39.0.0
2021-11-28 11:14:02 +01:00
Jonas Jenwald
0c301dfa8b Update the dommatrix package to the latest version 2021-11-28 11:14:02 +01:00
Jonas Jenwald
cfc55b044e Update npm packages 2021-11-28 11:13:58 +01:00
Jonas Jenwald
a807ffe907 Prevent circular references in XRef tables from hanging the worker-thread (issue 14303)
*Please note:* While this patch on its own is sufficient to prevent the worker-thread from hanging, however in combination with PR 14311 these PDF documents will both load *and* render correctly.

Rather than focusing on the particular structure of these PDF documents, it seemed (at least to me) to make sense to try and prevent all circular references when fetching/looking-up data using the XRef table.
To avoid a solution that required tracking the references manually everywhere, the implementation settled on here instead handles that internally in the `XRef.fetch`-method. This should work, since that method *and* the `Parser`/`Lexer`-implementations are completely synchronous.

Note also that the existing `XRef`-caching, used for all data-types *except* Streams, should hopefully help to lessen the performance impact of these changes.
One *potential* problem with these changes could be certain *browser* exceptions, since those are generally not catchable in JavaScript code, however those would most likely "stop" worker-thread parsing anyway (at least I hope so).

Finally, note that I settled on returning dummy-data rather than throwing an exception. This was done to allow parsing, for the rest of the document, to continue such that *one* bad reference doesn't prevent an entire document from loading.

Fixes two of the issues listed in issue 14303, namely the `poppler-91414-0.zip-2.gz-53.pdf` and `poppler-91414-0.zip-2.gz-54.pdf` documents.
2021-11-27 23:50:26 +01:00
Jonas Jenwald
a669fce762 Inline the isDict, isRef, and isStream checks in the src/core/xref.js file 2021-11-27 23:49:17 +01:00
Jonas Jenwald
680e0efb9d Use Array-destructuring in the XRef.readXRefStream-method 2021-11-27 23:49:17 +01:00