pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	e027178356	Tweak the "pagesloaded" event handler in `PDFOutlineViewer` These changes improves the consistency ever so slightly in the `PDFOutlineViewer._dispatchEvent` method, by making sure that we can tell the following two cases apart: - The "pagesloaded" event has not yet been fired. - The "pagesloaded" event has been fired, but no pages were available.	2021-12-05 11:04:17 +01:00
Jonas Jenwald	9de30c4ff0	Ensure that the viewer handles `BaseViewer` initialization failures This patch can be tested e.g. with the `poppler-85140-0.pdf` document from the test-suite. For some sufficiently corrupt documents the `getDocument` call will succeed, but fetching even the very first page fails. Currently we only print error messages (in the console) from the `{BaseViewer, PDFThumbnailViewer}.setDocument` methods, but don't actually provide these errors to allow the viewer to handle them properly. In practice this means that the GENERIC viewer won't display the `errorWrapper`, and in the MOZCENTRAL viewer the browser loading indicator is never hidden (since we never unblock the "load" event).	2021-12-05 10:55:47 +01:00
Tim van der Meij	dc455c836e	Merge pull request #14339 from Snuffleupagus/issue-8019-reftest Add a (linked) test-case for issue 8019	2021-12-04 13:26:47 +01:00
Tim van der Meij	335c4c8a43	Merge pull request #14338 from Snuffleupagus/XRef-more-Pages-validation [api-minor] Clear all caches in `XRef.indexObjects`, and improve /Root dictionary validation in `XRef.parse` (issue 14303)	2021-12-04 13:23:40 +01:00
Tim van der Meij	3117985c55	Merge pull request #14340 from Snuffleupagus/Metadata-fetch-error Handle errors when fetching the raw /Metadata (issue 14305)	2021-12-04 13:19:37 +01:00
Tim van der Meij	bceed26e67	Merge pull request #14341 from Snuffleupagus/shadow-prop-assert Ensure that the `shadow` helper function is passed a valid property (PR 14152 follow-up)	2021-12-04 13:17:14 +01:00
Jonas Jenwald	d9fac34596	Ensure that the `shadow` helper function is passed a valid property (PR 14152 follow-up) Trying to shadow a non-existent property is always an implementation mistake, since it leads to the `shadow`-call not having any effect. In PR 14152 I overlooked the fact that it's fairly easy to enforce this during development/testing, since that can help catch e.g. simple spelling bugs.	2021-12-04 10:07:21 +01:00
Jonas Jenwald	40291d1943	Handle errors when fetching the raw /Metadata (issue 14305) Currently the `Catalog.metadata` getter only handles errors during parsing, however in a corrupt PDF document fetching of the raw /Metadata can obviously fail as well. Without this patch the `PDFDocumentProxy.getMetadata` method, in the API, can thus fail which it never should and this will cause the viewer to not initialize all state as expected. Fixes one of the documents in issue 14305.	2021-12-04 09:41:42 +01:00
Jonas Jenwald	ca82e1832f	Add a (linked) test-case for issue 8019 Given that [bug 1336572](https://bugzilla.mozilla.org/show_bug.cgi?id=1336572) was just closed as fixed, thus fixing issue 8019 in Firefox[1], let's add a test-case to enable us to catch any future regressions either in PDF.js or in browsers themselves. --- [1] It also seems to be working in Google Chrome, although I'm having a slightly difficult time deciphering exactly what configurations were affected when looking through issue 8019.	2021-12-04 08:56:04 +01:00
Jonas Jenwald	ad3a271fc4	[api-minor] Clear all caches in `XRef.indexObjects`, and improve /Root dictionary validation in `XRef.parse` (issue 14303) This patch improves handling of a couple of PDF documents from issue 14303. - Update `XRef.indexObjects` to actually clear all XRef-caches. Invalid XRef tables usually cause issues early enough during parsing that we've not populated the XRef-cache, however to prevent any issues we obviously need to clear that one as well. - Improve the /Root dictionary validation in `XRef.parse` (PR 9827 follow-up). In addition to checking that a /Pages entry exists, we'll now also check that it can be successfully fetched and that it's of the correct type. There's really no point trying to use a /Root dictionary that e.g. `Catalog.toplevelPagesDict` will reject, and this way we'll be able to fallback to indexing the objects in corrupt documents. - Throw an `InvalidPDFException`, rather than a general `FormatError`, in `XRef.parse` when no usable /Root dictionary could be found. That really seems more appropriate overall, since all attempts at parsing/recovery have failed. (This part of the patch is API-observable, hence the tag.) With these changes, two existing test-cases are improved and the unit-tests are updated/re-factored to highlight that. In particular `GHOSTSCRIPT-698804-1-fuzzed.pdf` will now both load and "render" correctly, whereas `poppler-395-0-fuzzed.pdf` will now fail immediately upon loading (rather than appearing to work).	2021-12-03 11:57:38 +01:00
Tim van der Meij	e9e4b913c0	Merge pull request #14324 from Snuffleupagus/force-PAGE-scrolling Enforce PAGE-scrolling for very large/long documents (bug 1588435, PR 11263 follow-up)	2021-12-02 20:01:13 +01:00
Tim van der Meij	4c145fc9c4	Merge pull request #14335 from Snuffleupagus/Catalog-getAllPageDicts [Regression] Eagerly fetch/parse the entire /Pages-tree in corrupt documents (issue 14303, PR 14311 follow-up)	2021-12-02 19:54:30 +01:00
Tim van der Meij	aee4b7c73f	Merge pull request #14328 from Snuffleupagus/node-examples-page-cleanup Update (primarily) the Node.js examples to release page resources	2021-12-02 19:44:17 +01:00
Jonas Jenwald	1fac6371d3	[Regression] Eagerly fetch/parse the entire /Pages-tree in corrupt documents (issue 14303, PR 14311 follow-up) Please note: This is similar to the method that existed prior to PR 3848, but the new method will only be used as a fallback when parsing of corrupt PDF documents. The implementation in PR 14311 unfortunately turned out to be way too simplistic, as evident by the recently added test-files in issue 14303, since it may cause infinite loops in `PDFDocument.checkLastPage` for some corrupt PDF documents.[1] To avoid this, the easiest solution that I could come up with was to fallback to eagerly parsing the entire /Pages-tree when the /Count-entry validation fails during document initialization. Fixes at least two of the issues listed in issue 14303, namely the `poppler-395-0.pdf...` and `GHOSTSCRIPT-698804-1.pdf...` documents. --- [1] The whole point of PR 14311 was obviously to get rid of infinte loops during document initialization, not to introduce any more of those.	2021-12-02 14:31:04 +01:00
Jonas Jenwald	f61b74e38e	Merge pull request #14325 from Snuffleupagus/getPageDict-rm-skipCount Remove the unused `skipCount` parameter from `Catalog.getPageDict` (PR 14311 follow-up)	2021-12-02 13:16:26 +01:00
Jonas Jenwald	8ea740c800	Slightly extend the "creates pdf doc from PDF file with bad XRef table" unit-test (PR 14304 follow-up) Given that we're able to "render" this document, let's extend the unit-test to actually check that we're able to obtain the operatorList; although given the overall issues in the document it'll be empty.	2021-12-02 11:51:40 +01:00
Jonas Jenwald	e045cd4520	Remove the unused `skipCount` parameter from `Catalog.getPageDict` (PR 14311 follow-up) This was added in PR 14311, but given that I completely missed to update the `PDFDocument.getPage` signature accordingly it's completely unused. Given that things work just as fine as-is, let's simply remove that optional parameter for now; sorry about the churn here!	2021-12-02 11:51:38 +01:00
Jonas Jenwald	d9e0de8515	Merge pull request #14333 from Snuffleupagus/Pages-tree-corruption Handle errors correctly when data lookup fails during /Pages-tree parsing (issue 14303)	2021-12-02 11:49:11 +01:00
Jonas Jenwald	63be23f05b	Handle errors correctly when data lookup fails during /Pages-tree parsing (issue 14303) This only applies to severely corrupt documents, where it's possible that the `Parser` throws when we try to access e.g. a /Kids-entry in the /Pages-tree. Fixes two of the issues listed in issue 14303, namely the `poppler-742-0.pdf...` and `poppler-937-0.pdf...` documents.	2021-12-02 10:54:40 +01:00
Jonas Jenwald	487a7ddc7d	Update (primarily) the Node.js examples to release page resources Given that Node.js doesn't support Workers, general PDF.js performance will be worse when compared to browsers. In an attempt to improve at least memory usage a little bit, update the Node.js examples to release page resources once parsing is done for that page.	2021-11-30 13:11:50 +01:00
Jonas Jenwald	6dfe4a9140	Enforce PAGE-scrolling for very large/long documents (bug 1588435, PR 11263 follow-up) This patch is essentially a continuation of PR 11263, which tried to improve loading/initialization performance of very large/long documents. Note that browsers, in general, don't handle a huge amount of DOM-elements very well, with really poor (e.g. sluggish scrolling) performance once the number gets "large". Furthermore, at least in Firefox, it seems that DOM-elements towards the bottom of a HTML-page can effectively be ignored; for the PDF.js viewer that means that pages at the end of the document can become impossible to access. Hence, in order to improve things for these very large/long documents, this patch will now enforce usage of the (recently added) PAGE-scrolling mode for these documents. As implemented, this will only happen once the number of pages exceed 15000 (which is hopefully rare in practice). While this might feel a bit jarring to users being forced to use PAGE-scrolling, it seems all things considered like a better idea to ensure that the entire document actually remains accessible and with (hopefully) more acceptable performance. Fixes [bug 1588435](https://bugzilla.mozilla.org/show_bug.cgi?id=1588435), to the extent that doing so is possible since the document contains 25560 pages (and is 197 MB large).	2021-11-29 13:54:24 +01:00
Jonas Jenwald	f15eb63ed5	Remove the `PDFSinglePageViewer`-specific code from `web/secondary_toolbar.js` (PR 9877 follow-up) This was added on the assumption that the viewer would (eventually) start using the `PDFSinglePageViewer` for e.g. PAGE-scrolling mode and PresentationMode. However, having both a `PDFViewer` and a `PDFSinglePageViewer` side-by-side in the viewer would've been tricky to implement well, which is why PR 14112 implemented PAGE-scrolling for the general `BaseViewer` instead. Given that the default viewer is no longer (potentially) going to use `PDFSinglePageViewer`, there's code in the `SecondaryToolbar` (and related CSS rules) which is now unnecessary.	2021-11-29 13:13:17 +01:00
Jonas Jenwald	700eaecddd	Merge pull request #14321 from timvandermeij/puppeteer Upgrade to Puppeteer 12	2021-11-29 11:38:24 +01:00
Tim van der Meij	d5b5b665e4	Upgrade to Puppeteer 12	2021-11-28 19:24:11 +01:00
Tim van der Meij	96bb3c6217	Convert `package-lock.json` to lock file version 2 Since NPM 7, which is over a year old now since it released in October 2020, NPM automatically transforms lock files from version 1 to version 2. In the NPM 7 release notes they reported: "One change to take note of is the new lockfile format, which is backwards compatible with npm 6 users. The lockfile v2 unlocks the ability to do deterministic and reproducible builds to produce a package tree." Not only is this change backwards compatible (so older versions of NPM will still be able to install everything as expected), reproducability is also a nice property to have and modern NPM versions will otherwise constantly do the conversion anyway, causing contributors to explicitly have to revert the change. Therefore, I believe we should do this now since it doesn't break backwards compatibility for consumers of this file. It only means that producers of this file (i.e., us contributors) need to use at least NPM 7 or higher (as of writing NPM 8 is even available). According to https://nodejs.org/en/download/releases/ this means contributors should at least run Node.js 15.0.0, while 17.1.0 is the most recent as of writing, so to me that sounds reasonable to ask.	2021-11-28 19:23:25 +01:00
Tim van der Meij	0d2cdff6c5	Fix browser page navigation for Puppeteer 11+ in `test/test.js` In Puppeteer 11 we noticed that Firefox doesn't shut down once the tests are done anymore. I tracked this down to the `page.goto` call, in `startBrowser`, never resolving anymore. I can only assume that something changed in Puppeteer, possibly in combination with recent Firefox Nightly versions, that caused this, but haven't been able to fully track it down. However, I did find that the problem is that the `load` event no longer triggers, so fortunately we can fix the problem by explicitly waiting for the `domcontentloaded` event instead. In general this change might even be better since we now wait until the test framework is fully loaded before we continue. Note that this also still works for the current Puppeteer version. I did find two upstream references that appear to track this issue, both on the Puppeteer side and on the Firefox side, making me further suspect that the issue is partly on both sides: - https://github.com/puppeteer/puppeteer/issues/5806 - https://bugzilla.mozilla.org/show_bug.cgi?id=1706353	2021-11-28 18:58:22 +01:00
Tim van der Meij	60ed3cd297	Fix compatibility with Node.js 17 in `test/test.js` Node.js 17, which as of writing is the most recent version, contains a breaking change in its DNS resolver, causing Firefox not to start anymore in our test framework. The inline comment together with the following resources provide more background: - https://github.com/nodejs/node/issues/40702 - https://github.com/nodejs/node/pull/39987 - https://github.com/cyrus-and/chrome-remote-interface/issues/467 - https://github.com/nodejs/node/blob/master/doc/changelogs/CHANGELOG_V17.md#other-notable-changes - https://github.com/DeviceFarmer/adbkit/issues/209 - https://nodejs.org/api/dns.html#dnssetdefaultresultorderorder This commit ensures that versions both older and newer than Node.js 17 work as expected. This is mainly necessary since the bots as of writing run Node.js 14.17.0 which is from before this API got introduced and for example Node.js 12 LTS is only end-of-life in April 2022, so we have to keep support for those older versions unfortunately.	2021-11-28 18:52:51 +01:00
Tim van der Meij	5309133a9d	Fix browser error logging in `test/test.js` If a browser cannot be started, we currently get the following log: `Error while starting firefox: [object Object]`. This is simply an oversight from the initial Puppeteer integration work since we never got into this code path before. With this fix the error log becomes more useful: `Error while starting firefox: connect ECONNREFUSED ::1:45387`	2021-11-28 18:08:08 +01:00
Tim van der Meij	c14552874b	Merge pull request #14312 from Snuffleupagus/XRef-circular-reference Prevent circular references in XRef tables from hanging the worker-thread (issue 14303)	2021-11-28 14:07:02 +01:00
Tim van der Meij	7613bb5522	Merge pull request #14320 from Snuffleupagus/update-packages Update packages and translations	2021-11-28 13:58:53 +01:00
Jonas Jenwald	d62f847db2	Update Stylelint to version 14 (along with related packages) Based on https://github.com/stylelint/stylelint/blob/14.0.0/docs/migration-guide/to-14.md none of the changes look directly relevant for us.	2021-11-28 12:36:16 +01:00
Jonas Jenwald	37bed4ea45	Update l10n files	2021-11-28 11:14:03 +01:00
Jonas Jenwald	fdf3f03985	Update the `eslint-plugin-unicorn` package to the latest version Please see https://github.com/sindresorhus/eslint-plugin-unicorn/releases/tag/v39.0.0	2021-11-28 11:14:02 +01:00
Jonas Jenwald	0c301dfa8b	Update the `dommatrix` package to the latest version	2021-11-28 11:14:02 +01:00
Jonas Jenwald	cfc55b044e	Update npm packages	2021-11-28 11:13:58 +01:00
Jonas Jenwald	a807ffe907	Prevent circular references in XRef tables from hanging the worker-thread (issue 14303) Please note: While this patch on its own is sufficient to prevent the worker-thread from hanging, however in combination with PR 14311 these PDF documents will both load and render correctly. Rather than focusing on the particular structure of these PDF documents, it seemed (at least to me) to make sense to try and prevent all circular references when fetching/looking-up data using the XRef table. To avoid a solution that required tracking the references manually everywhere, the implementation settled on here instead handles that internally in the `XRef.fetch`-method. This should work, since that method and the `Parser`/`Lexer`-implementations are completely synchronous. Note also that the existing `XRef`-caching, used for all data-types except Streams, should hopefully help to lessen the performance impact of these changes. One potential problem with these changes could be certain browser exceptions, since those are generally not catchable in JavaScript code, however those would most likely "stop" worker-thread parsing anyway (at least I hope so). Finally, note that I settled on returning dummy-data rather than throwing an exception. This was done to allow parsing, for the rest of the document, to continue such that one bad reference doesn't prevent an entire document from loading. Fixes two of the issues listed in issue 14303, namely the `poppler-91414-0.zip-2.gz-53.pdf` and `poppler-91414-0.zip-2.gz-54.pdf` documents.	2021-11-27 23:50:26 +01:00
Jonas Jenwald	a669fce762	Inline the `isDict`, `isRef`, and `isStream` checks in the `src/core/xref.js` file	2021-11-27 23:49:17 +01:00
Jonas Jenwald	680e0efb9d	Use Array-destructuring in the `XRef.readXRefStream`-method	2021-11-27 23:49:17 +01:00
Jonas Jenwald	a2a5376adf	Merge pull request #14311 from Snuffleupagus/validate-Pages-Count [api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)	2021-11-27 23:47:05 +01:00
Jonas Jenwald	d0c4bbd828	[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303) This patch basically extends the approach from PR 10392, by also checking the last page. Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an integer /Count entry it must also be correct/valid. As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser). Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the entire /Pages-tree and essentially counting the pages. To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the entire /Pages-tree to determine the number of pages. Unfortunately these changes will have a number of somewhat negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug. - This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the last page of the PDF documents. - For poorly generated PDF documents, where the entire /Pages-tree only has one level, we'll unfortunately need to fetch/parse the entire /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of some long PDF documents, - This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost. As one small additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value). Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.	2021-11-27 21:57:35 +01:00
Tim van der Meij	9a1e27efc5	Merge pull request #14313 from Snuffleupagus/PDFDocument_pagePromises-map Change the `_pagePromises` cache, in the worker, from an Array to a Map	2021-11-27 20:58:23 +01:00
calixteman	bbd8b5ce9f	Merge pull request #14319 from calixteman/xfa_arc XFA - Draw arcs correctly	2021-11-27 11:32:32 -08:00
Calixte Denizet	31e13515f5	XFA - Draw arcs correctly - it aims to fix #14315; - take into account the startAngle to compute the coordinates of the final point.	2021-11-27 19:30:12 +01:00
Jonas Jenwald	b11091c0f9	Merge pull request #14318 from calixteman/14317 Handle sub/super-scripts in rich text	2021-11-27 18:49:52 +01:00
Calixte Denizet	cfdaa57353	Handle sub/super-scripts in rich text - it aims to fix #14317; - change the fontSize and the verticalAlign properties according to the position of the text.	2021-11-27 16:06:09 +01:00
Jonas Jenwald	e439f4d620	Merge pull request #14314 from Snuffleupagus/XFA-viewer-refs-cache-fix [Regression] Prevent errors, during loading, in the viewer for XFA-documents (PR 14295 follow-up)	2021-11-26 20:47:45 +01:00
Jonas Jenwald	8fa5fcfe72	[Regression] Prevent errors, during loading, in the viewer for XFA-documents (PR 14295 follow-up) In the second commit in PR 14295, I forgot that the pages in XFA-documents don't have references (like in regular PDF documents); sorry about that!	2021-11-26 20:21:12 +01:00
Jonas Jenwald	4c56214ab4	Convert `PDFDocument._getLinearizationPage` to an async method This, ever so slightly, simplifies the code and reduces overall indentation.	2021-11-26 19:57:47 +01:00
Jonas Jenwald	080996ac68	Change the `_pagePromises` cache, in the worker, from an Array to a Map Given that not all pages necessarily are being accessed, or that the pages may be accessed out of order, using a `Map` seems like a more appropriate data-structure here. Furthermore, this patch also adds (currently missing) caching for XFA-documents. Loading a couple of such documents in the viewer, with logging added, shows that we're currently re-creating `Page`-instances unnecessarily for XFA-documents.	2021-11-26 19:53:57 +01:00
Jonas Jenwald	2e2d049a9c	Merge pull request #14310 from Snuffleupagus/XRef-bogus-byteWidths Abort parsing when the XRef /W-array contain bogus entries (issue 14303)	2021-11-25 19:57:40 +01:00

1 2 3 4 5 ...

15108 Commits