Go to file
Jonas Jenwald d0c4bbd828 [api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*

Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).

Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.

Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
 - This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
 - For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
 - This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.

As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).

Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-27 21:57:35 +01:00
.github Remove the npm test-command 2021-08-27 16:29:55 +02:00
docs docs: Fix grammatical error 2021-10-15 01:09:09 +05:30
examples Convert examples/components/pageviewer.js to await/async (issue 14127) 2021-11-24 15:22:21 +05:30
extensions Replace the remaining Node.removeChild() instances with Element.remove() 2021-11-16 17:52:50 +01:00
external Add support for modern ECMAScript class features 2021-10-22 22:01:17 +02:00
l10n Update l10n files 2021-11-14 10:48:50 +01:00
src [api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303) 2021-11-27 21:57:35 +01:00
test [api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303) 2021-11-27 21:57:35 +01:00
web [Regression] Prevent errors, during loading, in the viewer for XFA-documents (PR 14295 follow-up) 2021-11-26 20:21:12 +01:00
.editorconfig Ensure that the EditorConfig rules apply to *.json and *.pdf.link files as well 2021-10-15 13:46:11 +02:00
.eslintignore Include the test/resources/ folder when running ESLint/Stylelint 2021-08-04 13:50:44 +02:00
.eslintrc Enable the unicorn/prefer-dom-node-remove ESLint plugin rule 2021-11-16 17:52:50 +01:00
.gitattributes Fixing C++,PHP and Pascal presence in the repo 2015-10-29 13:03:51 -05:00
.gitignore Include package-lock.json for reproducible builds 2018-06-02 20:29:47 +02:00
.gitmodules Update fonttools location and version (issue 6223) 2015-07-17 12:51:09 +02:00
.gitpod.Dockerfile Simplifies code contributions by automating the dev setup with gitpod.io 2019-11-06 04:12:19 +00:00
.gitpod.yml Simplifies code contributions by automating the dev setup with gitpod.io 2019-11-06 04:12:19 +00:00
.mailmap Add mgol's name to AUTHORS, add .mailmap 2017-11-22 10:46:11 +01:00
.prettierrc Update Prettier to version 2.0 2020-04-14 12:28:14 +02:00
.stylelintignore Include the test/resources/ folder when running ESLint/Stylelint 2021-08-04 13:50:44 +02:00
.stylelintrc Enable the Stylelint shorthand-property-no-redundant-values rule 2021-01-22 14:36:02 +01:00
AUTHORS Add SehyunPark to AUTHORS 2017-11-29 22:24:08 +09:00
CODE_OF_CONDUCT.md Add Mozilla Code of Conduct file 2019-03-27 21:00:01 -07:00
EXPORT Adds ECCN response statement 2017-10-23 13:31:36 -05:00
gulpfile.js Add support for modern ECMAScript class features 2021-10-22 22:01:17 +02:00
LICENSE cleaned whitespace 2015-02-17 11:07:37 -05:00
package-lock.json Update the eslint-plugin-unicorn package to the latest version 2021-11-14 10:27:29 +01:00
package.json Update the eslint-plugin-unicorn package to the latest version 2021-11-14 10:27:29 +01:00
pdfjs.config Bump versions in pdfjs.config 2021-10-02 14:56:14 +02:00
README.md Add support for modern ECMAScript class features 2021-10-22 22:01:17 +02:00
systemjs.config.js Enable the ESLint no-var rule globally 2021-03-13 16:12:53 +01:00

PDF.js Build Status

PDF.js is a Portable Document Format (PDF) viewer that is built with HTML5.

PDF.js is community-driven and supported by Mozilla. Our goal is to create a general-purpose, web standards-based platform for parsing and rendering PDFs.

Contributing

PDF.js is an open source project and always looking for more contributors. To get involved, visit:

Feel free to stop by our Matrix room for questions or guidance.

Getting Started

Online demo

Please note that the "Modern browsers" version assumes native support for features such as e.g. async/await, ReadableStream, optional chaining, nullish coalescing, and private class fields/methods.

Browser Extensions

Firefox

PDF.js is built into version 19+ of Firefox.

Chrome

  • The official extension for Chrome can be installed from the Chrome Web Store. This extension is maintained by @Rob--W.
  • Build Your Own - Get the code as explained below and issue gulp chromium. Then open Chrome, go to Tools > Extension and load the (unpackaged) extension from the directory build/chromium.

Getting the Code

To get a local copy of the current code, clone it using git:

$ git clone https://github.com/mozilla/pdf.js.git
$ cd pdf.js

Next, install Node.js via the official package or via nvm. You need to install the gulp package globally (see also gulp's getting started):

$ npm install -g gulp-cli

If everything worked out, install all dependencies for PDF.js:

$ npm install

Finally, you need to start a local web server as some browsers do not allow opening PDF files using a file:// URL. Run:

$ gulp server

and then you can open:

Please keep in mind that this requires a modern and fully up-to-date browser; refer to Building PDF.js for non-development usage of the PDF.js library.

It is also possible to view all test PDF files on the right side by opening:

Building PDF.js

In order to bundle all src/ files into two production scripts and build the generic viewer, run:

$ gulp generic

If you need to support older browsers, run:

$ gulp generic-legacy

This will generate pdf.js and pdf.worker.js in the build/generic/build/ directory (respectively build/generic-legacy/build/). Both scripts are needed but only pdf.js needs to be included since pdf.worker.js will be loaded by pdf.js. The PDF.js files are large and should be minified for production.

Using PDF.js in a web application

To use PDF.js in a web application you can choose to use a pre-built version of the library or to build it from source. We supply pre-built versions for usage with NPM and Bower under the pdfjs-dist name. For more information and examples please refer to the wiki page on this subject.

Including via a CDN

PDF.js is hosted on several free CDNs:

Learning

You can play with the PDF.js API directly from your browser using the live demos below:

More examples can be found in the examples folder. Some of them are using the pdfjs-dist package, which can be built and installed in this repo directory via gulp dist-install command.

For an introduction to the PDF.js code, check out the presentation by our contributor Julian Viereck:

More learning resources can be found at:

The API documentation can be found at:

Questions

Check out our FAQs and get answers to common questions:

Talk to us on Matrix:

File an issue:

Follow us on twitter: @pdfjs