Go to file

Jonas Jenwald d62c9181bd Improve the *local* image caching in PartialEvaluator.getOperatorList

Currently the local `imageCache`, as used in `PartialEvaluator.getOperatorList`, will miss certain cases of repeated images because the caching is *only* done by name (usually using a format such as e.g. "Im0", "Im1", ...).
However, in some PDF documents the `/XObject` dictionaries many contain hundreds (or even thousands) of distinctly named images, despite them referring to only a handful of actual image objects (via the XRef table).

With these changes we'll now cache *local* images using both name and (where applicable) reference, thus improving re-usage of images resources even further.

This patch was tested using the PDF file from [bug 857031](https://bugzilla.mozilla.org/show_bug.cgi?id=857031), i.e. https://bug857031.bmoattachments.org/attachment.cgi?id=732270, with the following manifest file:
```
[
    {  "id": "bug857031",
       "file": "../web/pdfs/bug857031.pdf",
       "md5": "",
       "rounds": 250,
       "lastPage": 1,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
firefox | 0    | Overall      |   250 |         2749 |        2656 | -93 | -3.38 |        faster
firefox | 0    | Page Request |   250 |            3 |           4 |   1 | 50.14 |        slower
firefox | 0    | Rendering    |   250 |         2746 |        2652 | -94 | -3.44 |        faster
```

While this is certainly an improvement, since we now avoid re-parsing ~1000 images on the first page, all of the image resources are small enough that the total rendering time doesn't improve that much in this particular case.

In pathological cases, such as e.g. the PDF document in issue 4958, the improvements with this patch can be very significant. Looking for example at page 2, from issue 4958, the rendering time drops from ~60 seconds with `master` to ~30 seconds with this patch (obviously still slow, but it really showcases the potential of this patch nicely).

Finally, note that there's also potential for additional improvements by re-using `LocalImageCache` instances for e.g. /XObject data of the `Form`-type. However, given that recent changes in this area I purposely didn't want to complicate *this* patch more than necessary.

2020-05-25 15:14:14 +02:00

.github

Update links from IRC to Matrix.

2020-02-27 16:26:17 -08:00

docs

Update the getting started page of the website for the new release

2020-03-19 23:07:45 +01:00

examples

[api-minor] Decode all JPEG images with the built-in PDF.js decoder in src/core/jpg.js

2020-05-22 00:22:48 +02:00

extensions

Update Prettier to version 2.0

2020-04-14 12:28:14 +02:00

external

Reduce usage of SystemJS, in the development viewer, even further

2020-05-20 13:36:52 +02:00

l10n

Update l10n files

2020-05-16 11:47:08 +02:00

src

Improve the *local* image caching in PartialEvaluator.getOperatorList

2020-05-25 15:14:14 +02:00

test

Merge pull request #11601 from Snuffleupagus/rm-nativeImageDecoderSupport

2020-05-23 15:33:46 +02:00

web

[api-minor] Remove the disableCreateObjectURL option from the getDocument parameters, since it's now unused in the API

2020-05-22 00:22:48 +02:00

.editorconfig

Uses editorconfig to maintain consistent coding styles

2015-11-14 07:32:18 +05:30

.eslintignore

Replace the bundled ReadableStream polyfill with the web-streams-polyfill npm package (issue 11157)

2019-09-23 22:16:59 +02:00

.eslintrc

Reduce usage of SystemJS, in the development viewer, even further

2020-05-20 13:36:52 +02:00

.gitattributes

Fixing C++,PHP and Pascal presence in the repo

2015-10-29 13:03:51 -05:00

.gitignore

Include package-lock.json for reproducible builds

2018-06-02 20:29:47 +02:00

.gitmodules

Update fonttools location and version (issue 6223)

2015-07-17 12:51:09 +02:00

.gitpod.Dockerfile

Simplifies code contributions by automating the dev setup with gitpod.io

2019-11-06 04:12:19 +00:00

.gitpod.yml

Simplifies code contributions by automating the dev setup with gitpod.io

2019-11-06 04:12:19 +00:00

.mailmap

Add mgol's name to AUTHORS, add .mailmap

2017-11-22 10:46:11 +01:00

.prettierrc

Update Prettier to version 2.0

2020-04-14 12:28:14 +02:00

.travis.yml

Use Node LTS releases to fix Travis CI builds (issue 10790)

2020-04-22 00:06:27 +02:00

AUTHORS

Add SehyunPark to AUTHORS

2017-11-29 22:24:08 +09:00

CODE_OF_CONDUCT.md

Add Mozilla Code of Conduct file

2019-03-27 21:00:01 -07:00

EXPORT

Adds ECCN response statement

2017-10-23 13:31:36 -05:00

gulpfile.js

Stop building any src/ files during the gulp default_preferences task

2020-05-22 00:22:48 +02:00

LICENSE

cleaned whitespace

2015-02-17 11:07:37 -05:00

package-lock.json

Reduce usage of SystemJS, in the development viewer, even further

2020-05-20 13:36:52 +02:00

package.json

Reduce usage of SystemJS, in the development viewer, even further

2020-05-20 13:36:52 +02:00

pdfjs.config

Bump versions in pdfjs.config

2020-03-19 23:01:17 +01:00

README.md

Remove any mention of Gitpod from the README (issue 11732)

2020-04-11 16:47:27 +02:00

systemjs.config.js

docs: Fix simple typo, occurences -> occurrences

2020-04-18 07:53:18 +10:00

README.md

PDF.js

PDF.js is a Portable Document Format (PDF) viewer that is built with HTML5.

PDF.js is community-driven and supported by Mozilla Labs. Our goal is to create a general-purpose, web standards-based platform for parsing and rendering PDFs.

Contributing

PDF.js is an open source project and always looking for more contributors. To get involved, visit:

Feel free to stop by our Matrix room for questions or guidance.

Getting Started

Online demo

Please note that the "Modern browsers" version assumes native support for features such as e.g. async/await, Promise, and ReadableStream.

Modern browsers: https://mozilla.github.io/pdf.js/web/viewer.html
Older browsers: https://mozilla.github.io/pdf.js/es5/web/viewer.html

Browser Extensions

Firefox

PDF.js is built into version 19+ of Firefox.

Chrome

The official extension for Chrome can be installed from the Chrome Web Store. This extension is maintained by @Rob--W.
Build Your Own - Get the code as explained below and issue gulp chromium. Then open Chrome, go to Tools > Extension and load the (unpackaged) extension from the directory build/chromium.

Getting the Code

To get a local copy of the current code, clone it using git:

$ git clone https://github.com/mozilla/pdf.js.git
$ cd pdf.js

Next, install Node.js via the official package or via nvm. You need to install the gulp package globally (see also gulp's getting started):

$ npm install -g gulp-cli

If everything worked out, install all dependencies for PDF.js:

$ npm install

Finally, you need to start a local web server as some browsers do not allow opening PDF files using a file:// URL. Run:

$ gulp server

and then you can open:

http://localhost:8888/web/viewer.html

Please keep in mind that this requires an ES6 compatible browser; refer to Building PDF.js for usage with older browsers.

It is also possible to view all test PDF files on the right side by opening:

http://localhost:8888/test/pdfs/?frame

Building PDF.js

In order to bundle all src/ files into two production scripts and build the generic viewer, run:

$ gulp generic

This will generate pdf.js and pdf.worker.js in the build/generic/build/ directory. Both scripts are needed but only pdf.js needs to be included since pdf.worker.js will be loaded by pdf.js. The PDF.js files are large and should be minified for production.

Using PDF.js in a web application

To use PDF.js in a web application you can choose to use a pre-built version of the library or to build it from source. We supply pre-built versions for usage with NPM and Bower under the pdfjs-dist name. For more information and examples please refer to the wiki page on this subject.

Including via a CDN

PDF.js is hosted on several free CDNs:

Learning

You can play with the PDF.js API directly from your browser using the live demos below:

Interactive examples

More examples can be found in the examples folder. Some of them are using the pdfjs-dist package, which can be built and installed in this repo directory via gulp dist-install command.

For an introduction to the PDF.js code, check out the presentation by our contributor Julian Viereck:

https://www.youtube.com/watch?v=Iv15UY-4Fg8

More learning resources can be found at:

https://github.com/mozilla/pdf.js/wiki/Additional-Learning-Resources

The API documentation can be found at:

https://mozilla.github.io/pdf.js/api/

Questions

Check out our FAQs and get answers to common questions:

https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions

Talk to us on Matrix:

https://chat.mozilla.org/#/room/#pdfjs:mozilla.org

File an issue:

https://github.com/mozilla/pdf.js/issues/new

https://twitter.com/pdfjs