Go to file

Jonas Jenwald f05e5c5460 Take the dictionary, and not just the image data, into account when caching inline images (issue 9398)

The reason for the bug is that we're only computing a checksum of the image data itself, but completely ignore the inline dictionary. The latter is important, since in practice it's not uncommon for inline images to be identical but use e.g. different ColourSpaces.

There's obviously a couple of different ways that we could compute a hash/checksum of the dictionary.
Initially I tried using `MurmurHash3_64` to compute a hash of the keys/values in the dictionary. Unfortunately this approach turned out to be *way* too slow in practice, especially for PDF files with a huge number of inline images; in particular issue 2618 would regresses quite badly with this solution.

The solution that is instead implemented in this patch, is to compute a checksum of the dictionary contents. While this is a much simpler, not to mention a lot more efficient, solution there's one drawback associated with it:
If the contents of inline image dictionaries are ordered differently, they will not be considered equal with this approach which could thus lead to failures to cache repeated inline images. In practice this doesn't seem to be a problem in any of the PDF files I've tested, and generally I'd rather err on the side of *not* caching given that too aggressive caching can easily lead to rendering bugs.

One small, but somewhat annoying, complication is that by the time `Parser.makeInlineImage` is called, we no longer know the *exact* stream position where the inline image dictionary starts. Having access to that information is crucial here, and the easiest solution I could come up with is to track this in the current `Lexer` instance.[1]

With the patch, we're thus able to fix the referenced issues without incurring large regressions in problematic cases such as issue 2618.

Fixes 9398; also improves/fixes the `issue8823` reference test.

---

[1] Obviously I'd have preferred if this patch could be limited to `Parser.makeInlineImage`, without the need for this "hack", but I'm not sure what that'd look like here.

2018-02-12 16:43:47 +01:00

.github

Attempt to clarify the meaning of "extension" in the ISSUE_TEMPLATE

2017-10-21 11:32:03 +02:00

docs

Update index.md

2017-11-30 09:50:00 +09:00

examples

Use setPDFNetworkStreamFactory in the helloworld and svgviewer examples

2018-02-04 19:32:47 +01:00

extensions

Enable the mozilla/use-includes-instead-of-indexOf ESLint rule globally

2018-02-10 23:24:50 +01:00

external

Enable the mozilla/use-includes-instead-of-indexOf ESLint rule globally

2018-02-10 23:24:50 +01:00

l10n

Update l10n files

2018-02-05 12:23:54 +01:00

src

Take the dictionary, and not just the image data, into account when caching inline images (issue 9398)

2018-02-12 16:43:47 +01:00

test

Take the dictionary, and not just the image data, into account when caching inline images (issue 9398)

2018-02-12 16:43:47 +01:00

web

Move public methods above private methods in web/pdf_find_controller.js

2018-02-11 20:31:59 +01:00

.editorconfig

Uses editorconfig to maintain consistent coding styles

2015-11-14 07:32:18 +05:30

.eslintignore

Adds streams-lib polyfill and exports ReadableStream from shared/util.

2017-05-20 00:26:34 +05:30

.eslintrc

Enable the mozilla/use-includes-instead-of-indexOf ESLint rule globally

2018-02-10 23:24:50 +01:00

.gitattributes

Fixing C++,PHP and Pascal presence in the repo

2015-10-29 13:03:51 -05:00

.gitignore

Update all packages to the most recent version

2017-09-30 16:26:24 +02:00

.gitmodules

Update fonttools location and version (issue 6223)

2015-07-17 12:51:09 +02:00

.mailmap

Add mgol's name to AUTHORS, add .mailmap

2017-11-22 10:46:11 +01:00

.travis.yml

Travis CI: use most recent version of NPM

2016-10-27 21:10:19 +02:00

AUTHORS

Add SehyunPark to AUTHORS

2017-11-29 22:24:08 +09:00

EXPORT

Adds ECCN response statement

2017-10-23 13:31:36 -05:00

gulpfile.js

[api-major] Remove the SINGLE_FILE build target

2018-01-29 14:44:44 +01:00

LICENSE

cleaned whitespace

2015-02-17 11:07:37 -05:00

package.json

Upstream the changes from: Bug 1432992 - Remove definitions of Ci, Cr, Cc, and Cu

2018-02-09 12:49:02 +01:00

pdfjs.config

Version 2.0

2017-10-30 08:18:25 -05:00

README.md

[Firefox addon] Change the minimum supported version to (the current) Firefox Nightly release

2018-02-04 14:07:17 +01:00

systemjs.config.js

Update all packages to the most recent version

2017-09-30 16:26:24 +02:00

README.md

PDF.js

PDF.js is a Portable Document Format (PDF) viewer that is built with HTML5.

PDF.js is community-driven and supported by Mozilla Labs. Our goal is to create a general-purpose, web standards-based platform for parsing and rendering PDFs.

Contributing

PDF.js is an open source project and always looking for more contributors. To get involved, visit:

Feel free to stop by #pdfjs on irc.mozilla.org for questions or guidance.

Getting Started

Online demo

https://mozilla.github.io/pdf.js/web/viewer.html

Browser Extensions

Firefox (and Seamonkey)

PDF.js is built into version 19+ of Firefox, however, one extension is still available:

Development Version - This extension is mainly intended for developers/testers, and it is updated every time new code is merged into the PDF.js codebase. It should be quite stable but might break from time to time.
- Please note that the extension is not guaranteed to be compatible with Firefox versions that are older than the current Nightly version, see the Release Calendar.
- The extension may also work in Seamonkey, provided that it is based on a Firefox version as above (see Which version of Firefox does SeaMonkey 2.x correspond with?), but we do not guarantee compatibility.

Chrome

The official extension for Chrome can be installed from the Chrome Web Store. This extension is maintained by @Rob--W.
Build Your Own - Get the code as explained below and issue gulp chromium. Then open Chrome, go to Tools > Extension and load the (unpackaged) extension from the directory build/chromium.

Getting the Code

To get a local copy of the current code, clone it using git:

$ git clone git://github.com/mozilla/pdf.js.git
$ cd pdf.js

Next, install Node.js via the official package or via nvm. You need to install the gulp package globally (see also gulp's getting started):

$ npm install -g gulp-cli

If everything worked out, install all dependencies for PDF.js:

$ npm install

Finally, you need to start a local web server as some browsers do not allow opening PDF files using a file:// URL. Run:

$ gulp server

and then you can open:

http://localhost:8888/web/viewer.html

Please keep in mind that this requires an ES6 compatible browser; refer to Building PDF.js for usage with older browsers.

It is also possible to view all test PDF files on the right side by opening:

http://localhost:8888/test/pdfs/?frame

Building PDF.js

In order to bundle all src/ files into two production scripts and build the generic viewer, run:

$ gulp generic

This will generate pdf.js and pdf.worker.js in the build/generic/build/ directory. Both scripts are needed but only pdf.js needs to be included since pdf.worker.js will be loaded by pdf.js. The PDF.js files are large and should be minified for production.

Using PDF.js in a web application

To use PDF.js in a web application you can choose to use a pre-built version of the library or to build it from source. We supply pre-built versions for usage with NPM and Bower under the pdfjs-dist name. For more information and examples please refer to the wiki page on this subject.

Including via a CDN

PDF.js is hosted on several free CDNs:

Learning

You can play with the PDF.js API directly from your browser using the live demos below:

Interactive examples

The repository contains a hello world example that you can run locally:

examples/helloworld/

More examples can be found in the examples folder. Some of them are using the pdfjs-dist package, which can be built and installed in this repo directory via gulp dist-install command.

For an introduction to the PDF.js code, check out the presentation by our contributor Julian Viereck:

http://www.youtube.com/watch?v=Iv15UY-4Fg8

More learning resources can be found at:

https://github.com/mozilla/pdf.js/wiki/Additional-Learning-Resources

Questions

Check out our FAQs and get answers to common questions:

https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions

Talk to us on IRC (Internet Relay Chat):

#pdfjs on irc.mozilla.org

File an issue:

https://github.com/mozilla/pdf.js/issues/new

http://twitter.com/#!/pdfjs