pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	69d7191034	Move the `disableAutoFetch` option from the global `PDFJS` object and into `getDocument` instead One additional complication with removing this option from the global `PDFJS` object, is that the viewer currently needs to check `disableAutoFetch` in a couple of places. To address this I'm thus proposing adding a getter in `PDFDocumentProxy`, to allow checking the actually used values for a particular `getDocument` invocation.	2018-03-01 18:11:16 +01:00
Jonas Jenwald	3c2fbdffe6	Move the `cMapUrl` and `cMapPacked` options from the global `PDFJS` object and into `getDocument` instead	2018-03-01 18:11:16 +01:00
Jonas Jenwald	83d52518da	[api-major] Refactor `PDFWorker` to be initialized with a parameter object, rather than a bunch of regular parameters	2018-02-16 13:22:35 +01:00
Jonas Jenwald	c3c1fc511d	Move the `workerSrc` option from the global `PDFJS` object and into `GlobalWorkerOptions` instead	2018-02-16 13:22:35 +01:00
Rob Wu	a89071bdef	Merge pull request #9470 from Snuffleupagus/issue-4888 Ensure that `JpegImage.getData` returns the correct data length when `forceRGBoutput == true` (issue 4888)	2018-02-16 13:14:21 +01:00
Brendan Dahl	4f5fb78237	Merge pull request #9401 from brendandahl/svg-fail Make the test framework more resilient to errors.	2018-02-14 17:05:40 -08:00
Brendan Dahl	53d19b619d	Make the test framework more resilient to errors.	2018-02-14 17:02:19 -08:00
Tim van der Meij	538dda1096	Merge pull request #9479 from Snuffleupagus/refactor-viewer-options [api-major] Refactor viewer components initialization to reduce their dependency on the global `PDFJS` object	2018-02-14 22:47:33 +01:00
Jonas Jenwald	11ab3b5c00	Ensure that `JpegImage.getData` returns the correct data length when `forceRGBoutput == true` (issue 4888) With PDF.js version `2.0` we'll only support browsers with built-in `TypedArray` functionality, hence there doesn't seem to be any good reason not to implement this now. Fixes 4888.	2018-02-13 20:44:21 +01:00
Jonas Jenwald	e95c11a7f0	Remove the undocumented `PDFJS.enableStats` option In order to simplify things, the undocumented `enableStats` option was removed and `pdfBug` is now instead used to enabled general debugging and page request/rendering stats. Considering that in the default viewer the `stats` was only used when debugging was also enabled, this simplification (code wise) definitely seem worthwhile to me.	2018-02-13 16:56:57 +01:00
Jonas Jenwald	3a6f6d23d6	Move the `externalLinkTarget` and `externalLinkRel` options to `PDFLinkService` options This removes the `PDFJS.externalLinkTarget`/`PDFJS.externalLinkRel` dependency from the viewer components, but please note that as a temporary solution the default viewer still uses it.	2018-02-13 14:28:40 +01:00
Jonas Jenwald	c45c394364	Move the `imageResourcesPath` option to a `BaseViewer`/`PDFPageView`/`AnnotationLayerBuilder` option This removes the `PDFJS.imageResourcesPath` dependency from the viewer components and the test-suite, but please note that as a temporary solution the default viewer still uses it.	2018-02-13 14:28:38 +01:00
Jonas Jenwald	f05e5c5460	Take the dictionary, and not just the image data, into account when caching inline images (issue 9398) The reason for the bug is that we're only computing a checksum of the image data itself, but completely ignore the inline dictionary. The latter is important, since in practice it's not uncommon for inline images to be identical but use e.g. different ColourSpaces. There's obviously a couple of different ways that we could compute a hash/checksum of the dictionary. Initially I tried using `MurmurHash3_64` to compute a hash of the keys/values in the dictionary. Unfortunately this approach turned out to be way too slow in practice, especially for PDF files with a huge number of inline images; in particular issue 2618 would regresses quite badly with this solution. The solution that is instead implemented in this patch, is to compute a checksum of the dictionary contents. While this is a much simpler, not to mention a lot more efficient, solution there's one drawback associated with it: If the contents of inline image dictionaries are ordered differently, they will not be considered equal with this approach which could thus lead to failures to cache repeated inline images. In practice this doesn't seem to be a problem in any of the PDF files I've tested, and generally I'd rather err on the side of not caching given that too aggressive caching can easily lead to rendering bugs. One small, but somewhat annoying, complication is that by the time `Parser.makeInlineImage` is called, we no longer know the exact stream position where the inline image dictionary starts. Having access to that information is crucial here, and the easiest solution I could come up with is to track this in the current `Lexer` instance.[1] With the patch, we're thus able to fix the referenced issues without incurring large regressions in problematic cases such as issue 2618. Fixes 9398; also improves/fixes the `issue8823` reference test. --- [1] Obviously I'd have preferred if this patch could be limited to `Parser.makeInlineImage`, without the need for this "hack", but I'm not sure what that'd look like here.	2018-02-12 16:43:47 +01:00
Jonas Jenwald	1cf116ab88	Enable the `mozilla/use-includes-instead-of-indexOf` ESLint rule globally This rule is available from https://www.npmjs.com/package/eslint-plugin-mozilla, and is enforced in mozilla-central. Note that we have the necessary `Array`/`String` polyfills and that most cases have already been fixed, see PRs 9032 and 9434.	2018-02-10 23:24:50 +01:00
Jonas Jenwald	2eb29409bc	Enable the `mozilla/avoid-removeChild` ESLint rule globally This rule is available from https://www.npmjs.com/package/eslint-plugin-mozilla, and is enforced in mozilla-central. Note that we have a polyfill for `ChildNode.remove()` and that most cases have already been fixed, see PRs 8056 and 8138.	2018-02-10 23:24:50 +01:00
Tim van der Meij	7bb066494f	Merge pull request #9427 from Snuffleupagus/native-JPEG-decoding-fallback Fallback to the built-in JPEG decoder when browser decoding fails, and attempt to handle JPEG images with DNL (Define Number of Lines) markers (issue 8614)	2018-02-09 21:36:08 +01:00
Jonas Jenwald	a18c65ae9f	Use the correct stream position when reading `maxSizeOfInstructions` from the `maxp` table (issue 9458) Please refer to the `maxp` table specification, found at https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6maxp.html. Fixes 9458.	2018-02-07 21:57:43 +01:00
Jonas Jenwald	bf4166e6c9	Attempt to handle DNL (Define Number of Lines) markers when parsing JPEG images (issue 8614) Please refer to the specification, found at https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=49 Given how the JPEG decoder is currently implemented, we need to know the value of the scanLines parameter (among others) before parsing of the SOS (Start of Scan) data begins. Hence the best solution I could come up with here, is to re-parse the image in the hopefully rare case of JPEG images that include a DNL (Define Number of Lines) marker. Fixes 8614.	2018-02-05 21:05:32 +01:00
Rob Wu	911659cd70	Add tests for file names with spaces and semicolons	2018-02-04 17:58:10 +01:00
Jonas Jenwald	f4a95de694	Attempt to find the next valid marker when encountering invalid image data in `JpegImage.parse` (issue 9425) In the JPEG images in the referenced PDF file, the DHT (Define Huffman Tables) segments contain more data than expected based on the length parameter. Fixes 9425.	2018-02-03 16:01:19 +01:00
Jonas Jenwald	56a8c934dd	[api-major] Remove the `PDFJS.disableWorker` option Despite this patch removing the `disableWorker` option itself, please note that we'll still fallback to loading the worker file(s) on the main-thread when running in environments without proper Web Worker support. Furthermore it's still possible, even with this patch, to force the use of fake workers by manually loading the necessary file using a `<script>` tag on the main-thread.[1] That way, the functionality of the now removed `SINGLE_FILE` build target and the resulting `build/pdf.combined.js` file can still be achieved simply by adding e.g. `<script src="build/pdf.worker.js"></script>` to the HTML (obviously with the path adjusted as needed). Finally note that the `disableWorker` option is a performance footgun, and unfortunately many existing third-party examples actually use it without providing any sort of warning/justification. --- [1] This approach is used in the default viewer, since certain kind of debugging may be easier if the code is running directly on the main-thread.	2018-01-31 12:52:10 +01:00
Jonas Jenwald	a5aaf62754	[api-minor] Add a (static) `PDFWorker.getWorkerSrc` method that returns the current `workerSrc` This method returns the currently used `workerSrc`, which thus allows obtaining the fallback `workerSrc` value (e.g. when the option wasn't set by the user).	2018-01-31 12:52:07 +01:00
Jonas Jenwald	42c71cd99f	Utilize `PDFNodeStream` to run more API unit-tests on Node.js/Travis	2018-01-28 17:14:08 +01:00
Jani Pehkonen	5593c970e0	Implement Huffman coding in JBIG2	2018-01-23 17:04:07 +02:00
Jonas Jenwald	f0216484bc	Merge pull request #9383 from Rob--W/better-content-disposition-parser Better content disposition parser	2018-01-21 15:08:14 +01:00
Rob Wu	a4e907169e	Improve correctness of Content-Disposition parser Re-uses logic from `9f5fcae11c/extension/content-disposition.js` which is already covered by tests: `6f3bbb8bbf`	2018-01-21 13:31:12 +01:00
Jonas Jenwald	fe5102a27f	Merge pull request #9363 from Rob--W/fetch-http/s-only Limit PDFFetchStream to http(s) in the Chrome extension	2018-01-21 11:45:09 +01:00
Rob Wu	0ffe9b9289	Remove useless test from network_utils_spec.js Remove "returns null when content disposition is form-data". The name of the test is already misleading: It suggests that the return value is null if the Content-Disposition starts with "form-data". This is not the case, anything with the "filename" parameter is accepted. So, to correct this, one would have to rephrase the test description to "returns null when content disposition has no filename". But this is already tested by the test called "gets the filename from the response header". So, remove the test.	2018-01-19 17:28:47 +01:00
Jonas Jenwald	69a8336cf1	Address the final round of review comments for Content-Disposition filename extraction This patch updates the `IPDFStreamReader` interface and ensures that the interface/implementation of `network.js`, `fetch_stream.js`, `node_stream.js`, and `transport_stream.js` all match properly. The unit-tests are also adjusted, to more closely replicate the actual behaviour of the various actual `IPDFStreamReader` implementations. Finally, this patch adjusts the use of the Content-Disposition filename when setting the title in the viewer, and adds `PDFDocumentProperties` support as well.	2018-01-18 17:39:22 +01:00
Juan Salvador Perez Garcia	eb1f6f4c24	Content disposition filename File name is extracted from headers.	2018-01-18 17:38:44 +01:00
Jonas Jenwald	f774abc8d3	Merge pull request #9368 from acchou/9362 end() is the official way to release a Writable stream	2018-01-17 12:45:23 +01:00
Andy Chou	b867c3299b	end() is the official way to release a Writable stream	2018-01-16 12:01:33 -08:00
Rob Wu	1c8cacd6b9	Limit PDFFetchStream to http(s) in the Chrome extension The `fetch` API is only supported for http(s), even in Chrome extensions. Because of this limitation, we should use the XMLHttpRequest API when the requested URL is not a http(s) URL. Fixes #9361	2018-01-14 00:34:46 +01:00
Jonas Jenwald	0e1b5589e7	Restore the `btoa`/`atob` polyfills for Node.js These were removed in PR 9170, since they were unused in the browsers that we'll support in PDF.js version `2.0`. However looking at the output of Travis, where a subset of the unit-tests are run using Node.js, there's warnings about `btoa` being undefined. This doesn't appear to cause any errors, which probably explains why we didn't notice this before (despite PR 9201).	2018-01-13 01:31:05 +01:00
Jonas Jenwald	d0c8992e8a	Attempt to actually resolve ColourSpace names in accordance with the specification (issue 9285) Please refer to the PDF specification, in particular http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.3801570 > A colour space shall be specified in one of two ways: > - Within a content stream, the CS or cs operator establishes the current colour space parameter in the graphics state. The operand shall always be name object, which either identifies one of the colour spaces that need no additional parameters (DeviceGray, DeviceRGB, DeviceCMYK, or some cases of Pattern) or shall be used as a key in the ColorSpace subdictionary of the current resource dictionary (see 7.8.3, "Resource Dictionaries"). In the latter case, the value of the dictionary entry in turn shall be a colour space array or name. A colour space array shall never be inline within a content stream. > > - Outside a content stream, certain objects, such as image XObjects, shall specify a colour space as an explicit parameter, often associated with the key ColorSpace. In this case, the colour space array or name shall always be defined directly as a PDF object, not by an entry in the ColorSpace resource subdictionary. This convention also applies when colour spaces are defined in terms of other colour spaces.	2018-01-10 20:20:43 +01:00
Brendan Dahl	3925aab010	Merge pull request #9282 from Snuffleupagus/TrueType-Collection Add support for TrueType Collection fonts (issue 9262)	2018-01-09 11:22:52 -08:00
Jonas Jenwald	915e3f4c5f	Merge pull request #9099 from tiriana/allow-dontFlip-in-PDFPageProxy-getViewport Allows 'dontFlip' as third arg in PDFPageProxy.getViewport	2018-01-09 18:27:26 +01:00
Radomir Wojtera	3dfc540d04	Allows 'dontFlip' as third argument in PDFPageProxy.getViewport	2018-01-09 13:08:24 +01:00
Jonas Jenwald	d6c028b946	Add support for TrueType Collection fonts (issue 9262) The specification can be found at https://www.microsoft.com/typography/otspec/otff.htm, under the "Font Collections" heading. Fixes 9262.	2018-01-08 22:31:08 +01:00
Jonas Jenwald	2db75a2a3a	Update the ESLint dependencies, and also tweak the `no-multiple-empty-lines` rules Since multiple empty lines is virtually unused in the code-base, and the few cases that do exist look like "typos", let's enforce greater consistency here; please see https://eslint.org/docs/rules/no-multiple-empty-lines.	2018-01-03 13:32:57 +01:00
Jonas Jenwald	c5700211d6	Adjust `decodeACSuccessive` in src/core/jpg.js to improve the rendering quality of (progressive) JPEG images I've been looking into the remaining point in 8637 about blurry images, to see if we could perhaps improve the rendering quality slightly there. After quite a bit of debugging, it seems that the issue is limited to certain progressive JPEG images. As mentioned previously, I've got no detailed knowledge of the JPEG format, but this patch does seem to improve things quite a bit for the images in question. Squinting at https://searchfox.org/mozilla-central/rev/6c33dde6ca02b389c52e8db3d22494df8b916f33/media/libjpeg/jdphuff.c#492-639, it seems reasonable that we should take the sign of the data into account. Furthermore, looking at the specification in https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=118, the "F.2.4.3 Decoding the binary decision sequence for non-zero DC differences and AC coefficients" section even contains a description of this (even though I cannot claim to really understand the details).	2017-12-30 15:24:09 +01:00
Jonas Jenwald	8c4b7d0439	Avoid truncating JPEG images with DeviceGray ColourSpaces when using the `src/core/jpg.js` built-in decoder The bug that this patch fixes is limited to the built-in JPEG decoder, and was unearthed by PR 9260. The underlying issue has existed since PR 6984, where the contents of this patch ought to have been included (if it weren't for the fact that we had no easy way to test `src/core/jpg.js` back then). Please note: The slight movement in the reference test is a result of using the `src/core/jpg.js` decoder, rather than the native browser one.	2017-12-29 18:44:07 +01:00
Tim van der Meij	25bbff4692	Merge pull request #9320 from Snuffleupagus/pr-9095-followup Avoid rendering errors by passing in the `webGLContext` when creating a new `CanvasGraphics` in `getColorN_Pattern` (PR 9095 follow-up)	2017-12-28 23:17:30 +01:00
Jonas Jenwald	ec21bd9626	Merge pull request #9314 from timvandermeij/encodings Implement unit tests for the encodings and fix missing items	2017-12-27 22:02:38 +01:00
Jonas Jenwald	06605abbc2	Avoid rendering errors by passing in the `webGLContext` when creating a new `CanvasGraphics` in `getColorN_Pattern` (PR 9095 follow-up) This was an oversight in PR 9095, which unfortunately breaks rendering in some PDF files (e.g. the one from issue 6737). It thus appears that we don't have any test-coverage for this code-path, and given the relative complexity of the PDF files affected by this bug I wasn't able to easily create a reduced test-case. Please note: The linked test-case included in this patch is currently not rendered correctly (that'd be the PR 6606), but it at least gives us some test-coverage here.	2017-12-27 13:50:53 +01:00
Tim van der Meij	c7af2db2ec	Implement unit tests for the encodings and fix missing items Initially I just implemented the unit tests, but quickly found that they were failing my expectation of having a size of 256 items. Some of them did contain 256 items and some did not. I looked up various resources and figured that they indeed all need to have 256 items. One of the good resources is https://github.com/davidben/poppler/blob/master/poppler/FontEncodingTables.cc Aside from some missing `notdef` (empty string) entries at the end of the arrays, which I assume causes issues since it may cause out-of-bounds array access which in JavaScript gives `undefined`, there was a `notdef` entry missing in the `MacExpertEncoding`, causing the entries after that to be shifted. This fix for this is similar to the one in #8589. The unit tests verify that, for known encoding names, the return value is not only an array, but that it is also of the right length and contains only strings.	2017-12-24 18:14:40 +01:00
Jonas Jenwald	d4cd44fd16	Add a fallback for non-embedded LucidaSans-Demi fonts (issue 9291) The PDF file in the issue uses a number of embedded versions of Lucida fonts, but for some reason does not embed the LucidaSans-Demi font. According to https://en.wikipedia.org/wiki/Lucida#Usages that one should be bold, so we can at least improve rendering here (even though it won't look perfect). Fixes 9291.	2017-12-24 17:36:58 +01:00
Tim van der Meij	957e2d420d	Implement unit tests for the network utility code This should provide 100% coverage for the file.	2017-12-23 19:24:11 +01:00
Jonas Jenwald	1dc54ddb40	Handle PDF files with missing 'endobj' operators, by searching for the "obj" string rather than "endobj" in `XRef.indexObjects` (issue 9105) This patch refactors the searching for 'endobj', to try and find the next occurance of "obj" and then check if it was in fact an 'endobj' and continue searching otherwise. This approach is used to avoid having to first find 'endobj', and then re-check the entire contents of the object and having to run (potentially expensive) regular expressions on arbitrary long strings. Fixes 9105.	2017-12-18 13:17:45 +01:00
Jonas Jenwald	6515b91118	Merge pull request #9276 from mozilla/loca-fix Fix loca table when offsets aren't in ascending order.	2017-12-15 20:59:42 +01:00

1 2 3 4 5 ...

1675 Commits