pdf.js

Author	SHA1	Message	Date
Tim van der Meij	98ea39f9d0	Merge pull request #9827 from Snuffleupagus/misc-corrupt-pdf-fixes Fix various corrupt PDF files (issue 9252, issue 9418)	2018-06-21 22:35:00 +02:00
Brendan Dahl	a278c5a8dc	Merge pull request #9795 from timvandermeij/object-assign Replace `Util.extendObj` by `Object.assign`	2018-06-20 10:50:40 -07:00
Jonas Jenwald	56e3648b65	Add basic validation of the 'trailer' dictionary candidates in `XRef.indexObjects` (issue 9418) This patch avoids choosing a (possible) 'trailer' dictionary that `XRef.parse` and/or the `Catalog` constructor/methods will reject anyway. Since `XRef.indexObjects` is already parsing the entire PDF file, the extra dictionary look-ups added here shouldn't matter much. Besides, this is a fallback code-path that only applies to corrupt PDF files anyway.	2018-06-20 13:41:22 +02:00
Jonas Jenwald	346810e02a	Add basic validation of the 'Root' dictionary in `XRef.parse` and try to recover when possible Note that the `Catalog` constructor, and some of its methods, are already enforcing that the 'Root' dictionary is valid/well-formed. However, by doing additional validation already in `XRef.parse` there's a slightly larger chance that corrupt PDF files could be successfully parsed/rendered.	2018-06-20 13:41:22 +02:00
Jonas Jenwald	e84813e7cc	Prevent hard errors if fetching the `Encrypt` dictionary fails in `XRef.parse`	2018-06-20 13:41:22 +02:00
Jonas Jenwald	30ad62a86a	Use the correct `startPos` when repeating the search for 'endobj' operators in `XRef.indexObjects` (PR 9288 follow-up)	2018-06-20 13:41:22 +02:00
Jonas Jenwald	6bbcafcd26	Let `Lexer.getNumber` treat a single decimal point as zero (issue 9252) This is consistent with the behaviour in Adobe Reader.	2018-06-20 13:41:21 +02:00
Jonas Jenwald	df4799a12a	Ensure that line-breaks are only skipped after operators in `Lexer.getNumber` (PR 8359 follow-up) With the current code line-breaks are accepted not just after an operator, but after a decimal point as well. When looking at this again, the latter case seems prone to cause false positives and might also interfere with subsequent patches. Hence this is code is adjusted to actually do what the original commit message says, and nothing more.	2018-06-20 13:41:15 +02:00
Jonas Jenwald	bfc88ead66	Expose a `Jbig2Image.parse` method, by re-instating the `parseJbig2` function The purpose of this patch is to hopefully provide slightly better user ergonomics, if/when the PDF.js image decoders are used standalone. This implementation is (basically) reverting the changes in PR 9386, in conjunction with code from the `parse` method found at https://github.com/notmasteryet/jpgjs/blob/master/src/pdfjs.js	2018-06-16 17:56:54 +02:00
Jonas Jenwald	682672db8e	Change the signature of the `JpegImage` constructor, to allow passing in various options directly	2018-06-16 17:56:54 +02:00
Jonas Jenwald	d4ff541b78	Enforce the use, in non-production/test-only mode, of `Uint8ClampedArray` in all relevant methods in `ColorSpace` and `PDFImage` Since `ColorSpace` now depends on the native clamping of `Uint8ClampedArray`, this patch adds non-production/test-only `assert`s to enforce that the expected TypedArray is used for the output. These `assert`s are purposely not included in PRODUCTION builds since that would break rendering completely, as opposed to "only" displaying some weird colours, when a `Uint8Array` was used. Furthermore, these are mostly added to help catch explicit developer errors when working with the `ColorSpace` and `PDFImage` code.	2018-06-12 11:01:32 +02:00
Jonas Jenwald	4b69bb7fe9	Add a TESTING build option, to enable using non-production/test-only code-paths Since the tests (currently) run with the `pdf.worker.js` file built, i.e. with `PRODUCTION = true` set, there's no simple way to add e.g. `assert` calls for both non-production and test-only builds without also affecting PRODUCTION builds.	2018-06-12 11:01:32 +02:00
Jonas Jenwald	f01e54eae1	Improve the warning messages printed by `PartialEvaluator.{getOperatorList, getTextContent} when errors are being ignored Currently the actual errors aren't printed, which can make debugging harder than necessary.	2018-06-12 11:01:32 +02:00
Jonas Jenwald	731f2e6dfc	Remove manual clamping/rounding from `ColorSpace` and `PDFImage`, by having their methods use `Uint8ClampedArray`s The built-in image decoders are already using `Uint8ClampedArray` when returning data, and this patch simply extends that to the rest of the image/colorspace code. As far as I can tell, the only reason for using manual clamping/rounding in the first place was because TypedArrays used to be polyfilled (using regular arrays). And trying to polyfill the native clamping/rounding would probably have been had too much overhead, but given that TypedArray support is required in PDF.js version `2.0` that's no longer a concern. Please note: Because of different rounding behaviour, basically `Math.round` in `Uint8ClampedArray` respectively `Math.floor` in the old code, there will be very slight movement in quite a few existing test-cases. However, the changes should be imperceivable to the naked eye, given that the absolute difference is at most `1` for each RGB component when comparing `master` and this patch (see also the updated expectation values in the unit-tests).	2018-06-12 11:01:32 +02:00
Jonas Jenwald	55199aa281	Remove the unused `bpc` parameter from, and update the signature of, the `resizeRgbImage` function in `src/core/colorspace.js`	2018-06-12 11:01:32 +02:00
Jonas Jenwald	d1637056b3	Use shorthand method signatures in `src/core/colorspace.js`	2018-06-12 11:01:32 +02:00
Jonas Jenwald	32367c5968	Make the `getBytes`/`peekBytes` methods of `Stream`/`DecodeStream`/`ChunkedStream` able to return `Uint8ClampedArray`s The built-in image decoders are already returning data as `Uint8ClampedArray`, and subsequently the JPEG/JBIG2/JPX streams are as well. However, for general streams we obviously don't want to force the use of `Uint8ClampedArray` unless an "Image" is actually being decoded. Hence this patch, which adds a parameter that allows the caller of the `getBytes`/`peekBytes` methods to force a `Uint8ClampedArray` (rather than a `Uint8Array`) to be returned.	2018-06-12 11:01:32 +02:00
Tim van der Meij	af8e88d00b	Replace `Util.extendObj` by `Object.assign`	2018-06-10 20:11:03 +02:00
Tim van der Meij	903bad1906	Remove `Util.appendToArray` and `Util.prependToArray` The former may be replaced by regular JavaScript array concatenation and the latter is unused. This avoids unnecessary function calls/imports.	2018-06-10 15:24:09 +02:00
Jonas Jenwald	2fdaa3d54c	Update the `postMessageTransfers` comment in `createDocumentHandler` in the `src/core/worker.js` file Since the old comment mentions a now unsupported browser, let's update it such that someone won't accidentally conclude that the code in question can be removed.	2018-06-06 08:52:43 +02:00
Jonas Jenwald	b263b702e8	Rename `PDFPageProxy.pageInfo` to `PDFPageProxy._pageInfo` to indicate that the property should be considered "private" Since `PDFPageProxy` already provide getters for all the data returned by `GetPage` (in the Worker), there isn't any compelling reason for accessing the `pageInfo` directly on `PDFPageProxy`. The patch also changes the `GetPage` handler, in `src/core/worker.js`, to use modern JavaScript features.	2018-06-06 08:52:42 +02:00
Jonas Jenwald	e89afa5899	Stop sending the `PDFManagerReady` message from the Worker, since it's unused in the API After PR 8617 the `PDFManagerReady` message handler function, in `src/display/api.js`, is now a no-op. Hence it seems completely unnecessary to keep sending this message from `src/core/worker.js`.	2018-06-06 08:52:42 +02:00
Jonas Jenwald	eef53347fe	Ensure that the correct data is sent, with the `test` message, from the worker if typed arrays aren't properly supported With native typed array support now being mandatory in PDF.js, since version 2.0, this probably isn't a huge problem even though the current code seems wrong (it was changed in PR 6571). Note how in the `!(data instanceof Uint8Array)` case we're currently attempting to send `handler.send('test', 'main', false);` to the main-thread, which doesn't really make any sense since the signature of the method reads `send(actionName, data, transfers) {`. Hence the data that's actually being sent here is `'main'`, with `false` as the transferList, which just seems weird. On the main-thread, this means that we're in this case checking `data && data.supportTypedArray`, where `data` contains the string `'main'` rather than being falsy. Since a string doesn't have a `supportTypedArray` property, that check still fails as expected but it doesn't seem great nonetheless.	2018-06-06 08:52:42 +02:00
Jonas Jenwald	44d8afd46b	Move `MessageHandler` into a separate `src/shared/message_handler.js` file The `MessageHandler` itself, and its assorted helper functions, are currently the single largest[1] piece of code in the `src/shared/util.js` file. By moving this code into its own file, `src/shared/util.js` thus becomes smaller and more manageable.	2018-06-04 12:53:08 +02:00
Jonas Jenwald	ef081a0531	Ensure that the `WorkerTransport._passwordCapability` is always rejected, even when errors are thrown in `PDFDocumentLoadingTask.onPassword` callback Please note that while the current code works, both in the viewer and the unit-tests, it can leave the `WorkerTransport._passwordCapability` Promise in a pending state. In the `PasswordRequest` handler, in src/display/api.js, we're returning the Promise from a `capability` object (rather than just a "plain" Promise). While an error thrown anywhere within this handler was fortunately enough to propagate it to the Worker side, it won't cause the Promise (in `WorkerTransport._passwordCapability`) to actually be rejected. Finally note that while we're now catching errors in the `PasswordRequest` handler, those errors are still propagated to the Worker side via the (now) rejected Promise and the existing `return this._passwordCapability.promise;` line. This prevents warnings about uncaught Promises, with messages such as "Error: Worker was destroyed during onPassword callback", when running the unit-tests both in browsers and in Node.js/Travis.	2018-06-03 00:28:40 +02:00
Tim van der Meij	36af85db92	Merge pull request #9740 from pedrotp/replace-get-getArray Use Dict.getArray, instead of Dict.get, when getting the 'Size' in constructSampled in src/core/function.js	2018-06-02 19:50:09 +02:00
pedrotp	a190d21dd7	Use Dict.getArray, instead of Dict.get, when getting the 'Size' in constructSampled in src/core/function.js (PR 7295 follow-up)	2018-06-02 11:16:05 -04:00
Jonas Jenwald	83ff7d9de9	Simplify the DNL (Define Number of Lines) marker warning in `JpegImage.parse`	2018-05-30 22:40:11 +02:00
Jonas Jenwald	620f65488b	Ignore the rest of the image when encountering an EOI (End of Image) marker while parsing Scan data (issue 9679)	2018-05-30 22:40:11 +02:00
Jonas Jenwald	f68f60099e	Remove usage of `makeSubStream` from `Type1Parser.extractFontProgram` in src/core/type1_parser.js (issue 9735) This avoids the initialization of, potentially thousands of, unnecessary `Stream` objects, by getting the required number of bytes directly instead. Given the special behaviour, when `length === 0`, of the `getBytes`/`skip` methods, it's also necessary to handle that particular case to prevent errors when encountering empty CharStrings.	2018-05-28 14:32:20 +02:00
Brendan Dahl	2dc4af525d	Merge pull request #9659 from yurydelendik/rm-createFromIR Remove createFromIR from PDFFunctionFactory	2018-04-12 14:22:43 -07:00
Yury Delendik	20085aaa5e	Remove createFromIR from PDFFunctionFactory; forgive invalid Dict values.	2018-04-10 18:49:31 -05:00
Jani Pehkonen	8ea505545a	Use FDSelect and FDArray when converting CFF CID font to paths	2018-04-10 16:44:42 +03:00
Wojciech Maj	ea2850e9a7	Fix typos	2018-04-01 23:20:41 +02:00
Jonas Jenwald	374d074f6e	Add stricter validation in `Catalog.readPageLabels` The current PageLabel dictionary validation code won't catch some (unlikely) forms of corruption. For example: a `Type`/`S` entry being `null`/`0`/empty string, a `P`/`St` entry being `null`/`0`. Please note: I'm not aware of any bugs caused by the old code, but I've had this patch sitting locally for some time and figured it couldn't hurt to submit it.	2018-03-21 14:36:05 +01:00
Jonas Jenwald	d431ae069d	Attempt to handle corrupt PDF documents that inline Page dictionaries in a Kids array (issue 9540) According to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.1942297, the contents of a Kids array should be indirect objects.	2018-03-12 14:13:23 +01:00
Tim van der Meij	f308d73d40	Implement a single `getInheritableProperty` utility function This function combines the logic of two separate methods into one. The loop limit is also a good thing to have for the calls in `src/core/annotation.js`. Moreover, since this is important functionality, a set of unit tests and documentation is added.	2018-03-03 19:19:39 +01:00
Tim van der Meij	4e5eb59a33	Remove the `getPageProp` method in `src/core/document.js` It's only used in two places in the class and those callsites can directly get the information from the dictionary, which is more readable and avoids an additional method call.	2018-03-03 14:57:42 +01:00
Jonas Jenwald	b674409397	Move the `maxImageSize` option from the global `PDFJS` object and into `getDocument` instead	2018-03-01 18:11:16 +01:00
Rob Wu	a89071bdef	Merge pull request #9470 from Snuffleupagus/issue-4888 Ensure that `JpegImage.getData` returns the correct data length when `forceRGBoutput == true` (issue 4888)	2018-02-16 13:14:21 +01:00
Jonas Jenwald	11ab3b5c00	Ensure that `JpegImage.getData` returns the correct data length when `forceRGBoutput == true` (issue 4888) With PDF.js version `2.0` we'll only support browsers with built-in `TypedArray` functionality, hence there doesn't seem to be any good reason not to implement this now. Fixes 4888.	2018-02-13 20:44:21 +01:00
Jonas Jenwald	f05e5c5460	Take the dictionary, and not just the image data, into account when caching inline images (issue 9398) The reason for the bug is that we're only computing a checksum of the image data itself, but completely ignore the inline dictionary. The latter is important, since in practice it's not uncommon for inline images to be identical but use e.g. different ColourSpaces. There's obviously a couple of different ways that we could compute a hash/checksum of the dictionary. Initially I tried using `MurmurHash3_64` to compute a hash of the keys/values in the dictionary. Unfortunately this approach turned out to be way too slow in practice, especially for PDF files with a huge number of inline images; in particular issue 2618 would regresses quite badly with this solution. The solution that is instead implemented in this patch, is to compute a checksum of the dictionary contents. While this is a much simpler, not to mention a lot more efficient, solution there's one drawback associated with it: If the contents of inline image dictionaries are ordered differently, they will not be considered equal with this approach which could thus lead to failures to cache repeated inline images. In practice this doesn't seem to be a problem in any of the PDF files I've tested, and generally I'd rather err on the side of not caching given that too aggressive caching can easily lead to rendering bugs. One small, but somewhat annoying, complication is that by the time `Parser.makeInlineImage` is called, we no longer know the exact stream position where the inline image dictionary starts. Having access to that information is crucial here, and the easiest solution I could come up with is to track this in the current `Lexer` instance.[1] With the patch, we're thus able to fix the referenced issues without incurring large regressions in problematic cases such as issue 2618. Fixes 9398; also improves/fixes the `issue8823` reference test. --- [1] Obviously I'd have preferred if this patch could be limited to `Parser.makeInlineImage`, without the need for this "hack", but I'm not sure what that'd look like here.	2018-02-12 16:43:47 +01:00
Tim van der Meij	7bb066494f	Merge pull request #9427 from Snuffleupagus/native-JPEG-decoding-fallback Fallback to the built-in JPEG decoder when browser decoding fails, and attempt to handle JPEG images with DNL (Define Number of Lines) markers (issue 8614)	2018-02-09 21:36:08 +01:00
Jonas Jenwald	a18c65ae9f	Use the correct stream position when reading `maxSizeOfInstructions` from the `maxp` table (issue 9458) Please refer to the `maxp` table specification, found at https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6maxp.html. Fixes 9458.	2018-02-07 21:57:43 +01:00
Jonas Jenwald	bf4166e6c9	Attempt to handle DNL (Define Number of Lines) markers when parsing JPEG images (issue 8614) Please refer to the specification, found at https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=49 Given how the JPEG decoder is currently implemented, we need to know the value of the scanLines parameter (among others) before parsing of the SOS (Start of Scan) data begins. Hence the best solution I could come up with here, is to re-parse the image in the hopefully rare case of JPEG images that include a DNL (Define Number of Lines) marker. Fixes 8614.	2018-02-05 21:05:32 +01:00
Jonas Jenwald	80441346a3	Fallback to the built-in JPEG decoder if 'JpegStream', in `src/display/api.js`, fails to load the image This works by making `PartialEvaluator.buildPaintImageXObject` wait for the success/failure of `loadJpegStream` on the API side before parsing continues. Please note that in practice, it should be quite rare for the browser to fail loading/decoding of a JPEG image. In the general case, it should thus not be completely surprising if even `src/core/jpg.js` will fail to decode the image.	2018-02-05 21:05:31 +01:00
Jonas Jenwald	76afe1018b	Fallback to built-in image decoding if the `NativeImageDecoder` fails In particular this means that if 'JpegDecode', in `src/display/api.js`, fails we'll fallback to the built-in JPEG decoder.	2018-02-05 17:01:35 +01:00
Jonas Jenwald	7f73fc9ace	Re-factor `PartialEvaluator.buildPaintImageXObject` to make it asynchronous This is necessary for upcoming changes, which will add fallback code-paths to allow graceful handling of native image decoding failures.	2018-02-05 17:01:35 +01:00
Jonas Jenwald	ec85d5c625	Change the signature of `PartialEvaluator.buildPaintImageXObject` to take a parameter object This method currently requires a fair number of parameters, which creates quite unwieldy call-sites. When invoking `buildPaintImageXObject`, you have to remember not only which arguments to supply, but also the correct order, to prevent run-time errors.	2018-02-05 17:01:35 +01:00
Jonas Jenwald	712090eff8	Upstream the changes from: Bug 1339461 - Convert foo.indexOf(...) == -1 to foo.includes() and implement an eslint rule to enforce this Yet another case where PDF.js code was modified in `mozilla-central` without the changes happening in the GitHub repo first; sigh. If we don't upstream at least the changes in `extensions/firefox/`, any future update of PDF.js in `mozilla-central` will be blocked. Please see: - https://bugzilla.mozilla.org/show_bug.cgi?id=1339461 - https://hg.mozilla.org/mozilla-central/rev/d5a5ad1dbbf2	2018-02-04 14:59:27 +01:00

... 3 4 5 6 7 ...

1523 Commits