pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	e89afa5899	Stop sending the `PDFManagerReady` message from the Worker, since it's unused in the API After PR 8617 the `PDFManagerReady` message handler function, in `src/display/api.js`, is now a no-op. Hence it seems completely unnecessary to keep sending this message from `src/core/worker.js`.	2018-06-06 08:52:42 +02:00
Jonas Jenwald	eef53347fe	Ensure that the correct data is sent, with the `test` message, from the worker if typed arrays aren't properly supported With native typed array support now being mandatory in PDF.js, since version 2.0, this probably isn't a huge problem even though the current code seems wrong (it was changed in PR 6571). Note how in the `!(data instanceof Uint8Array)` case we're currently attempting to send `handler.send('test', 'main', false);` to the main-thread, which doesn't really make any sense since the signature of the method reads `send(actionName, data, transfers) {`. Hence the data that's actually being sent here is `'main'`, with `false` as the transferList, which just seems weird. On the main-thread, this means that we're in this case checking `data && data.supportTypedArray`, where `data` contains the string `'main'` rather than being falsy. Since a string doesn't have a `supportTypedArray` property, that check still fails as expected but it doesn't seem great nonetheless.	2018-06-06 08:52:42 +02:00
Jonas Jenwald	44d8afd46b	Move `MessageHandler` into a separate `src/shared/message_handler.js` file The `MessageHandler` itself, and its assorted helper functions, are currently the single largest[1] piece of code in the `src/shared/util.js` file. By moving this code into its own file, `src/shared/util.js` thus becomes smaller and more manageable.	2018-06-04 12:53:08 +02:00
Jonas Jenwald	ef081a0531	Ensure that the `WorkerTransport._passwordCapability` is always rejected, even when errors are thrown in `PDFDocumentLoadingTask.onPassword` callback Please note that while the current code works, both in the viewer and the unit-tests, it can leave the `WorkerTransport._passwordCapability` Promise in a pending state. In the `PasswordRequest` handler, in src/display/api.js, we're returning the Promise from a `capability` object (rather than just a "plain" Promise). While an error thrown anywhere within this handler was fortunately enough to propagate it to the Worker side, it won't cause the Promise (in `WorkerTransport._passwordCapability`) to actually be rejected. Finally note that while we're now catching errors in the `PasswordRequest` handler, those errors are still propagated to the Worker side via the (now) rejected Promise and the existing `return this._passwordCapability.promise;` line. This prevents warnings about uncaught Promises, with messages such as "Error: Worker was destroyed during onPassword callback", when running the unit-tests both in browsers and in Node.js/Travis.	2018-06-03 00:28:40 +02:00
Tim van der Meij	36af85db92	Merge pull request #9740 from pedrotp/replace-get-getArray Use Dict.getArray, instead of Dict.get, when getting the 'Size' in constructSampled in src/core/function.js	2018-06-02 19:50:09 +02:00
pedrotp	a190d21dd7	Use Dict.getArray, instead of Dict.get, when getting the 'Size' in constructSampled in src/core/function.js (PR 7295 follow-up)	2018-06-02 11:16:05 -04:00
Jonas Jenwald	83ff7d9de9	Simplify the DNL (Define Number of Lines) marker warning in `JpegImage.parse`	2018-05-30 22:40:11 +02:00
Jonas Jenwald	620f65488b	Ignore the rest of the image when encountering an EOI (End of Image) marker while parsing Scan data (issue 9679)	2018-05-30 22:40:11 +02:00
Jonas Jenwald	f68f60099e	Remove usage of `makeSubStream` from `Type1Parser.extractFontProgram` in src/core/type1_parser.js (issue 9735) This avoids the initialization of, potentially thousands of, unnecessary `Stream` objects, by getting the required number of bytes directly instead. Given the special behaviour, when `length === 0`, of the `getBytes`/`skip` methods, it's also necessary to handle that particular case to prevent errors when encountering empty CharStrings.	2018-05-28 14:32:20 +02:00
Brendan Dahl	2dc4af525d	Merge pull request #9659 from yurydelendik/rm-createFromIR Remove createFromIR from PDFFunctionFactory	2018-04-12 14:22:43 -07:00
Yury Delendik	20085aaa5e	Remove createFromIR from PDFFunctionFactory; forgive invalid Dict values.	2018-04-10 18:49:31 -05:00
Jani Pehkonen	8ea505545a	Use FDSelect and FDArray when converting CFF CID font to paths	2018-04-10 16:44:42 +03:00
Wojciech Maj	ea2850e9a7	Fix typos	2018-04-01 23:20:41 +02:00
Jonas Jenwald	374d074f6e	Add stricter validation in `Catalog.readPageLabels` The current PageLabel dictionary validation code won't catch some (unlikely) forms of corruption. For example: a `Type`/`S` entry being `null`/`0`/empty string, a `P`/`St` entry being `null`/`0`. Please note: I'm not aware of any bugs caused by the old code, but I've had this patch sitting locally for some time and figured it couldn't hurt to submit it.	2018-03-21 14:36:05 +01:00
Jonas Jenwald	d431ae069d	Attempt to handle corrupt PDF documents that inline Page dictionaries in a Kids array (issue 9540) According to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.1942297, the contents of a Kids array should be indirect objects.	2018-03-12 14:13:23 +01:00
Tim van der Meij	f308d73d40	Implement a single `getInheritableProperty` utility function This function combines the logic of two separate methods into one. The loop limit is also a good thing to have for the calls in `src/core/annotation.js`. Moreover, since this is important functionality, a set of unit tests and documentation is added.	2018-03-03 19:19:39 +01:00
Tim van der Meij	4e5eb59a33	Remove the `getPageProp` method in `src/core/document.js` It's only used in two places in the class and those callsites can directly get the information from the dictionary, which is more readable and avoids an additional method call.	2018-03-03 14:57:42 +01:00
Jonas Jenwald	b674409397	Move the `maxImageSize` option from the global `PDFJS` object and into `getDocument` instead	2018-03-01 18:11:16 +01:00
Rob Wu	a89071bdef	Merge pull request #9470 from Snuffleupagus/issue-4888 Ensure that `JpegImage.getData` returns the correct data length when `forceRGBoutput == true` (issue 4888)	2018-02-16 13:14:21 +01:00
Jonas Jenwald	11ab3b5c00	Ensure that `JpegImage.getData` returns the correct data length when `forceRGBoutput == true` (issue 4888) With PDF.js version `2.0` we'll only support browsers with built-in `TypedArray` functionality, hence there doesn't seem to be any good reason not to implement this now. Fixes 4888.	2018-02-13 20:44:21 +01:00
Jonas Jenwald	f05e5c5460	Take the dictionary, and not just the image data, into account when caching inline images (issue 9398) The reason for the bug is that we're only computing a checksum of the image data itself, but completely ignore the inline dictionary. The latter is important, since in practice it's not uncommon for inline images to be identical but use e.g. different ColourSpaces. There's obviously a couple of different ways that we could compute a hash/checksum of the dictionary. Initially I tried using `MurmurHash3_64` to compute a hash of the keys/values in the dictionary. Unfortunately this approach turned out to be way too slow in practice, especially for PDF files with a huge number of inline images; in particular issue 2618 would regresses quite badly with this solution. The solution that is instead implemented in this patch, is to compute a checksum of the dictionary contents. While this is a much simpler, not to mention a lot more efficient, solution there's one drawback associated with it: If the contents of inline image dictionaries are ordered differently, they will not be considered equal with this approach which could thus lead to failures to cache repeated inline images. In practice this doesn't seem to be a problem in any of the PDF files I've tested, and generally I'd rather err on the side of not caching given that too aggressive caching can easily lead to rendering bugs. One small, but somewhat annoying, complication is that by the time `Parser.makeInlineImage` is called, we no longer know the exact stream position where the inline image dictionary starts. Having access to that information is crucial here, and the easiest solution I could come up with is to track this in the current `Lexer` instance.[1] With the patch, we're thus able to fix the referenced issues without incurring large regressions in problematic cases such as issue 2618. Fixes 9398; also improves/fixes the `issue8823` reference test. --- [1] Obviously I'd have preferred if this patch could be limited to `Parser.makeInlineImage`, without the need for this "hack", but I'm not sure what that'd look like here.	2018-02-12 16:43:47 +01:00
Tim van der Meij	7bb066494f	Merge pull request #9427 from Snuffleupagus/native-JPEG-decoding-fallback Fallback to the built-in JPEG decoder when browser decoding fails, and attempt to handle JPEG images with DNL (Define Number of Lines) markers (issue 8614)	2018-02-09 21:36:08 +01:00
Jonas Jenwald	a18c65ae9f	Use the correct stream position when reading `maxSizeOfInstructions` from the `maxp` table (issue 9458) Please refer to the `maxp` table specification, found at https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6maxp.html. Fixes 9458.	2018-02-07 21:57:43 +01:00
Jonas Jenwald	bf4166e6c9	Attempt to handle DNL (Define Number of Lines) markers when parsing JPEG images (issue 8614) Please refer to the specification, found at https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=49 Given how the JPEG decoder is currently implemented, we need to know the value of the scanLines parameter (among others) before parsing of the SOS (Start of Scan) data begins. Hence the best solution I could come up with here, is to re-parse the image in the hopefully rare case of JPEG images that include a DNL (Define Number of Lines) marker. Fixes 8614.	2018-02-05 21:05:32 +01:00
Jonas Jenwald	80441346a3	Fallback to the built-in JPEG decoder if 'JpegStream', in `src/display/api.js`, fails to load the image This works by making `PartialEvaluator.buildPaintImageXObject` wait for the success/failure of `loadJpegStream` on the API side before parsing continues. Please note that in practice, it should be quite rare for the browser to fail loading/decoding of a JPEG image. In the general case, it should thus not be completely surprising if even `src/core/jpg.js` will fail to decode the image.	2018-02-05 21:05:31 +01:00
Jonas Jenwald	76afe1018b	Fallback to built-in image decoding if the `NativeImageDecoder` fails In particular this means that if 'JpegDecode', in `src/display/api.js`, fails we'll fallback to the built-in JPEG decoder.	2018-02-05 17:01:35 +01:00
Jonas Jenwald	7f73fc9ace	Re-factor `PartialEvaluator.buildPaintImageXObject` to make it asynchronous This is necessary for upcoming changes, which will add fallback code-paths to allow graceful handling of native image decoding failures.	2018-02-05 17:01:35 +01:00
Jonas Jenwald	ec85d5c625	Change the signature of `PartialEvaluator.buildPaintImageXObject` to take a parameter object This method currently requires a fair number of parameters, which creates quite unwieldy call-sites. When invoking `buildPaintImageXObject`, you have to remember not only which arguments to supply, but also the correct order, to prevent run-time errors.	2018-02-05 17:01:35 +01:00
Jonas Jenwald	712090eff8	Upstream the changes from: Bug 1339461 - Convert foo.indexOf(...) == -1 to foo.includes() and implement an eslint rule to enforce this Yet another case where PDF.js code was modified in `mozilla-central` without the changes happening in the GitHub repo first; sigh. If we don't upstream at least the changes in `extensions/firefox/`, any future update of PDF.js in `mozilla-central` will be blocked. Please see: - https://bugzilla.mozilla.org/show_bug.cgi?id=1339461 - https://hg.mozilla.org/mozilla-central/rev/d5a5ad1dbbf2	2018-02-04 14:59:27 +01:00
Tim van der Meij	73436c0d12	Implement the `AESBaseCipher` class and let the `AES128Cipher` and `AES256Cipher` classes extend it	2018-02-03 20:16:33 +01:00
Tim van der Meij	9a959e4df7	Update the `AES128Cipher` and `AES256Cipher` implementations to be more similar This commit is the first step for extracting a base class for the `AES128Cipher` and the `AES256Cipher` classes. The objective here is to make code changes (not altering the logic) to make the implementations as similar as possible as found by creating a diff of both classes. In particular, we extract the key size and cycles of repetitions constants since they are different for AES-128 and AES-256. Moreover, we rename functions to be similar. In the `AES256Cipher` class, there was an additional assignment to `this` in the decryption function. However, this was unnecessary because the assignment would also be done when the loop was exited.	2018-02-03 20:16:29 +01:00
Jonas Jenwald	f4a95de694	Attempt to find the next valid marker when encountering invalid image data in `JpegImage.parse` (issue 9425) In the JPEG images in the referenced PDF file, the DHT (Define Huffman Tables) segments contain more data than expected based on the length parameter. Fixes 9425.	2018-02-03 16:01:19 +01:00
Jani Pehkonen	5593c970e0	Implement Huffman coding in JBIG2	2018-01-23 17:04:07 +02:00
Tim van der Meij	9746646511	Merge pull request #9386 from shikhar-scs/remove-parsejbig2-function removed parseJbig2 function	2018-01-21 14:51:59 +01:00
Shikhar Agnihotri	43e003cf5c	removed parseJbig2 function	2018-01-20 19:49:06 +05:30
Jonas Jenwald	69a8336cf1	Address the final round of review comments for Content-Disposition filename extraction This patch updates the `IPDFStreamReader` interface and ensures that the interface/implementation of `network.js`, `fetch_stream.js`, `node_stream.js`, and `transport_stream.js` all match properly. The unit-tests are also adjusted, to more closely replicate the actual behaviour of the various actual `IPDFStreamReader` implementations. Finally, this patch adjusts the use of the Content-Disposition filename when setting the title in the viewer, and adds `PDFDocumentProperties` support as well.	2018-01-18 17:39:22 +01:00
Jonas Jenwald	0e1b5589e7	Restore the `btoa`/`atob` polyfills for Node.js These were removed in PR 9170, since they were unused in the browsers that we'll support in PDF.js version `2.0`. However looking at the output of Travis, where a subset of the unit-tests are run using Node.js, there's warnings about `btoa` being undefined. This doesn't appear to cause any errors, which probably explains why we didn't notice this before (despite PR 9201).	2018-01-13 01:31:05 +01:00
Jonas Jenwald	d0c8992e8a	Attempt to actually resolve ColourSpace names in accordance with the specification (issue 9285) Please refer to the PDF specification, in particular http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.3801570 > A colour space shall be specified in one of two ways: > - Within a content stream, the CS or cs operator establishes the current colour space parameter in the graphics state. The operand shall always be name object, which either identifies one of the colour spaces that need no additional parameters (DeviceGray, DeviceRGB, DeviceCMYK, or some cases of Pattern) or shall be used as a key in the ColorSpace subdictionary of the current resource dictionary (see 7.8.3, "Resource Dictionaries"). In the latter case, the value of the dictionary entry in turn shall be a colour space array or name. A colour space array shall never be inline within a content stream. > > - Outside a content stream, certain objects, such as image XObjects, shall specify a colour space as an explicit parameter, often associated with the key ColorSpace. In this case, the colour space array or name shall always be defined directly as a PDF object, not by an entry in the ColorSpace resource subdictionary. This convention also applies when colour spaces are defined in terms of other colour spaces.	2018-01-10 20:20:43 +01:00
Jonas Jenwald	d6c028b946	Add support for TrueType Collection fonts (issue 9262) The specification can be found at https://www.microsoft.com/typography/otspec/otff.htm, under the "Font Collections" heading. Fixes 9262.	2018-01-08 22:31:08 +01:00
Tim van der Meij	6b2ed504b7	Merge pull request #9336 from Snuffleupagus/jpx-SIZ Correctly extract component data from "Image and tile size" (SIZ) markers in JPEG 2000 images	2018-01-03 23:34:34 +01:00
Jonas Jenwald	873556865b	Correctly extract component data from "Image and tile size" (SIZ) markers in JPEG 2000 images This is something that I noticed while attempting to debug https://bugzilla.mozilla.org/show_bug.cgi?id=1374945. Just looking at the code, the `YRsiz` parameter seemed immediately wrong and the fact that every component used the same data also looked strange. Comparing with the specification, see https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-T.800-200208-S!!PDF-E&type=items#page=37, confirmed that this is indeed incorrect. Note that I haven't got any example of a PDF file that is fixed by this patch, but that might be more luck than anything else. Manually checking a couple of files with included JPEG 2000 images, the `Csiz`/`XRsiz`/`YRsiz` parameters were `1` which could explain why this hasn't been an issue before. Obviously we shouldn't generally make changes to `core` code without adding tests, but in this case I'm simply not sure how to obtain/create one. However, since the existing code doesn't make sense this patch could hopefully be deemed acceptable anyway.	2018-01-03 16:26:28 +01:00
Jonas Jenwald	2db75a2a3a	Update the ESLint dependencies, and also tweak the `no-multiple-empty-lines` rules Since multiple empty lines is virtually unused in the code-base, and the few cases that do exist look like "typos", let's enforce greater consistency here; please see https://eslint.org/docs/rules/no-multiple-empty-lines.	2018-01-03 13:32:57 +01:00
Jonas Jenwald	c5700211d6	Adjust `decodeACSuccessive` in src/core/jpg.js to improve the rendering quality of (progressive) JPEG images I've been looking into the remaining point in 8637 about blurry images, to see if we could perhaps improve the rendering quality slightly there. After quite a bit of debugging, it seems that the issue is limited to certain progressive JPEG images. As mentioned previously, I've got no detailed knowledge of the JPEG format, but this patch does seem to improve things quite a bit for the images in question. Squinting at https://searchfox.org/mozilla-central/rev/6c33dde6ca02b389c52e8db3d22494df8b916f33/media/libjpeg/jdphuff.c#492-639, it seems reasonable that we should take the sign of the data into account. Furthermore, looking at the specification in https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=118, the "F.2.4.3 Decoding the binary decision sequence for non-zero DC differences and AC coefficients" section even contains a description of this (even though I cannot claim to really understand the details).	2017-12-30 15:24:09 +01:00
Jonas Jenwald	d6eed132e5	Correct the indentation in the `switch` statement in `decodeACSuccessive` in src/core/jpg.js	2017-12-30 15:22:30 +01:00
Jonas Jenwald	8c4b7d0439	Avoid truncating JPEG images with DeviceGray ColourSpaces when using the `src/core/jpg.js` built-in decoder The bug that this patch fixes is limited to the built-in JPEG decoder, and was unearthed by PR 9260. The underlying issue has existed since PR 6984, where the contents of this patch ought to have been included (if it weren't for the fact that we had no easy way to test `src/core/jpg.js` back then). Please note: The slight movement in the reference test is a result of using the `src/core/jpg.js` decoder, rather than the native browser one.	2017-12-29 18:44:07 +01:00
Jonas Jenwald	ec21bd9626	Merge pull request #9314 from timvandermeij/encodings Implement unit tests for the encodings and fix missing items	2017-12-27 22:02:38 +01:00
Tim van der Meij	c7af2db2ec	Implement unit tests for the encodings and fix missing items Initially I just implemented the unit tests, but quickly found that they were failing my expectation of having a size of 256 items. Some of them did contain 256 items and some did not. I looked up various resources and figured that they indeed all need to have 256 items. One of the good resources is https://github.com/davidben/poppler/blob/master/poppler/FontEncodingTables.cc Aside from some missing `notdef` (empty string) entries at the end of the arrays, which I assume causes issues since it may cause out-of-bounds array access which in JavaScript gives `undefined`, there was a `notdef` entry missing in the `MacExpertEncoding`, causing the entries after that to be shifted. This fix for this is similar to the one in #8589. The unit tests verify that, for known encoding names, the return value is not only an array, but that it is also of the right length and contains only strings.	2017-12-24 18:14:40 +01:00
Jonas Jenwald	d4cd44fd16	Add a fallback for non-embedded LucidaSans-Demi fonts (issue 9291) The PDF file in the issue uses a number of embedded versions of Lucida fonts, but for some reason does not embed the LucidaSans-Demi font. According to https://en.wikipedia.org/wiki/Lucida#Usages that one should be bold, so we can at least improve rendering here (even though it won't look perfect). Fixes 9291.	2017-12-24 17:36:58 +01:00
Jonas Jenwald	e58f2f513a	[api-major] Remove the unused `encrypted` property from the `pdfInfo` object sent from the worker via the `GetDoc` message I recall being confused as to the purpose of the `encrypted` property all the way back when working on PR 4750. Looking at the history, this property was added in PR 1698 when password support was added to the API/viewer. However, its only purpose seem to have been to facilitate the addition of a `isEncrypted` function in the API. That function never, as far as I can tell, saw any use and was unceremoniously removed in PR 4144. Since we want to avoid sending all non-essential data early during initial document loading (e.g. PR 4750), it seems correct to get rid of the `encrypted` property. Especially since it hasn't even been exposed in the API for over three years, with no complaints that I'm aware of. Finally note that the `encrypt` property on the `XRef` instance isn't tied to the code that's being removed here. Given that we're calling `PDFDocument.parse` during `createDocumentHandler` in the worker which, via `PDFDocument.setup`, calls `XRef.parse` where the `Encrypt` data (if it exists) is always parsed.	2017-12-21 13:10:23 +01:00
Jonas Jenwald	1dc54ddb40	Handle PDF files with missing 'endobj' operators, by searching for the "obj" string rather than "endobj" in `XRef.indexObjects` (issue 9105) This patch refactors the searching for 'endobj', to try and find the next occurance of "obj" and then check if it was in fact an 'endobj' and continue searching otherwise. This approach is used to avoid having to first find 'endobj', and then re-check the entire contents of the object and having to run (potentially expensive) regular expressions on arbitrary long strings. Fixes 9105.	2017-12-18 13:17:45 +01:00

1 2 3 4 5 ...

1302 Commits