pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	690bcc8c8a	Add a reduced, `eq`, test-case for issue 9915	2018-07-29 23:06:15 +02:00
Jonas Jenwald	2b25deb84c	Prevent errors in `sanitizeTTProgram`, during parsing of CALL functions, when encountering invalid functions stack deltas (bug 1473809) I was feeling bored; so this is a very quick, and somewhat naive, attempt at fixing the bug. The breaking error, i.e. `Error during font loading: invalid array length`, was thrown when attempting to re-size the `stack` to a negative length when parsing the CALL functions. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1473809.	2018-07-10 09:45:55 +02:00
Jonas Jenwald	7f21e38787	Error, rather than warn, once a number of invalid path operators are encountered in `EvaluatorPreprocessor.read` (bug 1443140) Incomplete path operators, in particular, can result in fairly chaotic rendering artifacts, as can be observed on page four of the referenced PDF file. The initial (naive) solution that was attempted, was to simply throw a `FormatError` as soon as any invalid (i.e. too short) operator was found and rely on the existing `ignoreErrors` code-paths. However, doing so would have caused regressions in some files; see the existing `issue2391-1` test-case, which was promoted to an `eq` test to help prevent future bugs. Hence this patch, which adds special handling for invalid path operators since those may cause quite bad rendering artifacts. You could, in all fairness, argue that the patch is a handwavy solution and I wouldn't object. However, given that this only concerns corrupt PDF files, the way that PDF viewers (PDF.js included) try to gracefully deal with those could probably be described as a best-effort solution anyway. This patch also adjusts the existing `warn`/`info` messages to print the command name according to the PDF specification, rather than an internal PDF.js enumeration value. The former should be much more useful for debugging purposes. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1443140.	2018-06-24 16:05:08 +02:00
Jonas Jenwald	56e3648b65	Add basic validation of the 'trailer' dictionary candidates in `XRef.indexObjects` (issue 9418) This patch avoids choosing a (possible) 'trailer' dictionary that `XRef.parse` and/or the `Catalog` constructor/methods will reject anyway. Since `XRef.indexObjects` is already parsing the entire PDF file, the extra dictionary look-ups added here shouldn't matter much. Besides, this is a fallback code-path that only applies to corrupt PDF files anyway.	2018-06-20 13:41:22 +02:00
Jonas Jenwald	6bbcafcd26	Let `Lexer.getNumber` treat a single decimal point as zero (issue 9252) This is consistent with the behaviour in Adobe Reader.	2018-06-20 13:41:21 +02:00
Jonas Jenwald	bf0db0fb72	Pass the `ignoreErrors` API option to the `FontFaceObject` constructor, and utilize it in `getPathGenerator` to ignore missing glyphs Obviously it's still not possible to render non-embedded fonts as paths, but in this way the rest of the page will at least be allowed to continue rendering. Please note: Including the 14 standard fonts in PDF.js probably wouldn't be that difficult to implement. (I'm not a lawyer, but the fonts from PDFium could probably be used given their BSD license.) However, the main blocker ought to be the total size of the necessary font data, since I cannot imagine people being OK with shipping ~5 MB of (additional) font data with Firefox. (Based on the reactions when the CMap files were added, and those are only ~1 MB in size.)	2018-06-13 11:02:06 +02:00
Jonas Jenwald	620f65488b	Ignore the rest of the image when encountering an EOI (End of Image) marker while parsing Scan data (issue 9679)	2018-05-30 22:40:11 +02:00
Tim van der Meij	2f3b05fd5a	Convert all PDF links from HTTP to HTTPS	2018-05-27 16:02:04 +02:00
Tim van der Meij	4958b59369	Fix broken links to Bugzilla PDF attachments	2018-05-27 15:40:33 +02:00
Jani Pehkonen	8ea505545a	Use FDSelect and FDArray when converting CFF CID font to paths	2018-04-10 16:44:42 +03:00
Jonas Jenwald	d431ae069d	Attempt to handle corrupt PDF documents that inline Page dictionaries in a Kids array (issue 9540) According to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.1942297, the contents of a Kids array should be indirect objects.	2018-03-12 14:13:23 +01:00
Tim van der Meij	7bb066494f	Merge pull request #9427 from Snuffleupagus/native-JPEG-decoding-fallback Fallback to the built-in JPEG decoder when browser decoding fails, and attempt to handle JPEG images with DNL (Define Number of Lines) markers (issue 8614)	2018-02-09 21:36:08 +01:00
Jonas Jenwald	a18c65ae9f	Use the correct stream position when reading `maxSizeOfInstructions` from the `maxp` table (issue 9458) Please refer to the `maxp` table specification, found at https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6maxp.html. Fixes 9458.	2018-02-07 21:57:43 +01:00
Jonas Jenwald	bf4166e6c9	Attempt to handle DNL (Define Number of Lines) markers when parsing JPEG images (issue 8614) Please refer to the specification, found at https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=49 Given how the JPEG decoder is currently implemented, we need to know the value of the scanLines parameter (among others) before parsing of the SOS (Start of Scan) data begins. Hence the best solution I could come up with here, is to re-parse the image in the hopefully rare case of JPEG images that include a DNL (Define Number of Lines) marker. Fixes 8614.	2018-02-05 21:05:32 +01:00
Jonas Jenwald	f4a95de694	Attempt to find the next valid marker when encountering invalid image data in `JpegImage.parse` (issue 9425) In the JPEG images in the referenced PDF file, the DHT (Define Huffman Tables) segments contain more data than expected based on the length parameter. Fixes 9425.	2018-02-03 16:01:19 +01:00
Jani Pehkonen	5593c970e0	Implement Huffman coding in JBIG2	2018-01-23 17:04:07 +02:00
Jonas Jenwald	d0c8992e8a	Attempt to actually resolve ColourSpace names in accordance with the specification (issue 9285) Please refer to the PDF specification, in particular http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.3801570 > A colour space shall be specified in one of two ways: > - Within a content stream, the CS or cs operator establishes the current colour space parameter in the graphics state. The operand shall always be name object, which either identifies one of the colour spaces that need no additional parameters (DeviceGray, DeviceRGB, DeviceCMYK, or some cases of Pattern) or shall be used as a key in the ColorSpace subdictionary of the current resource dictionary (see 7.8.3, "Resource Dictionaries"). In the latter case, the value of the dictionary entry in turn shall be a colour space array or name. A colour space array shall never be inline within a content stream. > > - Outside a content stream, certain objects, such as image XObjects, shall specify a colour space as an explicit parameter, often associated with the key ColorSpace. In this case, the colour space array or name shall always be defined directly as a PDF object, not by an entry in the ColorSpace resource subdictionary. This convention also applies when colour spaces are defined in terms of other colour spaces.	2018-01-10 20:20:43 +01:00
Jonas Jenwald	d6c028b946	Add support for TrueType Collection fonts (issue 9262) The specification can be found at https://www.microsoft.com/typography/otspec/otff.htm, under the "Font Collections" heading. Fixes 9262.	2018-01-08 22:31:08 +01:00
Jonas Jenwald	c5700211d6	Adjust `decodeACSuccessive` in src/core/jpg.js to improve the rendering quality of (progressive) JPEG images I've been looking into the remaining point in 8637 about blurry images, to see if we could perhaps improve the rendering quality slightly there. After quite a bit of debugging, it seems that the issue is limited to certain progressive JPEG images. As mentioned previously, I've got no detailed knowledge of the JPEG format, but this patch does seem to improve things quite a bit for the images in question. Squinting at https://searchfox.org/mozilla-central/rev/6c33dde6ca02b389c52e8db3d22494df8b916f33/media/libjpeg/jdphuff.c#492-639, it seems reasonable that we should take the sign of the data into account. Furthermore, looking at the specification in https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=118, the "F.2.4.3 Decoding the binary decision sequence for non-zero DC differences and AC coefficients" section even contains a description of this (even though I cannot claim to really understand the details).	2017-12-30 15:24:09 +01:00
Jonas Jenwald	06605abbc2	Avoid rendering errors by passing in the `webGLContext` when creating a new `CanvasGraphics` in `getColorN_Pattern` (PR 9095 follow-up) This was an oversight in PR 9095, which unfortunately breaks rendering in some PDF files (e.g. the one from issue 6737). It thus appears that we don't have any test-coverage for this code-path, and given the relative complexity of the PDF files affected by this bug I wasn't able to easily create a reduced test-case. Please note: The linked test-case included in this patch is currently not rendered correctly (that'd be the PR 6606), but it at least gives us some test-coverage here.	2017-12-27 13:50:53 +01:00
Jonas Jenwald	d4cd44fd16	Add a fallback for non-embedded LucidaSans-Demi fonts (issue 9291) The PDF file in the issue uses a number of embedded versions of Lucida fonts, but for some reason does not embed the LucidaSans-Demi font. According to https://en.wikipedia.org/wiki/Lucida#Usages that one should be bold, so we can at least improve rendering here (even though it won't look perfect). Fixes 9291.	2017-12-24 17:36:58 +01:00
Jonas Jenwald	1dc54ddb40	Handle PDF files with missing 'endobj' operators, by searching for the "obj" string rather than "endobj" in `XRef.indexObjects` (issue 9105) This patch refactors the searching for 'endobj', to try and find the next occurance of "obj" and then check if it was in fact an 'endobj' and continue searching otherwise. This approach is used to avoid having to first find 'endobj', and then re-check the entire contents of the object and having to run (potentially expensive) regular expressions on arbitrary long strings. Fixes 9105.	2017-12-18 13:17:45 +01:00
Brendan Dahl	9b51cea724	Fix loca table when offsets aren't in ascending order.	2017-12-15 11:20:28 -06:00
Brendan Dahl	af1d80d45e	Merge pull request #9230 from Snuffleupagus/issue-9195 Add basic support for non-embedded Calibri fonts (issue 9195)	2017-12-08 10:15:43 -08:00
Jonas Jenwald	a5e3261b48	Merge pull request #9062 from mozilla/no_high Move char codes from high surrogate pair range into private use.	2017-12-08 12:31:22 +01:00
Brendan Dahl	306999c325	Move char codes from high surrogate pair range into private use. Fixes #2884	2017-12-07 10:35:50 -08:00
Jonas Jenwald	08de655177	Add basic support for non-embedded Calibri fonts (issue 9195) There's a number of issues with the fonts in the referenced PDF file. First of all, they contain broken `ToUnicode` data (`NUL` bytes all over the place). However even if you skip those, the `ToUnicode` data appears to contain nothing but a `IdentityH` CMap which won't help provide a proper glyph mapping. The real issue actually turns out to be that the PDF file uses the "Calibri" font[1], but doesn't include any font files. Since that one isn't a standard font, and uses a fairly different CID to GID map compared to the standard fonts, we're not able to render the file even remotely correct. To work around this, I'm thus proposing that we include a (incomplete) glyph map for Calibri, and fallback to the standard Helvetica font. Obviously this isn't going to look perfect, but it's really the best that we can hope to achieve given that the PDF file is missing the necessary font data. Finally, please note that none of the PDF readers I've tried (Adobe Reader, PDFium in Chrome) were able to extract the text (which isn't very surprising, given the broken `ToUnicode` data). Fixes 9195. --- [1] According to Wikipedia, see https://en.wikipedia.org/wiki/Calibri, Calibri is (primarily) a Windows font.	2017-12-03 17:23:33 +01:00
Jonas Jenwald	f3c50fe2f9	Merge pull request #9192 from Snuffleupagus/issue-8229 Build a fallback `ToUnicode` map for simple fonts (issue 8229)	2017-11-30 10:27:32 +01:00
Tim van der Meij	e320243870	Merge pull request #9206 from janpe2/svg-inv-images Fix inverted 1-bit images in SVG backend	2017-11-28 22:46:43 +01:00
Jani Pehkonen	58b214eab3	Fix inverted 1-bit images in SVG backend	2017-11-28 21:24:27 +02:00
Jani Pehkonen	06d083b04b	Fix pattern-filled text	2017-11-28 19:40:22 +02:00
Jonas Jenwald	61e19bee43	Build a fallback `ToUnicode` map for simple fonts (issue 8229) In some fonts, the included `ToUnicode` data is incomplete causing text-selection to not work properly. For simple fonts that contain encoding data, we can manually build a `ToUnicode` map to attempt to improve things. Please note that since we're currently using the `ToUnicode` data during glyph mapping, in an attempt to avoid rendering regressions, I purposely didn't want to amend to original `ToUnicode` data for this text-selection edge-case. Instead, I opted for the current solution, which will (hopefully) give slightly better text-extraction results in PDF file with incomplete `ToUnicode` data. According to the PDF specification, see [section 9.10.2](http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1873172): > A conforming reader can use these methods, in the priority given, to map a character code to a Unicode value. > ... Reading that paragraph literally, it doesn't seem too unreasonable to use different methods for different charcodes. Fixes 8229.	2017-11-26 14:45:15 +01:00
Tim van der Meij	0fe80df2a7	Button widget annotations: implement support for pushbuttons	2017-11-26 14:09:48 +01:00
Jonas Jenwald	83e8398ff2	For non-embedded fonts, map softhyphen (0x00AD) to regular hyphen (0x002D) (issue 9084) In the PDF file, the `ToUnicode` data first maps the hyphen correctly, and then overwrites it to point to the softhyphen instead. That one cannot be rendered in browsers, and an empty space thus appear instead. Fixes 9084.	2017-10-31 13:26:04 +01:00
Jonas Jenwald	92fcfce685	Merge pull request #9082 from brendandahl/issue7562 Overwrite glyphs contour count if it's less than -1.	2017-10-30 20:44:01 +01:00
Brendan Dahl	17037b5e51	Overwrite glyphs contour count if it's less than -1. The test pdf has a contour count of -70, but OTS doesn't like values less than -1. Fixes issue #7562.	2017-10-30 09:16:51 -07:00
Jonas Jenwald	d71a576b30	Merge pull request #9045 from brendandahl/sani-name Sanitize name index in compile phase of CFF.	2017-10-24 11:48:03 +02:00
Brendan Dahl	6b12612a52	Sanitize name index in compile phase of CFF. Fixes #8960	2017-10-23 17:13:49 -07:00
Brendan Dahl	fcc9943d04	Use charstring as plain text when lengthIV is -1. Fixes #7769	2017-10-18 14:19:59 -07:00
Tim van der Meij	509d3728f1	Merge pull request #8922 from Snuffleupagus/paintXObject-errors Allow `getOperatorList`/`getTextContent` to skip errors when parsing broken XObjects (issue 8702, issue 8704)	2017-10-07 15:46:26 +02:00
Tim van der Meij	f73c9b75d9	Transform Web Archive URLs to avoid downloading an HTML page instead of the PDF file Moreover, adjust one linked test case that did not conform to the standard Web Archive URL format and adjust one linked test case because the link was dead.	2017-09-30 19:50:31 +02:00
Jonas Jenwald	b1472cddbb	Allow `getOperatorList`/`getTextContent` to skip errors when parsing broken XObjects (issue 8702, issue 8704) This patch makes use of the existing `ignoreErrors` property in `src/core/evaluator.js`, see PRs 8240 and 8441, thus allowing us to attempt to recovery as much as possible of a page even when it contains broken XObjects. Fixes 8702. Fixes 8704.	2017-09-29 17:14:21 +02:00
Brendan Dahl	18e2321845	Overwrite maxSizeOfInstructions in maxp with computed value. In issue #7507 the value is less than the actuall max size of the glyph instructions causing OTS to fail the font.	2017-09-25 17:53:26 -07:00
Jonas Jenwald	10727572a2	Merge pull request #8950 from timvandermeij/polygon-polyline-annotations Implement support for polyline and polygon annotations	2017-09-24 15:16:14 +02:00
Tim van der Meij	c69a7a83da	Merge pull request #8932 from janpe2/jbig2-sym-offset JBIG2 symbol offsets	2017-09-23 17:11:45 +02:00
Tim van der Meij	ed8c0ebfa7	Implement reference tests for polyline and polygon annotations	2017-09-23 17:01:19 +02:00
Jonas Jenwald	abc864fca9	Merge pull request #8938 from brendandahl/bug1392647 Use font's default width even when 0. (bug 1392647)	2017-09-20 22:38:39 +02:00
Brendan Dahl	10ba292b46	Use font's default width even when 0. Bug 1392647 has a PDF where the default width of the font is 0. It draws some charcodes that don't have glyphs, but we were wrongly using the 1000 default width for these charcodes causing some text to be overlapping.	2017-09-20 11:38:30 -07:00
Jani Pehkonen	5d1074c110	Fix JBIG2 symbol offsets in text regions	2017-09-19 23:43:23 +03:00
Jani Pehkonen	3d99b8d706	CCITTFaxStream problem when EndOfBlock is false	2017-09-19 22:19:40 +03:00

... 2 3 4 5 6 ...

964 Commits