pdf.js

Author	SHA1	Message	Date
Jeremy Press	1ceeb4d17b	added text enhancement regression tests	2016-08-31 09:54:52 -07:00
Jonas Jenwald	3ac23200ba	Add a reduced test-case for issue 7406 The PDF file contains an image that we're allowed to use, since it's just the PDF.js logo. The logo image was simply inverted (so that it requires a /Decode entry in the image dictionary that triggers the use of `jpg.js` instead of the browser), converted to JPEG, and finally edited by hand to change the order of the DQT/SOF{n} markers.	2016-08-31 18:42:07 +02:00
Yury Delendik	ffa99397ad	Merge pull request #7387 from Snuffleupagus/issue-5808 Attempt to ignore multiple identical Tf (setFont) commands in `PartialEvaluator_getTextContent` (issue 5808)	2016-08-30 15:21:41 -05:00
Jonas Jenwald	544d29f5cb	Add a `recoveryMode` that suppresses errors from the `Parser`, and utilize it when searching for the main trailer in `XRef_indexObjects` (bug 1250079) Instead of having `Parser_getObj` fail unconditionally for the referenced PDF file, this patch attempts to let searching for the main trailer continue even if there are errors. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1250079.	2016-08-17 12:37:35 +02:00
Jonas Jenwald	77c6ed5389	Attempt to ignore multiple identical Tf (setFont) commands in `PartialEvaluator_getTextContent` (issue 5808) This patch improves the performance of issue 5808, but I'm not sure if it's enough to call it fixed. On average, this patch reduces the number of textLayer div's by a factor of 3, and it also reduces the time spend in `getTextContent` by a factor of ~2. The PDF file is generated by `Scribus PDF`, which for reasons I cannot understand is placing redundant `Tf` commands before every showText command. Note how the PDF file also contains lots of (basically) identical fonts, but with slightly different names, which causes unnecessary font-switching. This causes some unnecessary breaking of textLayer div's, but this issue cannot be easily worked around.	2016-07-27 21:37:52 +02:00
Jonas Jenwald	558a22cd02	Prevent errors when parsing Annotations with missing (or invalid) /Subtype entries (issue 7446) Note that I used a separate warning message for this case, instead of utilizing the same one as in the unsupported subtype case, to more clearly indicate that the PDF file itself is to blame rather than PDF.js. Fixes 7446.	2016-07-25 13:59:26 +02:00
Brendan Dahl	5678486802	Merge pull request #7347 from Snuffleupagus/evaluator-more-Ref_toString Slightly refactor the `fontRef` handling in `PartialEvaluator_loadFont` (issue 7403 and issue 7402)	2016-07-22 17:21:47 -07:00
Brendan Dahl	50d6e4f147	Merge pull request #7447 from Snuffleupagus/buildToUnicode-notdef Ignore .notdef in the `differences` array when building a fallback `toUnicode` map in `PartialEvaluator_buildToUnicode` (issue 5256)	2016-07-22 14:33:32 -07:00
Jonas Jenwald	4fe891c5e7	Add a reduced test-case for issue 7403	2016-07-21 16:04:07 +02:00
Tim van der Meij	10f9f11ec4	Merge pull request #7490 from Snuffleupagus/issue-7426 Don't map glyphs to the Lepcha Unicode block (issue 7426)	2016-07-21 14:39:19 +02:00
Jonas Jenwald	90d19de935	Catch errors and continue parsing in `parseCMap` (issue 7492) After PR 7039, the PDF file in issue 7492 no longer renders at all, but note that text selection wasn't working correctly previously. The problem with the PDF file in issue 7492 is that the `cMap`, in the `toUnicode` entry in the font, contains an invalid name: ``` /CMapName /-usr-share-fonts-truetype-Panton-Panton Family-Fontfabric - Panton.otf,000-UTF16 def ``` When we parse that line, things obviously break because there are spaces present in the wrong places. To avoid that issue, the patch simply lets `parseCMap` continue when errors are encountered, to try and recover usable data. Note that by not aborting immediatly when an error is encountered, we are also able to fix the text selection. Obviously, it could be argued that we should just immediatly reject a corrupt `cMap`. But given that they usually are correct, it seems that trying to recover as much data as possible from corrupt one can only be a good thing for both glyph mapping and text selection. Fixes 7492.	2016-07-18 16:39:56 +02:00
Jonas Jenwald	64783c8b6e	Don't map glyphs to the Lepcha Unicode block (issue 7426) In the PDF file in the issue, some of the glyphs end up being mapped to the Lepcha Unicode block; see https://en.wikipedia.org/wiki/Lepcha_(Unicode_block). This didn't use to matter, but after HarfBuzz updates that improved support for Lepcha fonts, in particular https://bugzilla.mozilla.org/show_bug.cgi?id=1249861, some glyphs are now moved horizontally. To avoid that, this patch adds the Lepcha block to the list of Unicode ranges that we skip when building the glyph mapping. Fixes 7426.	2016-07-17 16:53:36 +02:00
Brendan Dahl	1f3f4a8dd7	Merge pull request #7441 from Snuffleupagus/issue-7439 Fallback to attempt to recover standard glyph names when amending the `charCodeToGlyphId` with entries from the `differences` array in `type1FontGlyphMapping` (issue 7439)	2016-07-06 13:02:21 -07:00
Jonas Jenwald	bdd58ab1d2	Ignore .notdef in the `differences` array when building a fallback `toUnicode` map in `PartialEvaluator_buildToUnicode` (issue 5256) Fixes 5256.	2016-06-27 16:20:23 +02:00
Jonas Jenwald	7866109af9	Fallback to attempt to recover standard glyph names when amending the `charCodeToGlyphId` with entries from the `differences` array in `type1FontGlyphMapping` (issue 7439) Fixes 7439.	2016-06-25 14:54:34 +02:00
Jonas Jenwald	6a0b047bfa	Add upper-case `I` as a possible space replacement fallback in `Font.spaceWidth` to improve text-selection (issue 7180) In fonts with only upper-case glyphs, that are also missing a space glyph, `get spaceWidth` won't be able to return anything useful. By adding upper-case `I` as a fallback, we can thus improve text-selection in some PDF files. Note that locally, the patch causes slight movement in a few existing `text` tests, but in my opinion this actually looks like slight improvements. Fixes 7180.	2016-06-07 22:55:25 +02:00
Jonas Jenwald	6260fc09a3	Attempt to recover valid `format 3` FDSelect data from broken CFF fonts (bug 1146106) According to the CFF specification, see http://partners.adobe.com/public/developer/en/font/5176.CFF.pdf#G3.46884, for `format 3` FDSelect data: "The first range must have a ‘first’ GID of 0". Since the PDF file (attached in the bug) violates that part of the specification, this patch tries to recover valid FDSelect data to prevent OTS from rejecting the font. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1146106.	2016-06-06 18:20:52 +02:00
Jonas Jenwald	b02d560ae0	Fix errors in `setGState` in `PartialEvaluator_getTextContent` that prevents text-selection from working properly Currently `setGState` is completely broken, and looking through the history of that code, it seems to me that this may never have worked correctly. This patch fixes the text-selection in `extgstate.pdf` in the test-suite, which is also added as a `text` test.	2016-06-01 22:58:49 +02:00
Jonas Jenwald	98fe094d18	Let non-viewable Popup Annotations inherit the parent's Annotation Flags if the parent is viewable Fixes http://www.pdf-archive.com/2013/09/30/file2/file2.pdf. Note how it's not possible to show the various Popup Annotations in the above document. To fix that, this patch lets the Popup inherit the flags of the parent, in the special case where the parent is `viewable` and the Popup is not. In general, I don't think that a Popup must have the same flags set as the parent. However, it seems very strange to have a `viewable` parent annotation, and then not being able to view the Popup. Annoyingly the PDF specification doesn't, as far as I can find, mention anything about how this case should be handled, but this patch seem consistent with the actual behaviour in Adobe Reader.	2016-05-25 23:00:26 +02:00
Brendan Dahl	b86610ffdb	Merge pull request #7300 from Snuffleupagus/bug-1068432 Prevent adding invalid values in `CFFDict_setByKey` (bug 1068432)	2016-05-24 12:12:38 -07:00
Jonas Jenwald	7ddb0bc718	Attempt to combine text runs positioned with `setTextMatrix`	2016-05-18 17:21:58 +02:00
Jonas Jenwald	182d33800a	Ignore 'endobj' commands inside of `ObjStm` streams (issue 5241, bug 898610, bug 1037816) According to an example in the PDF specification, see http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=56, an `ObjStm` stream should not contain 'endobj' commands. Fixes 5241. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=898610. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1037816.	2016-05-09 09:50:45 +02:00
Jonas Jenwald	c9b6de3b16	Prevent adding invalid values in `CFFDict_setByKey` (bug 1068432) In the font in question, there are a couple of `topDict` entries that have invalid values (`0xF 0xF`, i.e. just eof markers without any actual numbers). This causes the `parseFloatOperand` function, inside `CFFParser_parseDict`, to return `NaN`. Currently we pass this broken font onto the browser, which OTS unsurprisingly rejects. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1068432.	2016-05-07 21:09:58 +02:00
Jonas Jenwald	293901d7e5	Add a (linked) test-case for issue 3248	2016-04-21 16:36:46 +02:00
Jonas Jenwald	e281ef15db	Adjust incorrect first obj number of "free" xref entry in `XRef_readXRefTable` (issue 7229) Fixes 7229.	2016-04-21 16:36:32 +02:00
Jonas Jenwald	079b563e2d	Ensure that the `params` parameter of the `PredictorStream` is a dictionary (issue 7200) Fixes 7200.	2016-04-15 16:30:18 +02:00
Yury Delendik	398e6acbc5	Stops bleeding of pattern edges for mesh.	2016-04-11 18:21:44 -05:00
Yury Delendik	d76db416f4	Adds more SMask tests.	2016-04-11 08:02:06 -05:00
Yury Delendik	ff3ce973b8	Merge pull request #7106 from Snuffleupagus/issue-7101 Keep track of the character to glyph mapping in font_renderer.js, to prevent errors when different characters point to the same glyph (issue 7101)	2016-04-01 08:09:21 -05:00
Jonas Jenwald	05cf709f8e	Parse Type1 font files to determine the various `Length{n}` properties, instead of trusting the PDF file (issue 5686, issue 3928) Fixes 5686. Fixes 3928.	2016-03-31 11:08:12 +02:00
Jonas Jenwald	17aaa125df	Keep track of the character to glyph mapping in font_renderer.js, to prevent errors when different characters point to the same glyph (issue 7101) Fixes 7101.	2016-03-30 11:33:04 +02:00
Jonas Jenwald	13d7a5070e	Prevent failures in the Annotation code if the `Rect` array contains indirect objects (issue 7115) Note that in the PDF files provided by the reporter, this issue was limited to `Rect` arrays in AcroForm entries (which we currently don't support). However, since a bad PDF generator could create this problem in any kind of annotation, the reduced test-case included here uses a simple LinkAnnotation instead. Fixes 7115.	2016-03-26 20:55:16 +01:00
Yury Delendik	a505aa8e90	Disables issue6961 test.	2016-03-25 12:48:11 -05:00
Jonas Jenwald	dfe9015a43	Convert `uniXXXX` glyph names to proper ones when building the `charCodeToGlyphId` map for TrueType fonts (bug 1132849, issue 6893, issue 6894) This patch adds a `getUnicodeForGlyph` helper function, which is used to recover Unicode values for non-standard glyph names. Some PDF generators, e.g. Scribus PDF, use improper `uniXXXX` glyph names which breaks the glyph mapping. We can avoid this by converting them to "standard" glyph names instead. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1132849. Fixes 6893. Fixes 6894.	2016-03-09 19:37:15 +01:00
Preetham Mysore	be1e12dbcb	Fix for descent calculation while reading font hhea headers	2016-03-03 08:51:41 -05:00
Jonas Jenwald	8402c79171	Merge pull request #7050 from brendandahl/issue4402 For CIDFontType2 use CID as glyph ID when missing CID to GID map.	2016-03-02 10:11:42 +01:00
Brendan Dahl	a6acf74b54	Merge pull request #7023 from brendandahl/issue6721 Only draw glyphs on canvas if they are in the font or the font file is missing.	2016-03-01 18:03:37 -08:00
Brendan Dahl	6e1d131384	For CIDFontType2 use CID as glyph ID when missing CID to GID map.	2016-03-01 17:05:33 -08:00
Brendan Dahl	ff87f3fb86	Only draw glyphs on canvas if they are in the font or the font file is missing.	2016-03-01 13:24:58 -08:00
Jonas Jenwald	505f15f221	Avoid accidentally getting the entire font file in `readNameTable` (issue 7020) In the PDF file in question, some of the 'name' table entries have `record.length === 0`. This becomes problematic in the non-unicode case, since `font.getBytes(0)` will fetch the entire stream. Given that OTS rejects 'name' entries larger than `2^16`, this thus explain the sanitizer errors. Fixes 7020.	2016-03-01 21:59:49 +01:00
Tim van der Meij	ad31e52a26	Group popup creation code and apply it to more annotation types	2016-02-25 00:35:45 +01:00
Jonas Jenwald	41efb92d3a	Merge pull request #6988 from timvandermeij/fileattachment-annotation Implement support for FileAttachment annotations	2016-02-24 12:58:06 +01:00
Tim van der Meij	10902fd882	Implement unit and reference testing for FileAttachment annotations	2016-02-23 22:49:53 +01:00
Jonas Jenwald	a494e33776	Update `JpegImage.getData` to support `forceRGBoutput` for images with `numComponents === 1` (issue 6066) A more robust solution for issue 6066. As a temporary work-around for (the upstream) [bug 1164199](https://bugzilla.mozilla.org/show_bug.cgi?id=1164199), we parsed all images in the Firefox addon during a short time. Doing so uncovered an issue with our image handling (see 6066), for JPEG images with a `DeviceGray` ColorSpace and `bpc !== 1` (bits per component). As long as we let the browser handle image decoding in this case, this isn't going to be an issue, but I do think that we should proactively fix this to avoid future issues if we change where the images are decoded (in `jpg.js` vs in browser). Also, we currently don't seem to have a test-case for that kind of image data.	2016-02-18 10:12:37 +01:00
Jonas Jenwald	62b17ad36e	Add a linked `load` test for issue 6549	2016-02-12 18:10:07 +01:00
Jonas Jenwald	07e1ad40a2	Replace `getAll` with `getKeys` in `PartialEvaluator_hasBlendModes` to speed up loading of badly generated PDF files (issue 6961) Some bad PDF generators, in particular "Scribus PDF", duplicates resources a lot at various levels of the PDF files. This can lead to `PartialEvaluator_hasBlendModes` taking an unreasonable amount of time to complete. The reason is that the current code is using `Dict_getAll`, which recursively dereferences all indirect objects, which can be really slow. This patch instead uses `Dict_getKeys`, and then manually looks up only the necessary indirect objects. I've added the PDF file as a `load` test. The most important thing here is probably to ensure that the file remains available in the repo, and the comment should help reduced the chance of regressions. (Note that locally, the `load` test times out without this patch, but we cannot really assume that that always happens.) Fixes 6961.	2016-02-10 17:21:38 +01:00
Jonas Jenwald	15ce96a6eb	Prevent failures in the "scanning for endstream" code, in `Parser_makeStream`, by handling the case where 'endstream' is split between contiguous chunks (issue 1536)	2016-01-26 09:03:51 +01:00
Yury Delendik	0aa373cdf3	Merge pull request #6891 from Snuffleupagus/issue-6889 Map missing glyphs to the `notdef` glyph for TrueType (3, 1) fonts regardless if the 'post' table is defined or not (issue 6889)	2016-01-20 13:14:47 -06:00
Tim van der Meij	ec066101d8	Merge pull request #6848 from Snuffleupagus/recover-missing-glyf-table [TrueType] Recover from a missing "glyf" table by replacing it with dummy data, utilizing the existing code in `sanitizeGlyphLocations`	2016-01-18 20:28:52 +01:00
Jonas Jenwald	4855d4cc9f	Map missing glyphs to the `notdef` glyph for TrueType (3, 1) fonts regardless if the 'post' table is defined or not (issue 6889)	2016-01-17 22:58:00 +01:00

... 2 3 4 5 6 ...

800 Commits