pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	61e19bee43	Build a fallback `ToUnicode` map for simple fonts (issue 8229) In some fonts, the included `ToUnicode` data is incomplete causing text-selection to not work properly. For simple fonts that contain encoding data, we can manually build a `ToUnicode` map to attempt to improve things. Please note that since we're currently using the `ToUnicode` data during glyph mapping, in an attempt to avoid rendering regressions, I purposely didn't want to amend to original `ToUnicode` data for this text-selection edge-case. Instead, I opted for the current solution, which will (hopefully) give slightly better text-extraction results in PDF file with incomplete `ToUnicode` data. According to the PDF specification, see [section 9.10.2](http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1873172): > A conforming reader can use these methods, in the priority given, to map a character code to a Unicode value. > ... Reading that paragraph literally, it doesn't seem too unreasonable to use different methods for different charcodes. Fixes 8229.	2017-11-26 14:45:15 +01:00
Jonas Jenwald	83e8398ff2	For non-embedded fonts, map softhyphen (0x00AD) to regular hyphen (0x002D) (issue 9084) In the PDF file, the `ToUnicode` data first maps the hyphen correctly, and then overwrites it to point to the softhyphen instead. That one cannot be rendered in browsers, and an empty space thus appear instead. Fixes 9084.	2017-10-31 13:26:04 +01:00
Jonas Jenwald	92fcfce685	Merge pull request #9082 from brendandahl/issue7562 Overwrite glyphs contour count if it's less than -1.	2017-10-30 20:44:01 +01:00
Brendan Dahl	17037b5e51	Overwrite glyphs contour count if it's less than -1. The test pdf has a contour count of -70, but OTS doesn't like values less than -1. Fixes issue #7562.	2017-10-30 09:16:51 -07:00
Jonas Jenwald	d71a576b30	Merge pull request #9045 from brendandahl/sani-name Sanitize name index in compile phase of CFF.	2017-10-24 11:48:03 +02:00
Brendan Dahl	6b12612a52	Sanitize name index in compile phase of CFF. Fixes #8960	2017-10-23 17:13:49 -07:00
Brendan Dahl	fcc9943d04	Use charstring as plain text when lengthIV is -1. Fixes #7769	2017-10-18 14:19:59 -07:00
Jonas Jenwald	b1472cddbb	Allow `getOperatorList`/`getTextContent` to skip errors when parsing broken XObjects (issue 8702, issue 8704) This patch makes use of the existing `ignoreErrors` property in `src/core/evaluator.js`, see PRs 8240 and 8441, thus allowing us to attempt to recovery as much as possible of a page even when it contains broken XObjects. Fixes 8702. Fixes 8704.	2017-09-29 17:14:21 +02:00
Brendan Dahl	18e2321845	Overwrite maxSizeOfInstructions in maxp with computed value. In issue #7507 the value is less than the actuall max size of the glyph instructions causing OTS to fail the font.	2017-09-25 17:53:26 -07:00
Jonas Jenwald	10727572a2	Merge pull request #8950 from timvandermeij/polygon-polyline-annotations Implement support for polyline and polygon annotations	2017-09-24 15:16:14 +02:00
Tim van der Meij	c69a7a83da	Merge pull request #8932 from janpe2/jbig2-sym-offset JBIG2 symbol offsets	2017-09-23 17:11:45 +02:00
Tim van der Meij	ed8c0ebfa7	Implement reference tests for polyline and polygon annotations	2017-09-23 17:01:19 +02:00
Jonas Jenwald	abc864fca9	Merge pull request #8938 from brendandahl/bug1392647 Use font's default width even when 0. (bug 1392647)	2017-09-20 22:38:39 +02:00
Brendan Dahl	10ba292b46	Use font's default width even when 0. Bug 1392647 has a PDF where the default width of the font is 0. It draws some charcodes that don't have glyphs, but we were wrongly using the 1000 default width for these charcodes causing some text to be overlapping.	2017-09-20 11:38:30 -07:00
Jani Pehkonen	5d1074c110	Fix JBIG2 symbol offsets in text regions	2017-09-19 23:43:23 +03:00
Jani Pehkonen	3d99b8d706	CCITTFaxStream problem when EndOfBlock is false	2017-09-19 22:19:40 +03:00
Tim van der Meij	400e4aae0e	Implement support for stamp annotations	2017-09-16 16:37:50 +02:00
Tim van der Meij	320779e6ed	Merge pull request #8691 from timvandermeij/square-circle-annotations Implement support for square and circle annotations	2017-09-09 22:56:54 +02:00
Tim van der Meij	c04f9d6098	Implement reference tests for square and circle annotations	2017-09-09 21:36:28 +02:00
Jonas Jenwald	7115e136e4	Hide unsupported `LinkAnnotation`s (issue 3897) Rather than displaying links that does nothing when clicked, it probably makes more sense to simply not render them instead. Especially since it turns out that, at least at this point in time, this is very easy to both implement and test. Fixes 3897.	2017-09-06 12:52:56 +02:00
Jonas Jenwald	49b8cd5a6a	Attempt to improve the `EI` detection heuristics, for inline images, in streams containing `NUL` bytes (issue 8823) Since this patch will now treat (some) `NUL` bytes as "ASCII", the number of `followingBytes` checked are thus increased to (hopefully) reduce the risk of introducing new false positives. Fixes 8823.	2017-08-27 12:48:28 +02:00
Jonas Jenwald	4891b9c7e0	Replace the test-case for issue 8798 with a reduced one (PR 8800 follow-up) Re: issue 8798 and PR 8800. Big thanks to @THausherr for providing the test-case.	2017-08-24 17:43:05 +02:00
Brendan Dahl	0bef50d56d	Fix two cmap related issues. In issue #8707, there's a char code mapped to a non- existing glyph which shouldn't be drawn. However, we saw it was missing and tried to then use the post table and end up mapping it incorrectly. This illuminated a problem with issue #5704 and bug 893730 where glyphs disappeared after above fix. This was from the cmap returning the wrong glyph id. Which in turn was caused because the font had multiple of the same type of cmap table and we were choosing the last one. Now, we instead default to the first one. I'm unsure if we should instead be merging the multiple cmaps, but using only the first one works.	2017-08-03 22:19:36 -07:00
Jonas Jenwald	23ec6b16ca	Add a fallback for non-embedded SegoeUISymbol font (issue 8697) The PDF file uses a non-embedded SegoeUISymbol font, which is not a standard font (and is mainly used by Microsoft, see https://en.wikipedia.org/wiki/Segoe). Fixes 8697.	2017-07-25 12:45:11 +02:00
Jonas Jenwald	794b099385	Add a reduced test-case for issue 7696 Issue 7696 was one of the issues fixed by PR 8580. The other ones were all cases of missing glyphs, however in this particular one glyphs did render but every single one was incorrect. Hence it probably cannot hurt to have a small, reduced, reference test for that PDF file as well.	2017-07-24 09:55:16 +02:00
Rob Wu	01f03fe393	Optimize PNG compression in SVG backend on Node.js Use the environment's zlib implementation if available to get reasonably-sized SVG files when an XObject image is converted to PNG. The generated PNG is not optimal because we do not use a PNG predictor. Futher, when our SVG backend is run in a browser, the generated PNG images will still be unnecessarily large (though the use of blob:-URLs when available should reduce the impact on memory usage). If we want to optimize PNG images in browsers too, we can either try to use a DEFLATE library such as pako, or re-use our XObject image painting logic in src/display/canvas.js. This potential improvement is not implemented by this commit Tested with: - Node.js 8.1.3 (uses zlib) - Node.js 0.11.12 (uses zlib) - Node.js 0.10.48 (falls back to inferior existing implementation). - Chrome 59.0.3071.86 - Firefox 54.0 Tests: Unit test on Node.js: ``` $ gulp lib $ JASMINE_CONFIG_PATH=test/unit/clitests.json node ./node_modules/.bin/jasmine --filter=SVG ``` Unit test in browser: Run `gulp server` and open http://localhost:8888/test/unit/unit_test.html?spec=SVGGraphics To verify that the patch works as desired, ``` $ node examples/node/pdf2svg.js test/pdfs/xobject-image.pdf $ du -b svgdump/xobject-image-1.svg # ^ Calculates the file size. Confirm that the size is small # (784 instead of 80664 bytes). ```	2017-07-10 18:56:57 +02:00
Jonas Jenwald	eff257b820	Merge pull request #8580 from brendandahl/missing-glyf Fix how we detect and handle missing glyph data.	2017-07-04 12:16:07 +02:00
Brendan Dahl	efbbd8533f	Only mask char codes of (3, 0) cmap tables in the range of 0xF000 to 0xF0FF.	2017-07-03 13:13:46 -07:00
Brendan Dahl	6d4f748fb1	Fix how we detect and handle missing glyph data.	2017-07-03 13:06:06 -07:00
Brendan Dahl	a8a8909d2d	Fix missing notdef in expert encoding.	2017-06-29 12:12:39 -07:00
Brendan Dahl	f1f9d98519	Merge pull request #8507 from Snuffleupagus/issue-8480 Only special-case OpenType fonts with `CFF` data if it's both a composite (i.e. Type0) font and also has a non-default CID to GID map (issue 8480)	2017-06-23 13:36:58 -07:00
Rob Wu	fc6448d18c	Move svg:clipPath generation from clip to endPath In the PDF from issue 8527, the clip operator (W) shows up before a path is defined. The current SVG backend however expects a path to exist before generating a `<svg:clipPath>` element. In the example, the path was defined after the clip, followed by a endPath operator (n). So this commit fixes the bug by moving the path generation logic from clip to endPath. Our canvas backend appears to use similar logic: `CanvasGraphics_endPath` calls `consumePath`, which in turn draws the clip and resets the `pendingClip` state. The canvas backend calls `consumePath` from multiple other places, so we probably need to check whether doing so is also necessary for the SVG backend. I scanned our corpus of PDF files in test/pdfs, and found that in every instance (except for one), the "W" PDF operator (clip) is immediately followed by "n" (endPath). The new test from this commit (clippath.pdf) starts with "W", followed by a path definition and then "n". # Commands used to find some of the clipping commands: grep -ra '^W$' -C7 \| less -S grep -ra '^W ' -C7 \| less -S grep -ra ' W$' -C7 \| less -S test/pdfs/issue6413.pdf is the only file where "W" (a tline 55) is not followed by "n". In fact, the "W" is the last operation of a series of XObject painting operations, and removing it does not have any effect on the rendered PDF (confirmed by looking at the output of PDF.js's canvas backend, and ImageMagick's convert command).	2017-06-22 01:08:17 +02:00
Jonas Jenwald	8b4a42e5b8	Only special-case OpenType fonts with `CFF` data if it's both a composite (i.e. Type0) font and also has a non-default CID to GID map (issue 8480) As mentioned the last time that I touched this particular part of the font code, I'm sincerely hope that this doesn't cause any regressions! However, the patch passes all tests added in PRs 5770, 6270, and 7904 (and obviously all other tests as well). Furthermore, I've manually checked all the issues/bugs referenced in those PRs without finding any issues. Fixes 8480.	2017-06-09 21:15:39 +02:00
Jonas Jenwald	4ce5e520fb	Add different code-paths to `{CMap, ToUnicodeMap}.charCodeOf` depending on length, since `Array.prototype.indexOf` can be extremely inefficient for very large arrays (issue 8372) Fixes 8372.	2017-05-24 19:47:04 +02:00
Jonas Jenwald	31c24ed631	Don't map glyphs to the HANGUL FILLER (0x3164) Unicode location (issue 8424) This patch follows a similar pattern as previous ones, by skipping certain problematic Unicode locations. According to http://searchfox.org/mozilla-central/rev/6c2dbacbba1d58b8679cee700fd0a54189e0cf1b/gfx/harfbuzz/src/hb-unicode-private.hh#136, it seems that the HANGUL FILLER (0x3164) location is "special". Fixes 8424.	2017-05-23 16:12:45 +02:00
chris.greening	cfc2f36f5c	Adds additional parameter so background color of canvas can be set	2017-05-17 17:06:44 +01:00
Yury Delendik	c4c44c1bbe	Merge pull request #8240 from Snuffleupagus/api-stopAtErrors [api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815)	2017-04-13 10:58:49 -05:00
Tim van der Meij	32e01cda96	Merge pull request #8228 from timvandermeij/line-annotations Implement support for line annotations	2017-04-13 00:18:31 +02:00
Tim van der Meij	e15a2ec523	Annotations: implement support for line annotations This patch implements support for line annotations. Other viewers only show the popup annotation when hovering over the line, which may have any orientation. To make this possible, we render an invisible line (SVG element) over the line on the canvas that acts as the trigger for the popup annotation. This invisible line has the same starting coordinates, ending coordinates and width of the line on the canvas.	2017-04-12 23:05:25 +02:00
Jonas Jenwald	a39d636eb8	[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815) Other PDF readers, e.g. Adobe Reader and PDFium (in Chrome), will attempt to render as much of a page as possible even if there are errors present. Currently we just bail as soon the first error is hit, which means that we'll usually not render anything in these cases and just display a blank page instead. NOTE: This patch changes the default behaviour of the PDF.js API to always attempt to recover as much data as possible, even when encountering errors during e.g. `getOperatorList`/`getTextContent`, which thus improve our handling of corrupt PDF files and allow the default viewer to handle errors slightly more gracefully. In the event that an API consumer wishes to use the old behaviour, where we stop parsing as soon as an error is encountered, the `stopAtErrors` parameter can be set at `getDocument`. Fixes, inasmuch it's possible since the PDF files are corrupt, e.g. issue 6342, issue 3795, and [bug 1130815](https://bugzilla.mozilla.org/show_bug.cgi?id=1130815) (and probably others too).	2017-04-11 08:59:22 +02:00
Brendan Dahl	4969b2ad97	Normalize blend mode names.	2017-04-10 16:18:08 -07:00
Jason O. Jensen	d230784ac3	Handle cff fonts with erroneous stackSize	2017-03-06 19:28:46 -05:00
Jonas Jenwald	4a0ff5dbf7	Ensure that we don't ignore `0` values in `Page.getInheritedPageProp` (issue 8125) It appears that I accidentally broke this in PR 6065, sorry about that! The issue in this particular PDF file is that there's `/Rotate` entries on different levels of the `/Pages` tree. We're supposed to use the `/Rotate` entry in the `/Page` dict (which is `0`), but because of an incorrect condition we instead ended up with the one from the `/Pages` dict (which is `180`). Fixes 8125.	2017-03-03 12:27:40 +01:00
Jonas Jenwald	1ce295541c	Always check all Kids nodes, in `Catalog.getPageDict`, to avoid getting stuck in an empty node further down in the Pages tree (issue 8088) As discussed on IRC, we need to check all nodes at the bottom of the tree to ensure that we find the correct `Page` dict. Furthermore, this patch also gets rid of the caching present in a previous version, since it's not clear if that really helps. Note that this patch purposely adds an `eq` test, using a reduced test-case, so that we can be sure that the algorithm actually finds the correct `Page` dict for each `pageIndex`. Fixes 8088.	2017-02-24 12:09:46 +01:00
Jonas Jenwald	ce072022c1	Always choose a (3, 1) cmap table for TrueType fonts that have an encoding specified, regardless of the Symbolic font flag (bug 1337429) This patch basically reverts one aspect of TrueType (3, 1) cmap parsing to the state prior to PR 4259. After that PR, a number of regressions occurred in this particular code-path, which necessitated a number of follow-ups such as PRs 5703, 5743, and 6425. The empirical data suggests, at least to me, that we should always prefer a (3, 1) cmap for TrueType fonts when they have an encoding, regardless of the Symbolic font flag. Obviously this patch passes all unit/font/reference tests locally, and I made sure that all the PRs mentioned above landed with test-cases included. However, in my opinion, there's still a very real possibility that this patch could potentially cause new regressions. Given that the PDF file in bug 1337429 has been broken for almost three years before anyone noticed, and considering that the code-path in question has been the source of numerous regressions, I do not intend to request uplift of this patch to previous Firefox versions (assuming that it's even accepted). Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1337429.	2017-02-15 17:38:08 +01:00
Jonas Jenwald	23c62cc321	Consume the current character when encountering illegal characters in `Lexer.getObject`, in order to prevent infinite loops during reading of streams (issue 8061) Please note: The rendering of the PDF file in issue 8061 first regressed in PR 7039, and then PR 7493 exacerbated the problem even further by causing an infinite loop. In this particular case, when errors were encountered inside of the `Lexer.getObject` method itself, we didn't advance the stream position. This thus caused an inifinite loop in `parseCMap`, since the exact same character was then parsed over and over again. Fixes 8061.	2017-02-11 19:32:48 +01:00
pmysore1	af8292058f	Font ascent descent calculation fix	2017-02-11 01:25:05 -05:00
Jonas Jenwald	e963971244	Further adjust the heuristics used to detect OpenType font files with CFF data, to ensure that all Type0 fonts are handled the same way regardless of font Subtype (issue 7901) Changing this particular code makes me somewhat nervous about regressions, since PR 5770 necessitated the follow-up PR 6270. However, the patch passes all tests added in those PRs (and obviously all other tests). Furthermore, I've manually checked all the issues/bugs referenced in PRs 5770 and 6270 without finding any issues. Please note: This patch fixes only the font bug, not the SVG conversion, present on pages two and three of the PDF file in issue 7901.	2016-12-20 17:03:51 +01:00
Yury Delendik	3b3a179486	Merge pull request #7879 from rossj/highlight-fix Make use of textAdvanceScale consistent during combineTextItems. Fix for #7878.	2016-12-19 09:18:13 -06:00
Tim van der Meij	0c9a06c020	Button widget annotations: implement reference testing Moreover, ensure that the read-only state is respected and improve CSS names.	2016-12-17 20:33:35 +01:00

1 2 3 4 5 ...

372 Commits