pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	eff257b820	Merge pull request #8580 from brendandahl/missing-glyf Fix how we detect and handle missing glyph data.	2017-07-04 12:16:07 +02:00
Brendan Dahl	efbbd8533f	Only mask char codes of (3, 0) cmap tables in the range of 0xF000 to 0xF0FF.	2017-07-03 13:13:46 -07:00
Brendan Dahl	6d4f748fb1	Fix how we detect and handle missing glyph data.	2017-07-03 13:06:06 -07:00
Brendan Dahl	a8a8909d2d	Fix missing notdef in expert encoding.	2017-06-29 12:12:39 -07:00
Brendan Dahl	f1f9d98519	Merge pull request #8507 from Snuffleupagus/issue-8480 Only special-case OpenType fonts with `CFF` data if it's both a composite (i.e. Type0) font and also has a non-default CID to GID map (issue 8480)	2017-06-23 13:36:58 -07:00
Rob Wu	fc6448d18c	Move svg:clipPath generation from clip to endPath In the PDF from issue 8527, the clip operator (W) shows up before a path is defined. The current SVG backend however expects a path to exist before generating a `<svg:clipPath>` element. In the example, the path was defined after the clip, followed by a endPath operator (n). So this commit fixes the bug by moving the path generation logic from clip to endPath. Our canvas backend appears to use similar logic: `CanvasGraphics_endPath` calls `consumePath`, which in turn draws the clip and resets the `pendingClip` state. The canvas backend calls `consumePath` from multiple other places, so we probably need to check whether doing so is also necessary for the SVG backend. I scanned our corpus of PDF files in test/pdfs, and found that in every instance (except for one), the "W" PDF operator (clip) is immediately followed by "n" (endPath). The new test from this commit (clippath.pdf) starts with "W", followed by a path definition and then "n". # Commands used to find some of the clipping commands: grep -ra '^W$' -C7 \| less -S grep -ra '^W ' -C7 \| less -S grep -ra ' W$' -C7 \| less -S test/pdfs/issue6413.pdf is the only file where "W" (a tline 55) is not followed by "n". In fact, the "W" is the last operation of a series of XObject painting operations, and removing it does not have any effect on the rendered PDF (confirmed by looking at the output of PDF.js's canvas backend, and ImageMagick's convert command).	2017-06-22 01:08:17 +02:00
Jonas Jenwald	e589834f13	Ensure that `TilingPattern`s have valid (non-zero) /BBox arrays (issue 8330) Fixes 8330.	2017-06-09 21:41:48 +02:00
Jonas Jenwald	8b4a42e5b8	Only special-case OpenType fonts with `CFF` data if it's both a composite (i.e. Type0) font and also has a non-default CID to GID map (issue 8480) As mentioned the last time that I touched this particular part of the font code, I'm sincerely hope that this doesn't cause any regressions! However, the patch passes all tests added in PRs 5770, 6270, and 7904 (and obviously all other tests as well). Furthermore, I've manually checked all the issues/bugs referenced in those PRs without finding any issues. Fixes 8480.	2017-06-09 21:15:39 +02:00
Jonas Jenwald	4ce5e520fb	Add different code-paths to `{CMap, ToUnicodeMap}.charCodeOf` depending on length, since `Array.prototype.indexOf` can be extremely inefficient for very large arrays (issue 8372) Fixes 8372.	2017-05-24 19:47:04 +02:00
Jonas Jenwald	ac942ac657	Merge pull request #8437 from yurydelendik/default-ctx Resets canvas 2d context to the default state.	2017-05-23 23:31:57 +02:00
Yury Delendik	a67198895f	Resets canvas 2d context to the default state.	2017-05-23 15:10:30 -05:00
Jonas Jenwald	31c24ed631	Don't map glyphs to the HANGUL FILLER (0x3164) Unicode location (issue 8424) This patch follows a similar pattern as previous ones, by skipping certain problematic Unicode locations. According to http://searchfox.org/mozilla-central/rev/6c2dbacbba1d58b8679cee700fd0a54189e0cf1b/gfx/harfbuzz/src/hb-unicode-private.hh#136, it seems that the HANGUL FILLER (0x3164) location is "special". Fixes 8424.	2017-05-23 16:12:45 +02:00
Yury Delendik	5dc8dcdc0f	Merge pull request #8388 from Snuffleupagus/issue-8380 Cache JPEG images, just as we do for other image formats, in `evaluator.js` (issue 8380)	2017-05-17 17:25:51 -05:00
chris.greening	cfc2f36f5c	Adds additional parameter so background color of canvas can be set	2017-05-17 17:06:44 +01:00
Jonas Jenwald	0c2ebda31c	Cache JPEG images, just as we do for other image formats, in `evaluator.js` (issue 8380) For some reason, we're putting all kind of images except JPEG into the `imageCache` in `evaluator.js`.[1] This means that in the PDF file in issue 8380, we'll keep sending the same two small images[2] to the main-thread and decoding them over and over. This is obviously hugely inefficient! As can be seen from the discussion in the issue, the performance becomes extremely bad if the user has the addon "Adblock Plus" installed. However, even in a clean Firefox profile, the performance isn't that great. This patch not only addresses the performance implications of the "Adblock Plus" addon together with that particular PDF file, but it also improves the rendering times considerably for all users. Locally, with a clean profile, the rendering times are reduced from `~2000 ms` to `~500 ms` for my setup! Obviously, the general structure of the PDF file and its operator sequence is still hugely inefficient, however I'd say that the performance with this patch is good enough to consider the issue (as it stands) resolved.[3] Fixes 8380. --- [1] Not technically true, since inline images are cached from `parser.js`, but whatever :-) [2] The two JPEG images have dimensions 1x2, respectively 4x2. [3] To make this even more efficient, a new state would have to be added to the `QueueOptimizer`. Given that PDF files this stupid fortunately aren't too common, I'm not convinced that it's worth doing.	2017-05-07 13:07:41 +02:00
Jonas Jenwald	40feca12c1	Ignore line-breaks between operator and digit in `Lexer.getNumber` This is consistent with the behaviour in Adobe Reader (and PDFium), and it fixes the display of page 30 in https://bug1354114.bmoattachments.org/attachment.cgi?id=8855457 (taken from https://bugzilla.mozilla.org/show_bug.cgi?id=1354114). The patch also makes the `error` message for invalid numbers slightly more useful, by including the charCode as well. (Having that information available would have reduced the time spent on debugging the PDF file above.)	2017-05-02 20:59:42 +02:00
Jani Pehkonen	64deb6c700	Subtract the X/Y offsets when decoding refinement regions of JBIG2 images (issue 7145, 7308, 7401, 7850, 8270) Please refer to the JBIG2 standard, see https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-T.88-200002-I!!PDF-E&type=items. In particular, section "6.3.5.3 Fixed templates and adaptive templates" mentions that the offsets should be subtracted; where the offsets are defined according to "Table 6" under section "6.3.2 Input parameters". Fixes 7145. Fixes 7308. Fixes 7401. Fixes 7850. Fixes 8270.	2017-04-26 16:06:15 +02:00
Yury Delendik	c4c44c1bbe	Merge pull request #8240 from Snuffleupagus/api-stopAtErrors [api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815)	2017-04-13 10:58:49 -05:00
Tim van der Meij	32e01cda96	Merge pull request #8228 from timvandermeij/line-annotations Implement support for line annotations	2017-04-13 00:18:31 +02:00
Tim van der Meij	e15a2ec523	Annotations: implement support for line annotations This patch implements support for line annotations. Other viewers only show the popup annotation when hovering over the line, which may have any orientation. To make this possible, we render an invisible line (SVG element) over the line on the canvas that acts as the trigger for the popup annotation. This invisible line has the same starting coordinates, ending coordinates and width of the line on the canvas.	2017-04-12 23:05:25 +02:00
Jonas Jenwald	a39d636eb8	[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815) Other PDF readers, e.g. Adobe Reader and PDFium (in Chrome), will attempt to render as much of a page as possible even if there are errors present. Currently we just bail as soon the first error is hit, which means that we'll usually not render anything in these cases and just display a blank page instead. NOTE: This patch changes the default behaviour of the PDF.js API to always attempt to recover as much data as possible, even when encountering errors during e.g. `getOperatorList`/`getTextContent`, which thus improve our handling of corrupt PDF files and allow the default viewer to handle errors slightly more gracefully. In the event that an API consumer wishes to use the old behaviour, where we stop parsing as soon as an error is encountered, the `stopAtErrors` parameter can be set at `getDocument`. Fixes, inasmuch it's possible since the PDF files are corrupt, e.g. issue 6342, issue 3795, and [bug 1130815](https://bugzilla.mozilla.org/show_bug.cgi?id=1130815) (and probably others too).	2017-04-11 08:59:22 +02:00
Brendan Dahl	4969b2ad97	Normalize blend mode names.	2017-04-10 16:18:08 -07:00
Brendan Dahl	cdc79a4721	Don’t skip glyph 0 in cmap.	2017-04-05 15:17:38 -07:00
Jonas Jenwald	62eee8c782	Try harder to find the next valid JPEG marker when decoding Scan data (issue 8182, issue 8189) Tentatively fixes 8182 and fixes 8189.	2017-03-27 15:55:21 +02:00
Jonas Jenwald	e2e13df4a5	Merge pull request #8164 from Snuffleupagus/issue-7828 Don't read past the EOI marker for JPEG images with non-default restart interval (issue 7828)	2017-03-20 22:17:28 +01:00
Jonas Jenwald	d6d0f778aa	Don't read past the EOI marker for JPEG images with non-default restart interval (issue 7828) After browsing through (a version of) the JPEG specification, see https://www.w3.org/Graphics/JPEG/itu-t81.pdf, I hope that this patch makes sense. Note that while issue 7828 became a problem after PR 7661, it isn't really a regression from than PR. The explanation is rather that we're now relying on `core/jpg.js` instead of the Native Image decoder in more situations than before, which thus exposed an existing issue in our JPEG decoder. Another factor also seems to be that in many JPEG images, the DRI (Define Restart Interval) marker isn't present, in which case this bug won't manifest either. According to https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=89 (at the bottom of the page): "NOTE – The final restart interval may be smaller than the size specified by the DRI marker segment, as it includes only the number of MCUs remaining in the scan." Furthermore, according to https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=39 (in the middle of the page): "[...] If restart is enabled and the restart interval is defined to be Ri, each entropy-coded segment except the last one shall contain Ri MCUs. The last one shall contain whatever number of MCUs completes the scan." Based on the above, it thus seem to me that we should simply ensure that we're not attempting to continue to parse Scan data once we've found all MCUs (Minimum Coded Unit) of the image. Fixes 7828.	2017-03-20 17:16:33 +01:00
Jonas Jenwald	be1a6f294f	Try to recover when encountering JPEG markers with too short marker lengths (issue 8169) The issue with the JPEG image in question, is that the COM (Comment) marker has an incorrect length entry. Fixes 8169.	2017-03-20 17:05:51 +01:00
Jonas Jenwald	098a56270d	Normalize the `BBox` entry in Tiling Pattern dictionaries (issue 8117) According to the PDF specification, see http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.3982967, the `BBox` entry should have the form `[left, bottom, right, top]`. Since some PDF generators apparently violates the specification, we normalize the `BBox` to ensure that the pattern is (correctly) rendered. Fixes 8117.	2017-03-16 21:43:11 +01:00
Jason O. Jensen	d230784ac3	Handle cff fonts with erroneous stackSize	2017-03-06 19:28:46 -05:00
Jonas Jenwald	4a0ff5dbf7	Ensure that we don't ignore `0` values in `Page.getInheritedPageProp` (issue 8125) It appears that I accidentally broke this in PR 6065, sorry about that! The issue in this particular PDF file is that there's `/Rotate` entries on different levels of the `/Pages` tree. We're supposed to use the `/Rotate` entry in the `/Page` dict (which is `0`), but because of an incorrect condition we instead ended up with the one from the `/Pages` dict (which is `180`). Fixes 8125.	2017-03-03 12:27:40 +01:00
Jonas Jenwald	1ce295541c	Always check all Kids nodes, in `Catalog.getPageDict`, to avoid getting stuck in an empty node further down in the Pages tree (issue 8088) As discussed on IRC, we need to check all nodes at the bottom of the tree to ensure that we find the correct `Page` dict. Furthermore, this patch also gets rid of the caching present in a previous version, since it's not clear if that really helps. Note that this patch purposely adds an `eq` test, using a reduced test-case, so that we can be sure that the algorithm actually finds the correct `Page` dict for each `pageIndex`. Fixes 8088.	2017-02-24 12:09:46 +01:00
Jonas Jenwald	ce072022c1	Always choose a (3, 1) cmap table for TrueType fonts that have an encoding specified, regardless of the Symbolic font flag (bug 1337429) This patch basically reverts one aspect of TrueType (3, 1) cmap parsing to the state prior to PR 4259. After that PR, a number of regressions occurred in this particular code-path, which necessitated a number of follow-ups such as PRs 5703, 5743, and 6425. The empirical data suggests, at least to me, that we should always prefer a (3, 1) cmap for TrueType fonts when they have an encoding, regardless of the Symbolic font flag. Obviously this patch passes all unit/font/reference tests locally, and I made sure that all the PRs mentioned above landed with test-cases included. However, in my opinion, there's still a very real possibility that this patch could potentially cause new regressions. Given that the PDF file in bug 1337429 has been broken for almost three years before anyone noticed, and considering that the code-path in question has been the source of numerous regressions, I do not intend to request uplift of this patch to previous Firefox versions (assuming that it's even accepted). Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1337429.	2017-02-15 17:38:08 +01:00
Jonas Jenwald	23c62cc321	Consume the current character when encountering illegal characters in `Lexer.getObject`, in order to prevent infinite loops during reading of streams (issue 8061) Please note: The rendering of the PDF file in issue 8061 first regressed in PR 7039, and then PR 7493 exacerbated the problem even further by causing an infinite loop. In this particular case, when errors were encountered inside of the `Lexer.getObject` method itself, we didn't advance the stream position. This thus caused an inifinite loop in `parseCMap`, since the exact same character was then parsed over and over again. Fixes 8061.	2017-02-11 19:32:48 +01:00
pmysore1	af8292058f	Font ascent descent calculation fix	2017-02-11 01:25:05 -05:00
Tim van der Meij	1fda987a4c	Merge pull request #7904 from Snuffleupagus/issue-7901 Further adjust the heuristics used to detect OpenType font files with CFF data, to ensure that all Type0 fonts are handled the same way regardless of font Subtype (issue 7901)	2017-01-12 21:55:57 +01:00
Yury Delendik	393740e2ae	Merge pull request #7869 from PedroPachecoInf/master Fixes issue #6071 - TIFF with 1 bit-depth	2017-01-10 12:37:26 -06:00
jazzchipc	493853031b	Fixes issue #6071 . Corrects readBlockTiff() case for 1-bit depth and 1 color TIFF images incorporated in the PDF. Adds reference test for PDF used to fix this issue.	2017-01-10 16:42:43 +00:00
Jonas Jenwald	e963971244	Further adjust the heuristics used to detect OpenType font files with CFF data, to ensure that all Type0 fonts are handled the same way regardless of font Subtype (issue 7901) Changing this particular code makes me somewhat nervous about regressions, since PR 5770 necessitated the follow-up PR 6270. However, the patch passes all tests added in those PRs (and obviously all other tests). Furthermore, I've manually checked all the issues/bugs referenced in PRs 5770 and 6270 without finding any issues. Please note: This patch fixes only the font bug, not the SVG conversion, present on pages two and three of the PDF file in issue 7901.	2016-12-20 17:03:51 +01:00
Yury Delendik	3b3a179486	Merge pull request #7879 from rossj/highlight-fix Make use of textAdvanceScale consistent during combineTextItems. Fix for #7878.	2016-12-19 09:18:13 -06:00
Tim van der Meij	0c9a06c020	Button widget annotations: implement reference testing Moreover, ensure that the read-only state is respected and improve CSS names.	2016-12-17 20:33:35 +01:00
Ross Johnson	4537590033	Consitently apply textAdvanceScale during building of textContentItems for improved highlighting. Fixes #7878 .	2016-12-14 21:02:19 -06:00
Jonas Jenwald	9be3aee9c9	Add a parameter to `Page_getInheritedPageProp` to make it possible to fetch (and dereference) Arrays, and use that for the `MediaBox`/`CropBox` getters (issue 7872)	2016-12-08 22:03:42 +01:00
Jonas Jenwald	e386af7b22	Adjust one of the Page Label unit-tests to use a PDF file where the "St" entry is both present and non-default (i.e. greater than one) I just realized that none of our current unit-tests cover this particular part of the Page Label parsing code, hence this patch adjusts an existing test PDF to include a "St" entry in the Page Label dictionary.	2016-12-04 13:03:22 +01:00
Jonas Jenwald	c5b06cb40d	Ensure that `PartialEvaluator_extractWidths` is able to handle indirect objects in all kinds of "width" data (issue 7855) Fixes 7855.	2016-11-29 20:49:07 +01:00
Jonas Jenwald	451956c0b1	Merge pull request #7628 from Snuffleupagus/issue-7580 Fallback to the `StandardEncoding` for Nonsymbolic fonts without `/Encoding` entry (issue 7580)	2016-11-29 12:37:36 +01:00
Jonas Jenwald	3170a4c40a	Improve rendering of non-embedded NuptialScript font This patch fixes something that I noticed while debugging https://bugzilla.mozilla.org/show_bug.cgi?id=1308536. The PDF file contains a font called "NuptialScript", which unfortunately is not embedded. Since that is a non-standard font we will not be able to render it entirely correct. However, by adding "NuptialScript" to the `getNonStdFontMap`, we can at least improve the rendering slightly by using an italic (serif) fallback font.	2016-11-22 17:56:17 +01:00
Jonas Jenwald	d3043167de	Correctly detect more cases of non-embedded Arial Black fonts (issue 7835) This patch adds support for non-embedded Arial Black fonts, that use a `Arial-Black...` format for the font names. Also, this patch changes `canvas.js` such that we always render Arial Black fonts with the maximum weight, which actually improves a number of existing test-cases. This should thus explain the test "failures", which are clear improvements compared with e.g. Adobe Reader. Fixes 7835.	2016-11-22 13:56:21 +01:00
Jonas Jenwald	b4100ba651	Merge pull request #7698 from Snuffleupagus/bug-1308536 Ignore reserved commands when parsing operands in `CFFParser_parseDict`, instead of just rejecting the entire font (bug 1308536)	2016-11-03 23:53:14 +01:00
Jonas Jenwald	2d8d8b5e53	Use `stringToPDFString` to sanitizing bad "Prefix" entries in Page Label dictionaries It seems that certain bad PDF generators can create badly encoded "Prefix" entries for Page Labels, one example being http://ukjewishfilm.org/wp-content/uploads/2015/09/Jewish-Film-Festival-Programme-ONLINE.pdf. Unfortunately I didn't come across such a PDF file while adding the API support for Page Labels, but with them now being used in the viewer I just found this issue. With this patch, we now display the Page Labels in the same way as Adobe Reader.	2016-11-03 19:48:08 +01:00
Jonas Jenwald	9dc6463933	Ignore reserved commands when parsing operands in `CFFParser_parseDict`, instead of just rejecting the entire font (bug 1308536) According to the CFF specification, see http://partners.adobe.com/public/developer/en/font/5176.CFF.pdf#page=11, certain commands are currently reserved. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1308536.	2016-11-03 12:50:40 +01:00

1 2 3 4 5 ...

744 Commits