pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	4a0ff5dbf7	Ensure that we don't ignore `0` values in `Page.getInheritedPageProp` (issue 8125) It appears that I accidentally broke this in PR 6065, sorry about that! The issue in this particular PDF file is that there's `/Rotate` entries on different levels of the `/Pages` tree. We're supposed to use the `/Rotate` entry in the `/Page` dict (which is `0`), but because of an incorrect condition we instead ended up with the one from the `/Pages` dict (which is `180`). Fixes 8125.	2017-03-03 12:27:40 +01:00
Jonas Jenwald	1ce295541c	Always check all Kids nodes, in `Catalog.getPageDict`, to avoid getting stuck in an empty node further down in the Pages tree (issue 8088) As discussed on IRC, we need to check all nodes at the bottom of the tree to ensure that we find the correct `Page` dict. Furthermore, this patch also gets rid of the caching present in a previous version, since it's not clear if that really helps. Note that this patch purposely adds an `eq` test, using a reduced test-case, so that we can be sure that the algorithm actually finds the correct `Page` dict for each `pageIndex`. Fixes 8088.	2017-02-24 12:09:46 +01:00
Jonas Jenwald	ce072022c1	Always choose a (3, 1) cmap table for TrueType fonts that have an encoding specified, regardless of the Symbolic font flag (bug 1337429) This patch basically reverts one aspect of TrueType (3, 1) cmap parsing to the state prior to PR 4259. After that PR, a number of regressions occurred in this particular code-path, which necessitated a number of follow-ups such as PRs 5703, 5743, and 6425. The empirical data suggests, at least to me, that we should always prefer a (3, 1) cmap for TrueType fonts when they have an encoding, regardless of the Symbolic font flag. Obviously this patch passes all unit/font/reference tests locally, and I made sure that all the PRs mentioned above landed with test-cases included. However, in my opinion, there's still a very real possibility that this patch could potentially cause new regressions. Given that the PDF file in bug 1337429 has been broken for almost three years before anyone noticed, and considering that the code-path in question has been the source of numerous regressions, I do not intend to request uplift of this patch to previous Firefox versions (assuming that it's even accepted). Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1337429.	2017-02-15 17:38:08 +01:00
Jonas Jenwald	23c62cc321	Consume the current character when encountering illegal characters in `Lexer.getObject`, in order to prevent infinite loops during reading of streams (issue 8061) Please note: The rendering of the PDF file in issue 8061 first regressed in PR 7039, and then PR 7493 exacerbated the problem even further by causing an infinite loop. In this particular case, when errors were encountered inside of the `Lexer.getObject` method itself, we didn't advance the stream position. This thus caused an inifinite loop in `parseCMap`, since the exact same character was then parsed over and over again. Fixes 8061.	2017-02-11 19:32:48 +01:00
pmysore1	af8292058f	Font ascent descent calculation fix	2017-02-11 01:25:05 -05:00
Tim van der Meij	1fda987a4c	Merge pull request #7904 from Snuffleupagus/issue-7901 Further adjust the heuristics used to detect OpenType font files with CFF data, to ensure that all Type0 fonts are handled the same way regardless of font Subtype (issue 7901)	2017-01-12 21:55:57 +01:00
Yury Delendik	393740e2ae	Merge pull request #7869 from PedroPachecoInf/master Fixes issue #6071 - TIFF with 1 bit-depth	2017-01-10 12:37:26 -06:00
jazzchipc	493853031b	Fixes issue #6071 . Corrects readBlockTiff() case for 1-bit depth and 1 color TIFF images incorporated in the PDF. Adds reference test for PDF used to fix this issue.	2017-01-10 16:42:43 +00:00
Jonas Jenwald	e963971244	Further adjust the heuristics used to detect OpenType font files with CFF data, to ensure that all Type0 fonts are handled the same way regardless of font Subtype (issue 7901) Changing this particular code makes me somewhat nervous about regressions, since PR 5770 necessitated the follow-up PR 6270. However, the patch passes all tests added in those PRs (and obviously all other tests). Furthermore, I've manually checked all the issues/bugs referenced in PRs 5770 and 6270 without finding any issues. Please note: This patch fixes only the font bug, not the SVG conversion, present on pages two and three of the PDF file in issue 7901.	2016-12-20 17:03:51 +01:00
Yury Delendik	3b3a179486	Merge pull request #7879 from rossj/highlight-fix Make use of textAdvanceScale consistent during combineTextItems. Fix for #7878.	2016-12-19 09:18:13 -06:00
Tim van der Meij	0c9a06c020	Button widget annotations: implement reference testing Moreover, ensure that the read-only state is respected and improve CSS names.	2016-12-17 20:33:35 +01:00
Ross Johnson	4537590033	Consitently apply textAdvanceScale during building of textContentItems for improved highlighting. Fixes #7878 .	2016-12-14 21:02:19 -06:00
Jonas Jenwald	9be3aee9c9	Add a parameter to `Page_getInheritedPageProp` to make it possible to fetch (and dereference) Arrays, and use that for the `MediaBox`/`CropBox` getters (issue 7872)	2016-12-08 22:03:42 +01:00
Jonas Jenwald	e386af7b22	Adjust one of the Page Label unit-tests to use a PDF file where the "St" entry is both present and non-default (i.e. greater than one) I just realized that none of our current unit-tests cover this particular part of the Page Label parsing code, hence this patch adjusts an existing test PDF to include a "St" entry in the Page Label dictionary.	2016-12-04 13:03:22 +01:00
Jonas Jenwald	c5b06cb40d	Ensure that `PartialEvaluator_extractWidths` is able to handle indirect objects in all kinds of "width" data (issue 7855) Fixes 7855.	2016-11-29 20:49:07 +01:00
Jonas Jenwald	451956c0b1	Merge pull request #7628 from Snuffleupagus/issue-7580 Fallback to the `StandardEncoding` for Nonsymbolic fonts without `/Encoding` entry (issue 7580)	2016-11-29 12:37:36 +01:00
Jonas Jenwald	3170a4c40a	Improve rendering of non-embedded NuptialScript font This patch fixes something that I noticed while debugging https://bugzilla.mozilla.org/show_bug.cgi?id=1308536. The PDF file contains a font called "NuptialScript", which unfortunately is not embedded. Since that is a non-standard font we will not be able to render it entirely correct. However, by adding "NuptialScript" to the `getNonStdFontMap`, we can at least improve the rendering slightly by using an italic (serif) fallback font.	2016-11-22 17:56:17 +01:00
Jonas Jenwald	d3043167de	Correctly detect more cases of non-embedded Arial Black fonts (issue 7835) This patch adds support for non-embedded Arial Black fonts, that use a `Arial-Black...` format for the font names. Also, this patch changes `canvas.js` such that we always render Arial Black fonts with the maximum weight, which actually improves a number of existing test-cases. This should thus explain the test "failures", which are clear improvements compared with e.g. Adobe Reader. Fixes 7835.	2016-11-22 13:56:21 +01:00
Jonas Jenwald	b4100ba651	Merge pull request #7698 from Snuffleupagus/bug-1308536 Ignore reserved commands when parsing operands in `CFFParser_parseDict`, instead of just rejecting the entire font (bug 1308536)	2016-11-03 23:53:14 +01:00
Jonas Jenwald	2d8d8b5e53	Use `stringToPDFString` to sanitizing bad "Prefix" entries in Page Label dictionaries It seems that certain bad PDF generators can create badly encoded "Prefix" entries for Page Labels, one example being http://ukjewishfilm.org/wp-content/uploads/2015/09/Jewish-Film-Festival-Programme-ONLINE.pdf. Unfortunately I didn't come across such a PDF file while adding the API support for Page Labels, but with them now being used in the viewer I just found this issue. With this patch, we now display the Page Labels in the same way as Adobe Reader.	2016-11-03 19:48:08 +01:00
Jonas Jenwald	9dc6463933	Ignore reserved commands when parsing operands in `CFFParser_parseDict`, instead of just rejecting the entire font (bug 1308536) According to the CFF specification, see http://partners.adobe.com/public/developer/en/font/5176.CFF.pdf#page=11, certain commands are currently reserved. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1308536.	2016-11-03 12:50:40 +01:00
Jonas Jenwald	d284cfd5eb	[api-minor] Add support for relative URLs, in both annotations and the outline, by adding a `docBaseUrl` parameter to `PDFJS.getDocument` (bug 766086) Note that in `FIREFOX/MOZCENTRAL/CHROME` builds of the standard viewer the `docBaseUrl` parameter will be set by default, since in that case it makes sense to use the current URL as a base. For the `GENERIC` viewer, or the API itself, it doesn't make sense to try and set the `docBaseUrl` by default. However, custom deployments/implementations may still find the parameter useful.	2016-10-19 22:20:24 +02:00
Yury Delendik	ea5949f1fd	Merge pull request #7668 from Snuffleupagus/issue-7665 Prevent an infinite loop in `XRef_fetchUncompressed` for encrypted PDF files with indirect objects in the /Encrypt dictionary (issue 7665)	2016-10-15 10:52:08 -05:00
Chas Emerick	85c52f1fd6	Fix getTextContent evaluation to only apply TJ horizontal offsets using numeric items/args While the array argument to TJ should only contain strings and numbers, other unfortunate items are found in PDFs in the wild, e.g.: [(Grandes) 0.0 Tc -250.0 (Client\350les,) 0.0 Tc -250.0 (Financements) 0.0 Tc -250.0 (et) 0.0 Tc -250.0 (March\351s) ] TJ getOperatorList already properly ignores any non-string, non-numeric values in TJ arrays; without this patch to getTextContent, returned text items can have NaN widths due to calculations being applied to those non-numeric values.	2016-10-13 08:08:31 -04:00
Tim van der Meij	9b3a91f365	Merge pull request #7671 from timvandermeij/interactive-forms-choice-fields Interactive forms: render choice widget annotations	2016-10-05 23:27:45 +02:00
Tim van der Meij	f85f3243b1	Choice widget annotations: unit and reference testing	2016-10-05 21:25:29 +02:00
Yury Delendik	7b2a9ee4e0	Merge pull request #7670 from Snuffleupagus/Parser_makeFilter-maybeLength Only skip parsing a stream in `Parser_makeFilter` when we know for sure that it is empty (PR 6372 follow-up)	2016-10-05 10:38:12 -05:00
Jonas Jenwald	54ee83eb12	Attempt to skip zero bytes at the end of Scan blocks when decoding JPEG images (issue 4090)	2016-09-28 16:31:02 +02:00
Jonas Jenwald	116ba19dd9	Respect the 'ColorTransform' entry in the image dictionary when decoding JPEG images (bug 956965, issue 6574) Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=956965. Fixes 6574.	2016-09-26 21:55:43 +02:00
Jonas Jenwald	a22f0ae820	Only skip parsing a stream in `Parser_makeFilter` when we know for sure that it is empty (PR 6372 follow-up) For PDF files with multiple `/Filter`s, where the `/Length` entry is zero, we fail to render the file correctly. The reason is that `maybeLength` is `null` for the every filter except the first, and `!maybeLength` is thus truthy. Hence it seems that we should completely ignore the `/Length` entry and also explicitly check `maybeLength === 0`. Note that I've not (yet) come across a PDF file with this issue in the wild, but given all the stupid things PDF generators do I wouldn't be surprised if such a file actually exists. In order to prevent a possible future bug, I'm submitting this patch which includes a hand-edited PDF file that we currently cannot render correctly (but e.g. Adobe Reader can).	2016-09-25 12:40:15 +02:00
Jonas Jenwald	4d2de9b47e	Add a reduced `load` test for issue 7665	2016-09-25 00:19:42 +02:00
Jonas Jenwald	6c263c1994	Merge pull request #7649 from timvandermeij/interactive-forms-tx-comb Text widget annotations: implement comb support	2016-09-22 11:36:30 +02:00
Tim van der Meij	6100ab4b18	Text widget annotations: implement comb support	2016-09-20 22:31:10 +02:00
Brendan Dahl	15e1ae4e3f	Merge pull request #7639 from Snuffleupagus/bug-1252420 Replace empty CharStrings with '.notdef' in `Type1Font_wrap` to prevent OTS from rejecting the font (bug 1252420)	2016-09-20 11:56:47 -07:00
Jonas Jenwald	170871ab3d	Prevent rendering `TextWidgetAnnotation`s in both the `core`/`display` layer (issue 7643)	2016-09-18 15:42:22 +02:00
Tim van der Meij	f062695d62	Merge pull request #7633 from timvandermeij/interactive-forms-tx-flags Text widget annotations: support read-only/multiline fields and improve testing	2016-09-17 17:19:47 +02:00
Tim van der Meij	adf0972ca5	Text widget annotations: improve unit and reference tests This patch improves the unit tests by testing the support for read-only and multiline fields. Moreover, we add a reference test to ensure that the text widgets are not only rendered, but also that their contents are styled properly. Finally, we perform minor improvements in `src/core/annotation.js`, for example adding missing comments.	2016-09-17 15:24:48 +02:00
Jonas Jenwald	aadcbe98c8	Replace empty CharStrings with '.notdef' in `Type1Font_wrap` to prevent OTS from rejecting the font (bug 1252420) Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1252420.	2016-09-17 14:39:10 +02:00
Jonas Jenwald	356b321f6d	Fallback to the `StandardEncoding` for Nonsymbolic fonts without `/Encoding` entry (issue 7580) Even though this patch passes all tests (unit/font/reference) locally, including the new ones that I added in PR 7621, I'm still a bit nervous about modifying the code that choose the fallback encoding for fonts without an `/Encoding` entry. Note that over the years this code has been changed on a number of occasions, see a possibly incomplete [list here], to deal with various cases of incorrect font data. According to the PDF specification, see http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1904184, it seems that we should fallback to the `StandardEncoding` for Nonsymbolic fonts. There's obviously a risk that fixing this particular issue could break other PDF files for which we don't have tests. However I've tried to change the logic as little as possible in this patch, to hopefully reduce possible breakage. Based on debugging numerous font issue, it seems that a lot of fonts actually set the Symbolic flag, even when they are in fact not Symbolic. Fonts actually marked as Nonsymbolic seem to be somewhat less common, which I hope should reduce the risk of the patch somewhat. Fixes 7580.	2016-09-13 14:07:16 +02:00
Jonas Jenwald	325f7afcca	For embedded Type1 fonts without included `ToUnicode`/`Encoding` data, attempt to improve text selection by using the `builtInEncoding` to amend the `toUnicode` map (issue 6901, issue 7182, issue 7217, bug 917796, bug 1242142) Note that in order to prevent any possible issues, this patch does not try to amend the `toUnicode` data for Type1 fonts that contain either `ToUnicode` or `Encoding` entries in the font dictionary. Fixes, or at least improves, issues/bugs such as e.g. 6658, 6901, 7182, 7217, bug 917796, bug 1242142.	2016-09-11 20:54:10 +02:00
Jonas Jenwald	ae2cc9119b	Add a couple more, mostly `text`, reference tests for non-embedded symbolic fonts without included encoding information I've started to look into how we can fix issue 7580, but quickly became worried that fixing it could easily mean that we'd trade one fixed PDF file for a multitude of broken ones. Hence I started going through the history of the code that choose the fallback encoding, and noticed that it has been changed a number of times over the years to deal with various cases of weirdness/errors in non-embedded fonts. To my relief it turned out that almost all the PRs, please see a possibly incomplete [list here], that changed this code actually included `eq` test-cases. However, in one case it appears that a PR missed to add a test-case. Furthermore since the fallback encoding may also be the only source for creating a `toUnicode` map, changing the encoding could possibly regress only the text-selection despite a PDF file still rendering correctly. Therefore, this PR adds one new `eq` test, and also a number of additional `text` tests for PDF files already present in the test-suite. Note that it's obviously possible that there's a certain overlap between the added tests, but I'd be a whole lot more concerned with causing regressions.	2016-09-11 16:38:39 +02:00
Jonas Jenwald	0b75f63c03	Don't duplicate the first entry in the `charCodeToGlyphId` map for CIDFontType2 fonts with a `CIDToGIDMap` that already mapped the first entry to a non-zero `glyphId` (issue 7544) Fixes 7544.	2016-09-09 22:33:41 +02:00
Jonas Jenwald	44b75c01a1	Check that Type1C fonts does not actually contain OpenType font files (issue 7598) This patch is yet another instalment in the (never ending) series of patches for PDF files that specify completely incorrect Type/Subtype for its fonts. In this case Type1/Type1C, when in fact OpenType would have been correct. Fixes 7598.	2016-09-06 10:13:11 +02:00
Jonas Jenwald	3ac23200ba	Add a reduced test-case for issue 7406 The PDF file contains an image that we're allowed to use, since it's just the PDF.js logo. The logo image was simply inverted (so that it requires a /Decode entry in the image dictionary that triggers the use of `jpg.js` instead of the browser), converted to JPEG, and finally edited by hand to change the order of the DQT/SOF{n} markers.	2016-08-31 18:42:07 +02:00
Yury Delendik	ffa99397ad	Merge pull request #7387 from Snuffleupagus/issue-5808 Attempt to ignore multiple identical Tf (setFont) commands in `PartialEvaluator_getTextContent` (issue 5808)	2016-08-30 15:21:41 -05:00
Jonas Jenwald	544d29f5cb	Add a `recoveryMode` that suppresses errors from the `Parser`, and utilize it when searching for the main trailer in `XRef_indexObjects` (bug 1250079) Instead of having `Parser_getObj` fail unconditionally for the referenced PDF file, this patch attempts to let searching for the main trailer continue even if there are errors. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1250079.	2016-08-17 12:37:35 +02:00
Jonas Jenwald	77c6ed5389	Attempt to ignore multiple identical Tf (setFont) commands in `PartialEvaluator_getTextContent` (issue 5808) This patch improves the performance of issue 5808, but I'm not sure if it's enough to call it fixed. On average, this patch reduces the number of textLayer div's by a factor of 3, and it also reduces the time spend in `getTextContent` by a factor of ~2. The PDF file is generated by `Scribus PDF`, which for reasons I cannot understand is placing redundant `Tf` commands before every showText command. Note how the PDF file also contains lots of (basically) identical fonts, but with slightly different names, which causes unnecessary font-switching. This causes some unnecessary breaking of textLayer div's, but this issue cannot be easily worked around.	2016-07-27 21:37:52 +02:00
Jonas Jenwald	558a22cd02	Prevent errors when parsing Annotations with missing (or invalid) /Subtype entries (issue 7446) Note that I used a separate warning message for this case, instead of utilizing the same one as in the unsupported subtype case, to more clearly indicate that the PDF file itself is to blame rather than PDF.js. Fixes 7446.	2016-07-25 13:59:26 +02:00
Brendan Dahl	5678486802	Merge pull request #7347 from Snuffleupagus/evaluator-more-Ref_toString Slightly refactor the `fontRef` handling in `PartialEvaluator_loadFont` (issue 7403 and issue 7402)	2016-07-22 17:21:47 -07:00
Brendan Dahl	50d6e4f147	Merge pull request #7447 from Snuffleupagus/buildToUnicode-notdef Ignore .notdef in the `differences` array when building a fallback `toUnicode` map in `PartialEvaluator_buildToUnicode` (issue 5256)	2016-07-22 14:33:32 -07:00

... 3 4 5 6 7 ...

915 Commits