pdf.js

Author	SHA1	Message	Date
Jani Pehkonen	a22c0eab48	The first glyph in CFF CIDFonts must be named 0 instead of ".notdef" Fixes #11718 in which the `ff` ligature glyph is at index zero in a CFF font. Beacuse this is a CIDFont, glyph names are CIDs, which are integers. Thus the string `".notdef"` is not correct. The rest of the charset data is already parsed correctly as integers when the boolean argument `cid` is true.	2020-03-24 15:56:50 +02:00
Jonas Jenwald	15e8692eff	Don't accidentally accept invalid glyphNames which appear to follow the Cdd{d}/cdd{d} format in `PartialEvaluator._buildSimpleFontToUnicode` (issue 11697) The /Differences array of the problematic font contains a `/c.1` entry, which is consequently detected as a possible Cdd{d}/cdd{d} glyphName by the existing heuristics. Because of how the base 10 conversion is implemented, which is necessary for the base 16 special case, the parsed charCode becomes `0.1` thus causing `String.fromCodePoint` to throw since that obviously isn't a valid code point. To fix the referenced issue, and to hopefully prevent similar ones in the future, the patch adds additional validation of the charCode found by the heuristics.	2020-03-13 23:35:47 +01:00
Jonas Jenwald	65e6ea2cb2	Prevent lookup errors in `PartialEvaluator.hasBlendModes` from breaking all parsing/rendering of a page (issue 11678) The PDF document in question is corrupt, since it contains an XObject with a truncated dictionary and where the stream contents start without a "stream" operator.	2020-03-09 12:00:12 +01:00
Tim van der Meij	1a97c142b3	Merge pull request #11523 from Snuffleupagus/issue-10880 Add a heuristic, in `src/core/jpg.js`, to handle JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10880)	2020-03-06 23:03:09 +01:00
Tim van der Meij	c95b9b1e17	Merge pull request #11653 from Snuffleupagus/ensureStateFont Ensure that there's always a setFont (Tf) operator before text rendering operators (issue 11651)	2020-03-03 23:33:13 +01:00
Jani Pehkonen	71e7686950	Fix Type1 font parsing when .notdef is not at index zero Fixes #11477 The PDF draws many space characters but the embedded fonts don't have a glyph named `space`, so `.notdef` should be drawn instead. PDF.js assumed that Type1 fonts define `.notdef` as the first glyph (index 0). However, now the fonts have the glyph `A` at index 0 and `.notdef` is the last one, so `A` appears where spaces are expected. Because the rest of the font machinery in `core/fonts.js` assumes `.notdef` is at index zero, it's easiest to modify `core/type1_parser.js` so that it "repairs" fonts and makes sure `.notdef` is at index 0.	2020-03-03 21:55:51 +02:00
Jonas Jenwald	65e514e063	Ensure that there's always a setFont (Tf) operator before text rendering operators (issue 11651) The PDF document in question is corrupt, since it contains multiple instances of incorrect operators. We obviously don't want to slow down parsing of all documents (since most are valid), just to accommodate a particular bad PDF generator, hence the reason for the inline check before calling the `ensureStateFont` method.	2020-03-03 10:05:18 +01:00
Takashi Tamura	d8c9f119b0	Fix the vertical writing mode with horizontal scaling. #11555 . It is not valid to multiply textHScale when the writing mode is vertical. See 9.4.4 Text Space Details, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1694762	2020-02-29 07:48:29 +09:00
Jonas Jenwald	c3c3b8cd81	Add a heuristic, in `src/core/jpg.js`, to handle JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10880) This whole patch feels somewhat arbitrary, and I'd be slightly worried about possibly breaking something else. To limit the impact of these changes, we only re-parse JPEG images using a reduced `scanLines` value if and only if: An unexpected EOI (End of Image) marker was encountered during decoding of Scan data and the "actual" `scanLines` value is at least one order of magnitude smaller than expected.	2020-02-22 14:16:07 +01:00
Takashi Tamura	512dbe3060	Fix text spacing with vertical fonts. #7687 and #11526 . When the writing mode is vertical, we have to reverse the sign of spacing since we are subtracting it from current.y. We have to add it to current.y. See 9.4.4 Text Space Details, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1694762	2020-02-11 08:49:23 +09:00
Tim van der Meij	dced0a3821	Merge pull request #11579 from Snuffleupagus/issue-11578 Ignore spaces when normalizing the font name in `Font.fallbackToSystemFont` (issue 11578)	2020-02-09 17:33:09 +01:00
Tim van der Meij	61056a9238	Merge pull request #11551 from Snuffleupagus/issue-11549 Allow skipping of errors when reading broken/corrupt ToUnicode data (issue 11549)	2020-02-09 17:32:35 +01:00
Jonas Jenwald	7937165537	Ignore spaces when normalizing the font name in `Font.fallbackToSystemFont` (issue 11578)	2020-02-08 19:59:04 +01:00
Brendan Dahl	09a6e17d22	Merge pull request #11528 from janpe2/type1-nonemb-notdef Hide .notdef glyphs in non-embedded Type1 fonts and don't ignore Widths	2020-02-06 13:30:07 -08:00
Jonas Jenwald	4c54395ff6	Allow skipping of errors when reading broken/corrupt ToUnicode data (issue 11549) This will allow font loading/parsing to continue, rather than immediately failing, when broken/corrupt CMap data is encountered.	2020-01-30 13:19:05 +01:00
Tim van der Meij	474fe1757e	Merge pull request #11508 from Snuffleupagus/jpg-default-marker Simplify the handling of unsupported/incorrect markers in `src/core/jpg.js`	2020-01-26 21:32:13 +01:00
Jonas Jenwald	62b2b984cc	Render Popup annotations last, once all other annotations have been rendered (issue 11362) In the current `AnnotationLayer` implementation, Popup annotations require that the parent annotation have already been rendered (otherwise they're simply ignored). Usually the annotations are ordered, in the `/Annots` array, in such a way that this isn't a problem, however there's obviously no guarantee that all PDF generators actually do so. Hence we simply ensure, when rendering the `AnnotationLayer`, that the Popup annotations are handled last.	2020-01-26 15:49:55 +01:00
Jonas Jenwald	13930e5202	Simplify the handling of unsupported/incorrect markers in `src/core/jpg.js` - Re-factor the "incorrect encoding" check, since this can be easily achieved using the general `findNextFileMarker` helper function (with a suitable `startPos` argument). - Tweak a condition, to make it easier to see that the end of the data has been reached. - Add a reference test for issue 1877, since it's what prompted the "incorrect encoding" check.	2020-01-25 22:52:24 +01:00
Jani Pehkonen	809b96b40c	Hide .notdef glyphs in non-embedded Type1 fonts and don't ignore Widths Fixes #11403 The PDF uses the non-embedded Type1 font Helvetica. Character codes 194 and 160 (`Â` and `NBSP`) are encoded as `.notdef`. We shouldn't show those glyphs because it seems that Acrobat Reader doesn't draw glyphs that are named `.notdef` in fonts like this. In addition to testing `glyphName === ".notdef"`, we must test also `glyphName === ""` because the name `""` is used in `core/encodings.js` for undefined glyphs in encodings like `WinAnsiEncoding`. The solution above hides the `Â` characters but now the replacement character (space) appears to be too wide. I found out that PDF.js ignores font's `Widths` array if the font has no `FontDescriptor` entry. That happens in #11403, so the default widths of Helvetica were used as specified in `core/metrics.js` and `.nodef` got a width of 333. The correct width is 0 as specified by the `Widths` array in the PDF. Thus we must never ignore `Widths`.	2020-01-21 21:35:25 +02:00
Jonas Jenwald	5c0336872e	Handle corrupt ASCII85Decode inline images with truncated EOD markers (issue 11385) In the PDF document in question, there's an ASCII85Decode inline image where the '>' part of EOD (end-of-data) marker is missing; hence the PDF document is corrupt.	2019-12-05 15:53:18 +01:00
Jonas Jenwald	9199b02a42	Subtract `stream.start` when getting the `startXRef` property for documents with a Linearization dictionary (issue 11330) For documents with a Linearization dictionary the computed `startXRef` position will be relative to the raw file, rather than the actual PDF document itself (which begins with `%PDF-`). Hence it's necessary to subtract `stream.start` in this case, since otherwise the `XRef.readXRef` method will increment the position too far resulting in parsing errors.	2019-11-16 09:29:10 +01:00
Tim van der Meij	6972bbea74	Include a reduced test case for annotations without a `Border`/`BS` entry (PR 6180 follow-up)	2019-11-10 14:37:42 +01:00
Jonas Jenwald	835d8c2be5	Allow skipping of errors when parsing broken/unsupported ColorSpaces (issue 6707, issue 11287) This will allow us to attempt to recover as much as possible of a page, rather than immediately failing, when a broken/unsupported ColorSpace is encountered. This patch thus extends the framework added in PRs such as e.g. 8240 and 8922, to also cover parsing of ColorSpaces.	2019-11-01 09:01:24 +01:00
Jonas Jenwald	5c266f0e8c	Support Blend Modes which are specified in an Array of Names (issue 11279) According to the specification, the first supported Blend Mode should be choosen in this case; please see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G10.4848607	2019-10-26 14:24:31 +02:00
Tim van der Meij	11f3851a97	Merge pull request #11243 from Snuffleupagus/issue-11242 Add a fallback for non-embedded composite Verdana fonts (issue 11242)	2019-10-18 23:56:46 +02:00
Tim van der Meij	c54bb222ca	Merge pull request #11231 from Snuffleupagus/indexObjects-entries-gen Allow over-writing entries, in `XRef.indexObjects`, only when the generation number matches (issues 11230, 11139, 9552, 9129, 7303)	2019-10-17 23:56:26 +02:00
Jonas Jenwald	2fcb5afc7b	Add a fallback for non-embedded composite Verdana fonts (issue 11242) Obviously this won't look exactly right, but considering that the PDF file doesn't bother embedding non-standard fonts this is the best that we can do here.	2019-10-17 17:00:55 +02:00
Jonas Jenwald	17a3af3fc0	Replace a couple of `skipPages` annotations with `firstPage` in `test/test_manifest.json` Originally only `skipPages` existed, but given that `firstPage`/`lastPage` has existed for a long time now using them whenever possible looks simpler overall.	2019-10-17 13:20:56 +02:00
Jonas Jenwald	f3c5a690fc	Remove unnecessary `skipPages` annotations from `test/test_manifest.json` - In the `ibwa-bad` case the sixteenth page contains corrupt/incomplete commands, but given that we're suppressing `Error`s by default now skipping hardly seems warranted any more. - In the `geothermal.pdf` case the first page contains an unsupported ColourSpace, but again we're suppressing `Error`s by default now and skipping hardly seems warranted any more.	2019-10-17 13:20:49 +02:00
Jonas Jenwald	ffc847eaa5	Allow over-writing entries, in `XRef.indexObjects`, only when the generation number matches (issues 11230, 11139, 9552, 9129, 7303) This patch is making me somewhat worried about future regressions, since it's certainly easy to imagine this completely breaking certain kinds of corrupt/edited PDF documents while fixing others.[1] Obviously it passes all existing reference tests (and even improves one), however compared to many other patches there's no telling how much it could break. The only reason that I'm even submitting this patch, is because of the number of open issues that it would address. Generally speaking though, the best course of action would probably be if `XRef.indexObjects` was re-written to be much more robust (since it currently feels somewhat hand-wavy in parts). E.g. by actually checking/validating more of the objects before committing to them. --- [1] Especially given that it's reverting part of PR 5910, however in the case of issue 5909 it seems that other (more recent) changes have actually made that PR redundant.	2019-10-14 22:10:04 +02:00
Jonas Jenwald	259551d144	Convert a number of reference tests, for documents with corrupt XRef tables, from `load` to `eq` As part of attempting to fix a number issues containing PDF documents with corrupt XRef tables, I'd like to improve the reference test-coverage slightly first. Obviously this will increase the runtime of the tests a bit, however I'd rather "waste" resources on the bots instead of developer time fixing regressions which could have been avoided.	2019-10-12 18:49:59 +02:00
Jonas Jenwald	f5be2d62a3	Improve the heuristics, in `PartialEvaluator._buildSimpleFontToUnicode`, for glyphNames of the Cdd{d}/cdd{d} format (issue 9655) Please note: I've been thinking about possible ways of addressing this issue for a while now, but all of the solutions I came up with became too complicated and thus hurt readability of the code. However, it occured to me that we're essentially trying to add a heuristic on top of another heuristic, and that it shouldn't matter how efficient the code is as long as it works. In the PDF file in the issue the Encoding contains glyphNames of the `Cdd` format, which our existing heuristics will treat as base 10 values. However, in this particular file they actually contain base 16 values, which we thus attempt to detect and fix such that text-selection works.	2019-10-06 10:47:29 +02:00
Tim van der Meij	3da680cdfc	Merge pull request #11158 from janpe2/gradient-stops Avoid floating point inaccuracy in gradient color stops	2019-09-19 13:15:11 +02:00
Jonas Jenwald	af22dc9b0c	For Type1 fonts, replace missing font dictionary /Widths entries with ones from the font data (issue 11150) Hopefully this patch makes sense, and in order to reduce the regression risk the implementation ensures that only completely missing widths are being replaced.	2019-09-18 10:15:09 +02:00
Jani Pehkonen	911df237f3	Avoid floating point inaccuracy in gradient color stops	2019-09-17 21:01:17 +03:00
Tim van der Meij	fbe8c6127c	Merge pull request #11059 from Snuffleupagus/boundingBox-more-validation Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries	2019-08-09 22:39:01 +02:00
Jonas Jenwald	d637b25e36	Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it. Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly". The patch makes the following notable changes: - Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.) - Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer. - Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`. - Add an optional parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty. - Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange. --- [1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.	2019-08-09 10:18:13 +02:00
Jonas Jenwald	0f78fdb229	Handle some corrupt/truncated JPEG images that are missing the EOI (End of Image) marker (issue 11052) Note that even Adobe Reader cannot render the PDF file completely, which is always a good indication that it's corrupt.	2019-08-08 10:37:41 +02:00
Jonas Jenwald	5ac9c7c384	Support corrupt PDF files with invalid/non-existent Group /CS entries (issue 11045) The PDF file in question tries to reference a non-existent ColorSpace, which should be quite rare in practice.	2019-08-06 14:33:05 +02:00
Jonas Jenwald	9ad50521b1	Add a work-around, in `glyphlist.js`, for bad PDF generators which use a non-standard `/f_f` string in the `Encoding` dictionary when referring to the ff ligature (issue 11016) This patch will not incur any (measurable) overhead, since the glyphlist is already quite long and one more entry won't really matter, which is important given that this sort of PDF corruption ought to be very rare. Furthermore, this patch purposely does not add a bunch of similarly modified ligature names on pure speculation. Any similar additions, for other ligatures, should only be made if there's real-world examples of PDF files where that's actually necessary.	2019-07-30 17:06:58 +02:00
Tim van der Meij	ed3954fc7a	Merge pull request #10851 from brendandahl/shading-bbox Apply bounding box before using shading patterns.	2019-07-12 22:52:07 +02:00
Tim van der Meij	87f36e3520	Merge pull request #10850 from brendandahl/scale-line-width Scale stroking line width when using a tiling pattern.	2019-07-12 22:50:32 +02:00
Brendan Dahl	6fab0a0dac	Apply bounding box before using shading patterns. Fixes #8092	2019-07-08 14:05:48 -07:00
Brendan Dahl	446efab707	Scale stroking line width when using a tiling pattern.	2019-07-08 13:47:54 -07:00
Jonas Jenwald	876c962235	Ignore Annotations with too large border `width`s, to prevent the `annotationLayer` from rendering it over the surrounding document (bug 1552113) The border `width` will instead fallback to the default value of `1`, rather than ignoring it altoghether, to also ensure that e.g. `LinkAnnotation`s become clickable as intended. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1552113	2019-06-01 15:51:22 +02:00
Jani Pehkonen	05c527f035	Fix glyph 0 in CIDFontType2 that has a CIDToGIDMap stream	2019-05-07 18:44:37 +03:00
Jonas Jenwald	5335285cda	Attempt to handle corrupt PDF documents that contains path operators inside of text object (issue 10542) First of all, while this simple approach appears to work OK in practice I'm not sure if it's the best way of addressing the problem (assuming that you even want to). Second of all, while the solution implemented here only requires tracking/checking one new boolean in order for this to work, I'm nonetheless not entirely happy about this since it will add additional overhead (albeit very small) to the parsing of path operators in PDF documents just for a handful of corrupt ones.	2019-04-30 23:35:33 +02:00
Tim van der Meij	55d9b35d37	Merge pull request #10727 from Snuffleupagus/type3-image-resources Support (rare) Type3 fonts which contains image resources (issue 10717)	2019-04-18 23:07:26 +02:00
Tim van der Meij	ae2a4dc3dd	Implement free text annotations	2019-04-13 18:45:22 +02:00
Jonas Jenwald	be604bd195	Support (rare) Type3 fonts which contains image resources (issue 10717) The Type3 font type is not commonly used in PDF documents, as can be seen from telemetry data such as: https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2019-04-09&include_spill=0&keys=__none__!__none__!__none__&max_channel_version=nightly%252F68&measure=PDF_VIEWER_FONT_TYPES&min_channel_version=nightly%252F57&processType=&product=Firefox&sanitize=1&sort_by_value=0&sort_keys=submissions&start_date=2019-03-18&table=0&trim=1&use_submission_date=0 (see also https://github.com/mozilla/pdf.js/wiki/Enumeration-Assignments-for-the-Telemetry-Histograms#pdf_viewer_font_types). Type3 fonts containing image resources are very* rare in practice, usually they only contain path rendering operators, but as the issue shows they unfortunately do exist. Currently these Type3-related image resources are not handled in any special way, and given that fonts are document rather than page specific rendering breaks since the image resources are thus not available to the entire document. Fortunately fixing this isn't too difficult, but it does require adding a couple of Type3-specific code-paths to the `PartialEvaluator`. In order to keep the implementation simple, particularily on the main-thread, these Type3 image resources are completely decoded on the worker-thread to avoid adding too many special cases. This should not cause any issues, only marginally less efficient code, but given how rare this kind of Type3 font is adding premature optimizations didn't seem at all warranted at this point.	2019-04-13 18:27:50 +02:00

1 2 3 4 5 ...

869 Commits