Commit Graph

500 Commits

Author SHA1 Message Date
Jonas Jenwald
91efde5246 Add a heuristic to scale even single-char text, when the horizontal/vertical scaling differs significantly (issue 11713)
At this point in time, compared to when the "ignore single-char" code was added, we *should* generally be doing a much better job of combining text into as few chunks as possible.
However, there's still bad cases where we're not able to combine text as much as one would like, which is why I'm *not* proposing to simply measure/scale all text. Instead this patch will to only measure/scale single-char text in cases where the horizontal/vertical scale is off significantly, since that's were you'd expect bad text-selection behaviour otherwise.

Note that most of the movement caused by this patch is with Type3 fonts, which is a somewhat special font type and one where our current text-selection behaviour is probably the least good.
2020-04-07 00:36:23 +02:00
Jonas Jenwald
938d519192 Create the glyph mapping correctly for composite Type1, i.e. CIDFontType0, fonts (issue 11740)
This updates `Type1Font.getGlyphMapping` with a code-path "borrowed" from `CFFFont.getGlyphMapping`.
2020-04-06 11:21:02 +02:00
Jani Pehkonen
a22c0eab48 The first glyph in CFF CIDFonts must be named 0 instead of ".notdef"
Fixes #11718 in which the `ff` ligature glyph is at index zero in a CFF font. Beacuse this is a CIDFont, glyph names are CIDs, which are integers. Thus the string `".notdef"` is not correct. The rest of the charset data is already parsed correctly as integers when the boolean argument `cid` is true.
2020-03-24 15:56:50 +02:00
Jonas Jenwald
15e8692eff Don't accidentally accept invalid glyphNames which *appear* to follow the Cdd{d}/cdd{d} format in PartialEvaluator._buildSimpleFontToUnicode (issue 11697)
The /Differences array of the problematic font contains a `/c.1` entry, which is consequently detected as a *possible* Cdd{d}/cdd{d} glyphName by the existing heuristics.
Because of how the base 10 conversion is implemented, which is necessary for the base 16 special case, the parsed charCode becomes `0.1` thus causing `String.fromCodePoint` to throw since that obviously isn't a valid code point.

To fix the referenced issue, and to hopefully prevent similar ones in the future, the patch adds *additional* validation of the charCode found by the heuristics.
2020-03-13 23:35:47 +01:00
Tim van der Meij
c95b9b1e17
Merge pull request #11653 from Snuffleupagus/ensureStateFont
Ensure that there's always a setFont (Tf) operator before text rendering operators (issue 11651)
2020-03-03 23:33:13 +01:00
Jani Pehkonen
71e7686950 Fix Type1 font parsing when .notdef is not at index zero
Fixes #11477
The PDF draws many space characters but the embedded fonts don't have a glyph named `space`, so `.notdef` should be drawn instead. PDF.js assumed that Type1 fonts define `.notdef` as the first glyph (index 0). However, now the fonts have the glyph `A` at index 0 and `.notdef` is the last one, so `A` appears where spaces are expected.

Because the rest of the font machinery in `core/fonts.js` assumes `.notdef` is at index zero, it's easiest to modify `core/type1_parser.js` so that it "repairs" fonts and makes sure `.notdef` is at index 0.
2020-03-03 21:55:51 +02:00
Jonas Jenwald
65e514e063 Ensure that there's always a setFont (Tf) operator before text rendering operators (issue 11651)
The PDF document in question is *corrupt*, since it contains multiple instances of incorrect operators.
We obviously don't want to slow down parsing of *all* documents (since most are valid), just to accommodate a particular bad PDF generator, hence the reason for the inline check before calling the `ensureStateFont` method.
2020-03-03 10:05:18 +01:00
Takashi Tamura
d8c9f119b0 Fix the vertical writing mode with horizontal scaling. #11555.
It is not valid to multiply textHScale when the writing mode is vertical.

See 9.4.4 Text Space Details, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1694762
2020-02-29 07:48:29 +09:00
Jonas Jenwald
3c7b7be100 Prevent circular references in the /Pages tree 2020-02-19 01:49:39 +01:00
Tim van der Meij
dced0a3821
Merge pull request #11579 from Snuffleupagus/issue-11578
Ignore spaces when normalizing the font name in `Font.fallbackToSystemFont` (issue 11578)
2020-02-09 17:33:09 +01:00
Tim van der Meij
61056a9238
Merge pull request #11551 from Snuffleupagus/issue-11549
Allow skipping of errors when reading broken/corrupt ToUnicode data (issue 11549)
2020-02-09 17:32:35 +01:00
Jonas Jenwald
7937165537 Ignore spaces when normalizing the font name in Font.fallbackToSystemFont (issue 11578) 2020-02-08 19:59:04 +01:00
Jonas Jenwald
88c35d872f Ensure that the PDF header contains an actual number (PR 11463 follow-up)
While it would be nice to change the `PDFFormatVersion` property, as returned through `PDFDocumentProxy.getMetadata`, to a number (rather than a string) that would unfortunately be a breaking API change.
However, it does seem like a good idea to at least *validate* the PDF header version on the worker-thread, rather than potentially returning an arbitrary string.
2020-02-07 12:25:07 +01:00
Brendan Dahl
09a6e17d22
Merge pull request #11528 from janpe2/type1-nonemb-notdef
Hide .notdef glyphs in non-embedded Type1 fonts and don't ignore Widths
2020-02-06 13:30:07 -08:00
Jonas Jenwald
4c54395ff6 Allow skipping of errors when reading broken/corrupt ToUnicode data (issue 11549)
This will allow font loading/parsing to continue, rather than immediately failing, when broken/corrupt CMap data is encountered.
2020-01-30 13:19:05 +01:00
Jonas Jenwald
62b2b984cc Render Popup annotations last, once all other annotations have been rendered (issue 11362)
In the current `AnnotationLayer` implementation, Popup annotations require that the parent annotation have already been rendered (otherwise they're simply ignored).
Usually the annotations are ordered, in the `/Annots` array, in such a way that this isn't a problem, however there's obviously no guarantee that all PDF generators actually do so. Hence we simply ensure, when rendering the `AnnotationLayer`, that the Popup annotations are handled last.
2020-01-26 15:49:55 +01:00
Jani Pehkonen
809b96b40c Hide .notdef glyphs in non-embedded Type1 fonts and don't ignore Widths
Fixes #11403
The PDF uses the non-embedded Type1 font Helvetica. Character codes 194 and 160 (`Â` and `NBSP`) are encoded as `.notdef`. We shouldn't show those glyphs because it seems that Acrobat Reader doesn't draw glyphs that are named `.notdef` in fonts like this.

In addition to testing `glyphName === ".notdef"`, we must test also `glyphName === ""` because the name `""` is used in `core/encodings.js` for undefined glyphs in encodings like `WinAnsiEncoding`.

The solution above hides the `Â` characters but now the replacement character (space) appears to be too wide. I found out that PDF.js ignores font's `Widths` array if the font has no `FontDescriptor` entry. That happens in #11403, so the default widths of Helvetica were used as specified in `core/metrics.js` and `.nodef` got a width of 333. The correct width is 0 as specified by the `Widths` array in the PDF. Thus we must never ignore `Widths`.
2020-01-21 21:35:25 +02:00
Tim van der Meij
dfe42a5ca4
Include a unit test for OpenAction dictionaries without Type entries (PR 11443 follow-up)
The original issue did not contain a (reduced) test case that we could
include and linked test cases are not ideal for unit tests, so the
original PR could only be verified manually.

I found this a bit unfortunate considering that the print data is
exposed through the API, so I thought about how we could have an
automated test and managed to create a reduced test case with the
OpenAction dictionary from the file in the original issue.

Therefore, this commit includes a unit test for parsing OpenAction
dictionaries without `Type` entries. I verified that this PDF file
behaves the same as the original one, i.e., no print dialog is shown for
older viewers and the print dialog is shown for the most recent viewer.
2019-12-27 00:05:51 +01:00
Tim van der Meij
6972bbea74
Include a reduced test case for annotations without a Border/BS entry (PR 6180 follow-up) 2019-11-10 14:37:42 +01:00
Jonas Jenwald
5c266f0e8c Support Blend Modes which are specified in an Array of Names (issue 11279)
According to the specification, the first *supported* Blend Mode should be choosen in this case; please see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G10.4848607
2019-10-26 14:24:31 +02:00
Jonas Jenwald
2fcb5afc7b Add a fallback for non-embedded *composite* Verdana fonts (issue 11242)
Obviously this won't look exactly right, but considering that the PDF file doesn't bother embedding non-standard fonts this is the best that we can do here.
2019-10-17 17:00:55 +02:00
Jonas Jenwald
259551d144 Convert a number of reference tests, for documents with corrupt XRef tables, from load to eq
As part of attempting to fix a number issues containing PDF documents with corrupt XRef tables, I'd like to improve the reference test-coverage slightly *first*.
Obviously this will increase the runtime of the tests a bit, however I'd rather "waste" resources on the bots instead of developer time fixing regressions which could have been avoided.
2019-10-12 18:49:59 +02:00
Jonas Jenwald
f5be2d62a3 Improve the heuristics, in PartialEvaluator._buildSimpleFontToUnicode, for glyphNames of the Cdd{d}/cdd{d} format (issue 9655)
*Please note:* I've been thinking about possible ways of addressing this issue for a while now, but all of the solutions I came up with became too complicated and thus hurt readability of the code.
However, it occured to me that we're essentially trying to add a heuristic *on top* of another heuristic, and that it shouldn't matter how efficient the code is as long as it works.

In the PDF file in the issue the Encoding contains glyphNames of the `Cdd` format, which our existing heuristics will treat as base 10 values. However, in this particular file they actually contain base 16 values, which we thus attempt to detect and fix such that text-selection works.
2019-10-06 10:47:29 +02:00
Tim van der Meij
3da680cdfc
Merge pull request #11158 from janpe2/gradient-stops
Avoid floating point inaccuracy in gradient color stops
2019-09-19 13:15:11 +02:00
Jonas Jenwald
af22dc9b0c For Type1 fonts, replace missing font dictionary /Widths entries with ones from the font data (issue 11150)
Hopefully this patch makes sense, and in order to reduce the regression risk the implementation ensures that only completely missing widths are being replaced.
2019-09-18 10:15:09 +02:00
Jani Pehkonen
911df237f3 Avoid floating point inaccuracy in gradient color stops 2019-09-17 21:01:17 +03:00
Tim van der Meij
09df1ee0ce
Include a reduced, non-linked PDF file for the attachments API unit test 2019-08-25 15:14:57 +02:00
Jonas Jenwald
d637b25e36 Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".

The patch makes the following notable changes:
 - Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
 - Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
 - Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
 - Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
 - Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.

---

[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-09 10:18:13 +02:00
Jonas Jenwald
5ac9c7c384 Support corrupt PDF files with invalid/non-existent Group /CS entries (issue 11045)
The PDF file in question tries to reference a non-existent ColorSpace, which should be quite rare in practice.
2019-08-06 14:33:05 +02:00
Jonas Jenwald
9ad50521b1 Add a work-around, in glyphlist.js, for bad PDF generators which use a non-standard /f_f string in the Encoding dictionary when referring to the ff ligature (issue 11016)
This patch will not incur any (measurable) overhead, since the glyphlist is already quite long and one more entry won't really matter, which is important given that this sort of PDF corruption ought to be very rare.

Furthermore, this patch purposely does *not* add a bunch of similarly modified ligature names on pure speculation. Any similar additions, for other ligatures, should only be made if there's real-world examples of PDF files where that's actually necessary.
2019-07-30 17:06:58 +02:00
Tim van der Meij
ed3954fc7a
Merge pull request #10851 from brendandahl/shading-bbox
Apply bounding box before using shading patterns.
2019-07-12 22:52:07 +02:00
Tim van der Meij
87f36e3520
Merge pull request #10850 from brendandahl/scale-line-width
Scale stroking line width when using a tiling pattern.
2019-07-12 22:50:32 +02:00
Brendan Dahl
6fab0a0dac Apply bounding box before using shading patterns.
Fixes #8092
2019-07-08 14:05:48 -07:00
Brendan Dahl
446efab707 Scale stroking line width when using a tiling pattern. 2019-07-08 13:47:54 -07:00
Jonas Jenwald
876c962235 Ignore Annotations with too large border widths, to prevent the annotationLayer from rendering it over the surrounding document (bug 1552113)
The border `width` will instead fallback to the default value of `1`, rather than ignoring it altoghether, to also ensure that e.g. `LinkAnnotation`s become clickable as intended.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1552113
2019-06-01 15:51:22 +02:00
Jani Pehkonen
05c527f035 Fix glyph 0 in CIDFontType2 that has a CIDToGIDMap stream 2019-05-07 18:44:37 +03:00
Jonas Jenwald
5335285cda Attempt to handle corrupt PDF documents that contains path operators inside of text object (issue 10542)
First of all, while this simple approach appears to work OK in practice I'm not sure if it's the best way of addressing the problem (assuming that you even want to).
Second of all, while the solution implemented here only requires tracking/checking one new boolean in order for this to work, I'm nonetheless not entirely happy about this since it will add additional overhead (albeit *very* small) to the parsing of path operators in PDF documents just for a handful of *corrupt* ones.
2019-04-30 23:35:33 +02:00
Tim van der Meij
ae2a4dc3dd
Implement free text annotations 2019-04-13 18:45:22 +02:00
Tim van der Meij
b4c3b94592
Merge pull request #6606 from Rob--W/pattern-scaling
Improve performance and correctness of Tiling Patterns
2019-03-29 00:01:38 +01:00
Tim van der Meij
f9c58115fc
Merge pull request #10683 from janpe2/type0-noncid-cmap
Use CMap in Type0 fonts when CFF is not a CID font
2019-03-28 00:07:08 +01:00
Rob Wu
d3dc8f16b5 TilingPattern: Reverse transform after painting
This transform resulted in an incorrectly positioned object when the
bounding box's upper-left corner did not start at (0,0), because
the translation was not reverted. This patch adds the missing transform.

The test file (tiling-pattern-box.pdf) is based on the PDF from #2825.
All but the first cube (including the PDF data) have been removed.
To trigger the bug that is fixed by this commit, I changed the BBox of
the first pattern from "[ 0 0 596 842]" to "[90 0 596 842]". Without
this patch, the dashed vertical line that intersects the corners at A
and E would disappear.
2019-03-27 17:50:35 +01:00
Rob Wu
a72a8e921f Avoid extreme sizing / scaling in tiling pattern
The new test file (tiling-pattern-large-steps.pdf) was manually created,
to have the following characteristics:
- Large xstep and ystep (90000)
- Page width is 4000 (which is larger than MAX_PATTERN_SIZE)
- Visually, the page consists of a red rectangle with a black border,
  surrounded by a 50 unit white padding.
- Before patch: blurry; After patch: sharp

Fixes #6496
Fixes #5698
Fixes #1434
Fixes #2825
2019-03-27 17:44:04 +01:00
Jonas Jenwald
9077abc263 Take the FirstChar/LastChar properties into account when computing the hash in PartialEvaluator.preEvaluateFont (issue 10665)
Without this some fonts may incorrectly end up with matching `hash`es, thus breaking rendering since we'll not actually try to load/parse some of the fonts.
2019-03-27 16:27:10 +01:00
Jani Pehkonen
49c6233fbc Use CMap in Type0 fonts when CFF is not a CID font 2019-03-26 19:38:44 +02:00
Jonas Jenwald
88f9e633dd Try to improve text-selection for Type3 fonts that utilize a non-default /FontMatrix (bug 1513120)
For Type3 fonts text-selection is often not that great, and there's a couple of heuristics used to try and improve things. This patch simple extends those heuristics a bit, and fixes a pre-existing "naive" array comparison, but this all feels a bit brittle to say the least.

The existing Type3 test-coverage isn't that great in general, and in particular Type3 `text` tests are few and far between, hence why this patch adds *two* different new `text` tests.
2019-03-12 10:32:08 +01:00
Jonas Jenwald
fb774a65b0 Avoid truncating/breaking some Type3 glyphs in compileType3Glyph (bug 1245391, issue 10568)
*Hopefully this patch makes sense, since I cannot claim to fully understand this function.*

With the changes made in PR 3354 *some* Type3 glyph outlines are no longer rendering correctly, since the final paths were being accidentally ignored.
The fact that Type3 fonts are not very common in PDF documents, and that most Type3 glyphs are unaffected by this regression, probably explains why this has gone unnoticed since 2013.
2019-02-21 23:29:43 +01:00
Tsukasa OI
96ba6afd47 Fix copying on supplementary plane characters
pdf.js had a problem when copying characters on supplementary planes
(0xPPXXXX where PP is nonzero).  This is because certain methods of
PartialEvaluator use classic String.fromCharCode instead of ES6's
String.fromCodePoint.

Despite the fact that readToUnicode method *tried* to parse out-of-UCS2
code points by parsing UTF-16BE, it was inadequate because
String.fromCharCode only supports UCS-2 range of Unicode.
2019-02-10 18:14:53 +09:00
Jonas Jenwald
6f94a05a29 Do the final text scaling correctly in flushTextContentItem (issue 8276)
It's necessary to take into account whether or not the text is vertical, to avoid either the textContent `width` or `height` becoming incorrect.
2019-01-29 15:24:04 +01:00
Jani Pehkonen
26121177ab Implement Decode entry in Indexed images 2019-01-22 22:51:04 +02:00
Jonas Jenwald
b531fc4106 Avoid truncating inline images, where the data and the "EI" marker is glued together (issue 10388) (#10436)
Thanks to the *excellent* debugging done by @janpe2, this was easy to fix!
2019-01-12 20:31:23 +01:00
Jonas Jenwald
d4a3858ed5 Handle more cases of corrupt PDF files with missing 'endobj' operators, where the "obj" string is immediately followed by the dictionary (PR 9288 follow-up) 2019-01-10 17:55:28 +01:00
Brendan Dahl
32eace043b Fix reading number of HTMX metrics.
The length of the HHEA table can be incorrect, so it is better to
read the number of metrics offset from beginning of table instead.
2019-01-04 15:13:13 -08:00
Jani Pehkonen
9e990f6f3e Repair CFF fonts if stem hints are in wrong order 2018-11-20 18:50:37 +02:00
Simon Leblanc
b5806735d8 Add support of Ink annotation 2018-10-03 00:28:49 +02:00
Tim van der Meij
66422eb83e
Merge pull request #9340 from brendandahl/private-use
Map all glyphs to the private use area and duplicate the first glyph.
2018-09-08 17:51:04 +02:00
Brendan Dahl
b76cf665ec Map all glyphs to the private use area and duplicate the first glyph.
There have been lots of problems with trying to map glyphs to their unicode
values. It's more reliable to just use the private use areas so the browser's
font renderer doesn't mess with the glyphs.

Using the private use area for all glyphs did highlight other issues that this
patch also had to fix:

  * small private use area - Previously, only the BMP private use area was used
    which can't map many glyphs. Now, the (much bigger) PUP 16 area can also be
    used.

  * glyph zero not shown - Browsers will not use the glyph from a font if it is
    glyph id = 0. This issue was less prevalent when we mapped to unicode values
    since the fallback font would be used. However, when using the private use
    area, the glyph would not be drawn at all. This is illustrated in one of the
    current test cases (issue #8234) where there's an "ä" glyph at position
    zero. The PDF looked like it rendered correctly, but it was actually not
    using the glyph from the font. To properly show the first glyph it is always
    duplicated and appended to the glyphs and the maps are adjusted.

  * supplementary characters - The private use area PUP 16 is 4 bytes, so
    String.fromCodePoint must be used where we previously used
    String.fromCharCode. This is actually an issue that should have been fixed
    regardless of this patch.

  * charset - Freetype fails to load fonts when the charset size doesn't match
    number of glyphs in the font. We now write out a fake charset with the
    correct length. This also brought up the issue that glyphs with seac/endchar
    should only ever write a standard charset, but we now write a custom one.
    To get around this the seac analysis is permanently enabled so those glyphs
    are instead always drawn as two glyphs.
2018-09-05 14:04:54 -07:00
Jonas Jenwald
e5a6d892b4
Revert "Attempt to combine separate beginText/endText sequences in getTextContent (issue 9984)" 2018-09-05 18:01:33 +02:00
Tim van der Meij
959ed3705b
Implement a permissions API 2018-09-02 21:23:09 +02:00
Jonas Jenwald
497b765ede Attempt to combine separate beginText/endText sequences in getTextContent (issue 9984)
Please note that while this *improves* issue 9984 slightly (and likely others too), it's not a complete solution.
The remaining issues are related to the, more general, problems with the existing heuristics related to attempting to combine separate text items.
2018-08-18 13:45:32 +02:00
Brendan Dahl
5f67a6a237 Always fallback to system font on font failure.
The font in the PDF is marked as a CIDFontType0, but the font file is
actually a true type font. To fully address this issue we should really
peek into the font file and try to determine what it is. However, this
is the first case of this issue, so I think this solution is acceptable for
now.
2018-08-03 16:49:22 -07:00
Brian
2a665ebad4 Removed Extraneous Matrix Check in CalRGB Conversion 2018-08-02 10:16:42 -07:00
Jonas Jenwald
690bcc8c8a Add a reduced, eq, test-case for issue 9915 2018-07-29 23:06:15 +02:00
Jonas Jenwald
2b25deb84c Prevent errors in sanitizeTTProgram, during parsing of CALL functions, when encountering invalid functions stack deltas (bug 1473809)
*I was feeling bored; so this is a very quick, and somewhat naive, attempt at fixing the bug.*

The breaking error, i.e. `Error during font loading: invalid array length`, was thrown when attempting to re-size the `stack` to a *negative* length when parsing the CALL functions.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1473809.
2018-07-10 09:45:55 +02:00
Jonas Jenwald
56e3648b65 Add basic validation of the 'trailer' dictionary candidates in XRef.indexObjects (issue 9418)
This patch avoids choosing a (possible) 'trailer' dictionary that `XRef.parse` and/or the `Catalog` constructor/methods will reject anyway.
Since `XRef.indexObjects` is already parsing the entire PDF file, the extra dictionary look-ups added here shouldn't matter much. Besides, this is a fallback code-path that only applies to corrupt PDF files anyway.
2018-06-20 13:41:22 +02:00
Jonas Jenwald
6bbcafcd26 Let Lexer.getNumber treat a single decimal point as zero (issue 9252)
This is consistent with the behaviour in Adobe Reader.
2018-06-20 13:41:21 +02:00
Jani Pehkonen
8ea505545a Use FDSelect and FDArray when converting CFF CID font to paths 2018-04-10 16:44:42 +03:00
Jonas Jenwald
a18c65ae9f Use the correct stream position when reading maxSizeOfInstructions from the maxp table (issue 9458)
Please refer to the `maxp` table specification, found at https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6maxp.html.

Fixes 9458.
2018-02-07 21:57:43 +01:00
Jonas Jenwald
d6c028b946 Add support for TrueType Collection fonts (issue 9262)
The specification can be found at https://www.microsoft.com/typography/otspec/otff.htm, under the "Font Collections" heading.

Fixes 9262.
2018-01-08 22:31:08 +01:00
Jonas Jenwald
c5700211d6 Adjust decodeACSuccessive in src/core/jpg.js to improve the rendering quality of (progressive) JPEG images
I've been looking into the remaining point in 8637 about blurry images, to see if we could perhaps improve the rendering quality slightly there. After quite a bit of debugging, it seems that the issue is limited to certain progressive JPEG images.

As mentioned previously, I've got no detailed knowledge of the JPEG format, but this patch does seem to improve things quite a bit for the images in question.
Squinting at https://searchfox.org/mozilla-central/rev/6c33dde6ca02b389c52e8db3d22494df8b916f33/media/libjpeg/jdphuff.c#492-639, it seems reasonable that we should take the sign of the data into account. Furthermore, looking at the specification in https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=118, the "F.2.4.3 Decoding the binary decision sequence for non-zero DC differences and AC coefficients" section even contains a description of this (even though I cannot claim to really understand the details).
2017-12-30 15:24:09 +01:00
Jonas Jenwald
d4cd44fd16 Add a fallback for non-embedded LucidaSans-Demi fonts (issue 9291)
The PDF file in the issue uses a number of *embedded* versions of Lucida fonts, but for some reason does *not* embed the LucidaSans-Demi font. According to https://en.wikipedia.org/wiki/Lucida#Usages that one should be bold, so we can at least improve rendering here (even though it won't look perfect).

Fixes 9291.
2017-12-24 17:36:58 +01:00
Jonas Jenwald
1dc54ddb40 Handle PDF files with missing 'endobj' operators, by searching for the "obj" string rather than "endobj" in XRef.indexObjects (issue 9105)
This patch refactors the searching for 'endobj', to try and find the next occurance of "obj" and then check if it was in fact an 'endobj' and continue searching otherwise.
This approach is used to avoid having to first find 'endobj', and then re-check the entire contents of the object and having to run (potentially expensive) regular expressions on arbitrary long strings.

Fixes 9105.
2017-12-18 13:17:45 +01:00
Brendan Dahl
9b51cea724 Fix loca table when offsets aren't in ascending order. 2017-12-15 11:20:28 -06:00
Jonas Jenwald
a5e3261b48
Merge pull request #9062 from mozilla/no_high
Move char codes from high surrogate pair range into private use.
2017-12-08 12:31:22 +01:00
Brendan Dahl
306999c325 Move char codes from high surrogate pair range into private use.
Fixes #2884
2017-12-07 10:35:50 -08:00
Jonas Jenwald
f3c50fe2f9
Merge pull request #9192 from Snuffleupagus/issue-8229
Build a fallback `ToUnicode` map for simple fonts (issue 8229)
2017-11-30 10:27:32 +01:00
Tim van der Meij
e320243870
Merge pull request #9206 from janpe2/svg-inv-images
Fix inverted 1-bit images in SVG backend
2017-11-28 22:46:43 +01:00
Jani Pehkonen
58b214eab3 Fix inverted 1-bit images in SVG backend 2017-11-28 21:24:27 +02:00
Jani Pehkonen
06d083b04b Fix pattern-filled text 2017-11-28 19:40:22 +02:00
Jonas Jenwald
61e19bee43 Build a fallback ToUnicode map for simple fonts (issue 8229)
In some fonts, the included `ToUnicode` data is incomplete causing text-selection to not work properly. For simple fonts that contain encoding data, we can manually build a `ToUnicode` map to attempt to improve things.

Please note that since we're currently using the `ToUnicode` data during glyph mapping, in an attempt to avoid rendering regressions, I purposely didn't want to amend to original `ToUnicode` data for this text-selection edge-case.
Instead, I opted for the current solution, which will (hopefully) give slightly better text-extraction results in PDF file with incomplete `ToUnicode` data.

According to the PDF specification, see [section 9.10.2](http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1873172):

> A conforming reader can use these methods, in the priority given, to map a character code to a Unicode value.
> ...

Reading that paragraph literally, it doesn't seem too unreasonable to use *different* methods for different charcodes.

Fixes 8229.
2017-11-26 14:45:15 +01:00
Jonas Jenwald
83e8398ff2 For non-embedded fonts, map softhyphen (0x00AD) to regular hyphen (0x002D) (issue 9084)
In the PDF file, the `ToUnicode` data first maps the hyphen correctly, and then *overwrites* it to point to the softhyphen instead. That one cannot be rendered in browsers, and an empty space thus appear instead.

Fixes 9084.
2017-10-31 13:26:04 +01:00
Jonas Jenwald
92fcfce685
Merge pull request #9082 from brendandahl/issue7562
Overwrite glyphs contour count if it's less than -1.
2017-10-30 20:44:01 +01:00
Brendan Dahl
17037b5e51 Overwrite glyphs contour count if it's less than -1.
The test pdf has a contour count of -70, but OTS doesn't
like values less than -1.

Fixes issue #7562.
2017-10-30 09:16:51 -07:00
Jonas Jenwald
d71a576b30 Merge pull request #9045 from brendandahl/sani-name
Sanitize name index in compile phase of CFF.
2017-10-24 11:48:03 +02:00
Brendan Dahl
6b12612a52 Sanitize name index in compile phase of CFF.
Fixes #8960
2017-10-23 17:13:49 -07:00
Brendan Dahl
fcc9943d04 Use charstring as plain text when lengthIV is -1.
Fixes #7769
2017-10-18 14:19:59 -07:00
Jonas Jenwald
b1472cddbb Allow getOperatorList/getTextContent to skip errors when parsing broken XObjects (issue 8702, issue 8704)
This patch makes use of the existing `ignoreErrors` property in `src/core/evaluator.js`, see PRs 8240 and 8441, thus allowing us to attempt to recovery as much as possible of a page even when it contains broken XObjects.

Fixes 8702.
Fixes 8704.
2017-09-29 17:14:21 +02:00
Brendan Dahl
18e2321845 Overwrite maxSizeOfInstructions in maxp with computed value.
In issue #7507 the value is less than the actuall max size
of the glyph instructions causing OTS to fail the font.
2017-09-25 17:53:26 -07:00
Jonas Jenwald
10727572a2 Merge pull request #8950 from timvandermeij/polygon-polyline-annotations
Implement support for polyline and polygon annotations
2017-09-24 15:16:14 +02:00
Tim van der Meij
c69a7a83da Merge pull request #8932 from janpe2/jbig2-sym-offset
JBIG2 symbol offsets
2017-09-23 17:11:45 +02:00
Tim van der Meij
ed8c0ebfa7
Implement reference tests for polyline and polygon annotations 2017-09-23 17:01:19 +02:00
Jonas Jenwald
abc864fca9 Merge pull request #8938 from brendandahl/bug1392647
Use font's default width even when 0. (bug 1392647)
2017-09-20 22:38:39 +02:00
Brendan Dahl
10ba292b46 Use font's default width even when 0.
Bug 1392647 has a PDF where the default width of the font
is 0. It draws some charcodes that don't have glyphs, but
we were wrongly using the 1000 default width for these
charcodes causing some text to be overlapping.
2017-09-20 11:38:30 -07:00
Jani Pehkonen
5d1074c110 Fix JBIG2 symbol offsets in text regions 2017-09-19 23:43:23 +03:00
Jani Pehkonen
3d99b8d706 CCITTFaxStream problem when EndOfBlock is false 2017-09-19 22:19:40 +03:00
Tim van der Meij
400e4aae0e
Implement support for stamp annotations 2017-09-16 16:37:50 +02:00
Tim van der Meij
320779e6ed Merge pull request #8691 from timvandermeij/square-circle-annotations
Implement support for square and circle annotations
2017-09-09 22:56:54 +02:00
Tim van der Meij
c04f9d6098
Implement reference tests for square and circle annotations 2017-09-09 21:36:28 +02:00
Jonas Jenwald
7115e136e4 Hide unsupported LinkAnnotations (issue 3897)
Rather than displaying links that does *nothing* when clicked, it probably makes more sense to simply not render them instead. Especially since it turns out that, at least at this point in time, this is *very* easy to both implement and test.

Fixes 3897.
2017-09-06 12:52:56 +02:00
Jonas Jenwald
49b8cd5a6a Attempt to improve the EI detection heuristics, for inline images, in streams containing NUL bytes (issue 8823)
Since this patch will now treat (some) `NUL` bytes as "ASCII", the number of `followingBytes` checked are thus increased to (hopefully) reduce the risk of introducing new false positives.

Fixes 8823.
2017-08-27 12:48:28 +02:00
Jonas Jenwald
4891b9c7e0 Replace the test-case for issue 8798 with a reduced one (PR 8800 follow-up)
*Re: issue 8798 and PR 8800.*

Big thanks to @THausherr for providing the test-case.
2017-08-24 17:43:05 +02:00