Commit Graph

415 Commits

Author SHA1 Message Date
Jani Pehkonen
05c527f035 Fix glyph 0 in CIDFontType2 that has a CIDToGIDMap stream 2019-05-07 18:44:37 +03:00
Jonas Jenwald
5335285cda Attempt to handle corrupt PDF documents that contains path operators inside of text object (issue 10542)
First of all, while this simple approach appears to work OK in practice I'm not sure if it's the best way of addressing the problem (assuming that you even want to).
Second of all, while the solution implemented here only requires tracking/checking one new boolean in order for this to work, I'm nonetheless not entirely happy about this since it will add additional overhead (albeit *very* small) to the parsing of path operators in PDF documents just for a handful of *corrupt* ones.
2019-04-30 23:35:33 +02:00
Tim van der Meij
ae2a4dc3dd
Implement free text annotations 2019-04-13 18:45:22 +02:00
Tim van der Meij
b4c3b94592
Merge pull request #6606 from Rob--W/pattern-scaling
Improve performance and correctness of Tiling Patterns
2019-03-29 00:01:38 +01:00
Tim van der Meij
f9c58115fc
Merge pull request #10683 from janpe2/type0-noncid-cmap
Use CMap in Type0 fonts when CFF is not a CID font
2019-03-28 00:07:08 +01:00
Rob Wu
d3dc8f16b5 TilingPattern: Reverse transform after painting
This transform resulted in an incorrectly positioned object when the
bounding box's upper-left corner did not start at (0,0), because
the translation was not reverted. This patch adds the missing transform.

The test file (tiling-pattern-box.pdf) is based on the PDF from #2825.
All but the first cube (including the PDF data) have been removed.
To trigger the bug that is fixed by this commit, I changed the BBox of
the first pattern from "[ 0 0 596 842]" to "[90 0 596 842]". Without
this patch, the dashed vertical line that intersects the corners at A
and E would disappear.
2019-03-27 17:50:35 +01:00
Rob Wu
a72a8e921f Avoid extreme sizing / scaling in tiling pattern
The new test file (tiling-pattern-large-steps.pdf) was manually created,
to have the following characteristics:
- Large xstep and ystep (90000)
- Page width is 4000 (which is larger than MAX_PATTERN_SIZE)
- Visually, the page consists of a red rectangle with a black border,
  surrounded by a 50 unit white padding.
- Before patch: blurry; After patch: sharp

Fixes #6496
Fixes #5698
Fixes #1434
Fixes #2825
2019-03-27 17:44:04 +01:00
Jonas Jenwald
9077abc263 Take the FirstChar/LastChar properties into account when computing the hash in PartialEvaluator.preEvaluateFont (issue 10665)
Without this some fonts may incorrectly end up with matching `hash`es, thus breaking rendering since we'll not actually try to load/parse some of the fonts.
2019-03-27 16:27:10 +01:00
Jani Pehkonen
49c6233fbc Use CMap in Type0 fonts when CFF is not a CID font 2019-03-26 19:38:44 +02:00
Jonas Jenwald
88f9e633dd Try to improve text-selection for Type3 fonts that utilize a non-default /FontMatrix (bug 1513120)
For Type3 fonts text-selection is often not that great, and there's a couple of heuristics used to try and improve things. This patch simple extends those heuristics a bit, and fixes a pre-existing "naive" array comparison, but this all feels a bit brittle to say the least.

The existing Type3 test-coverage isn't that great in general, and in particular Type3 `text` tests are few and far between, hence why this patch adds *two* different new `text` tests.
2019-03-12 10:32:08 +01:00
Jonas Jenwald
fb774a65b0 Avoid truncating/breaking some Type3 glyphs in compileType3Glyph (bug 1245391, issue 10568)
*Hopefully this patch makes sense, since I cannot claim to fully understand this function.*

With the changes made in PR 3354 *some* Type3 glyph outlines are no longer rendering correctly, since the final paths were being accidentally ignored.
The fact that Type3 fonts are not very common in PDF documents, and that most Type3 glyphs are unaffected by this regression, probably explains why this has gone unnoticed since 2013.
2019-02-21 23:29:43 +01:00
Tsukasa OI
96ba6afd47 Fix copying on supplementary plane characters
pdf.js had a problem when copying characters on supplementary planes
(0xPPXXXX where PP is nonzero).  This is because certain methods of
PartialEvaluator use classic String.fromCharCode instead of ES6's
String.fromCodePoint.

Despite the fact that readToUnicode method *tried* to parse out-of-UCS2
code points by parsing UTF-16BE, it was inadequate because
String.fromCharCode only supports UCS-2 range of Unicode.
2019-02-10 18:14:53 +09:00
Jonas Jenwald
6f94a05a29 Do the final text scaling correctly in flushTextContentItem (issue 8276)
It's necessary to take into account whether or not the text is vertical, to avoid either the textContent `width` or `height` becoming incorrect.
2019-01-29 15:24:04 +01:00
Jani Pehkonen
26121177ab Implement Decode entry in Indexed images 2019-01-22 22:51:04 +02:00
Jonas Jenwald
b531fc4106 Avoid truncating inline images, where the data and the "EI" marker is glued together (issue 10388) (#10436)
Thanks to the *excellent* debugging done by @janpe2, this was easy to fix!
2019-01-12 20:31:23 +01:00
Jonas Jenwald
d4a3858ed5 Handle more cases of corrupt PDF files with missing 'endobj' operators, where the "obj" string is immediately followed by the dictionary (PR 9288 follow-up) 2019-01-10 17:55:28 +01:00
Brendan Dahl
32eace043b Fix reading number of HTMX metrics.
The length of the HHEA table can be incorrect, so it is better to
read the number of metrics offset from beginning of table instead.
2019-01-04 15:13:13 -08:00
Jani Pehkonen
9e990f6f3e Repair CFF fonts if stem hints are in wrong order 2018-11-20 18:50:37 +02:00
Simon Leblanc
b5806735d8 Add support of Ink annotation 2018-10-03 00:28:49 +02:00
Tim van der Meij
66422eb83e
Merge pull request #9340 from brendandahl/private-use
Map all glyphs to the private use area and duplicate the first glyph.
2018-09-08 17:51:04 +02:00
Brendan Dahl
b76cf665ec Map all glyphs to the private use area and duplicate the first glyph.
There have been lots of problems with trying to map glyphs to their unicode
values. It's more reliable to just use the private use areas so the browser's
font renderer doesn't mess with the glyphs.

Using the private use area for all glyphs did highlight other issues that this
patch also had to fix:

  * small private use area - Previously, only the BMP private use area was used
    which can't map many glyphs. Now, the (much bigger) PUP 16 area can also be
    used.

  * glyph zero not shown - Browsers will not use the glyph from a font if it is
    glyph id = 0. This issue was less prevalent when we mapped to unicode values
    since the fallback font would be used. However, when using the private use
    area, the glyph would not be drawn at all. This is illustrated in one of the
    current test cases (issue #8234) where there's an "ä" glyph at position
    zero. The PDF looked like it rendered correctly, but it was actually not
    using the glyph from the font. To properly show the first glyph it is always
    duplicated and appended to the glyphs and the maps are adjusted.

  * supplementary characters - The private use area PUP 16 is 4 bytes, so
    String.fromCodePoint must be used where we previously used
    String.fromCharCode. This is actually an issue that should have been fixed
    regardless of this patch.

  * charset - Freetype fails to load fonts when the charset size doesn't match
    number of glyphs in the font. We now write out a fake charset with the
    correct length. This also brought up the issue that glyphs with seac/endchar
    should only ever write a standard charset, but we now write a custom one.
    To get around this the seac analysis is permanently enabled so those glyphs
    are instead always drawn as two glyphs.
2018-09-05 14:04:54 -07:00
Jonas Jenwald
e5a6d892b4
Revert "Attempt to combine separate beginText/endText sequences in getTextContent (issue 9984)" 2018-09-05 18:01:33 +02:00
Tim van der Meij
959ed3705b
Implement a permissions API 2018-09-02 21:23:09 +02:00
Jonas Jenwald
497b765ede Attempt to combine separate beginText/endText sequences in getTextContent (issue 9984)
Please note that while this *improves* issue 9984 slightly (and likely others too), it's not a complete solution.
The remaining issues are related to the, more general, problems with the existing heuristics related to attempting to combine separate text items.
2018-08-18 13:45:32 +02:00
Brendan Dahl
5f67a6a237 Always fallback to system font on font failure.
The font in the PDF is marked as a CIDFontType0, but the font file is
actually a true type font. To fully address this issue we should really
peek into the font file and try to determine what it is. However, this
is the first case of this issue, so I think this solution is acceptable for
now.
2018-08-03 16:49:22 -07:00
Brian
2a665ebad4 Removed Extraneous Matrix Check in CalRGB Conversion 2018-08-02 10:16:42 -07:00
Jonas Jenwald
690bcc8c8a Add a reduced, eq, test-case for issue 9915 2018-07-29 23:06:15 +02:00
Jonas Jenwald
2b25deb84c Prevent errors in sanitizeTTProgram, during parsing of CALL functions, when encountering invalid functions stack deltas (bug 1473809)
*I was feeling bored; so this is a very quick, and somewhat naive, attempt at fixing the bug.*

The breaking error, i.e. `Error during font loading: invalid array length`, was thrown when attempting to re-size the `stack` to a *negative* length when parsing the CALL functions.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1473809.
2018-07-10 09:45:55 +02:00
Jonas Jenwald
56e3648b65 Add basic validation of the 'trailer' dictionary candidates in XRef.indexObjects (issue 9418)
This patch avoids choosing a (possible) 'trailer' dictionary that `XRef.parse` and/or the `Catalog` constructor/methods will reject anyway.
Since `XRef.indexObjects` is already parsing the entire PDF file, the extra dictionary look-ups added here shouldn't matter much. Besides, this is a fallback code-path that only applies to corrupt PDF files anyway.
2018-06-20 13:41:22 +02:00
Jonas Jenwald
6bbcafcd26 Let Lexer.getNumber treat a single decimal point as zero (issue 9252)
This is consistent with the behaviour in Adobe Reader.
2018-06-20 13:41:21 +02:00
Jani Pehkonen
8ea505545a Use FDSelect and FDArray when converting CFF CID font to paths 2018-04-10 16:44:42 +03:00
Jonas Jenwald
a18c65ae9f Use the correct stream position when reading maxSizeOfInstructions from the maxp table (issue 9458)
Please refer to the `maxp` table specification, found at https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6maxp.html.

Fixes 9458.
2018-02-07 21:57:43 +01:00
Jonas Jenwald
d6c028b946 Add support for TrueType Collection fonts (issue 9262)
The specification can be found at https://www.microsoft.com/typography/otspec/otff.htm, under the "Font Collections" heading.

Fixes 9262.
2018-01-08 22:31:08 +01:00
Jonas Jenwald
c5700211d6 Adjust decodeACSuccessive in src/core/jpg.js to improve the rendering quality of (progressive) JPEG images
I've been looking into the remaining point in 8637 about blurry images, to see if we could perhaps improve the rendering quality slightly there. After quite a bit of debugging, it seems that the issue is limited to certain progressive JPEG images.

As mentioned previously, I've got no detailed knowledge of the JPEG format, but this patch does seem to improve things quite a bit for the images in question.
Squinting at https://searchfox.org/mozilla-central/rev/6c33dde6ca02b389c52e8db3d22494df8b916f33/media/libjpeg/jdphuff.c#492-639, it seems reasonable that we should take the sign of the data into account. Furthermore, looking at the specification in https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=118, the "F.2.4.3 Decoding the binary decision sequence for non-zero DC differences and AC coefficients" section even contains a description of this (even though I cannot claim to really understand the details).
2017-12-30 15:24:09 +01:00
Jonas Jenwald
d4cd44fd16 Add a fallback for non-embedded LucidaSans-Demi fonts (issue 9291)
The PDF file in the issue uses a number of *embedded* versions of Lucida fonts, but for some reason does *not* embed the LucidaSans-Demi font. According to https://en.wikipedia.org/wiki/Lucida#Usages that one should be bold, so we can at least improve rendering here (even though it won't look perfect).

Fixes 9291.
2017-12-24 17:36:58 +01:00
Jonas Jenwald
1dc54ddb40 Handle PDF files with missing 'endobj' operators, by searching for the "obj" string rather than "endobj" in XRef.indexObjects (issue 9105)
This patch refactors the searching for 'endobj', to try and find the next occurance of "obj" and then check if it was in fact an 'endobj' and continue searching otherwise.
This approach is used to avoid having to first find 'endobj', and then re-check the entire contents of the object and having to run (potentially expensive) regular expressions on arbitrary long strings.

Fixes 9105.
2017-12-18 13:17:45 +01:00
Brendan Dahl
9b51cea724 Fix loca table when offsets aren't in ascending order. 2017-12-15 11:20:28 -06:00
Jonas Jenwald
a5e3261b48
Merge pull request #9062 from mozilla/no_high
Move char codes from high surrogate pair range into private use.
2017-12-08 12:31:22 +01:00
Brendan Dahl
306999c325 Move char codes from high surrogate pair range into private use.
Fixes #2884
2017-12-07 10:35:50 -08:00
Jonas Jenwald
f3c50fe2f9
Merge pull request #9192 from Snuffleupagus/issue-8229
Build a fallback `ToUnicode` map for simple fonts (issue 8229)
2017-11-30 10:27:32 +01:00
Tim van der Meij
e320243870
Merge pull request #9206 from janpe2/svg-inv-images
Fix inverted 1-bit images in SVG backend
2017-11-28 22:46:43 +01:00
Jani Pehkonen
58b214eab3 Fix inverted 1-bit images in SVG backend 2017-11-28 21:24:27 +02:00
Jani Pehkonen
06d083b04b Fix pattern-filled text 2017-11-28 19:40:22 +02:00
Jonas Jenwald
61e19bee43 Build a fallback ToUnicode map for simple fonts (issue 8229)
In some fonts, the included `ToUnicode` data is incomplete causing text-selection to not work properly. For simple fonts that contain encoding data, we can manually build a `ToUnicode` map to attempt to improve things.

Please note that since we're currently using the `ToUnicode` data during glyph mapping, in an attempt to avoid rendering regressions, I purposely didn't want to amend to original `ToUnicode` data for this text-selection edge-case.
Instead, I opted for the current solution, which will (hopefully) give slightly better text-extraction results in PDF file with incomplete `ToUnicode` data.

According to the PDF specification, see [section 9.10.2](http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1873172):

> A conforming reader can use these methods, in the priority given, to map a character code to a Unicode value.
> ...

Reading that paragraph literally, it doesn't seem too unreasonable to use *different* methods for different charcodes.

Fixes 8229.
2017-11-26 14:45:15 +01:00
Jonas Jenwald
83e8398ff2 For non-embedded fonts, map softhyphen (0x00AD) to regular hyphen (0x002D) (issue 9084)
In the PDF file, the `ToUnicode` data first maps the hyphen correctly, and then *overwrites* it to point to the softhyphen instead. That one cannot be rendered in browsers, and an empty space thus appear instead.

Fixes 9084.
2017-10-31 13:26:04 +01:00
Jonas Jenwald
92fcfce685
Merge pull request #9082 from brendandahl/issue7562
Overwrite glyphs contour count if it's less than -1.
2017-10-30 20:44:01 +01:00
Brendan Dahl
17037b5e51 Overwrite glyphs contour count if it's less than -1.
The test pdf has a contour count of -70, but OTS doesn't
like values less than -1.

Fixes issue #7562.
2017-10-30 09:16:51 -07:00
Jonas Jenwald
d71a576b30 Merge pull request #9045 from brendandahl/sani-name
Sanitize name index in compile phase of CFF.
2017-10-24 11:48:03 +02:00
Brendan Dahl
6b12612a52 Sanitize name index in compile phase of CFF.
Fixes #8960
2017-10-23 17:13:49 -07:00
Brendan Dahl
fcc9943d04 Use charstring as plain text when lengthIV is -1.
Fixes #7769
2017-10-18 14:19:59 -07:00