Note that despite the new file having a different hash than the the current one, it does render *identically* and most importantly it uses *the same* JBIG2 functionality.
For reference, please see issue 3666 and PR 3738.
*This patch follows a similar idea as PR 5756.*
The patch is based on the nice debugging done by Brendan in the referenced issue 6782.
A better way to handle this, and similar issues, would probably be to completely ignore what the PDF file claims about font type/subtype, and just check the actual data. But until that kind of rewrite happens, this patch should help.
Fixes 6782.
Currently we're not applying Patterns for text, but only for graphics.
This patch is unfortunately not a complete solution, but rather a step on the way, since there are still some PDF files where the Patterns look more like a solid colour, rather than the intended gradient.
I've been unable to fix these issues completely, and I've not managed to determine if the remaining issues are caused either by the pattern code, the canvas code, or perhaps both.
However, given that even this simple patch improves the current situation quite a bit, I figured that it couldn't hurt to submit it as-is.
- Fixes 5804.
- Fixes 6130.
- Improves 3988 a lot, since the text is now visible. However, it looks like the text is *one* solid colour, instead of the correct gradient.
- Improves 5432, since the text is no longer gray. (This file also suffers from the same problem as the previous one.)
Most code for Popup annotations is already present for Text annotations.
This patch extracts the popup creation logic from the Text annotation
code so it can be reused for Popup annotations.
Not only does this add support for Popup annotations, the Text
annotation code is also considerably easier. If a `Popup` entry is
available for a Text annotation, it will not be more than an image. The
popup will be handled by the Popup annotation. However, it is also
possible for Text annotations to not have a separate Popup annotation,
in which case the Text annotation handles the popup creation itself.
In `Font_checkAndRepair` we can decide that a font isn't TrueType, and instead parse it as CFF. In that case it's quite possible that the `fontMatrix` will be changed, and without calling `adjustWidths` we're failing to update the glyph widths correctly.
Fixes 5027.
Fixes 5084.
Fixes 6556.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1204903.
After PR 6590, `font.spaceWidth` is now called in more cases than before (in `PartialEvaluator_getTextContent`), which exposed an underlying issue with `IdentityToUnicodeMap_charCodeOf` throwing an error.
This breaks text-selection in some PDF files found in the wild, hence this patch replaces the `error` with an actual function instead (modelled after `IdentityCMap_charCodeOf`).
*This is a regression from PR 3424.*
The PDF file in the referenced issue is using `Type3` fonts. In one of those, the `/CharProcs` dictionary contains an entry with the name `/#`. Before the changes to `Lexer_getName` in PR 3424, we were allowing certain invalid `Name` patterns containing the NUMBER SIGN (#).
It's unfortunate that this has been broken for close to two and a half years before the bug surfaced, but it should at least indicate that this is not a widespread issue.
Fixes 6692.
This patch goes a bit further than issue 6612 requires, and replaces all kinds of whitespace with standard spaces.
When testing this locally, it actually seemed to slightly improve two existing test-cases (`tracemonkey-text` and `taro-text`).
Fixes 6612.
When I submitted PR 3576, I included a linked test-case. The reason was that I didn't know enough about the PDF format, in order to successfully create a reduced test-case.
Considering that the link points to a Dropbox, there's no guarantee that the PDF file will remain available, hence it seems worthwhile to replace the test-case.
*Note:* Since this is a `load` test, `makeref` won't be necessary.
The file (`lshort.pdf`) has changed a couple of times since the test was added, hence there's no guarantee that the current version accurately reflects the issues the test was added to check.
In this patch, I'm updating the link location to point to the *intended* file version (hosted on the "Internet Archive").
According to the PDF spec 5.3.2, a positive value means in horizontal,
that the next glyph is further to the left (so narrower), and in
vertical that it is further down (so wider).
This change fixes the way PDF.js has interpreted the value.
For (1, 0) cmaps, we have two different codepaths depending on whether the font has/hasn't got an encoding. But with (3, 1) cmaps we don't have a good fallback when the encoding is missing, hence this patch changes `readCmapTable` to only choose a (3, 1) cmap table if the font is non-symbolic *and* an encoding exists. Without this, we'll not be able to successfully create a working glyph map for some TrueType fonts with (3, 1) cmap tables.
Fixes 6410.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1200096.
The problematic font has a `format 2` cmap, which we've never supported properly. Prior to PR 2606, we were able to fallback to a working state, despite not having proper support for that cmap format.
Obviously the best/correct solution would be to implement actual support for more cmap formats[1]. However, I'm hoping that a simple patch will be OK for now, given that:
- `format 2` cmaps seem to be quite rare in practice, since this has been broken for 2.5 years before anyone noticed.
- Having a simple patch will make potential uplifts a lot easier.
[1] See the specification at https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6cmap.html
Re: PR 4731.
Since the URL points to the Internet Archive, I think that adding a linked test-case should be OK. (Also, it's difficult to create reduced, or even `unit`, tests that accurately captures the brokenness of real-world PDF files.)
*Please note:* Since this is a `load` test, `makeref`ing won't be needed.
This patch improves the detection of `xref` in files where it is followed by an arbitrary whitespace character (not just a line-breaking char).
It also adds a check for missing whitespace, e.g. `1 0 obj<<`, to speed up `readToken` for the PDF file in the referenced issue.
Finally, the patch also replaces a bunch of magic numbers with suitably named constants.
Fixes 5752.
Also improves 6243, but there are still issues.
The problem with the PDF files in the issue, besides the obviously broken XRef tables which we're able to recover from, is that many/most of the streams have Dictionaries where the `Length` entry is set to `0`. This causes us to return `NullStream`, instead of the appropriate one in `Parser_makeFilter`.
Fixes 6360.
Short story: somebody got lost in two different indices. pi is an index in the stream and is explained on page 198 of the 32000-spec (however 1-based there), and ps is an index to something in PDF.js. I used the code from flag 0 (which works) to understand which is which. It is also important to understand that for flags 1,2 and 3, the stream is always assigned to the same coordinates and colors. What changes is which "old" coordinates and colors are assigned to what is "missing" in the stream. This is why for these flags, the code is identical except for the assignments in the first "row".