Some bad PDF generators, in particular "Scribus PDF", duplicates resources *a lot* at various levels of the PDF files. This can lead to `PartialEvaluator_hasBlendModes` taking an unreasonable amount of time to complete.
The reason is that the current code is using `Dict_getAll`, which recursively dereferences *all* indirect objects, which can be really slow. This patch instead uses `Dict_getKeys`, and then manually looks up only the necessary indirect objects.
I've added the PDF file as a `load` test. The most important thing here is probably to ensure that the file remains available in the repo, and the comment should help reduced the chance of regressions. (Note that locally, the `load` test times out without this patch, but we cannot really assume that that always happens.)
Fixes 6961.
As part of the link cleanup in issue 6854, obtaining this file through the Internet Archive didn't work.
However, given that the file was added in order to test an issue with `CropBox/MediaBox`, a reduced test-case should do just fine instead.
Please refer to issue 1155, and PR 1212.
It seems to be fairly common for OCR software to include incomplete TrueType fonts, notable missing the "glyf" table, in PDF files. Since we currently reject such fonts, the result is that text-selection/copying is broken.
This patch contains a suggested approach to try and use these kind of broken fonts, by using existing code in `sanitizeGlyphLocations` to replace a missing "glyf" table with dummy data.
Fixes 4684.
Fixes 6007.
Fixes 6829.
The test case was changed in 1faca19021 because the original file was not available anymore. However, its hash was also changed, meaning that we do not test the intended version anymore.
This patch makes sure that we test the intented version by reverting to
the original hash and using a link, also pointing to the Internet
Archive, with the original file.
This test was disabled in PR 4732, because the file was no longer available. The motivation being that there were two other files which should be good replacements. However, one of those has since been replaced with a reduced test-case (which doesn't exercise the same code-path), and in the other one the error does not appear to be entirely identical.
Hence it seems reasonable to re-add the 'aboutstacks.pdf' test, since it was possible to find it on the Internet Archive (by searching using a different URL, compared to the current one).
Note that despite the new file having a different hash than the the current one, it does render *identically* and most importantly it uses *the same* JBIG2 functionality.
For reference, please see issue 3666 and PR 3738.
*This patch follows a similar idea as PR 5756.*
The patch is based on the nice debugging done by Brendan in the referenced issue 6782.
A better way to handle this, and similar issues, would probably be to completely ignore what the PDF file claims about font type/subtype, and just check the actual data. But until that kind of rewrite happens, this patch should help.
Fixes 6782.
When generating new references locally on Windows, after PR 6724, I get the following output:
```
WARNING: Unable to open file for reading "Error: ENOENT, open 'c:\Users\Jonas\Git\pdfjs\test\pdfs\issue_3694_reduced.pdf'".
Unable to verify the checksum for the files that are used for testing.
Please re-download the files, or adjust the MD5 checksum in the manifest for the files listed above.
```
Compared to the name of the file (`issue3694_reduced.pdf`), you see that the manifest entry has a superfluous underscore in the "file" entry.
Currently we're not applying Patterns for text, but only for graphics.
This patch is unfortunately not a complete solution, but rather a step on the way, since there are still some PDF files where the Patterns look more like a solid colour, rather than the intended gradient.
I've been unable to fix these issues completely, and I've not managed to determine if the remaining issues are caused either by the pattern code, the canvas code, or perhaps both.
However, given that even this simple patch improves the current situation quite a bit, I figured that it couldn't hurt to submit it as-is.
- Fixes 5804.
- Fixes 6130.
- Improves 3988 a lot, since the text is now visible. However, it looks like the text is *one* solid colour, instead of the correct gradient.
- Improves 5432, since the text is no longer gray. (This file also suffers from the same problem as the previous one.)
Most code for Popup annotations is already present for Text annotations.
This patch extracts the popup creation logic from the Text annotation
code so it can be reused for Popup annotations.
Not only does this add support for Popup annotations, the Text
annotation code is also considerably easier. If a `Popup` entry is
available for a Text annotation, it will not be more than an image. The
popup will be handled by the Popup annotation. However, it is also
possible for Text annotations to not have a separate Popup annotation,
in which case the Text annotation handles the popup creation itself.
This PDF file (see issue 4914) originally regressed in PR 4318, and was subsequently fixed in PR 4915.
I added the PDF file as a (linked) test-case in PR 6481, in an effort to prevent regressions. Since we at that time didn't have the necessary framework in place, in order to correctly test annotations, this almost regressed *again* in PR https://github.com/mozilla/pdf.js/pull/6672#issuecomment-158689392.
In that PDF file, some of the annotations are both printable and hidden, and should definitely *not* be visible on normal display. Hence this patch, which adds the `annotations` flag to the manifest in order to ensure that those annotations won't be rendered when `intent === 'display'`.
In `Font_checkAndRepair` we can decide that a font isn't TrueType, and instead parse it as CFF. In that case it's quite possible that the `fontMatrix` will be changed, and without calling `adjustWidths` we're failing to update the glyph widths correctly.
Fixes 5027.
Fixes 5084.
Fixes 6556.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1204903.
After PR 6590, `font.spaceWidth` is now called in more cases than before (in `PartialEvaluator_getTextContent`), which exposed an underlying issue with `IdentityToUnicodeMap_charCodeOf` throwing an error.
This breaks text-selection in some PDF files found in the wild, hence this patch replaces the `error` with an actual function instead (modelled after `IdentityCMap_charCodeOf`).
*This is a regression from PR 3424.*
The PDF file in the referenced issue is using `Type3` fonts. In one of those, the `/CharProcs` dictionary contains an entry with the name `/#`. Before the changes to `Lexer_getName` in PR 3424, we were allowing certain invalid `Name` patterns containing the NUMBER SIGN (#).
It's unfortunate that this has been broken for close to two and a half years before the bug surfaced, but it should at least indicate that this is not a widespread issue.
Fixes 6692.
This patch goes a bit further than issue 6612 requires, and replaces all kinds of whitespace with standard spaces.
When testing this locally, it actually seemed to slightly improve two existing test-cases (`tracemonkey-text` and `taro-text`).
Fixes 6612.
When I submitted PR 3576, I included a linked test-case. The reason was that I didn't know enough about the PDF format, in order to successfully create a reduced test-case.
Considering that the link points to a Dropbox, there's no guarantee that the PDF file will remain available, hence it seems worthwhile to replace the test-case.
*Note:* Since this is a `load` test, `makeref` won't be necessary.
The file (`lshort.pdf`) has changed a couple of times since the test was added, hence there's no guarantee that the current version accurately reflects the issues the test was added to check.
In this patch, I'm updating the link location to point to the *intended* file version (hosted on the "Internet Archive").
According to the PDF spec 5.3.2, a positive value means in horizontal,
that the next glyph is further to the left (so narrower), and in
vertical that it is further down (so wider).
This change fixes the way PDF.js has interpreted the value.
For (1, 0) cmaps, we have two different codepaths depending on whether the font has/hasn't got an encoding. But with (3, 1) cmaps we don't have a good fallback when the encoding is missing, hence this patch changes `readCmapTable` to only choose a (3, 1) cmap table if the font is non-symbolic *and* an encoding exists. Without this, we'll not be able to successfully create a working glyph map for some TrueType fonts with (3, 1) cmap tables.
Fixes 6410.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1200096.
The problematic font has a `format 2` cmap, which we've never supported properly. Prior to PR 2606, we were able to fallback to a working state, despite not having proper support for that cmap format.
Obviously the best/correct solution would be to implement actual support for more cmap formats[1]. However, I'm hoping that a simple patch will be OK for now, given that:
- `format 2` cmaps seem to be quite rare in practice, since this has been broken for 2.5 years before anyone noticed.
- Having a simple patch will make potential uplifts a lot easier.
[1] See the specification at https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6cmap.html
Re: PR 4731.
Since the URL points to the Internet Archive, I think that adding a linked test-case should be OK. (Also, it's difficult to create reduced, or even `unit`, tests that accurately captures the brokenness of real-world PDF files.)
*Please note:* Since this is a `load` test, `makeref`ing won't be needed.