*Follow-up to PR 6299.*
This patch reduces unnecessary code duplication for the `canvas` to `image` conversion. It also does a bit of re-ordering (and adds new lines) in `_getPageDrawContext`, since that function currently is a bit hard to read.
Since some browsers render `null` characters, and others don't, this patch adds a way to remove them to prevent display issues in the viewer UI.
Given that documents may contain very long outlines, I've added a utility function to avoid creating a lot of unnecessary `RegExp` objects.
To avoid any future issues, this utility function is used for both the outline and the attachments.
Fixes 6416.
This patch adjusts `get fingerprint` to also check that the `/ID` entry contains (non-empty) strings, to prevent more possible failures when loading corrupt PDF files (follow-up to PR 5602).
Note that I've not actually encountered such a PDF file in the wild. However given that `stringToBytes` will assert that the input is a string, and that we'll thus fail to load a document unless `get fingerprint` succeeds, making this more robust seems like a good idea to me.
For (1, 0) cmaps, we have two different codepaths depending on whether the font has/hasn't got an encoding. But with (3, 1) cmaps we don't have a good fallback when the encoding is missing, hence this patch changes `readCmapTable` to only choose a (3, 1) cmap table if the font is non-symbolic *and* an encoding exists. Without this, we'll not be able to successfully create a working glyph map for some TrueType fonts with (3, 1) cmap tables.
Fixes 6410.
Prior to PR 6242, the width of all outline items was set such that their right (or left, in RTL locales) edges lined up vertically. In my opinion that looked more consistent, therefore this patch adjusts the CSS to make sure that this will be the case again.
The patch also makes the `border-radius` values of outline items a bit more consistent.
This issue is actually, in a sense, "caused" by the fact that the API/viewer supports partial loading/rendering. Previously when the *entire* document was always fetched before rendering begun, we knew all page sizes in advance and this issue didn't exist.
Now we use the size of *one* page in order to set the initial size of every page, until we've fetched the pages and thus know their correct sizes.
This means that during loading the size of the pages can change, which may cause the initial position to become scrolled out of view.
The most naive solution to this problem would perhaps be to delay setting the initial position on load for all documents, until all pages are fetched. However I think that would be a *really* bad idea, since doing so would make the initial rendering slower and make it feel sluggish for most documents.
Since there is generally no way of knowing if a document has different sized pages prior to loading it, we can only check once the pages are available.
Hence this patch, which treats documents with different sized pages as a special case, by re-applying the initial position when all pages have become available.
This code was added in PR 1214, but was made obsolete by PRs 1488/1493. Prior to the latter ones, `Dict_get` retured the raw objects. However, afterwards (and currently) `Dict_get` now resolves indirect objects, which makes `Parser_fetchIfRef` redundant.
*Potential risks with this patch:*
This patch passes all tests locally, but there's a *small* possibility that it could break some weird PDF files.
In the current code, wrapping `Dict_get` inside `Parser_fetchIfRef` will potentially mean two back-to-back call of `XRef_fetch`, if a reference points directly to another reference. I'm not sure if this can actually happen in practice, and I'd think that if that were the case we'd already have run into it elsewhere in the code-base, given that `Parser` is the only place where we try to "double" resolve references.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1200096.
The problematic font has a `format 2` cmap, which we've never supported properly. Prior to PR 2606, we were able to fallback to a working state, despite not having proper support for that cmap format.
Obviously the best/correct solution would be to implement actual support for more cmap formats[1]. However, I'm hoping that a simple patch will be OK for now, given that:
- `format 2` cmaps seem to be quite rare in practice, since this has been broken for 2.5 years before anyone noticed.
- Having a simple patch will make potential uplifts a lot easier.
[1] See the specification at https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6cmap.html
Re: PR 4731.
Since the URL points to the Internet Archive, I think that adding a linked test-case should be OK. (Also, it's difficult to create reduced, or even `unit`, tests that accurately captures the brokenness of real-world PDF files.)
*Please note:* Since this is a `load` test, `makeref`ing won't be needed.
Our current test coverage for vertical text is somewhat lacking, as evident from e.g. issue 6387. That regression could easily have been avoided if the `taro` test-case would have been an `eq` test, as well as an `text` test.