[api-minor] Add a parameter to `PDFPageProxy_getTextContent` that controls whether `PartialEvaluator_getTextContent` will attempt to combine same line text items
Note that I used a separate warning message for this case, instead of utilizing the same one as in the unsupported subtype case, to more clearly indicate that the PDF file itself is to blame rather than PDF.js.
Fixes 7446.
This patch attempts to cleanup a couple of things:
- Remove the `previousPageNumber` paramater. Prior to PR 7289, when the events were dispatched even when the active page didn't change, it made sense to be able to detect that in an event listener. However, now that's no longer the case, and furthermore other similar events (e.g. `scalechanging`/`scalechange`) don't include information about the previous state.
- Don't dispatch the events when the value passed to `set currentPageNumber` is out of bounds. Given that the active page doesn't change in this case, again similar to PR 7289, I don't think that the events should actually be dispatched in this case.
- Ensure that the value passed to `set currentPageNumber` is actually an integer, to avoid any issues (note how e.g. `set currentScale` has similar validation code).
Given that these changes could possibly affect the PDF.js `mochitest` integration tests in mozilla-central, in particular https://dxr.mozilla.org/mozilla-central/source/browser/extensions/pdfjs/test/browser_pdfjs_navigation.js, I ran the tests locally with this patch applied to ensure that they still pass.
Fonts that are not referenced by `Ref`s are very uncommon in practice, but it can unfortunately happen. In this case, we're currently not caching them in the usual way, i.e. by `Ref`, which leads to failures when a page is rendered after `cleanup` has run.
The simplest solution would have been to remove the `font.translated` workaround, but since this would have meant loading these kind of fonts over and over, the patch attempts to be a bit clever about this situation.
Note that if we instead loaded fonts per *page*, instead of per document, this issue wouldn't have existed.
Originally, I was just going to change this code to use `Ref_toString` in a couple more places. When I started reading the code, I figured that it wouldn't hurt to clean up a couple of comments. While doing this, I noticed that the logic for the (rare) `isDict(fontRef)` case could do with a few improvements.
There should be no functional changes with this patch, but given the added reference checks, we will now avoid bogus `Ref`s when resolving font aliases. In practice, as issue 7403 shows, the current code can break certain PDF files even if it's very rare.
Note that the only thing that this patch will change, is the `font.loadedName` in the case where a `fontRef` is a reference *and* the font doesn't have a descriptor. Previously for `fontRef = Ref(4, 0)` we'd get `font.loadedName = 'g_d0_f4_0'`, and with this patch `font.loadedName = g_d0_f4R`, which is actually one character shorted in most cases. (Given that `Ref_toString` contains an optimization for the `gen === 0` case, which is by far the most common `gen` value.)
In the already existing fallback case, where the `fontName` is used to when creating the `font.loadedName`, we allow any alphanumeric character. Hence I don't see how (as mentioned above) e.g. `font.loadedName = g_d0_f4R` would be an issue here.
From the discussion in issue 7445, it seems that there may be cases where an API consumer would want to get the text content as is, without combined text items.
After PR 7039, the PDF file in issue 7492 no longer renders at all, but note that text selection wasn't working correctly previously.
The problem with the PDF file in issue 7492 is that the `cMap`, in the `toUnicode` entry in the font, contains an invalid name:
```
/CMapName /-usr-share-fonts-truetype-Panton-Panton Family-Fontfabric - Panton.otf,000-UTF16 def
```
When we parse that line, things obviously break because there are spaces present in the wrong places.
To avoid that issue, the patch simply lets `parseCMap` continue when errors are encountered, to try and recover usable data. Note that by not aborting immediatly when an error is encountered, we are also able to fix the text selection.
Obviously, it could be argued that we should just immediatly reject a corrupt `cMap`. But given that they usually are correct, it seems that trying to recover as much data as possible from corrupt one can only be a good thing for both glyph mapping and text selection.
Fixes 7492.
In the PDF file in the issue, some of the glyphs end up being mapped to the Lepcha Unicode block; see https://en.wikipedia.org/wiki/Lepcha_(Unicode_block).
This didn't use to matter, but after HarfBuzz updates that improved support for Lepcha fonts, in particular https://bugzilla.mozilla.org/show_bug.cgi?id=1249861, some glyphs are now moved horizontally.
To avoid that, this patch adds the Lepcha block to the list of Unicode ranges that we skip when building the glyph mapping.
Fixes 7426.
After PR 7289, we'll now reset the current page view in cases where I don't think we should. To avoid this, this patch ensures that we'll not modify the position when the page number is out-of-bounds.
**STR:**
1. Open http://mozilla.github.io/pdf.js/web/viewer.html#page=1&zoom=auto,-98,696
2. Enter an invalid number, e.g. `1000`, in the `pageNumber` input.
**ER:**
The current position in the document shouldn't change, since the page number wasn't valid.
**AR:**
The document resets to the top of the page `1`.
After PR 7441, where `recoverGlyphName` is used a lot more than before, many PDF files will generate a lot of warnings the console. For normal usage, compared to debugging/development, this is probably more annoying than helpful.
Fallback to attempt to recover standard glyph names when amending the `charCodeToGlyphId` with entries from the `differences` array in `type1FontGlyphMapping` (issue 7439)
The method signature was improved in PR 6546, which was included in the `1.2.109` release from last November.
Hence I think that we should now be able to remove the fallback code for the old method signature. Note that this patch now throws an `Error` in `GENERIC` builds, to clearly indicate that the `open` callsite must be modified.
In PR 7273, we simplified the `MOZCENTRAL` specific check for fullscreen support. Unfortunately I've just, by coincidence, found out about https://bugzilla.mozilla.org/show_bug.cgi?id=1268749, which disabled the unprefixed fullscreen support in release versions of Firefox. If we do nothing, this will lead to Presentation Mode being unavailable in future Firefox releases.
This patch should fix things for now, and I'm afraid that we'll need to uplift it to Firefox 49 as well.
With the changes in PR 7289, we no longer dispatch a 'pagechanging' event on load. Since most PDF documents open on the first page, this means that the `previous` and `firstPage` buttons are no longer correctly disabled.
To avoid this, this patch moves the code that updates various UI toolbar state into one method, which is then called on document initialization and from the various existing event handling functions.