With the viewer code now being split into various components/files, having an obsolete comment in `PDFViewer` that references thumbnails despite there being no other mentions of them in the entire file seems strange.
*Note:* This comment is simply a left-over from older versions of PDF.js, where the *entire* default viewer code was placed in just one file (and where we unconditionally created thumbnails, regardless whether they were visible or not).
There are PDF generators which create destinations with e.g. too large top values, which cause the wrong page to be scrolled into view because the offset becomes negative.
By ignoring negative offsets, we can prevent this issue, and get a similar behaviour as in Adobe Reader.
However, since we're also using `PDFViewer_scrollPageIntoView` in more cases than just when links (in the document/outline) are clicked, the patch adds a way to allow the caller to opt-out of this behaviour.
In e.g. the following situations, I think that we still want to be able to allow negative offsets: when restoring a position from the `ViewHistory`, when the `viewBookmark` button is used to obtain a link to the current position, or when maintaining the current position on zooming.
Rather than adding another parameter to `PDFViewer_scrollPageIntoView`, I've changed the signature to take an parameter object instead. To maintain backwards compatibility, I've added fallback code enclosed in a `GENERIC` preprocessor tag.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=874482.
With PR 7502 we no longer dispatch an event when the `val` is out of bounds, so to better communicate why nothing happens this patch logs an error in that case (similar to the logging of errors when trying to set an invalid scale).
The way that the default viewer is currently implemented, means that e.g. keyboard short-cuts could trigger the new error. Hence this patch also adds the necessary validation code, both to `app.js` and `pdf_link_service.js` to prevent unnecessary errors.
The AMO extension is no longer supported as PDF.js is integrated in
Firefox. It was already removed from the bot's preview messages around a
year ago and has not been used or updated since.
This patch improves the performance of issue 5808, but I'm not sure if it's enough to call it fixed. On average, this patch reduces the number of textLayer div's by a factor of 3, and it also reduces the time spend in `getTextContent` by a factor of ~2.
The PDF file is generated by `Scribus PDF`, which for reasons I cannot understand is placing redundant `Tf` commands before *every* showText command.
Note how the PDF file also contains lots of (basically) identical fonts, but with slightly different names, which causes unnecessary font-switching. This causes some unnecessary breaking of textLayer div's, but this issue cannot be easily worked around.
[api-minor] Add a parameter to `PDFPageProxy_getTextContent` that controls whether `PartialEvaluator_getTextContent` will attempt to combine same line text items
Note that I used a separate warning message for this case, instead of utilizing the same one as in the unsupported subtype case, to more clearly indicate that the PDF file itself is to blame rather than PDF.js.
Fixes 7446.
This patch attempts to cleanup a couple of things:
- Remove the `previousPageNumber` paramater. Prior to PR 7289, when the events were dispatched even when the active page didn't change, it made sense to be able to detect that in an event listener. However, now that's no longer the case, and furthermore other similar events (e.g. `scalechanging`/`scalechange`) don't include information about the previous state.
- Don't dispatch the events when the value passed to `set currentPageNumber` is out of bounds. Given that the active page doesn't change in this case, again similar to PR 7289, I don't think that the events should actually be dispatched in this case.
- Ensure that the value passed to `set currentPageNumber` is actually an integer, to avoid any issues (note how e.g. `set currentScale` has similar validation code).
Given that these changes could possibly affect the PDF.js `mochitest` integration tests in mozilla-central, in particular https://dxr.mozilla.org/mozilla-central/source/browser/extensions/pdfjs/test/browser_pdfjs_navigation.js, I ran the tests locally with this patch applied to ensure that they still pass.
Fonts that are not referenced by `Ref`s are very uncommon in practice, but it can unfortunately happen. In this case, we're currently not caching them in the usual way, i.e. by `Ref`, which leads to failures when a page is rendered after `cleanup` has run.
The simplest solution would have been to remove the `font.translated` workaround, but since this would have meant loading these kind of fonts over and over, the patch attempts to be a bit clever about this situation.
Note that if we instead loaded fonts per *page*, instead of per document, this issue wouldn't have existed.
Originally, I was just going to change this code to use `Ref_toString` in a couple more places. When I started reading the code, I figured that it wouldn't hurt to clean up a couple of comments. While doing this, I noticed that the logic for the (rare) `isDict(fontRef)` case could do with a few improvements.
There should be no functional changes with this patch, but given the added reference checks, we will now avoid bogus `Ref`s when resolving font aliases. In practice, as issue 7403 shows, the current code can break certain PDF files even if it's very rare.
Note that the only thing that this patch will change, is the `font.loadedName` in the case where a `fontRef` is a reference *and* the font doesn't have a descriptor. Previously for `fontRef = Ref(4, 0)` we'd get `font.loadedName = 'g_d0_f4_0'`, and with this patch `font.loadedName = g_d0_f4R`, which is actually one character shorted in most cases. (Given that `Ref_toString` contains an optimization for the `gen === 0` case, which is by far the most common `gen` value.)
In the already existing fallback case, where the `fontName` is used to when creating the `font.loadedName`, we allow any alphanumeric character. Hence I don't see how (as mentioned above) e.g. `font.loadedName = g_d0_f4R` would be an issue here.
From the discussion in issue 7445, it seems that there may be cases where an API consumer would want to get the text content as is, without combined text items.
After PR 7039, the PDF file in issue 7492 no longer renders at all, but note that text selection wasn't working correctly previously.
The problem with the PDF file in issue 7492 is that the `cMap`, in the `toUnicode` entry in the font, contains an invalid name:
```
/CMapName /-usr-share-fonts-truetype-Panton-Panton Family-Fontfabric - Panton.otf,000-UTF16 def
```
When we parse that line, things obviously break because there are spaces present in the wrong places.
To avoid that issue, the patch simply lets `parseCMap` continue when errors are encountered, to try and recover usable data. Note that by not aborting immediatly when an error is encountered, we are also able to fix the text selection.
Obviously, it could be argued that we should just immediatly reject a corrupt `cMap`. But given that they usually are correct, it seems that trying to recover as much data as possible from corrupt one can only be a good thing for both glyph mapping and text selection.
Fixes 7492.
In the PDF file in the issue, some of the glyphs end up being mapped to the Lepcha Unicode block; see https://en.wikipedia.org/wiki/Lepcha_(Unicode_block).
This didn't use to matter, but after HarfBuzz updates that improved support for Lepcha fonts, in particular https://bugzilla.mozilla.org/show_bug.cgi?id=1249861, some glyphs are now moved horizontally.
To avoid that, this patch adds the Lepcha block to the list of Unicode ranges that we skip when building the glyph mapping.
Fixes 7426.
After PR 7289, we'll now reset the current page view in cases where I don't think we should. To avoid this, this patch ensures that we'll not modify the position when the page number is out-of-bounds.
**STR:**
1. Open http://mozilla.github.io/pdf.js/web/viewer.html#page=1&zoom=auto,-98,696
2. Enter an invalid number, e.g. `1000`, in the `pageNumber` input.
**ER:**
The current position in the document shouldn't change, since the page number wasn't valid.
**AR:**
The document resets to the top of the page `1`.