The viewer doesn't currently support executing of any JavaScript, as found in some PDF documents, for security reasons. (Although there's a small hack to at least provide basic support for automatic printing on document load, without running scripts.)
However, in the event that the browser doesn't support printing we're not run *any* of this code. In particular that means that we're also not displaying the "Warning: JavaScript is not supported" message, which seems strange since JavaScript found in a PDF document can really contain *anything* (and not only printing instructions).
It thus seem reasonable, as far as I'm concerned, to always display the JavaScript-warning *even* if printing happens to be unsupported.
Don't accidentally accept invalid glyphNames which *appear* to follow the Cdd{d}/cdd{d} format in `PartialEvaluator._buildSimpleFontToUnicode` (issue 11697)
The /Differences array of the problematic font contains a `/c.1` entry, which is consequently detected as a *possible* Cdd{d}/cdd{d} glyphName by the existing heuristics.
Because of how the base 10 conversion is implemented, which is necessary for the base 16 special case, the parsed charCode becomes `0.1` thus causing `String.fromCodePoint` to throw since that obviously isn't a valid code point.
To fix the referenced issue, and to hopefully prevent similar ones in the future, the patch adds *additional* validation of the charCode found by the heuristics.
*This is part of a series of patches that will try to split PR 11566 into smaller chunks, to make reviewing more feasible.*
Once all the code has been fixed, we'll be able to eventually enable the ESLint `no-shadow` rule; see https://eslint.org/docs/rules/no-shadow
Trying to enable the ESLint rule `no-shadow`, against the `master` branch, would result in a fair number of errors in the `Glyph` class in `src/core/fonts.js`.
Since the glyphs are exposed through the API, we can't very well change the `isSpace` property on `Glyph` instances. Thus the best approach seems, at least to me, to simply rename the `isSpace` helper function to `isWhiteSpace` which shouldn't cause any issues given that it's only used in the `src/core/` folder.
Rather than duplicating the lookup and caching in multiple files, it seems easier to simply move all of this functionality into `src/shared/util.js` instead.
This will also help avoid a bunch of ESLint errors once the `no-shadow` rule is eventually enabled.
While Firefox originally used transition effects for browser UI toolbar buttons, that was removed years ago in https://bugzilla.mozilla.org/show_bug.cgi?id=1393057
Since the PDF.js viewer toolbar transitions were likely based on the Firefox ones, it seems reasonable that these transition effects are removed from PDF.js as well. Besides removing a bunch of CSS, this also makes the toolbar feel ever so slightly more "snappy" without these delays on mouse interaction.
(In order to make it more feasible to modernize/improve the viewer UI, trying to clean-up/simplify existing rules iteratively seems like the most reasonble way to make any progress here w.r.t. being able to test/review things.)
When reftest analyzer shows magnified pixels, there is a seemingly random offset between the mouse position and the magnified position. The reason for this is that reftest analyzer assumes all images have 800 * 1000 pixels but actually the test images have varying sizes.
*This patch fixes something that's annoyed me every now and then over the years, when debugging/fixing corrupt PDF documents.*
For corrupt PDF documents where `Lexer.getHexString` encounters invalid characters, there's very rarely just a handful of them. In practice it's not uncommon for there to be many hundreds, or even many thousands, invalid hex characters found.
Not only is the resulting console warning spam utterly useless in these cases, there's often enough of it that performance may even suffer; hence this patch which limits the amount of messages that any one `Lexer.getHexString` invocation may print.
The PDF document in question is *corrupt*, since it contains an XObject with a truncated dictionary and where the stream contents start without a "stream" operator.
For reasons that I don't even pretend to understand, the `textLayerFactory` property is determined for *every single* page in the PDF document.
Given that the `TextLayerMode` should be consistent for *all* pages in a document, we obviously could/should define `textLayerFactory` just once instead.
There's a couple of issues with this functionality:
- The respective `PromiseCapability` instances are not being reset, in `BaseViewer._resetView`, when the document is closed which is inconsistent with all other state.
- While the default viewer depends on these promises, and they thus ought to be considered part of e.g. the `PDFViewer` API-surface, they're not really defined in a particularily user-visible way (being that they're attached to the `BaseViewer` instance *inline* in `BaseViewer.setDocument`).
- There's some internal `BaseViewer` state, e.g. `BaseViewer._pageViewsReady`, which is tracked manually and could instead be tracked indirectly via the relevant `PromiseCapability`, thus reducing the need to track state *twice* since that's always best to avoid.
*Please note:* In the existing implementation, these promises are not defined *until* the `BaseViewer.setDocument` method has been called.
While it would've been simple to lift that restriction in this patch, I'm purposely choosing *not* to do so since this ensures that any Promise handlers added inside of `BaseViewer.setDocument` are always invoked *before* any external ones (and keeping that behaviour seems generally reasonable).
The only reason for the `return undefined;` lines was to appease the ESLint `consistent-return` rule, but that's not actually necessary if you make use of the fact that the method is `async` and that we can thus await the Promise rather than returning it.
Note that `Dict.set` will only be called with values returned through `Parser.getObj`, and thus indirectly via `Lexer.getObj`. Since neither of those methods will ever return `undefined`, we can simply assert that that's the case when inserting data into the `Dict` and thus get rid of `in` checks when doing the data lookups.
In this case, since `Dict.set` is fairly hot, the patch utilizes an *inline check* and when necessary a direct call to `unreachable` to not affect performance of `gulp server/test` too much (rather than always just calling `assert`).
For very large and complex PDF files this will help performance *slightly*, since `Dict.{get, getAsync, has}` is called *a lot* during parsing in the worker.
This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file:
```
[
{ "id": "issue2618",
"file": "../web/pdfs/issue2618.pdf",
"md5": "",
"rounds": 250,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall | 250 | 2838 | 2820 | -18 | -0.65 | faster
Firefox | Page Request | 250 | 1 | 2 | 0 | 11.92 | slower
Firefox | Rendering | 250 | 2837 | 2818 | -19 | -0.65 | faster
```
This patch deprecates the existing `getOpenActionDestination` API method, in favor of a better and more general `getOpenAction` method instead. (For now JavaScript actions, related to printing, are still handled as before.)
By clearly separating "regular" Print actions from the JavaScript handling, it's thus possible to get rid of the somewhat annoying and strictly incorrect warning when the viewer loads.
It occured to me that similar to the `getGlobalEventBus` function, it's probably a good idea to *also* notify users of the fact that `eventBusDispatchToDOM` is now deprecated.
Rather than depending of the re-dispatching of internal events to the DOM, the default viewer can instead be used in e.g. the following way:
```javascript
document.addEventListener("webviewerloaded", function() {
PDFViewerApplication.initializedPromise.then(function() {
// The viewer has now been initialized, and its properties can be accessed.
PDFViewerApplication.eventBus.on("pagerendered", function(event) {
console.log("Has rendered page number: " + event.pageNumber);
});
});
});
```
I overlooked this in PR 11631; sorry about that!
Also, ensure that `EventBus` instances *always* track "external" events using a boolean regardless of the actual option value.
This changes the dropdown icon from being set using the `background` CSS property, to being set with `::after` which is *similar* to all the other toolbar button icons (which use `::before`).
Also tweaks the dropdown `background-color` on `:hover` slightly, since the other changes made it too light.
Fixes#11477
The PDF draws many space characters but the embedded fonts don't have a glyph named `space`, so `.notdef` should be drawn instead. PDF.js assumed that Type1 fonts define `.notdef` as the first glyph (index 0). However, now the fonts have the glyph `A` at index 0 and `.notdef` is the last one, so `A` appears where spaces are expected.
Because the rest of the font machinery in `core/fonts.js` assumes `.notdef` is at index zero, it's easiest to modify `core/type1_parser.js` so that it "repairs" fonts and makes sure `.notdef` is at index 0.
The PDF document in question is *corrupt*, since it contains multiple instances of incorrect operators.
We obviously don't want to slow down parsing of *all* documents (since most are valid), just to accommodate a particular bad PDF generator, hence the reason for the inline check before calling the `ensureStateFont` method.
It's no longer necessary to special-case this getter in the `GenericFontLoader` case, since the GENERIC build hasn't been using `mozPrintCallback` for years now (furthermore Firefox 63 is really old as well).