- the parser is base on a class extending XMLParserBase
- it handle xml namespaces:
* each namespace is assocated with a builder
* builder builds nodes belonging to the namespace
* when a node is inserted in the parent namespace compatibility is checked (if required)
- to avoid name collision between xml names and object properties, use Symbol.
Given that IE 11/Edge is now unsupported in PDF.js, and that Microsoft Edge is now a Chromium-browser, we can avoid (some) unnecessary bloat in the built CSS files.
Given that `FontFaceObject` is not exposed in the public API, but only accessed internally, there's no need to assume that a `FontFaceObject`-instance is ever initialized without `onUnsupportedFeature` being provided. This is also consistent with the `BaseFontLoader` implementation.
With PR 10539, we'll now always attempt to fallback to the PDF.js built-in font renderer for fonts that fail to load (i.e. are rejected by the sanitizer). Generally speaking, these errors are the result of insufficient validation in the PDF.js font code, however in almost all cases we've seen thus far our built-in font renderer manages just fine.
However, we still trigger the `onUnsupportedFeature` reporting, which in Firefox causes the fallback bar to be displayed. Given that, in a majority of cases[1], things look fine it seems unfortunate to bother the user with the fallback bar here.
Note that even though we no longer show the fallback bar in this case, we still report telemetry as before.
---
[1] The only *known* case where things aren't fine with the built-in font renderer is issue 10232, however that document is sufficiently broken that there's a couple of other things that will trigger the fallback bar.
Given that we'll only cache `/XObject`s of the `Image`-type globally, we can utilize that in `PartialEvaluator.getTextContent` as well. This way, in cases such as e.g. issue 12098, we can avoid having to fetch/parse `/XObject`s that we already know to be `Image`s. This is helpful, since `Stream`s are not cached on the `XRef` instance (given their potential size) and the lookup can thus be somewhat expensive in general.
Also, skip a redundant `RefSetCache.has` check in the `GlobalImageCache.getData` method.
The widths property should be an object to match what metrics returns.
In ZapfDingbats.pdf I was getting a data clone error with pdfBug enabled.
In buildCharCodeToWidth() there was an encoding with the name "at" which
is also the name of a method on an array. buildCharCodeToWidth assumes an
object is passed in, so when it checked for the "at" property, it found the
method and copied it over.
This only seemed to affect Firefox.
When implementing the `GlobalImageCache` functionality I was mostly worried about the effect of *very large* images, hence the maximum number of cached images were purposely kept quite low[1].
However, there's one fairly obvious problem with that approach: In documents with hundreds, or even thousands, of *small* images the `GlobalImageCache` as implemented becomes essentially pointless.
Hence this patch, where the `GlobalImageCache`-implementation is changed in the following ways:
- We're still guaranteed to be able to cache a *minimum* number of images, set to `10` (similar as before).
- If the *total* size of all the cached image data is below a threshold[2], we're allowed to cache additional images.
This patch thus *improve*, but doesn't completely fix, issue 12098. Note that that document is created by a *very poor* PDF generator, since every single page contains the *entire* document (with all of its /Resources) and to create the individual pages clipping is used.[3]
---
[1] Currently set to `10` images; imagine what would happen to overall memory usage if we encountered e.g. 50 images each 10 MB in size.
[2] This value was chosen, somewhat randomly, to be `40` megabytes; basically five times the [maximum individual image size per page](6249ef517d/src/display/api.js (L2483-L2484)).
[3] This surely has to be some kind of record w.r.t. how badly PDF generators can mess things up...
Given how our existing code-base is written, we simply can't (and probably don't want to) enable *all* of this ESLint rule. The reason is that we have a lot of code, especially related to functions, which is fine at runtime but might to a static analyzer *look* like a problem.
However, even in its most *basic* version this ESLint rule can help catch some outright bugs; see PR 12891.
Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-use-before-define
Note how, in the `if (this.stateManager.stateStack.length !== 0) {` branch, we're attempting to access the not yet defined variable[1] `args`. If this code-path is ever hit, an Error will be thrown and parsing will thus be aborted immediately (likely leading to e.g. rendering bugs).
Note that I found this purely by accident, since I happened to glance at the LGTM report. However, I've since found that the error is also present during the unit-test[2] and with this patch we're actually testing the *intended* thing here.
As part of fixing this, and to avoid re-introducing a similar bug in the future, we'll now instead always reset `args.length` *before* attempting to read the next operator.
Also, we can use the existing `EvaluatorPreprocessor.savedStatesDepth` getter to simplify the save/restore detection a tiny bit.
---
[1] The ESLint rule `no-use-before-define` would have helped catch this problem, but unfortunately we cannot enable that without quite a bit of refactoring all over the code-base.
[2] The unit-test was updated such that it would fail in the `master`-branch.
- For wrapped scrolling, we unfortunately need to do a fair bit of parsing of the *current* page layout. Compared to e.g. the spread-modes, where we can easily tell how the pages are laid out, with wrapped scrolling we cannot tell without actually checking. In particular documents with varying page sizes require some care, since we need to check all pages on the "row" of the current page are visible and that there aren't any "holes" present. Otherwise, in the general case, there's a risk that we'd skip over pages if we'd simply always advance to the previous/next "row" in wrapped scrolling.
- For horizontal scrolling, this patch simply maintains the current behaviour of advancing *one* page at a time. The reason for this is to prevent inconsistent behaviour for the next and previous cases, since those cannot be handled identically. For the next-case, it'd obviously be simple to advance to the first not completely visible page. However for the previous-case, we'd only be able to go back *one* page since it's not possible to (easily) determine the page layout of non-visible pages (documents with varying page sizes being a particular issue).
- For vertical scrolling, this patch maintains the current behaviour by default. When spread-modes are being used, we'll now attempt to advance to the next *spread*, rather than just the next page, whenever possible. To prevent skipping over a page, this two-page advance will only apply when both pages of the current spread are visible (to avoid breaking documents with varying page sizes) and when the second page in the current spread is fully visible *horizontally* (to handle larger zoom values).
In order to reduce the performance impact of these changes, note that the previous/next-functionality will only call `getVisibleElements` for the scroll/spread-modes where that's necessary and that "normal" vertical scrolling is thus unaffected by these changes.
To support these changes, the `getVisibleElements` helper function will now also include the `widthPercent` in addition to the existing `percent` property.
The `PDFViewer._updateHelper` method is changed slightly w.r.t. updating the `currentPageNumber` for the non-vertical/spread modes, i.e. won't affect "normal" vertical scrolling, since that helped simplify the overall calculation of the page advance.
Finally, these new `BaseViewer` methods also allow (some) simplification of previous/next-page functionality in various viewer components.
*Please note:* There's one thing that this patch does not attempt to change, namely disabling of the previous/next toolbarButtons respectively the firstPage/lastPage secondaryToolbarButtons. The reason for this is that doing so would add quite a bit of complexity in general, and if for some reason `BaseViewer._getPageAdvance` would get things wrong we could end up incorrectly disabling the buttons. Hence it seemed overall safer to *not* touch this, and accept that the buttons won't be `disabled` despite in some edge-cases no further scrolling being possible.