So that Firefox doesn't switch to pixel mode for compat with other
browsers.
This should fix https://github.com/mozilla/pdf.js/issues/14476, in terms
of restoring the previous behavior.
We probably want to change the pixel-based scrolling code to not scroll
so much (the deltaMode stuff normalizes to +/-1 tick for each wheel
event, perhaps the pixel-based value should do the same).
- it aims to fix issue #14307;
- this event has been added recently in Firefox and we can now use it;
- fix few bugs in aform.js or in annotation_layer.js;
- add some integration tests to test keystroke events (see `AFSpecial_Keystroke`);
- make dispatchEvent in the quickjs sandbox async.
*Please note:* This is a tentative patch, since I don't know if this is deemed important enough to fix.
The new event could be seen as a *supplement* to the existing "documentinit" and "documentloaded" events, but for the case when a PDF document fails to load.
To make the "documenterror" event generally useful, it'll include both the localized error message as well as the original reason for the error (when that exists).
This helper function has never been used in e.g. the worker-thread, hence its placement in `src/shared/util.js` led to a *small* amount of unnecessary duplication.
After the previous patches this helper function is now *only* used in the viewer, hence it no longer seems necessary to expose it through the official API.
*Please note:* It seems somewhat unlikely that third-party users were relying *directly* on this helper function, which is why it's not being exported as part of the viewer components. (If necessary, we can always change this later on.)
As part of the changes/improvement in PR 14092, we're no longer using the `addLinkAttributes` directly in e.g. the AnnotationLayer-code.
Given that the helper function is now *only* used in the viewer, hence it no longer seems necessary to expose it through the official API.
*Please note:* It seems somewhat unlikely that third-party users were relying *directly* on the helper function, which is why it's not being exported as part of the viewer components. (If necessary, we can always change this later on.)
This structure contains *almost* exclusively references to DOM elements (and a couple of simple strings), rather than complete classes/functions. Hence the `eventBus`-option sticks out a fair bit, and I'd guess that it's *mostly* unused in e.g. third-party implementations.
Given that we, in multiple places, mention that the default viewer shouldn't be used as-is I really don't think that we need to keep this special `eventBus`-option around. Furthermore, nowadays it's also a lot easier to (safely) access the existing `EventBus`-instance in the viewer; see https://github.com/mozilla/pdf.js/wiki/Third-party-viewer-usage#initialization-promise which shows how to listen for the default viewer being initialized (and its `eventBus` thus being available).
Besides converting `Catalog.getPageDict` to an `async` method, thus simplifying the code, this patch also allows us to pro-actively fix a existing issue.
Note how we're looking up References in such a way that `MissingDataException`s won't cause trouble, however it's *technically possible* that the entries (i.e. /Count, /Kids, and /Type) in a /Pages Dictionary could actually be indirect objects as well. In the existing code this could lead to *some*, or even all, pages failing to load/render as intended.
In practice that doesn't *appear* to happen in real-world PDF documents, but given all the weird things that PDF software do I'd prefer to fix this pro-actively (rather than waiting for a bug report).
With `Catalog.getPageDict` being `async` this is now really simple to address, however I didn't want to introduce a bunch more *unconditional* asynchronicity in this method if it could be avoided (since that could slow things down). Hence we'll *synchronously* lookup the *raw* data in a /Pages Dictionary, and only fallback to asynchronous data lookup when a Reference was encountered.
In addition to the above, this patch also makes the following notable changes:
- Let `Catalog.getPageDict` *consistently* reject with the actual error, regardless of what data we're fetching. Previously we'd "swallow" the actual errors except when looking up Dictionary entries, which is inconsistent and thus seem unfortunate. As can be seen from the updated unit-tests this change is API-observable, hence why the patch is tagged `[api-minor]`.
- Improve the consistency of the Dictionary /Type-checks in both the `Catalog.getPageDict` and `Catalog.getAllPageDicts` methods.
In `Catalog.getPageDict` there's a fallback code-path where we're *incorrectly* checking the /Page Dictionary for a /Contents-entry, which is wrong since a /Page Dictionary doesn't need to have a /Contents-entry in order to be valid.
For consistency the `Catalog.getAllPageDicts` method is also updated to handle errors in the /Type-lookup correctly.
- Reduce the `PagesCountLimit.PAUSE_EAGER_PAGE_INIT` viewer constant, to further improve loading/rendering performance of the *second* page during initialization of very long documents; PR 14359 follow-up.
*This addresses the following case missing from the previous patch:*
The viewer is loaded in an *active* window/tab, and enough time is allowed to pass in order to allow rendering to start. However, if the user then switches to another tab (or another program) *before* rendering has finished, the "load" event also needs to be unblocked.
Given that `requestAnimationFrame` is being used, see the `src/diplay/api.js` file, an inactive window/tab means that rendering will not run and we'll thus not fetch all pages. The latter is a requirement for the "load" event to be unblocked, in the MOZCENTRAL-version of, the default viewer.
This patch is a *partial* solution, since it only addresses the following situations:
- A *background* tab (containing the viewer) is reloaded, e.g. via the tab-bar context menu.
- The viewer is loaded in a active tab, but the user switches away from it (or switches to another program window) *before* rendering has started.
This provides a work-around for badly generated PDF documents that contain *negative* /FitH parameters (in the referenced issue the value `-32768` is used).
This patch, first of all, removes circular dependencies in the TypeScript definitions. Secondly, it also moves `RenderingStates` into `web/ui_utils.js` to break another type-dependency and directly use the `XfaLayerBuilder` during XFA-printing.
Finally, note that this patch *slightly* reduces the size of the default viewer (e.g. in the `MOZCENTRAL` build) by not having to bundle code which is completely unused.
This patch circumvents the issues seen when trying to update TypeScript to version `4.5`, by "simply" fixing the broken/missing JSDocs and `typedef`s such that `gulp typestest` now passes.
As always, given that I don't really know anything about TypeScript, I cannot tell if this is a "correct" and/or proper way of doing things; we'll need TypeScript users to help out with testing!
*Please note:* I'm sorry about the size of this patch, but given how intertwined all of this unfortunately is it just didn't seem easy to split this into smaller parts.
However, one good thing about this TypeScript update is that it helped uncover a number of pre-existing bugs in our JSDocs comments.
The size of the `web/ui_utils.js` file has increased over time, as more code has been added to (or moved into) that file. To reduce its size slightly, this patch moves the event-related functionality into a separate file.
In PR 14114 this was only added to the default viewer, which means that in the viewer components the user would need to *manually* implement /Lang handling. This was (obviously) a bad choice, since the viewer components already support e.g. structTrees by default; sorry about overlooking this!
To avoid having to make *two* `getMetadata` API-calls[1] very early during initialization, in the default viewer, the API will now cache its result. This will also come in handy elsewhere in the default viewer, e.g. by reducing parsing when opening the "document properties" dialog.
---
[1] This not only includes a round-trip to the worker-thread, but also having to re-parse the /Metadata-entry when it exists.
By making this API-call *unconditionally*, we introduce a (slight) delay in the initialization of *all* documents.
That seems quite unfortunate, since `pdfjs.enablePermissions` is off by default, and it thus seem better only do the API-call when actually needed; sorry about this!
For encrypted PDF documents without the required permissions set, this patch adds support for disabling of form editing. However, please note that it also requires that the `pdfjs.enablePermissions` preference is set to `true`[1] (since PDF document permissions could be seen as user hostile).
Based on https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G6.1942134, this condition hopefully makes sense.
---
[1] Either manually with `about:config`, or using e.g. a [Group Policy](https://github.com/mozilla/policy-templates).
This patch is essentially *another* continuation of PR 11263, which tried to improve loading/initialization performance of *very* large/long documents.
For most documents, unless they're *very* long, we'll eagerly initialize all of the pages in the viewer. For shorter documents having all pages loaded/initialized early provides overall better performance/UX in the viewer, however there's cases where it can instead *hurt* performance.
For documents with a couple of thousand pages[1], the parsing and pre-rendering of the *second* page of the document can be delayed (quite a bit). The reason for this is that we trigger `PDFDocumentProxy.getPage` for *all pages* early during the viewer initialization, which causes the worker-thread to be swamped with handling (potentially) thousands of `getPage`-calls and leaving very little time for other parsing (such as e.g. of operatorLists).
To address this situation, this patch thus proposes temporarily "pausing" the eager `PDFDocumentProxy.getPage`-calls once a threshold has been reached, to give the worker-thread a change to handle other requests.[2]
Obviously this may *slightly* delay the "pagesloaded" event in longer documents, but considering that it's already the result of asynchronous parsing that'll hopefully not be seen as a blocker for these changes.[3]
---
[1] A particularly problematic example is https://github.com/mozilla/pdf.js/files/876321/kjv.pdf (16 MB large), which is a document with 2236 pages and a /Pages-tree that's only *one* level deep.
[2] Please note that I initially considered simply chaining the `PDFDocumentProxy.getPage`-calls, however that'd slowed things down for all documents which didn't seem appropriate.
[3] This patch will *hopefully* also make it possible to re-visit PR 11312, since it seems that changing `Catalog.getPageDict` to an `async` method wasn't the problem in itself. Rather it appears that it leads to slightly different timings, thus exacerbating the already existing issues with the worker-thread being overloaded by `getPage`-calls.
Having recently worked with that method, there's a couple of (very old) issues that I'd also like to address and having `Catalog.getPageDict` be `async` would simplify things a great deal.
These changes improves the consistency ever so slightly in the `PDFOutlineViewer._dispatchEvent` method, by making sure that we can tell the following two cases apart:
- The "pagesloaded" event has *not yet* been fired.
- The "pagesloaded" event has been fired, but no pages were available.
*This patch can be tested e.g. with the `poppler-85140-0.pdf` document from the test-suite.*
For some sufficiently corrupt documents the `getDocument` call will succeed, but fetching even the very first page fails. Currently we only print error messages (in the console) from the `{BaseViewer, PDFThumbnailViewer}.setDocument` methods, but don't actually provide these errors to allow the viewer to handle them properly.
In practice this means that the GENERIC viewer won't display the `errorWrapper`, and in the MOZCENTRAL viewer the *browser* loading indicator is never hidden (since we never unblock the "load" event).
This patch is essentially a continuation of PR 11263, which tried to improve loading/initialization performance of *very* large/long documents.
Note that browsers, in general, don't handle a huge amount of DOM-elements very well, with really poor (e.g. sluggish scrolling) performance once the number gets "large". Furthermore, at least in Firefox, it seems that DOM-elements towards the bottom of a HTML-page can effectively be ignored; for the PDF.js viewer that means that pages at the end of the document can become impossible to access.
Hence, in order to improve things for these *very* large/long documents, this patch will now enforce usage of the (recently added) PAGE-scrolling mode for these documents. As implemented, this will only happen once the number of pages *exceed* 15000 (which is hopefully rare in practice).
While this might feel a bit jarring to users being *forced* to use PAGE-scrolling, it seems all things considered like a better idea to ensure that the entire document actually remains accessible and with (hopefully) more acceptable performance.
Fixes [bug 1588435](https://bugzilla.mozilla.org/show_bug.cgi?id=1588435), to the extent that doing so is possible since the document contains 25560 pages (and is 197 MB large).
This was added on the assumption that the viewer would (eventually) start using the `PDFSinglePageViewer` for e.g. PAGE-scrolling mode and PresentationMode. However, having both a `PDFViewer` and a `PDFSinglePageViewer` side-by-side in the viewer would've been tricky to implement well, which is why PR 14112 implemented PAGE-scrolling for the general `BaseViewer` instead.
Given that the default viewer is no longer (potentially) going to use `PDFSinglePageViewer`, there's code in the `SecondaryToolbar` (and related CSS rules) which is now unnecessary.
*This patch can be tested e.g. with the `sizes.pdf` document in the test-suite.*
While this patch isn't necessarily the best solution, e.g. it might be possible to solve this with *only* CSS, it's what I was able to come up with to address an old issue.
The solution here re-uses the `spread`-class in PresentationMode, since that one already takes care of centering pages *vertically*, together with a dummy-page that takes up the entire height of the window.
Finally, some PresentationMode-related CSS-rules are also simplified slightly, since the changes in PR 14112 (using Page-scrolling) allows some clean-up here.
The issue that this patch fixes has existed ever since the viewer was first re-factored into components, however it only really affects the `disableAutoFetch = true` mode.
By default we're fetching all pages in `BaseViewer.setDocument`, and as part of the parsing/initialization we're also populating the `PDFLinkService`-pagesRefCache. The purpose of that cache is to make navigating to any internal destinations faster, by not having to (asynchronously) lookup the pageNumber via the API when handling the destination.
In comparison, when the `disableAutoFetch = true` mode is being used we're instead *lazily* initializing the pages in the `BaseViewer.#ensurePdfPageLoaded`-method. For some reason, that I can only assume is a simple oversight, we're not attempting to update the `PDFLinkService`-pagesRefCache in that case.
In the `BaseViewer` this cache is mostly relevant in the `disableAutoFetch = true` mode, since the pages are being initialized *lazily* in that case.
In the `PDFThumbnailViewer` this cache is mostly used for thumbnails that are actually being rendered, as opposed to those created directly from the "regular" pages.
Please note that I'm not suggesting that we remove these caches because they're only used in some situations, but rather because they're for all intents and purposes actually *redundant*. In the API itself, we're already caching both the page-promises and the actual pages themselves on the `WorkerTransport`-instance.
Hence these viewer-caches aren't really necessary in practice, and adds what to me mostly seems like an unnecessary level of indirection.[1]
Given that the viewer now relies on caching in the API itself, this patch also adds a new unit-test to ensure that page-caching works (and keep working) as expected.
---
[1] In the `WorkerTransport.getPage`-method the parameter is being validated on every call, but that's hardly enough code to warrant keeping the "duplicate" caches in the viewer in my opinion.
*Please note:* These changes will primarily benefit longer documents, somewhat at the expense of e.g. one-page documents.
The existing `PDFDocumentProxy.getStats` function, which in the default viewer is called for each rendered page, requires a round-trip to the worker-thread in order to obtain the current document stats. In the default viewer, we currently make one such API-call for *every rendered* page.
This patch proposes replacing that method with a *synchronous* `PDFDocumentProxy.stats` getter instead, combined with re-factoring the worker-thread code by adding a `DocStats`-class to track Stream/Font-types and *only send* them to the main-thread *the first time* that a type is encountered.
Note that in practice most PDF documents only use a fairly limited number of Stream/Font-types, which means that in longer documents most of the `PDFDocumentProxy.getStats`-calls will return the same data.[1]
This re-factoring will obviously benefit longer document the most[2], and could actually be seen as a regression for one-page documents, since in practice there'll usually be a couple of "DocStats" messages sent during the parsing of the first page. However, if the user zooms/rotates the document (which causes re-rendering), note that even a one-page document would start to benefit from these changes.
Another benefit of having the data available/cached in the API is that unless the document stats change during parsing, repeated `PDFDocumentProxy.stats`-calls will return *the same identical* object.
This is something that we can easily take advantage of in the default viewer, by now *only* reporting "documentStats" telemetry[3] when the data actually have changed rather than once per rendered page (again beneficial in longer documents).
---
[1] Furthermore, the maximium number of `StreamType`/`FontType` are `10` respectively `12`, which means that regardless of the complexity and page count in a PDF document there'll never be more than twenty-two "DocStats" messages sent; see 41ac3f0c07/src/shared/util.js (L206-L232)
[2] One example is the `pdf.pdf` document in the test-suite, where rendering all of its 1310 pages only result in a total of seven "DocStats" messages being sent from the worker-thread.
[3] Reporting telemetry, in Firefox, includes using `JSON.stringify` on the data and then sending an event to the `PdfStreamConverter.jsm`-code.
In that code the event is handled and `JSON.parse` is used to retrieve the data, and in the "documentStats"-case we'll then iterate through the data to avoid double-reporting telemetry; see https://searchfox.org/mozilla-central/rev/8f4c180b87e52f3345ef8a3432d6e54bd1eb18dc/toolkit/components/pdfjs/content/PdfStreamConverter.jsm#515-549
It shouldn't be necessary to iterate through *all* pages when using a non-default `spreadMode`, since we already know which page(s) should become visible.
This code is a left-over from the initial (local) implementation that resulted in PR 14112, however I forgot to clean-up some things such as e.g. this loop.
Also fixes an outdated comment, see PR 14204 which removed the mentioned data-structure.
This patch preserves the old behaviour of appending a `loadingIcon`-div to all pages that are not yet loaded/rendered. However, the actual `loadingIcon`-spinner (i.e. the `loading-icon.gif` image) will only be displayed on *visible* pages to improve performance.
To avoid having to iterate through all pages in the document, which doesn't seem like a good idea for a PDF document with thousands of pages, we use a combination of the currently visible *and* cached pages to toggle the `loadingIcon`-spinner.
This code is the last piece[1] of the viewer that's not using standard `class`es, and by converting this code we get rid of some now unneeded boilerplate code (slightly reducing the size of the *built* `web/viewer.js` file).
Note that while this code was originally imported from a separate repository, it was last sync-ed with upstream *five years* ago which is why this re-factoring should be OK as far as I'm concerned (and we've done some other clean-up since then as well).
---
[1] Technically the `web/debugger.js` file is left as well, however that code is first of all not bundled in the *built* `web/viewer.js` file and secondly it's not even loaded by default either.
- First step to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1737260;
- several interactive pdfs use the possibility to hide/show buttons to show different icons;
- render pushbuttons on their own canvas and then insert it the annotation_layer;
- update test/driver.js in order to convert canvases for pushbuttons into images.