*This is a follow-up to PRs 13867 and 13899.*
This patch is tagged `api-minor` for the following reasons:
- It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.
- For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.
- It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.
---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
The `loadAndEnablePDFBug` helper function, in `web/app.js`, can be simplified a little bit by making it `async`. Furthermore, given how `PDFBug` is being used, we can also (slightly) re-factor `PDFBug.init` such that the `PDFBug.enable`-call is done internally rather than having to handle that manually at the call-site.
(Finally, utilize `await` more in the `loadFakeWorker` helper function.)
This patch removes the only remaining closure in the `src/display/api.js` file, utilizing a similar approach as used in lots of other parts of the code-base, which results in a small decrease in the size of the *build* `pdf.js` file.
Given that `PDFWorker` is exposed through the *public* API, this complicates things somewhat since there's a couple of worker-related properties that really should stay *private*. Initially, while working on PR 13813, I believed that we'd need support for private (static) class fields in order to get rid of this closure, however I've managed to come up with what's hopefully deemed an acceptable work-around here.
Furthermore, some helper functions were simply moved into the `PDFWorker` class as static methods, thus simplifying the overall implementation (e.g. we don't need to manually cache the Promise in the `PDFWorker._setupFakeWorkerGlobal`-method).
Finally, as part of this re-factoring a number of missing JSDoc-comments were added which *together* with the removal of the closure significantly improves the `gulp jsdoc` output for the `PDFWorker` class.
*Please note:* This patch is tagged with `api-minor` since it deprecates `PDFWorker.getWorkerSrc()` in favor of the shorter `PDFWorker.workerSrc`, with the fallback limited to `GENERIC` builds.
Even though the code as-is *should* be safe, given that we're using an Object with a `null` prototype, it cannot hurt to change this to a Map to prevent any issues (since we're parsing unknown and potentially unsafe data).
Overall I also think that these changes improve the `parseQueryString` call-sites, since we now have a proper way of checking for the existence of a particular key (and don't have to use `in` which stringifies the keys in the Object).
This patch also changes the default, when no `value` exists, from `null` to an empty string since the use of `decodeURIComponent` currently can modify the value in a somewhat surprising way (at least to me).
Note how `decodeURIComponent(null) === "null"` which is unlikely to be what you actually want, whereas `decodeURIComponent("") === ""` which seems much more helpful.
Prior to PR 13042, when scripting wasn't really possible to use outside of the full viewer, the `enableScripting` option made sense.
However, at this point in time having to both pass in a `PDFScriptingManager`-instance *and* set the `enableScripting`-boolean when creating a `BaseViewer`-instance feels redundant and (mostly) annoying. Hence this patch, which removes the *separate* boolean and always enables scripting when `scriptingManager` is provided.
The relevant "viewer component" examples are also updated (with a comment), but in such a way that scripting support won't just break when used with the current PDF.js releases.
Given that we've over time been reducing the number of `compatibilityParams` in use, there's now few enough left that I think it makes sense to simply inline them directly in the `web/app_options.js` file.
Note that we recently inlined/removed the separate `src/display/api_compatibility.js` file, see PR 13525, and that it (in my opinion) thus makes sense to do the same in the `web/`-folder. This patch will also slightly reduce the size of *built* `web/viewer.js` file, which cannot hurt.
The PDF.js API has only ever supported accessing the original file ID, however the second one that (should) exist in *modified* documents have thus far been completely inaccessible through the API.
That seems like a simple oversight, caused e.g. by the viewer not needing it, since it really shouldn't hurt to provide API-users with the ability to check if a PDF document has been modified since its creation.[1]
Please refer to https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G13.2261661 for additional information.
For an example of how to update existing code to use the new API, please see the changes in the `web/app.js` file included in this patch.
*Please note:* While I'm not sure if we'll ever be able to remove the old `PDFDocumentProxy.fingerprint` getter, given that it's existed since "forever", that probably isn't a big deal given that it's now limited to only `GENERIC`-builds.
---
[1] Although this obviously depends on the PDF software following the specification, by updating the second file ID as intended.
Given that this property is only used with password protected documents, and is consequently document-specific rather than viewer-specific, ensure that `IPDFLinkService.externalLinkEnabled` is actually being reset by `PDFViewerApplication.close`.
To make things less confusing/inconsistent, remove the *undocumented* `externalLinkEnabled` property from the `PDFLinkService` constructor and force it to always be manually set when needed.
Reasons for the removal include:
- This functionality was always somewhat experimental and has never been enabled by default, partly because of worries about rendering bugs caused by e.g. bad/outdated graphics drivers.
- After the initial implementation, in PR 4286 (back in 2014), no additional functionality has been added to the WebGL implementation.
- The vast majority of all documents do not benefit from WebGL rendering, since only a couple of *specific* features are supported (e.g. some Soft Masks and Patterns).
- There is, and has always been, *zero* test-coverage for the WebGL implementation.
- Overall performance, in the PDF.js library, has improved since the experimental WebGL implementation was added.
Rather than shipping unused *and* untested code, it seems reasonable to simply remove the WebGL implementation for now; thanks to version control it's always possible to bring back the code should the need ever arise.
According to a decision by UX and PM, please see https://bugzilla.mozilla.org/show_bug.cgi?id=1705060#c2 (and implemented in https://bugzilla.mozilla.org/show_bug.cgi?id=1705327), we no longer show the notification-bar in Firefox; hence the special `PDFViewerApplication._delayedFallback` functionality should no longer be necessary.
Furthermore, note that at this point in time *most* of the features which used the `PDFViewerApplication._delayedFallback` functionality is now enabled by default; hence that provides even less reason to keep this code around and existing calls are thus converted to "regular" `PDFViewerApplication.fallback` calls.
According to a decision by UX and PM, please see https://bugzilla.mozilla.org/show_bug.cgi?id=1705060#c2, in Firefox we should first of all *not* display the notification-bar for signatures. Secondly, as can also be seen there, we shouldn't display the notification-bar *at all* and it's thus disabled in https://bugzilla.mozilla.org/show_bug.cgi?id=1705327.
If we purposely don't display a notification, for documents with signatures, in the *built in* Firefox PDF Viewer then it cannot be necessary in the GENERIC viewer either.
To simplify the overall implementation, given that it only applies to the GENERIC-viewer, this patch purposely re-uses the existing `errorWrapper`-functionality to display the message.
While that one is mostly intended for actual *errors*, by re-using it here we considerably reduce the amount of code/complexity necessary for supporting this new warning. It's obviously possible to re-factor/improve this later on, but the patch should do just fine here since it'll indeed inform users (of the GENERIC-viewer) about unverified signatures.
Finally this patch also tweaks the background-color of the `errorWrapper`, making it 20 percent lighter respectively darker (depending on the theme) to make it "stand out" a little bit *less*.[1] While it may perhaps be useful to re-style/re-factor the `errorWrapper`, this patch probably isn't the right place for doing that.
---
[1] Note how in the MOZCENTRAL-viewer, which instead uses the browser notification-bar, we're purposely using a neutral colour to not draw too much attention to the notification-bar.
- but don't validate them for now;
- Firefox will display a bar to warn that the signature validation is not supported (see https://bugzilla.mozilla.org/show_bug.cgi?id=854315)
- almost all (all ?) pdf readers display signatures;
- validation is done in edge but for now it's behind a pref.
*This patch fixes some technical debt in the viewer.*
Given that most API methods are (purposely) asynchronous, there's always a risk that the viewer could have been `close`d before the requested data arrives.
Lately we've started to check this case before using the data, to prevent errors and/or inconsistent state, however the outline/attachments/layers fetching and rendering is old enough that it pre-dates those checks.
Note how we purposely don't expose the `AnnotationStorage`-class directly in the official API (see `src/pdf.js`), since trying to use *multiple* ones simultaneously doesn't really make sense (e.g. in the viewer).
Instead we lazily initialize, and cache, just *one* instance via `PDFDocumentProxy.annotationStorage` which should thus be available internally in the API itself without having to be manually passed to various methods.
To support these changes, the `AnnotationStorage`-instance initialization is moved into the `WorkerTransport`-class to allow both `PDFDocumentProxy` and `PDFPageProxy` to access it.
This patch implements the following simplifications:
- Remove the `annotationStorage`-parameter from `PDFDocumentProxy.saveDocument`, since it's already available internally.
Furthermore, while it's currently possible to call that method without an `AnnotationStorage`-instance, that really does *not* make any sense at all. In this case you're effectively reducing `PDFDocumentProxy.saveDocument` to a "regular" `PDFDocumentProxy.getData` call, but with *a lot* more overhead, which was obviously not the intention of the `PDFDocumentProxy.saveDocument`-method.
- Try to discourage third-party users from calling `PDFDocumentProxy.saveDocument` unconditionally, as a replacement for `PDFDocumentProxy.getData` (note the previous point).
- Replace the `annotationStorage`-parameter, in `PDFPageProxy.render`, with a boolean `includeAnnotationStorage`-parameter which simply indicates if the (internally available) `AnnotationStorage`-instance should be used during rendering (e.g. for printing).
- By removing the need to *manually* provide `annotationStorage`-parameters to various API-methods, using the API should become simpler (e.g. for third-parties) since you no longer need to worry about manually fetching and passing around this data.
As discussed in the issue, this is a small/simple patch that should help to prevent *outright* data loss in forms when a new document is opened in the GENERIC viewer.
While the implementation is perhaps a bit "simplistic", it does seem to work and should be fine given that this is an edge-case only relevant for the GENERIC viewer.
In the next patch we'll need to be able to actually wait for saving to complete, hence it's necessary to slightly re-factor the `save`-method.
As part of these changes, we can reduce some duplication in the `save`-method and slightly improve the overall code. For consistency, the `download`-method is updated similarily to improve the code (this functionality is *very* old, even pre-dating the introduction of Promises in the code-base).
As mentioned in the JSDoc comment, this should not be used unless you know what you're doing, since it will lead to increased memory usage. However, in some situations (e.g. SVG-rendering), we still want to be able to run general clean-up on both the main/worker-thread while keeping loaded fonts attached to the DOM.[1]
As part of these changes, `WorkerTransport.startCleanup` is converted to an async method and we'll also skip clean-up when destruction has started (since it's redundant).
---
[1] The SVG-rendering mode is obviously not officially supported, since it's both rather incomplete and inherently slower. However with recent changes, whereby we cache repeated images on the document rather than the page level, memory usage can be *a lot* worse than before if we never attempt to release e.g. cached image-data when the viewer is in SVG-rendering mode.
Given that *all* data has been loaded on the main-thread, and then transferred to the worker-thread, ever since PR 8617 (almost four years ago) it should no longer be necessary to keep this special-case around.
Given that the `webViewerOpenFileViaURL` helper function is being defined in *all* builds anyway, the current pre-processor usage doesn't really improve readability in my opinion.
The rotation handling that's currently living in `PDFViewerApplication` is *very* old, and pre-dates the introduction of the viewer components by years.
As can be seen in the `BaseViewer.pagesRotation` setter, we're not actually normalizing the rotation as intended and instead rely on the caller to handle that correctly. This is first of all inconsistent, given how other setters are implemented, and secondly it could also lead to the rotation being set to a value outside of the `[0, 360)`-range.
Finally, for improved consistency the rotation handling in `PageViewport` is updated similarly. Please note that this case, it's *not* changing the pre-existing logic.
Given that the `enableXfa` parameter must to be passed to the API/Worker, and thus included in the `getDocument` call, it's not necessary to include it when initializing the `PDFViewer`-instance used in the default viewer. (Also, in `AppOptions`, the parameter is clearly marked with `OptionKind.API`.)
Furthermore, we probably don't want to display the fallback bar (in Firefox) for XFA documents when `enableXfa = true` is set.
- add an option to enable XFA rendering if any;
- for now, let the canvas layer: it could be useful to implement XFAF forms (embedded pdf in xml stream for the background and xfa form for the foreground);
- ui elements in template DOM are pretty close to their html counterpart so we generate a fake html DOM from template one:
- it makes easier to translate template properties to html ones;
- it makes faster the creation of the html element in the main thread.
It seems reasonable to place this alongside the *similar* `getFilenameFromUrl` helper function. This way, with the changes in the next patch, we also avoid having to expose the `isDataScheme` function in the API itself and we instead expose `getPdfFilenameFromUrl` in the API (which feels overall more appropriate).
The *main* purpose of this patch is to allow scripting to be used together with the viewer components, note the updated "simpleviewer"/"singlepageviewer" examples, in addition to the full default viewer.
Given how the scripting functionality is currently implemented in the default viewer, trying to re-use this with the standalone viewer components would be *very* hard and ideally you'd want it to work out-of-the-box.
For an initial implementation, in the default viewer, of the scripting functionality it probably made sense to simply dump all of the code in the `app.js` file, however that cannot be used with the viewer components.
To address this, the functionality is moved into a new `PDFScriptingManager` class which can thus be handled in the same way as all other viewer components (and e.g. be passed to the `BaseViewer`-implementations).
Obviously the scripting functionality needs quite a lot of data, during its initialization, and for the default viewer we want to maintain the current way of doing the lookups since that helps avoid a number of redundant API-calls.
To that end, the `PDFScriptingManager` implementation accepts (optional) factories/functions such that we can maintain the current behaviour for the default viewer. For the viewer components specifically, fallback code-paths are provided to ensure that scripting will "just work"[1].
Besides moving the viewer handling of the scripting code to its own file/class, this patch also takes the opportunity to re-factor the functionality into a number of helper methods to improve overall readability[2].
Note that it's definitely possible that the `PDFScriptingManager` class could be improved even further (e.g. for general re-use), since it's still heavily tailored to the default viewer use-case, however I believe that this patch is still a good step forward overall.
---
[1] Obviously *all* the relevant document properties might not be available in the viewer components use-case (e.g. the various URLs), but most things should work just fine.
[2] The old `PDFViewerApplication._initializeJavaScript` method, where everything was simply inlined, have over time (in my opinion) become quite large and somewhat difficult to *easily* reason about.
These changes will be necessary for the next patch, since we don't want to accidentally pull in the entire default viewer in the standalone viewer components.
Rather than having to spell out the English fallback strings at *every* single `IL10n.get` call-site throughout the viewer, we can simplify things by collecting them in *one* central spot.
This provides a much better overview of the fallback l10n strings used, which makes future changes easier and ensures that fallback strings occuring in multiple places cannot accidentally get out of sync.
Furthermore, by making the `fallback` parameter of the `IL10n.get` method *optional*[1] many of the call-sites (and their surrounding code) become a lot less verbose.
---
[1] It's obviously still possible to pass in a fallback string, it's just not required.
The only reason, as far as I can tell, for parsing the Metadata on the main-thread is how it was originally implemented. When Metadata support was first implemented, it utilized the [`DOMParser`](https://developer.mozilla.org/en-US/docs/Web/API/DOMParser) which isn't available in workers.
Today, with the custom XML-parser being used, that's no longer an issue and it seems reasonable to move the Metadata parsing to the worker-thread[1], since that's where all parsing should happen (for performance reasons).
Based on these changes, we'll be able to reduce the now unnecessary duplication of the XML-parser (and related code) in both of the *built* `pdf.js`/`pdf.worker.js` files.
Finally, this patch changes the `_repair` method to use "Array + join" rather than string concatenation.
---
[1] This needed the previous patch, to enable sending of `Map`s between threads with workers disabled.
*This is somewhat similar to PR 12931.*
For PDF documents where fonts are completely missing in the /Resources dictionaries, there's basically no "correct" way of rendering the document.
Hence it's very unlikely that another PDF viewer will do a better job than PDF.js in these cases, and consequently it seems highly questionable if the fallback bar really helps here.
Given that these event listeners should essentially never be needed, but are included simply to avoid breakage in edge-cases, it can't hurt to make this code slightly less verbose.
Given that these HTML elements are not being used at all in `MOZCENTRAL`-builds, note the preprocessor check in `PDFViewerApplication._otherError`, we obviously don't need the HTML code either.
Some of the localization strings (e.g. "loading_error") are repeated multiple times throughout the `web/app.js` file, which means that we need to duplicate the fallback strings as well. Furthermore, the signature of the `IL10n.get` method makes the call-sites quite verbose.
By adding a new helper method, in `PDFViewerApplication`, we're able to gather the localization fallback strings in one central spot in `web/app.js` and also make the lookup of the error/warning messages more compact.
This code is *very* old and it even predates the existence of arrow functions. Hence we can now reduce the overall verbosity by not having to explicitly spell out `PDFViewerApplication` everywhere.
This feature was Firefox-specific, and it's now been removed from the HTML specification and it's disabled by default starting with Firefox 85. Hence it seems completely unnecessary to keep this code in the default viewer.
Please refer to https://groups.google.com/g/mozilla.dev.platform/c/tc11BCenm2c and the resources that it links to.