In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents.
However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes.
Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to *copy* a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases.
For the PDF document in issue 11878, the rendering time of *the second page* changes as follows (on my computer):
- With the `master`-branch it takes >600 ms to render.
- With this patch that goes down to ~50 ms, which is one order of magnitude faster.
(Note that all other pages are, as expected, completely unaffected by these changes.)
This new main-thread copying is limited to "large" global images, since:
- Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue.
- With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread.
- This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.
The doorhanger for highlighting has a basic color picker composed of 5 predefined colors
to set the default color to use.
These colors can be changed thanks to a preference for now but it's something which could
be changed in the Firefox settings in the future.
Each highlight has in its own toolbar a color picker to just change its color.
The different color pickers are so similar (modulo few differences in their styles) that
this patch introduces a new class ColorPicker which provides a color picker component
which could be reused in future editors.
All in all, a large part of this patch is dedicated to color picker itself and its style
and the rest is almost a matter of wiring the component.
- Extend the `fetchData` helper function to also support fetching of "blob" data.
- Use the `fetchData` helper function more in the code-base, when fetching non-PDF data. Given that the Fetch API isn't supported for all protocols, this should improve compatibility for the PDF.js library.
The goal is to be able to get these outlines to fill the shape corresponding
to a text selection in order to highlight some text contents.
The outlines will be used either to show selected/hovered highlights.
- Re-factor the existing `fetchData` helper function such that it can fetch more types of data, and it now supports "arraybuffer", "json", and "text".
This only needed minor adjustments in the `DOMCMapReaderFactory` and `DOMStandardFontDataFactory` classes.[1]
- Expose the `fetchData` helper function in the API, such that the viewer is able to access it.
- Use the `fetchData` helper function in the `GenericL10n` class, since this should allow fetching of localization-data even if the default viewer is run in an environment without support for the Fetch API.
---
[1] While testing this I also noticed a minor inconsistency when handling standard font-data on the worker-thread.
Hopefully this is enough to address the problem of initializing the Worker in Chromium-based browsers.
Locally I've tried to *force* use of `createCDNWrapper` in development mode, by commenting out the `isSameOrigin` checks, and worker-loading fails against `master` and works with this patch.
There are environments that include *incomplete* polyfills for the `navigator`-object, which may thus cause the PDF.js library to break.
Despite that clearly not being our fault, it may still result in bug reports filed against the PDF.js project; see e.g. 15728.
Currently this even seem to affect *the latest* version of Node.js; see e.g. [here].
*Please note:* Thanks to the pre-processor none of these changes affect the Firefox PDF Viewer, however it does add "overhead" when working with and reviewing the affected code (which is why I'm not crazy about this).
Currently the `WidgetAnnotationElement._getKeyModifier` method will always be falsy on Linux, which seems like a simple oversight. Looking at all the other `FeatureTest.platform` accesses we only handle the `isMac`-case specially, and it seems reasonable to do the same thing here.
The reason that this hasn't led to any bug reports is most likely that the `modifier`-property seems completely unused in the scripting-implementation.
Finally, with these changes we can (slightly) simplify the `FeatureTest.platform` implementation.
This patch changes almost all viewer-components[1] to use "data-l10n-id"/"data-l10n-args" for localization, which means that in many cases we no longer need to pass around the `L10n`-instance any more.
One part of the code-base where the `L10n`-instance is still being used "directly" is the AnnotationEditors, however while it might be possible to convert (most of) that code as well that's not attempted in this patch.
---
[1] The one exception is the `PDFDocumentProperties` dialog, since the way it's currently implemented makes that less straightforward to fix without a lot of code changes.
*Please note:* These changes only affect the GENERIC build, since `NullL10n` is only a stub elsewhere (see PR 17135).
After the changes in PR 17115, which modernized and improved l10n-handling, the `NullL10n`-implementation is no longer a good fallback for the "proper" `L10n`-classes.
To improve this situation, especially for the *standalone* viewer-components, this patch makes the following changes:
- Let the `NullL10n`-implementation extend an actual `L10n`-class, which is constant and lazily initialized, to ensure that it works *exactly* like the "proper" ones.
- Automatically bundle the "en-US" l10n-strings in the build, via the pre-processor, such that we don't need to remember to manually update them.
- Ensure that the *standalone* viewer-components register their DOM-elements for translation, similar to the default viewer, since this will allow future code improvements by using "data-l10n-id"/"data-l10n-args" in most (if not all) parts of the viewer.
- Remove the `NullL10n` from the `AnnotationLayer`, to avoid affecting bundle size too much.
For third-party users that access the `AnnotationLayer`, as exposed in the main PDF.js library, they'll now need to *manually* register it for translation. (However, the *standalone* viewer-components still works given the point above.)
Use existing helper to calculate the Box
Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
Ensure that there are non-zero
Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
Add a reference test for #17147
- For the generic viewer we use @fluent/dom and @fluent/bundle
- For the builtin pdf viewer in Firefox, we set a localization url
and then we rely on document.l10n which is a DOMLocalization object.
Given that we only use standard `import`/`export` statements now, after recent PRs, the "exports" global is unused.
Instead we add "__non_webpack_import__" to the `globals` to avoid having to sprinkle disable statements throughout the code.
Finally, the way that `globals` are defined has changed in ESLint and we should thus explicitly specify them as "readonly"; please find additional details at https://eslint.org/docs/latest/use/configure/language-options#specifying-globals
This *finally* allows us to mark the entire PDF.js library as a "module", which should thus conclude the (multi-year) effort to re-factor and improve how we import files/resources in the code-base.
This also means that the `gulp ci-test` target, which is what's run in GitHub Actions, now uses JavaScript modules since that's supported in modern Node.js versions.
For large/complex images it's possible that the image-data arrives in the API *after* the page has been scrolled out-of-view and thus been cleaned-up. In this case we obviously shouldn't cache such page-level data, since it'll first of all be unused and secondly can increase memory usage *a lot*.
Also, ensure that we *immediately* release any `ImageBitmap` data in this case to help reclaim memory faster.
When pdfBug is true, the substitution font is used in the text layer in order
to be able to know what is the font really used thanks to the devtools.
And to be sure that fonts are loaded, the font cache isn't cleaned up when
the debugger is active.
Comparing the currently supported browsers/environments, see [the FAQ](https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support) and the [MDN compatibility data](https://developer.mozilla.org/en-US/docs/Web/API/structuredClone#browser_compatibility), the `structuredClone` polyfill is *only* needed in Google Chrome versions < 98. Because of some limitations in the core-js polyfill we're currently forced to special-case the `transfer` handling to prevent bugs, and it'd be nice to avoid that.
Note that `structuredClone`, with transfers, is only used in two spots:
- The `LoopbackPort` class, which is only used with fake workers. Given that fake workers should *never* be used in browsers, breaking that edge-case in older Google Chrome versions seem fine.
- The `AnnotationStorage` class, when Stamp-annotations have been added to the document. Given that Google Chrome isn't the main focus of development, breaking *part* of the editing-functionality in older Google Chrome versions should hopefully be acceptable.
At this point in time all browsers, and also Node.js, support standard `import`/`export` statements and we can now finally consider outputting modern JavaScript modules in the builds.[1]
In order for this to work we can *only* use proper `import`/`export` statements throughout the main code-base, and (as expected) our Node.js support made this much more complicated since both the official builds and the GitHub Actions-based tests must keep working.[2]
One remaining issue is that the `pdf.scripting.js` file cannot be built as a JavaScript module, since doing so breaks PDF scripting.
Note that my initial goal was to try and split these changes into a couple of commits, however that unfortunately didn't really work since it turned out to be difficult for smaller patches to work correctly and pass (all) tests that way.[3]
This is a classic case of every change requiring a couple of other changes, with each of those changes requiring further changes in turn and the size/scope quickly increasing as a result.
One possible "issue" with these changes is that we'll now only output JavaScript modules in the builds, which could perhaps be a problem with older tools. However it unfortunately seems far too complicated/time-consuming for us to attempt to support both the old and modern module formats, hence the alternative would be to do "nothing" here and just keep our "old" builds.[4]
---
[1] The final blocker was module support in workers in Firefox, which was implemented in Firefox 114; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import#browser_compatibility
[2] It's probably possible to further improve/simplify especially the Node.js-specific code, but it does appear to work as-is.
[3] Having partially "broken" patches, that fail tests, as part of the commit history is *really not* a good idea in general.
[4] Outputting JavaScript modules was first requested almost five years ago, see issue 10317, and nowadays there *should* be much better support for JavaScript modules in various tools.
The user should *always* provide a correct `GlobalWorkerOptions.workerSrc` value when using the PDF.js library in browser environments. Note that the fallback:
- Has been deprecated ever since PR 11418, first released in version `2.4.456` over three years ago.
- Was always a best-effort solution, with no guarantees that it'd actually work correctly.
- With upcoming changes, w.r.t. outputting JavaScript modules, it'd now be more diffiult to determine the correct value.
When an editing button is disabled, focused and the user press Enter (or space), an
editor is automatically added at the center of the current page.
Next creations can be done in using the same keys within the focused page.