This should *hopefully* fix 17228, by tweaking the build scripts to give the GENERIC viewer something to await to avoid breaking third-party users of the standalone viewer components.
Currently we *synchronously* fetch a number of browser preferences/options, from the platform code, during the viewer respectively PDF document initialization paths.
This seems unnecessary, and we can re-factor the code to instead include the relevant data when fetching the regular viewer preferences.
*Please note:* While following the steps in the README still works with this patch, in the sense that the example runs and successfully renders a PDF document, I unfortunately cannot tell if it illustrates Webpack best practices.
*Please note:* These changes only affect the GENERIC build, since `NullL10n` is only a stub elsewhere (see PR 17135).
After the changes in PR 17115, which modernized and improved l10n-handling, the `NullL10n`-implementation is no longer a good fallback for the "proper" `L10n`-classes.
To improve this situation, especially for the *standalone* viewer-components, this patch makes the following changes:
- Let the `NullL10n`-implementation extend an actual `L10n`-class, which is constant and lazily initialized, to ensure that it works *exactly* like the "proper" ones.
- Automatically bundle the "en-US" l10n-strings in the build, via the pre-processor, such that we don't need to remember to manually update them.
- Ensure that the *standalone* viewer-components register their DOM-elements for translation, similar to the default viewer, since this will allow future code improvements by using "data-l10n-id"/"data-l10n-args" in most (if not all) parts of the viewer.
- Remove the `NullL10n` from the `AnnotationLayer`, to avoid affecting bundle size too much.
For third-party users that access the `AnnotationLayer`, as exposed in the main PDF.js library, they'll now need to *manually* register it for translation. (However, the *standalone* viewer-components still works given the point above.)
- For the generic viewer we use @fluent/dom and @fluent/bundle
- For the builtin pdf viewer in Firefox, we set a localization url
and then we rely on document.l10n which is a DOMLocalization object.
Those files only contain old debugging code that is not used/imported
anywhere anymore, which is generating code scanning alerts. Moreover,
they rely on globals/platform-specific code and don't import/export
logic properly.
This *finally* allows us to mark the entire PDF.js library as a "module", which should thus conclude the (multi-year) effort to re-factor and improve how we import files/resources in the code-base.
This also means that the `gulp ci-test` target, which is what's run in GitHub Actions, now uses JavaScript modules since that's supported in modern Node.js versions.
It's been loaded as a JavaScript module for a long time, and given that the file is bundled as-is (without building) it seems reasonable to just change the file extension now.
Comparing the currently supported browsers/environments, see [the FAQ](https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support) and the [MDN compatibility data](https://developer.mozilla.org/en-US/docs/Web/API/structuredClone#browser_compatibility), the `structuredClone` polyfill is *only* needed in Google Chrome versions < 98. Because of some limitations in the core-js polyfill we're currently forced to special-case the `transfer` handling to prevent bugs, and it'd be nice to avoid that.
Note that `structuredClone`, with transfers, is only used in two spots:
- The `LoopbackPort` class, which is only used with fake workers. Given that fake workers should *never* be used in browsers, breaking that edge-case in older Google Chrome versions seem fine.
- The `AnnotationStorage` class, when Stamp-annotations have been added to the document. Given that Google Chrome isn't the main focus of development, breaking *part* of the editing-functionality in older Google Chrome versions should hopefully be acceptable.
To avoid problems with `export` statements in the QuickJS Javascript Engine, we can work-around that by *explicitly* exposing `pdfjsScripting` globally instead.
At this point in time all browsers, and also Node.js, support standard `import`/`export` statements and we can now finally consider outputting modern JavaScript modules in the builds.[1]
In order for this to work we can *only* use proper `import`/`export` statements throughout the main code-base, and (as expected) our Node.js support made this much more complicated since both the official builds and the GitHub Actions-based tests must keep working.[2]
One remaining issue is that the `pdf.scripting.js` file cannot be built as a JavaScript module, since doing so breaks PDF scripting.
Note that my initial goal was to try and split these changes into a couple of commits, however that unfortunately didn't really work since it turned out to be difficult for smaller patches to work correctly and pass (all) tests that way.[3]
This is a classic case of every change requiring a couple of other changes, with each of those changes requiring further changes in turn and the size/scope quickly increasing as a result.
One possible "issue" with these changes is that we'll now only output JavaScript modules in the builds, which could perhaps be a problem with older tools. However it unfortunately seems far too complicated/time-consuming for us to attempt to support both the old and modern module formats, hence the alternative would be to do "nothing" here and just keep our "old" builds.[4]
---
[1] The final blocker was module support in workers in Firefox, which was implemented in Firefox 114; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import#browser_compatibility
[2] It's probably possible to further improve/simplify especially the Node.js-specific code, but it does appear to work as-is.
[3] Having partially "broken" patches, that fail tests, as part of the commit history is *really not* a good idea in general.
[4] Outputting JavaScript modules was first requested almost five years ago, see issue 10317, and nowadays there *should* be much better support for JavaScript modules in various tools.
The minified default viewer has never been distributed in either official releases or through pdfjs-dist, which means that it's most likely unused, and it's has never been tested nor actively maintained.
This has been deprecated since version `2.15.349`, which is a year ago.
Removing this will also simplify some upcoming changes, specifically outputting of JavaScript modules in the builds.
The old pre-processor used for CSS, and HTML, files leaves comments intact which unnecessarily contributes to the overall size of the *built* CSS files (note that the built JavaScript files don't include comments).
Rather than trying to "hack" comment removal into the pre-processor it seems easier to use a PostCSS plugin instead. The one potential issue is that it also affects *some* whitespaces, and it's not clear to me if this'll work with the various CSS-related tests that run in mozilla-central.
Please refer to https://www.npmjs.com/package/postcss-discard-comments for additional information.
Given the limitations of the old pre-processor that's used for CSS/HTML files, this unfortunately isn't as "easy" to implement as it is for JavaScript code.
Since this is the first case where we've wanted to do conditional CSS imports, rather than trying to completely re-write the pre-processor, this patch settles for handling it explicitly in the `expandCssImports` function.
The typescript compiler is now configured to know about the import map
to be able to resolve those imports and find the associated types.
As tsc outputs declaration files using the original module identifiers
and not the resolved ones, tsc-alias is used to post-process the
declaration files by resolving those paths.
Some configurations settings like `paths` cannot be provided through CLI
arguments but only in a configuration file. And when using a
configuration file, only a few options (like `--outDir` can still be
provided) through the CLI.
- The `src/core/unicode.js` exclude ought to have become unnecessary already with PR 16200, which significantly shortened and simplified that file.
- The `src/core/glyphlist.js` exclude no longer seems necessary in practice either, possibly because of improvements in Babel.
There are 2 rotation we've to deal with: the viewer one and the editor one.
The previous implementation was a bit complex and having to deal with these
rotation would have potentially increase it.
So this patch aims to simplify the implementation and deal with all the possible
cases.
The main idea is to transform the mouse deltas according to the rotations and then
apply the resizing in the page coordinates system.
We obviously don't want to re-introduce any `require` usage in e.g. the viewer, since we should strive to only use native `import` statements wherever possible.[1]
Hopefully exposing e.g. the library globally in more cases won't break anything, however it's somewhat difficult for me to imagine all the ways in which third-party users may be accessing the PDF.js library. (Given the lack of a runnable test-case in the issue, I also cannot guarantee that this is enough to fully address the problem.)
---
[1] Ideally we should probably not rely on e.g. `pdfjsLib` being globally available in the *built* viewer, and rather always `import` the library instead.
Unfortunately this would require larger (possibly breaking) changes in the builds that we provide, however note that Firefox only recently got support for `import` in workers and that Webpack still only have *experimental* support for outputting "proper" modules.
In Gulp 4, which we use for years now, the `gulp.src()` function
supports the `removeBOM` option to disable the default BOM stripping,
so this commit uses that to get rid of our `vinyl-fs` dependency.
Note that this actually makes disabling BOM stripping work again. It's
currently broken because in `vinyl-fs` 3, that we already use since 2018
in commit 95de23e, the `stripBOM` option was renamed to `removeBOM`, so
the current code doesn't actually disable BOM stripping which we now
confirmed and sadly broke for years without anyone noticing. Most likely
this is because the BOM is not required for UTF-8 documents, but while
not necessary it also can't hurt to have it for tools that use it to
determine if a document is UTF-8.
Trying to update Stylelint to version `15.10.1`, and beyond, broke linting. Looking at the changes the issue appears to be that the `bin/stylelint.js` file was replaced with `bin/stylelint.mjs` instead, which our `gulp lint` runner wasn't able to automatically find; see https://github.com/stylelint/stylelint/compare/15.10.0...15.10.1
By leveraging import maps we can get rid of *most* of the remaining `require`-calls in the `src/display/`-folder, since we should strive to use modern `import`-statements wherever possible.
The only remaining cases are Node.js-specific dependencies, since those seem very difficult to convert unless we start producing a bundle *specifically* for Node.js environments.
In the last couple of years we've been quicker to remove support for older browsers/environments, which means that at this point in time we don't bundle that many polyfills. (The polyfills are also generally simpler nowadays, ever since we removed support for e.g. Internet Explorer.)
Rather than having to *manually* handle the polyfills, we can actually let Babel take care of bundling the necessary polyfills for us; please refer to https://babeljs.io/docs/babel-preset-env
The only exception here is the Node.js-specific compatibility-code, which is moved into the `src/display/node_utils.js` file. This ought to be fine since workers are not available/used in Node.js-environments.
*Please note:* For the `legacy`-builds this will increase the size of the *built* files, however that seems like a very small price to pay in order to simplify maintenance of the general PDF.js library.