It occurred to me that we can actually run this unit-test in Node.js environments by making use of the preprocessor to stub out the browser globals there.
- Replace FoxitSans with LiberationSans: LiberationSans is already there (for XFA) and we can use
it as a good replacement of FoxitSans.
- For now we just try to substitue standard fonts, the strategy is the following:
* we try to find a font locally from a hardcoded list;
* if it fails then we use Liberation as fallback (only for Helvetica for the moment);
* else we just fallback on the system serif/sansserif/monospace font.
Now that https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 has landed in Firefox, we're able to use worker-modules during development :-)
This removes the final piece of SystemJS usage from the PDF.js library, thus allowing a fair bit of clean-up, and we now use *only* native `import`/`export` statements everywhere in development mode.
The `pdfjs-dist/lib/` directory contains a README file that explicitly advises against using those files, however based on a fairly large number of issues filed over the years users seem to be (mostly) overlooking that warning.
In particular it unfortunately seems to be somewhat common for users to attempt to "combine" proper builds from `pdfjs-dist/build/` together with individual components from the `pdfjs-dist/lib/web/` directory, which more often than not leads to subtle bugs and general problems.
When we receive bug reports about this it's often not immediately obvious what the problem is, given that many issues lack enough details (such as runnable test-cases), but after some back-and-forth it usually turns out that usage of `pdfjs-dist/lib/` is the culprit.
Considering that keeping the general PDF.js library working is challenging and time-consuming enough nowadays, this patch thus proposes that we stop including the "lib"-build in the `pdfjs-dist` repository to both reduce user confusion and the support burden.
By being less specific about which *exact* JavaScript features are required for the default vs `legacy` build, we don't need to worry about keeping multiple README files up-to-date.
These README files will now refer back to the FAQ for current browser/environment support information.
Currently we simply use the Babel `preset-env` in the `legacy`-builds of the PDF.js library. This has the side-effect of transpiling the code for *very old* browsers/environments, including ones that we (since many years) no longer support which unnecessarily bloats the size of the `legacy`-builds.
For the CSS files we're only targeting *the supported browsers*, and it's thus possible to extend that to also apply to Babel.
One of the most significant changes, with this patch, is that we'll no longer polyfill `async`/`await` in the `legacy`-builds. However, this shouldn't be an issue given the browsers that we currently support in PDF.js; please refer to:
- https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
- https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/async_function#browser_compatibility
Note that these cases, which are all in older code, were found using the [`unicorn/no-for-loop`](https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/no-for-loop.md) ESLint plugin rule.
However, note that I've opted not to enable this rule by default since there's still *some* cases where I do think that it makes sense to allow "regular" for-loops.
There are obviously cases where using `concat` makes perfect sense, since that method doesn't change any of the existing Arrays; see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/concat
However, in a few cases throughout the code-base that's not an issue and using `concat` only leads to unnecessary intermediate allocations. With modern JavaScript we can thus replace those with a combination of `push` and spread-syntax, which wasn't originally possible when the code was written.
Apparently the ESLint rule added in PR 15031 wasn't able to catch all cases that can be converted, which is probably not all that surprising given how some of these call-sites look.
- Use `Element.prepend()` to insert nodes before all other ones in the element, rather than using `firstChild` with `insertBefore`-calls; see https://developer.mozilla.org/en-US/docs/Web/API/Element/prepend
- Fix one *incorrect* `insertBefore` call, in the AnnotationLayer-code.
Initially the patch simply changed that to an `Element.before()`-call, however that broke one of the integration-tests. It turns out that the `index` may try to access a non-existent select-child, which triggers undefined behaviour; note the warning in https://developer.mozilla.org/en-US/docs/Web/API/Node/insertBefore#parameters
When pre-processor blocks are being removed, since they don't apply to the current build target, we may currently end up with consecutive blank lines.
While this is obviously not a big issue, it's nonetheless undesirable and we can adjust the `writeLine` function to prevent that.
*This is a follow-up to PR 14886, which "broke" this.*
In addition to fixing the issue, using an Array and `join`-ing it at the end may also be a tiny bit more efficient than using a growing string.
An old shortcoming of the `preprocessCSS`-function is its complete lack of support for our "normal" defines, which makes it very difficult to have build-specific CSS rules. Recently we've started using specially crafted comments to remove CSS rules from the MOZCENTRAL build, but (ab)using the `preprocessCSS`-function in this way really doesn't feel great.
However, it turns out to be surprisingly simple to instead use the "regular" `preprocess`-function for the CSS files as well. The only special-handling that's still necessary is the helper-function for dealing with CSS-imports, but apart from that everything seems to just work.
One reason, as far as I can tell, for having a separate `preprocessCSS`-function was likely that we originally used *lots* of vendor-prefixed CSS rules in our CSS files. With improvements over the years, especially thanks to Autoprefixer and PostCSS, we've been able to remove *almost* all non-standard CSS rules and the need for special-casing the CSS parsing has mostly vanished.
*Please note:* As part of testing this patch I've diffed the output of `gulp generic`, `gulp mozcentral`, and `gulp chromium` against the `master`-branch to check that there was no obvious breakage.
Rather than *manually* specifying a "mode", we can simply use the regular `defines` directly instead. To improve consistency, in the `external/builder/builder.js` file, a couple of parameters are also re-named.
According to the MDN compatibility data, see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#browser_compatibility, all browsers that we support have native `ReadableStream` implementations (since quite some time too).
Hence only Node.js is now lagging behind w.r.t. `ReadableStream` support, and its experimental implementation doesn't really help us given the life-span of the LTS releases (see https://en.wikipedia.org/wiki/Node.js#Releases).
It seems quite unfortunate to bundle a `ReadableStream` polyfill in the `legacy` builds when it's unnecessary in browsers, given its overall size, but fortunately we can avoid that by simply listing `web-streams-polyfill` as a dependency for the `pdfjs-dist` library.
With ESLint 8 we should now finally be able to start using modern `class` features, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Classes/Public_class_fields and https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Classes/Private_class_fields
However, while both ESLint and Acorn now support this, it unfortunately turns out that Escodegen (which we use during building) still lack the necessary support. Looking at https://github.com/estools/escodegen there's not been any updates since last year, and there's also open PRs adding support for these new `class` features.
To avoid blocking usage of these `class` features in the PDF.js code-base, in particular *private* fields/methods, this patch thus proposes that we (hopefully temporarily) switch to an `escodegen` fork that has the necessary support; please see https://www.npmjs.com/package/@javascript-obfuscator/escodegen
While I have no reason to doubt the security of the `escodegen` fork, this patch nonetheless pins the version number. Furthermore, I've also diffed the output of the two `.js`-files in this forked package against the original files without finding anything that looks immediately "dangerous".
The Viewer API definitions do not compile because of missing imports and
anonymous objects are typed as `Object`. These issues were not caught
during CI because the test project was not compiling anything from the
Viewer API.
As an example of the first problem:
```
/**
* @implements MyInterface
*/
export class MyClass {
...
}
```
will generate a broken definition that doesn’t import MyInterface:
```
/**
* @implements MyInterface
*/
export class MyClass implements MyInterface {
...
}
```
This can be fixed by adding a typedef jsdoc to specify the import:
```
/** @typedef {import("./otherFile").MyInterface} MyInterface */
```
See https://github.com/jsdoc/jsdoc/issues/1537 and
https://github.com/microsoft/TypeScript/issues/22160 for more details.
As an example of the second problem:
```
/**
* Gets the size of the specified page, converted from PDF units to inches.
* @param {Object} An Object containing the properties: {Array} `view`,
* {number} `userUnit`, and {number} `rotate`.
*/
function getPageSizeInches({ view, userUnit, rotate }) {
...
}
```
generates the broken definition:
```
function getPageSizeInches({ view, userUnit, rotate }: Object) {
...
}
```
The jsdoc should specify the type of each nested property:
```
/**
* Gets the size of the specified page, converted from PDF units to inches.
* @param {Object} options An object containing the properties: {Array} `view`,
* {number} `userUnit`, and {number} `rotate`.
* @param {number[]} options.view
* @param {number} options.userUnit
* @param {number} options.rotate
*/
```
Current pdf_viewer definitions result in errors like the following when
trying to use them in a ts project:
[error] TypeScript error
node_modules/.pnpm/pdfjs-dist@2.10.377/node_modules/pdfjs-dist/web/pdf_viewer.d.ts:1:15
- error TS2691: An import path cannot end with a '.d.ts' extension.
Consider importing 'pdfjs-dist/types/web/pdf_viewer.component.js'
instead.
1 export * from "pdfjs-dist/types/web/pdf_viewer.component.d.ts";
Import/export statements in typescript should not include file extensions.
The only purpose, according to the README and existing files, is to
parse an integer from those lines, so (\d+) is sufficient for that. This
avoids potential exponential backtracking as flagged by CodeQL. I have
compared the output of the script with and without these changes and the
resulting files are the same.
Please note that while the `gulp types`/`gulp typestest` tasks (obviously) still work with this patch, I've got no idea if the output is first of all even useable and secondly if it's actually useful in practice.
However, in the interest of closing some (seemingly simple) issues, I suppose that this probably shouldn't hurt (and we'd need TypeScript users to help improve things here).
Currently only the Foxit license-file is included, which is most likely just an oversight as far as I can tell.
Furthermore, to be able the tell the two license-files apart, the Foxit one is also renamed slightly.
- Some js files contain scale factors for each glyph in order to rescale Liberation to have a final font with the correct width.
- A lot of XFA have some containers where their dimensions are based on their text content, so using default font from browser can lead to an almost unreadable pdf.
- In case of large string the sandbox initialization failed because of an OOM
* so allocate a new string in the heap
* and free it after use.
- it requires a quickjs update since we need to export some symbols (stringToNewUTF8 and free).
As part of testing this, I've diffed the output of `gulp mozcentral` with/without this patch and the *only* difference is the incremented `version`/`build` numbers.