At this point in time, all of the supported browsers (in the PDF.js project) have native `ReadableStream` implementations; see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#browser_compatibility
Hence the polyfill is *only* necessary in Node.js environments now, and we shouldn't need to do any detailed feature detection either (since that was only done for the non-Chromium versions of the MS Edge browser).
Finally, we can slightly reduce the size of the Chromium-extension since the polyfill shouldn't be needed there either.
Once we're finally able to get rid of SystemJS, which is unfortunately still blocked on [bug 1247687](https://bugzilla.mozilla.org/show_bug.cgi?id=1247687), we might also want to clean-up (or even completely remove) the `BaseException` abstraction and simply extend `Error` directly instead.
At that point we'd need to (explicitly) set the `name` on each class anyway, so this patch is essentially preparing for future clean-up. Furthermore, after the `BaseException` abstraction was added there's been *multiple* issues filed about third-party minification breaking our code since `this.constructor.name` is not guaranteed to always do what you intended.
While hard-coding the strings indeed feels quite unfortunate, it's likely the "best" solution to avoid the problem described above.
The value of the `renderInteractiveForms` parameter, as passed to the `PDFPageProxy.render` method, will (potentially) affect the size/content of the operatorList that's returned from the worker (for documents with forms).
Given that operatorLists will generally, unless they contain huge images, be cached in the API, repeated `PDFPageProxy.render` calls that *only* change the `renderInteractiveForms` parameter can thus return an incorrect operatorList.
As far as I can tell, this *subtle* bug has existed ever since `renderInteractiveForms`-support was first added in PR 7633 (which is almost five years ago).
With the previous patch, fixing this is now really simple by "encoding" the `renderInteractiveForms` parameter in the *internal* renderingIntent handling.
With the changes made in PR 13746 the *internal* renderingIntent handling became somewhat "messy", since we're now having to do string-matching in various spots in order to handle the "oplist"-intent correctly.
Hence this patch, which implements the idea from PR 13746 to convert the `intent`-strings, used in various API-methods, into an *internal* renderingIntent that's implemented using a bit-field instead. *Please note:* This part of the patch, in itself, does *not* change the public API (but see below).
This patch is tagged `api-minor` for the following reasons:
1. It changes the *default* value for the `intent` parameter, in the `PDFPageProxy.getAnnotations` method, to "display" in order to be consistent across the API.
2. In order to get *all* annotations, with the `PDFPageProxy.getAnnotations` method, you now need to explicitly set "any" as the `intent` parameter.
3. The `PDFPageProxy.getOperatorList` method will now also support the new "any" intent, to allow accessing the operatorList of all annotations (limited to those types that have one).
4. Finally, for consistency across the API, the `PDFPageProxy.render` method also support the new "any" intent (although I'm not sure how useful that'll be).
Points 1 and 2 above are the significant, and thus breaking, changes in *default* behaviour here. However, unfortunately I cannot see a good way to improve the overall API while also keeping `PDFPageProxy.getAnnotations` unchanged.
This patch makes use of the existing `ignoreErrors` option, thus allowing a page to continue parsing/rendering even if (some of) its sub-streams are corrupt. Obviously this may cause *part* of a page to be broken/missing, however it should be better than (potentially) rendering nothing.
Also, to the best of my knowledge, this is the first bug of its kind that we've encountered.
To avoid having to pass in a bunch of, for a `BaseStream`-instance, mostly unrelated parameters when initializing a `StreamsSequenceStream`-instance, I settled on utilizing a callback function instead to allow conditional Error-suppression.
Note that the `StreamsSequenceStream`-class is a *special* stream-implementation that we only use when the `/Contents`-entry, in the `/Page`-dictionary, consists of an Array with streams.
Given that `DOMMatrix` is, unsurprisingly, not supported in Node.js the `createMatrix` helper function in `src/display/pattern_helper.js` is most likely broken in Node.js environments. It will obviously try to fallback to the `DOMSVGFactory`, however that isn't intended for Node.js usage and errors will be thrown.
Rather than trying to implement a `NodeSVGFactory`, this patch takes the easier route of just adding a `DOMMatrix` polyfill using: https://www.npmjs.com/package/dommatrix
This isn't done only for simplicity, but it'll become necessary anyway since the `createMatrix` helper function is only temporary and will be removed in the future.
The building of glyph paths, in the `FontRendererFactory`, can fail in various ways for corrupt font data. However, we're currently not attempting to handle any such errors in the evaluator, which means that a single broken glyph *can* prevent an entire page from rendering.
To address this we simply have to pass along, and check, the existing `ignoreErrors` option in `PartialEvaluator.buildFontPaths` similar to the rest of the `PartialEvaluator` code.
To get the maximum benefit from something like Prettier, you obviously don't want to disable the automatic formatting unless absolutely necessary. When we added Prettier there were a number of cases, mostly involving larger Arrays, which required disabling of the automatic formatting for overall readability and/or to not break inline comments.
With changes in Prettier version `2.3.0`, see [the release notes](https://prettier.io/blog/2021/05/09/2.3.0.html#concise-formatting-of-number-only-arrays-10106httpsgithubcomprettierprettierpull10106-10160httpsgithubcomprettierprettierpull10160-by-thorn0httpsgithubcomthorn0), there's now better formatting support for Arrays containing only numbers. Hence we can now remove a number of `// prettier-ignore` comments, and thus get the benefit of automatic formatting in (slightly) more of the code-base.
With the changes in PR 13361, we're now using the `CanvasPattern.setTransform()` method when rendering certain Shadings/Patterns.
Note that while `CanvasPattern` itself has been supported since basically "forever", its `setTransform` method is a slightly newer addition to the specification; please refer to https://developer.mozilla.org/en-US/docs/Web/API/CanvasPattern#browser_compatibility
Rather than trying to re-write PR 13361 to not use, or possibly spending time/effort (if possible) polyfilling, `CanvasPattern.setTransform()` this patch thus suggests that we simply update the *minimum* supported browser versions instead.
According to the compatibility data linked above, the *minimum* supported browser versions in the PDF.js library are now as follows:
- Chrome >= 68, which was released on 2018-07-24.[1]
- Firefox ESR, see https://wiki.mozilla.org/Release_Management/Calendar.
- Safari >= 11.1, which was release on 2018-03-29.[2]
(Given that the PDF.js contributors cannot realistically test a bunch of old browsers, it's not unimaginable that some older browser versions are already not working with the PDF.js library.)
Based on these changes, which we should ensure are reflected in the Wiki as well, we can also remove a number of now redundant polyfills. Furthermore we'll no longer "claim" to support Windows XP, note the `gulpfile.js` changes, which should definitely *not* be an issue given that it's no longer officially supported.[3]
---
[1] According to https://en.wikipedia.org/wiki/Google_Chrome_version_history
[2] According to https://en.wikipedia.org/wiki/Safari_version_history#Safari_11
[3] According to https://en.wikipedia.org/wiki/Windows_XP#End_of_support
- but don't validate them for now;
- Firefox will display a bar to warn that the signature validation is not supported (see https://bugzilla.mozilla.org/show_bug.cgi?id=854315)
- almost all (all ?) pdf readers display signatures;
- validation is done in edge but for now it's behind a pref.
Based on this compatibility information, given that IE 11 is now *explicitly* unsupported, we should no longer need to bundle a `URL` polyfill in any builds: https://developer.mozilla.org/en-US/docs/Web/API/URL/URL#browser_compatibility
Note that the caveat listed for older Safari-versions doesn't apply to any code in the PDF.js library, since we never call `new URL(url, undefined)` in the code-base.
Note also that Node.js has a web-compatible `URL` implementation, which according to the "History" section at https://nodejs.org/api/url.html#url_the_whatwg_url_api has been available since Node.js `10.0.0` (according to https://nodejs.org/en/about/releases/ that branch is one month away from being EOL-ed).
A significant portion of the code-base has now been converted to use `let`/`const`, rather than `var`, hence it should be possible to simply enable the ESLint `no-var` rule globally.
This way we can ensure that new code won't accidentally use `var`, and it also removes the need to manually enable the rule in various folders.
Obviously it makes sense to continue the efforts to replace `var`, but that should probably happen on a file and/or folder basis.
Please note that this patch excludes the following code:
- The `extensions/` folder, since that seemed easiest for now (and I don't know exactly what the support situation is for the Chromium-extension).
- The entire `external/` folder is ignored, since most of it's currently excluded from linting.
For the code that isn't imported from elsewhere (and should be ignored), we should probably (at some point) bring the code up to the same linting/formatting standard as the rest of the code-base.
- Various files in the `test/` folder are ignored, as necessary, since the way that a lot of this code is loaded will require some care (or perhaps larger re-factoring) when removing `var` usage.
Given that it's only used with `Map`s, and that it's currently implemented in such a way that we (indirectly) must iterate through the data *twice*, some simplification cannot hurt here.
Note that the only reason that we're not using `Object.fromEntries(...)` directly, at each call-site, is that that one won't guarantee that a `null` prototype is being used.
The `compareByteArrays` is first of all duplicated in multiple closures in the `src/core/crypto.js` file. Secondly, despite its name, it's also functionally equivalent to the now existing `isArrayEqual` helper function.
The `isArrayEqual` helper function is changed to use a standard `for`-loop, rather than `Array.prototype.every`, since that ought to be slightly more efficient given that we're now using it with (potentially) larger data.
Note that this particular helper function is, with the exception of the `GENERIC` default viewer and the (unsupported) SVG-backend, mostly unused at this point in time. Hence we should be able to clean-up this helper function slightly.
Also, fixes a small inconsistency in the `SVGGraphics` initialization in the viewer, by passing in the `disableCreateObjectURL` compatibility-option. Given that the SVG-backend isn't officially supported/recommended this shouldn't have been an issue, but given that I spotted this it can't hurt to fix it.
With the previous patch this function is now *only* accessed on the worker-thread, hence it's no longer necessary to include it in the *built* `pdf.js` file.
With the previous patch this functionality is now *only* accessed on the worker-thread, hence it's no longer necessary to include it in the *built* `pdf.js` file.
The only reason, as far as I can tell, for parsing the Metadata on the main-thread is how it was originally implemented. When Metadata support was first implemented, it utilized the [`DOMParser`](https://developer.mozilla.org/en-US/docs/Web/API/DOMParser) which isn't available in workers.
Today, with the custom XML-parser being used, that's no longer an issue and it seems reasonable to move the Metadata parsing to the worker-thread[1], since that's where all parsing should happen (for performance reasons).
Based on these changes, we'll be able to reduce the now unnecessary duplication of the XML-parser (and related code) in both of the *built* `pdf.js`/`pdf.worker.js` files.
Finally, this patch changes the `_repair` method to use "Array + join" rather than string concatenation.
---
[1] This needed the previous patch, to enable sending of `Map`s between threads with workers disabled.
- the parser is base on a class extending XMLParserBase
- it handle xml namespaces:
* each namespace is assocated with a builder
* builder builds nodes belonging to the namespace
* when a node is inserted in the parent namespace compatibility is checked (if required)
- to avoid name collision between xml names and object properties, use Symbol.
There's built-in ESLint rule, see `sort-imports`, to ensure that all `import`-statements are sorted alphabetically, since that often helps with readability.
Unfortunately there's no corresponding rule to sort `export`-statements alphabetically, however there's an ESLint plugin which does this; please see https://www.npmjs.com/package/eslint-plugin-sort-exports
The only downside here is that it's not automatically fixable, but the re-ordering is a one-time "cost" and the plugin will help maintain a *consistent* ordering of `export`-statements in the future.
*Note:* To reduce the possibility of introducing any errors here, the re-ordering was done by simply selecting the relevant lines and then using the built-in sort-functionality of my editor.
* the goal is to execute actions like Open or OpenAction
* can be tested with issue6106.pdf (auto-print)
* once #12701 is merged, we can add page actions
*This is a recent regression, which I stumbled upon while working on cleaning-up the gulpfile related to `pdf.sandbox.js` building.*
By placing the `ColorConverters` functionality in the `src/display/display_utils.js` file, you end up including a *significant* chunk of the `pdf.js` file in the built `pdf.scripting.js`/`pdf.sandbox.js` files.
Given that I cannot imagine that this was actually intended, since it inflates the built files with unnecessary/unused code, this moves `ColorConverters` to a new file instead (thus breaking the dependencies).
To hopefully reduce the risk future bugs, along these lines, a big comment is also placed at the top of the new file.
Finally, the `ColorConverters` is converted to a class with static methods, since this felt slightly cleaner overall.
Note that a number of these cases are covered by existing unit-tests, and a few others only matter for the development/build scripts.
Furthermore, I've also tried to the best of my ability to test each case *manually* to hopefully further reduce the likelihood of this patch introducing any bugs.
Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-useless-escape
* it's faster to generate the color code in using a table for components
* it's very likely a way faster to parse (when setting the color in the canvas)
This fixes only those warnings, as reported by https://lgtm.com/projects/g/mozilla/pdf.js?mode=list, that make sense (as far as I'm concerned).
Hence this patch leaves the following things unaddressed:
- The "recommendation"-category, since it only complains about unused variables. However, note that all of those cases are purposely included and that there's thus ESLint-disable comments added to explictly allow them.
- The "warning"-category, which still contains two complaints. However, as far as I can tell, they are both false positives.
Given first of all the false positives of the LGTM static analyzer, and secondly that we'd need to add (essentially duplicated) disable-comments for the unused variable cases, it's not entirely clear to me if we actually want to work towards including LGTM in the PDF.js project (e.g. running alongside Travis) or if we should just close issue 11965.
Rather than returning an *empty* Object[1] we should be returning `null` instead, since that's consistent with existing API-functionality.
To avoid having to *manually* track if the Object is empty, this patch also introduces a small helper function to check its size.
This patch removes unnecessary escape-sequence in (mostly) strings, as a first step, since the ones in regular expressions probably requires more careful testing (just in case).
The only exception is a regular expression in `src/core/annotation.js`, since we should have both unit- and reference-tests for this code *and* given [this information on MDN](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Character_Classes#Types):
> Inside a character set, the dot loses its special meaning and matches a literal dot.
Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-useless-escape
This provides a work-around to avoid having to conditionally try to initialize the `openAction`-object in multiple places.
Given that `Object.fromEntries` doesn't seem to *guarantee* that a `null` prototype is used, we thus hack around that by using `Object.assign` with `Object.create(null)`.
Given that this code is used on the worker-thread, where SystemJS is still used during development, we need to (for now) handle this folder the same way as the `src/core/` one.
This simplifies/consolidates the ESLint configuration slightly in the `src/` folder, and prevents the addition of any new files where `var` is being used.[1]
Hence we no longer need to manually add `/* eslint no-var: error */` in files, which is easy to forget, and can instead disable the rule in the `src/core/` files where `var` is still in use.
---
[1] Obviously the `no-var` rule can, in the same way as every other rule, be disabled on a case-by-case basis where actually necessary.
Previously this rule has been enabled in the `web/` folder, and in select files in the `src/` sub-folders.
In this case, enabling of this rule didn't actually require any further code changes.
Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-var
* Move display/xml_parser.js in shared to use it in worker
* Save form data in XFA datasets when pdf is a mix of acroforms and xfa
Co-authored-by: Brendan Dahl <brendan.dahl@gmail.com>