This patch removes the only remaining closure in the `src/display/api.js` file, utilizing a similar approach as used in lots of other parts of the code-base, which results in a small decrease in the size of the *build* `pdf.js` file.
Given that `PDFWorker` is exposed through the *public* API, this complicates things somewhat since there's a couple of worker-related properties that really should stay *private*. Initially, while working on PR 13813, I believed that we'd need support for private (static) class fields in order to get rid of this closure, however I've managed to come up with what's hopefully deemed an acceptable work-around here.
Furthermore, some helper functions were simply moved into the `PDFWorker` class as static methods, thus simplifying the overall implementation (e.g. we don't need to manually cache the Promise in the `PDFWorker._setupFakeWorkerGlobal`-method).
Finally, as part of this re-factoring a number of missing JSDoc-comments were added which *together* with the removal of the closure significantly improves the `gulp jsdoc` output for the `PDFWorker` class.
*Please note:* This patch is tagged with `api-minor` since it deprecates `PDFWorker.getWorkerSrc()` in favor of the shorter `PDFWorker.workerSrc`, with the fallback limited to `GENERIC` builds.
It shouldn't be necessary to assign these variables to the global scope (as far as I can tell), either explicitly with `window` or implicitly with `var`, and this way we don't need to disable the ESLint `no-undef` rule; fixes another small part of issue 13862.
*Please note:* I wasn't going to put additional work into this code after PR 13869, however these changes looked so simple that I figured trying to get rid of the few remaining "Code scanning alerts" wouldn't hurt.
However, this file would still very much benefit from additional clean-up and re-factoring work, since it's quite old and currently contains some dead code (commented out).
With the changes made in PR 13746 the *internal* renderingIntent handling became somewhat "messy", since we're now having to do string-matching in various spots in order to handle the "oplist"-intent correctly.
Hence this patch, which implements the idea from PR 13746 to convert the `intent`-strings, used in various API-methods, into an *internal* renderingIntent that's implemented using a bit-field instead. *Please note:* This part of the patch, in itself, does *not* change the public API (but see below).
This patch is tagged `api-minor` for the following reasons:
1. It changes the *default* value for the `intent` parameter, in the `PDFPageProxy.getAnnotations` method, to "display" in order to be consistent across the API.
2. In order to get *all* annotations, with the `PDFPageProxy.getAnnotations` method, you now need to explicitly set "any" as the `intent` parameter.
3. The `PDFPageProxy.getOperatorList` method will now also support the new "any" intent, to allow accessing the operatorList of all annotations (limited to those types that have one).
4. Finally, for consistency across the API, the `PDFPageProxy.render` method also support the new "any" intent (although I'm not sure how useful that'll be).
Points 1 and 2 above are the significant, and thus breaking, changes in *default* behaviour here. However, unfortunately I cannot see a good way to improve the overall API while also keeping `PDFPageProxy.getAnnotations` unchanged.
Given that issue 13862 tracks updating/modernizing the code, this patch purposely limits the scope of the changes. In particular, the following things are still left to address:
- The ESLint `no-undef` errors; for now the rule is simply disabled globally in this file.
- A couple of unused variables are commented out for now, but could perhaps just be removed.
The new command is a *variation* of the standard `gulp test` command and will run all unit/font/integration-tests just as normal, while *only* running ref-tests for XFA-documents to speed up development.
Given that we currently have (some) unit-tests for XFA-documents, and that we may also (in the future) want to add integration-tests, it thus makes sense to run all test-suites in my opinion.
*Please note:* Once this patch has landed, I'll submit a follow-up patch to https://github.com/mozilla/botio-files-pdfjs such that we can also run the new command on the bots.
Given how trivial the `isEOF` function is, we can simply inline the check at the various call-sites and remove the function (which ought to be ever so slightly more efficient as well).
Furthermore, this patch also changes the `EOF` primitive itself to a `Symbol` instead of an Object since that has the nice benefit of making it unclonable (thus preventing *accidentally* trying to send `EOF` from the worker-thread).
Given that the GitHub Advanced Security workflow now covers everything that LGTM does, but generally faster and with better GitHub-integration, there's no longer much point in also running LGTM separately.
As a follow-up to this patch, we should also disable/remove the LGTM-integration from the PDF.js repository.
This patch utilizes the same approach as used in lots of other parts of the code-base, which thus *slightly* reduces the size of this code.
By removing some of the (current) indirection, we can also simplify the JSDocs a little bit. Looking at the `gulp jsdoc` output, this actually seem to *improve* the documentation for this class.
- All the scale factors in for the substitution font were wrong because of different glyph positions between Liberation and the other ones:
- regenerate all the factors
- Text may have polish chars for example and in this case the glyph widths were wrong:
- treat substitution font as a composite one
- add a map glyphIndex to unicode for Liberation in order to generate width array for cid font
- In order to better compute text fields size, use line height with no gaps (and consequently guessed height for text are slightly better in general).
- Fix default background color in fields.
- an Image element was created, attached to its parent but the $globalData property was not set and that led to an error.
- the pdf in bug 1722029 has 27 rendered rows (checked in Acrobat) when only one was displayed: this patch some binding issues around the occur element.
This patch makes use of the existing `ignoreErrors` option, thus allowing a page to continue parsing/rendering even if (some of) its sub-streams are corrupt. Obviously this may cause *part* of a page to be broken/missing, however it should be better than (potentially) rendering nothing.
Also, to the best of my knowledge, this is the first bug of its kind that we've encountered.
To avoid having to pass in a bunch of, for a `BaseStream`-instance, mostly unrelated parameters when initializing a `StreamsSequenceStream`-instance, I settled on utilizing a callback function instead to allow conditional Error-suppression.
Note that the `StreamsSequenceStream`-class is a *special* stream-implementation that we only use when the `/Contents`-entry, in the `/Page`-dictionary, consists of an Array with streams.
Most of the warnings we don't really care about, and those are simply white-listed using inline comments; however two cases prompted actual code changes:
- In `src/display/pattern_helper.js` the branch in question is indeed unreachable, and should thus be safe to remove. (This code originated in PR 4192, which is now over seven years ago.)
- In `test/test.js`, the function in question indeed doesn't accept any arguments. (The patch also re-formats a string just above, which didn't seem worthy of a separated patch.)
This now leaves only *one* warning in the LGTM report, however that one is a false positive that we'll need to report upstream.
In cases where even the very *first* attempt at reading from an object will throw, simply ignoring such objects will help improve rendering of *some* corrupt documents.
Note that this will lead to more parsing in some cases, but considering that this only applies to *corrupt* documents that shouldn't be a big deal.
Bug 1721218 has a shading pattern that was used thousands of times.
To improve performance of this PDF:
- add a cache for patterns in the evaluator and only send the IR form once
to the main thread (this also makes caching in canvas easier)
- cache the created canvas radial/axial patterns
- for shading fill radial/axial use the pattern directly instead of creating temporary
canvas
- in real life some xfa contains xml like <xfa:data><xfa:Foo><xfa:Bar>...</xfa:data>
since there are no Foo or Bar in the xfa namespace the JS representation are empty
and that leads to errors.
- so the idea is to make all nodes under xfa:data namespace agnostic which means
that ns are removed from nodes in the parser but only xfa:data descendants.
*Please note:* The PDF document in issue 13751 is *dynamically* created (in e.g. Adobe Reader), with pages added when certain buttons are clicked, hence this patch simply fixes the breaking error and nothing more.
It looks like the current code contains a little bit too much copy-and-paste from the *similar* `index` branch above, since we cannot set the `startIndex` to a negative value. Note how it's being used to initialize the loop-variable, which is then used to lookup values in an Array and accessing the `-1`th element of an Array obviously makes no sense.
*This is yet another case where I've got no idea if the patch is correct, but it does at least fix a breaking error :-)*
Note how in the [`Binder._bindValue` method](683ce66a48/src/core/xfa/bind.js (L92-L93)), we're assuming that if a `data`-value exists then it'll also be possible to actually access it. For the `XFAAttribute`-implementation however, the second method is missing and that's what causes the breaking errors in issue 13748.
Please note that another possible way of "fixing" the error wouldn't been to simply change the exists-check to return `false`, and I could see that being a preferred solution.
However, the reason for submitting the current patch is that we get *fewer* warnings about Nodes with mis-matched types this way.
As can be seen in the code (see below), the `searchNode` helper function will return `null` in some cases and all of its call-sites should protect against that before attempting to access the returned data.
While only one of these changes were necessary to fix the breaking errors in issue 13756, in order to prevent future bugs I've added similar defensive code throughout this file.
- 07955fa1d3/src/core/xfa/som.js (L169)
- 07955fa1d3/src/core/xfa/som.js (L239)
- 07955fa1d3/src/core/xfa/som.js (L254)