At this point in time, all of the supported browsers (in the PDF.js project) have native `ReadableStream` implementations; see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#browser_compatibility
Hence the polyfill is *only* necessary in Node.js environments now, and we shouldn't need to do any detailed feature detection either (since that was only done for the non-Chromium versions of the MS Edge browser).
Finally, we can slightly reduce the size of the Chromium-extension since the polyfill shouldn't be needed there either.
By not adding any additional non-`Dict` entries to the list of candidates for merging of sub-dictionaries, we can very slightly reduce the amount of parsing required by not having to *again* iterate through unmergeable data.
Once we're finally able to get rid of SystemJS, which is unfortunately still blocked on [bug 1247687](https://bugzilla.mozilla.org/show_bug.cgi?id=1247687), we might also want to clean-up (or even completely remove) the `BaseException` abstraction and simply extend `Error` directly instead.
At that point we'd need to (explicitly) set the `name` on each class anyway, so this patch is essentially preparing for future clean-up. Furthermore, after the `BaseException` abstraction was added there's been *multiple* issues filed about third-party minification breaking our code since `this.constructor.name` is not guaranteed to always do what you intended.
While hard-coding the strings indeed feels quite unfortunate, it's likely the "best" solution to avoid the problem described above.
Rather than caching only the *last* `PDFPageProxy.getAnnotations` call, and having to handle the intent separately, we can instead implement the caching in exactly the same way as done in the `PDFPageProxy.{render, getOperatorList}` methods.
This patch removes the only remaining closure in the `src/display/api.js` file, utilizing a similar approach as used in lots of other parts of the code-base, which results in a small decrease in the size of the *build* `pdf.js` file.
Given that `PDFWorker` is exposed through the *public* API, this complicates things somewhat since there's a couple of worker-related properties that really should stay *private*. Initially, while working on PR 13813, I believed that we'd need support for private (static) class fields in order to get rid of this closure, however I've managed to come up with what's hopefully deemed an acceptable work-around here.
Furthermore, some helper functions were simply moved into the `PDFWorker` class as static methods, thus simplifying the overall implementation (e.g. we don't need to manually cache the Promise in the `PDFWorker._setupFakeWorkerGlobal`-method).
Finally, as part of this re-factoring a number of missing JSDoc-comments were added which *together* with the removal of the closure significantly improves the `gulp jsdoc` output for the `PDFWorker` class.
*Please note:* This patch is tagged with `api-minor` since it deprecates `PDFWorker.getWorkerSrc()` in favor of the shorter `PDFWorker.workerSrc`, with the fallback limited to `GENERIC` builds.
The value of the `renderInteractiveForms` parameter, as passed to the `PDFPageProxy.render` method, will (potentially) affect the size/content of the operatorList that's returned from the worker (for documents with forms).
Given that operatorLists will generally, unless they contain huge images, be cached in the API, repeated `PDFPageProxy.render` calls that *only* change the `renderInteractiveForms` parameter can thus return an incorrect operatorList.
As far as I can tell, this *subtle* bug has existed ever since `renderInteractiveForms`-support was first added in PR 7633 (which is almost five years ago).
With the previous patch, fixing this is now really simple by "encoding" the `renderInteractiveForms` parameter in the *internal* renderingIntent handling.
With the changes made in PR 13746 the *internal* renderingIntent handling became somewhat "messy", since we're now having to do string-matching in various spots in order to handle the "oplist"-intent correctly.
Hence this patch, which implements the idea from PR 13746 to convert the `intent`-strings, used in various API-methods, into an *internal* renderingIntent that's implemented using a bit-field instead. *Please note:* This part of the patch, in itself, does *not* change the public API (but see below).
This patch is tagged `api-minor` for the following reasons:
1. It changes the *default* value for the `intent` parameter, in the `PDFPageProxy.getAnnotations` method, to "display" in order to be consistent across the API.
2. In order to get *all* annotations, with the `PDFPageProxy.getAnnotations` method, you now need to explicitly set "any" as the `intent` parameter.
3. The `PDFPageProxy.getOperatorList` method will now also support the new "any" intent, to allow accessing the operatorList of all annotations (limited to those types that have one).
4. Finally, for consistency across the API, the `PDFPageProxy.render` method also support the new "any" intent (although I'm not sure how useful that'll be).
Points 1 and 2 above are the significant, and thus breaking, changes in *default* behaviour here. However, unfortunately I cannot see a good way to improve the overall API while also keeping `PDFPageProxy.getAnnotations` unchanged.
Given how trivial the `isEOF` function is, we can simply inline the check at the various call-sites and remove the function (which ought to be ever so slightly more efficient as well).
Furthermore, this patch also changes the `EOF` primitive itself to a `Symbol` instead of an Object since that has the nice benefit of making it unclonable (thus preventing *accidentally* trying to send `EOF` from the worker-thread).
Given that the GitHub Advanced Security workflow now covers everything that LGTM does, but generally faster and with better GitHub-integration, there's no longer much point in also running LGTM separately.
As a follow-up to this patch, we should also disable/remove the LGTM-integration from the PDF.js repository.
While fixing issue 13794, I noticed that cancelling the `ReadableStream` returned by the `PDFPageProxy.streamTextContent`-method could lead to "Uncaught promise" messages in the console.[1]
Generally speaking, we don't really care about errors when *cancelling* a `ReadableStream` and it thus seems reasonable to simply suppress any output in those cases.
---
[1] Although, after that issue was fixed you'd now need to set the API-option `stopAtErrors = true` to actually trigger this.
This patch utilizes the same approach as used in lots of other parts of the code-base, which thus *slightly* reduces the size of this code.
By removing some of the (current) indirection, we can also simplify the JSDocs a little bit. Looking at the `gulp jsdoc` output, this actually seem to *improve* the documentation for this class.
The PDF in bug 1721949 uses many unique pattern objects
that references the same shading many times. This caused
a new canvas pattern to be created and cached many times
driving up memory use.
To fix, I've changed the cache in the worker to key off the
shading object and instead send the shading and matrix
separately. While that worked well to fix the above bug,
there could be PDFs that use many shading that could
cause memory issues, so I've also added a LRU cache
on the main thread for canvas patterns. This should prevent
memory use from getting too high.
- All the scale factors in for the substitution font were wrong because of different glyph positions between Liberation and the other ones:
- regenerate all the factors
- Text may have polish chars for example and in this case the glyph widths were wrong:
- treat substitution font as a composite one
- add a map glyphIndex to unicode for Liberation in order to generate width array for cid font
- In order to better compute text fields size, use line height with no gaps (and consequently guessed height for text are slightly better in general).
- Fix default background color in fields.
For code that's part of the core library, rather than e.g. the `web/`-folder, we should always be careful about *directly* accessing any DOM methods.
The `navigator` is one such structure, which shouldn't be assumed to always be available and we should thus check that it's actually present.[1]
Hence this patch re-factors the `navigator.platform` access, in the `AnnotationLayer`-code, to ensure that it's generally safe. Furthermore, to reduce unnecessary repeated string-matching to determine the current platform, we're now using a shadowed getter which is evaluated only once instead (at first access).
---
[1] Note e.g. the `isSyncFontLoadingSupported` getter, in the `src/display/font_loader.js` file.
- an Image element was created, attached to its parent but the $globalData property was not set and that led to an error.
- the pdf in bug 1722029 has 27 rendered rows (checked in Acrobat) when only one was displayed: this patch some binding issues around the occur element.
This patch makes use of the existing `ignoreErrors` option, thus allowing a page to continue parsing/rendering even if (some of) its sub-streams are corrupt. Obviously this may cause *part* of a page to be broken/missing, however it should be better than (potentially) rendering nothing.
Also, to the best of my knowledge, this is the first bug of its kind that we've encountered.
To avoid having to pass in a bunch of, for a `BaseStream`-instance, mostly unrelated parameters when initializing a `StreamsSequenceStream`-instance, I settled on utilizing a callback function instead to allow conditional Error-suppression.
Note that the `StreamsSequenceStream`-class is a *special* stream-implementation that we only use when the `/Contents`-entry, in the `/Page`-dictionary, consists of an Array with streams.
There's no good reason, as far as I can tell, to explicitly define a bunch of methods to be `undefined`, which the current unconditional "copying" of methods will do.
Note that of the `OPS` ~23 percent don't, for various reasons, have an associated method on the `CanvasGraphics.prototype`.
For e.g. the `gulp mozcentral` command, the *built* `pdf.js` file decreases from `304 607` to `301 295` bytes with this patch. The improvement comes mostly from having less overall indentation in the code.