This way there cannot be any *incorrect* cache hits, since Refs are guaranteed to be unique.
Please note that the reason for caching by Ref rather than doing something along the lines of the `localShadingPatternCache` (which uses a `Map` directly), is that TilingPatterns are streams and those cannot be cached on the `XRef`-instance (this way we avoid unnecessary parsing).
*This patch is very similar to the recently fixed `renderInteractiveForms`-options, see PR 13867.*
As far as I can tell, this *subtle* bug has existed ever since `AnnotationStorage`-support was first added in PR 12106 (a little over a year ago).
The value of the `includeAnnotationStorage`-option, as passed to the `PDFPageProxy.render` method, will (potentially) affect the size/content of the operatorList that's returned from the worker (for documents with forms).
Given that operatorLists will generally, unless they contain huge images, be cached in the API, repeated `PDFPageProxy.render` calls where the form-data has been changed by the user in between, can thus *wrongly* return a cached operatorList.
In the viewer we're only using the `includeAnnotationStorage`-option when printing, which is probably why this has gone unnoticed for so long. Note that we, for performance reasons, don't cache printing-operatorLists in the API.
However, there's nothing stopping an API-user from using the `includeAnnotationStorage`-option during "normal" rendering, which could thus result in *subtle* (and difficult to understand) rendering bugs.
In order to handle this, we need to know if the `AnnotationStorage`-instance has been updated since the last `PDFPageProxy.render` call. The most "correct" solution would obviously be to create a hash of the `AnnotationStorage` contents, however that would require adding a bunch of code, complexity, and runtime overhead.
Given that operatorList caching in the API doesn't have to be perfect[1], but only have to avoid *false* cache-hits, we can simplify things significantly be only keeping track of the last time that the `AnnotationStorage`-data was modified.
*Please note:* While working on this patch, I also noticed that the `renderInteractiveForms`- and `includeAnnotationStorage`-options in the `PDFPageProxy.render` method are mutually exclusive.[2]
Given that the various Annotation-related options in `PDFPageProxy.render` have been added at different times, this has unfortunately led to the current "messy" situation.[3]
---
[1] Note how we're already not caching operatorLists for pages with *huge* images, in order to save memory, hence there's no guarantee that operatorLists will always be cached.
[2] Setting both to `true` will result in undefined behaviour, since trying to insert `AnnotationStorage`-values into fields that are being excluded from the operatorList-building will obviously not work, which isn't at all clear from the documentation.
[3] My intention is to try and fix this in a follow-up PR, and I've got a WIP patch locally, however it will result in a number of API-observable changes.
By not adding any additional non-`Dict` entries to the list of candidates for merging of sub-dictionaries, we can very slightly reduce the amount of parsing required by not having to *again* iterate through unmergeable data.
Once we're finally able to get rid of SystemJS, which is unfortunately still blocked on [bug 1247687](https://bugzilla.mozilla.org/show_bug.cgi?id=1247687), we might also want to clean-up (or even completely remove) the `BaseException` abstraction and simply extend `Error` directly instead.
At that point we'd need to (explicitly) set the `name` on each class anyway, so this patch is essentially preparing for future clean-up. Furthermore, after the `BaseException` abstraction was added there's been *multiple* issues filed about third-party minification breaking our code since `this.constructor.name` is not guaranteed to always do what you intended.
While hard-coding the strings indeed feels quite unfortunate, it's likely the "best" solution to avoid the problem described above.
The value of the `renderInteractiveForms` parameter, as passed to the `PDFPageProxy.render` method, will (potentially) affect the size/content of the operatorList that's returned from the worker (for documents with forms).
Given that operatorLists will generally, unless they contain huge images, be cached in the API, repeated `PDFPageProxy.render` calls that *only* change the `renderInteractiveForms` parameter can thus return an incorrect operatorList.
As far as I can tell, this *subtle* bug has existed ever since `renderInteractiveForms`-support was first added in PR 7633 (which is almost five years ago).
With the previous patch, fixing this is now really simple by "encoding" the `renderInteractiveForms` parameter in the *internal* renderingIntent handling.
With the changes made in PR 13746 the *internal* renderingIntent handling became somewhat "messy", since we're now having to do string-matching in various spots in order to handle the "oplist"-intent correctly.
Hence this patch, which implements the idea from PR 13746 to convert the `intent`-strings, used in various API-methods, into an *internal* renderingIntent that's implemented using a bit-field instead. *Please note:* This part of the patch, in itself, does *not* change the public API (but see below).
This patch is tagged `api-minor` for the following reasons:
1. It changes the *default* value for the `intent` parameter, in the `PDFPageProxy.getAnnotations` method, to "display" in order to be consistent across the API.
2. In order to get *all* annotations, with the `PDFPageProxy.getAnnotations` method, you now need to explicitly set "any" as the `intent` parameter.
3. The `PDFPageProxy.getOperatorList` method will now also support the new "any" intent, to allow accessing the operatorList of all annotations (limited to those types that have one).
4. Finally, for consistency across the API, the `PDFPageProxy.render` method also support the new "any" intent (although I'm not sure how useful that'll be).
Points 1 and 2 above are the significant, and thus breaking, changes in *default* behaviour here. However, unfortunately I cannot see a good way to improve the overall API while also keeping `PDFPageProxy.getAnnotations` unchanged.
Given how trivial the `isEOF` function is, we can simply inline the check at the various call-sites and remove the function (which ought to be ever so slightly more efficient as well).
Furthermore, this patch also changes the `EOF` primitive itself to a `Symbol` instead of an Object since that has the nice benefit of making it unclonable (thus preventing *accidentally* trying to send `EOF` from the worker-thread).
Given that the GitHub Advanced Security workflow now covers everything that LGTM does, but generally faster and with better GitHub-integration, there's no longer much point in also running LGTM separately.
As a follow-up to this patch, we should also disable/remove the LGTM-integration from the PDF.js repository.
The PDF in bug 1721949 uses many unique pattern objects
that references the same shading many times. This caused
a new canvas pattern to be created and cached many times
driving up memory use.
To fix, I've changed the cache in the worker to key off the
shading object and instead send the shading and matrix
separately. While that worked well to fix the above bug,
there could be PDFs that use many shading that could
cause memory issues, so I've also added a LRU cache
on the main thread for canvas patterns. This should prevent
memory use from getting too high.
- All the scale factors in for the substitution font were wrong because of different glyph positions between Liberation and the other ones:
- regenerate all the factors
- Text may have polish chars for example and in this case the glyph widths were wrong:
- treat substitution font as a composite one
- add a map glyphIndex to unicode for Liberation in order to generate width array for cid font
- In order to better compute text fields size, use line height with no gaps (and consequently guessed height for text are slightly better in general).
- Fix default background color in fields.
- an Image element was created, attached to its parent but the $globalData property was not set and that led to an error.
- the pdf in bug 1722029 has 27 rendered rows (checked in Acrobat) when only one was displayed: this patch some binding issues around the occur element.
This patch makes use of the existing `ignoreErrors` option, thus allowing a page to continue parsing/rendering even if (some of) its sub-streams are corrupt. Obviously this may cause *part* of a page to be broken/missing, however it should be better than (potentially) rendering nothing.
Also, to the best of my knowledge, this is the first bug of its kind that we've encountered.
To avoid having to pass in a bunch of, for a `BaseStream`-instance, mostly unrelated parameters when initializing a `StreamsSequenceStream`-instance, I settled on utilizing a callback function instead to allow conditional Error-suppression.
Note that the `StreamsSequenceStream`-class is a *special* stream-implementation that we only use when the `/Contents`-entry, in the `/Page`-dictionary, consists of an Array with streams.
Apparently I didn't put one of the disable comments on the *correct* line, since I didn't read the instructions carefully enough, so let's try again.
Note that, most unfortunately, disabling of warnings isn't applied until *after* a patch has been merged.
Most of the warnings we don't really care about, and those are simply white-listed using inline comments; however two cases prompted actual code changes:
- In `src/display/pattern_helper.js` the branch in question is indeed unreachable, and should thus be safe to remove. (This code originated in PR 4192, which is now over seven years ago.)
- In `test/test.js`, the function in question indeed doesn't accept any arguments. (The patch also re-formats a string just above, which didn't seem worthy of a separated patch.)
This now leaves only *one* warning in the LGTM report, however that one is a false positive that we'll need to report upstream.
In cases where even the very *first* attempt at reading from an object will throw, simply ignoring such objects will help improve rendering of *some* corrupt documents.
Note that this will lead to more parsing in some cases, but considering that this only applies to *corrupt* documents that shouldn't be a big deal.
Bug 1721218 has a shading pattern that was used thousands of times.
To improve performance of this PDF:
- add a cache for patterns in the evaluator and only send the IR form once
to the main thread (this also makes caching in canvas easier)
- cache the created canvas radial/axial patterns
- for shading fill radial/axial use the pattern directly instead of creating temporary
canvas