Fixes two issues:
- #4456 : The first 100 bytes are often not unique as they can be
filled with standard PDF headers - so we use the first 200 KB instead.
(This may be overkill)
- Some documents we encountered have invalid xref ids, which were
always coming out as ‘0000000000000000’ - so we detect that and use the
MD5 instead.
When submitting PR 5276 there wasn't a good PDF file to include in the test suite. However, with https://bugzilla.mozilla.org/show_bug.cgi?id=1108753, we now have a better source for a test file, hence this patch.
maskData comes out of maskCtx.getImageData, so is 0..255 clamped, and
the used multiplications will not create fractions needing rounding,
neither would addition.
Currently if you manage to e.g. open the console (with <kbd>Ctrl</kbd>+<kbd>Shift</kbd>+<kbd>K</kbd>) before the viewer is initialized, the following will be printed in the console: `TypeError: pdfViewer is null`.
This doesn't cause any actual errors, but nevertheless it seems like something we should avoid.
Followup to PR 5413.
When a search term isn't found, the background color of the findInput is supposed to change (to red). This is currently not working as intended, because the CSS rule is not being applied correctly. (It seems that this broke in PR 2208.)
This patch also changes the background color to match the one used in the native Firefox findbar, since the old color seemed a bit too pink.
As described in #5444, the evaluator will perform identity checking of
paintImageMaskXObjects to decide if it can use
paintImageMaskXObjectRepeat instead of paintImageMaskXObjectGroup.
This can only ever work if the entry is a cache hit. However the
previous caching implementation was doing a lazy caching, which would
only consider a image cache worthy if it is repeated.
Only then the repeated instance would be cached.
As a result of this the sequence of identical images A1 A2 A3 A4 would
be seen as A1 A2 A2 A2 by the evaluator, which prevents using the
"repeat" optimization. Also only the last encountered image is cached,
so A1 B1 A2 B2, would stay A1 B1 A2 B2.
The new implementation drops the "lazy" init of the cache. The threshold
for enabling an image to be cached is rather small, so the potential waste
in storage and adler32 calculation is rather low. It also caches any
eligible image by its adler32.
The two example from above would now be A1 A1 A1 A1 and A1 B1 A1 B1
which not only saves temporary storage, but also prevents computing
identical masks over and over again (which is the main performance impact
of #2618)