Fixes two issues:
- #4456 : The first 100 bytes are often not unique as they can be
filled with standard PDF headers - so we use the first 200 KB instead.
(This may be overkill)
- Some documents we encountered have invalid xref ids, which were
always coming out as ‘0000000000000000’ - so we detect that and use the
MD5 instead.
This is a tentative patch that adds *very* basic support for non-embedded Wingdings fonts (a Windows version of Dingbats), by falling back to the ZapfDingbats encoding. Obviously this approach will not work perfectly, but in my opinion it seems to work reasonably well in pratice.
Instead of this very simple patch, another option would be to try and include more complete glyph data for Wingdings, e.g. a Unicode map and glyph widths, similar to what was done for ZapfDingbats.
However there is, in my opinion, one important difference between Wingdings and ZapfDingbats: ZapfDingbats is part of the 14 standard fonts, which in previous versions of the PDF specification was assumed to be available in PDF readers. To improve compatibility with older files, it thus makes sense for us to include data for ZapfDingbats.
However Wingdings has never been a standard font in PDF files, hence PDF files using it *should* thus contain all the necessary font data.
Given the above, I thus believe that it should be OK to fall back to ZapfDingbats for now. If non-embedded Wingdings fonts turns out to be *a lot* more common, then we can revisit this later.
Fixes 4301 completely.
Fixes 4837 almost completely. With this patch the bullets are displayed correctly, but the arrows are not of the correct type.
Fixes `artofwar.pdf`, pages 14 and 15.
maskData comes out of maskCtx.getImageData, so is 0..255 clamped, and
the used multiplications will not create fractions needing rounding,
neither would addition.
As described in #5444, the evaluator will perform identity checking of
paintImageMaskXObjects to decide if it can use
paintImageMaskXObjectRepeat instead of paintImageMaskXObjectGroup.
This can only ever work if the entry is a cache hit. However the
previous caching implementation was doing a lazy caching, which would
only consider a image cache worthy if it is repeated.
Only then the repeated instance would be cached.
As a result of this the sequence of identical images A1 A2 A3 A4 would
be seen as A1 A2 A2 A2 by the evaluator, which prevents using the
"repeat" optimization. Also only the last encountered image is cached,
so A1 B1 A2 B2, would stay A1 B1 A2 B2.
The new implementation drops the "lazy" init of the cache. The threshold
for enabling an image to be cached is rather small, so the potential waste
in storage and adler32 calculation is rather low. It also caches any
eligible image by its adler32.
The two example from above would now be A1 A1 A1 A1 and A1 B1 A1 B1
which not only saves temporary storage, but also prevents computing
identical masks over and over again (which is the main performance impact
of #2618)
This patch makes the image from #5349 appear correctly, the artefacts
for the last packet are fixed in #5426.
This patch also optimizes some "in-checks" and adds a few header parsings.
After PR 5263, setting `disableAutoFetch = true` in the generic viewer no longer works correctly, since the entire file loads even with `disableStream = true`.
Currently when an exception is thrown, we try to reject `workerReadyCapability` with multiple arguments in src/core/api.js. This obviously doesn't work, hence this patch changes that to instead reject with the exception object as is.
In src/core/worker.js the exception is currently (unncessarily) wrapped in an object, so this patch also simplifies that to directly send the exception object instead.
This patch avoids creating many intermediate strings, when adding dummy width/lsb entries for glyphs where those are missing.
For the relevant PDF files in our test suite, the average number of intermediate strings are well over 1000.
The scanned, black-and-white document at
https://bugzilla.mozilla.org/show_bug.cgi?id=835380 doesn't benefit from
the critical GRAYSCALE_1BPP optimization because the optimization is
skipped if `needsDecode` is set.
This change addresses that, and reduces both rendering time and memory
usage for that document by almost 10x.
setGStateForKey() is a closure that serves no particularly useful
purpose. This change inlines it at the single call site. This avoids 1.7
MiB of allocations (because closures are objects) for the MTA map
mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=835380#c17.
For the document in #2504, 11% of the ops are `setGState` with a
`gStateObj` that is an empty array, which is a no-op. This is possible
because we ignore various setGState keys (OP, OPM, BG, etc).
This change prevents these ops from being inserted into the operator
list.
This allows the JS engine to do a better job of allocating the right
number of elements for the array, avoiding some resizings. For the PDF
in #2504, this avoids 100s of MiBs of allocations in Firefox.
makeInlineImage() has a "are the next five chars ASCII?" check which is
run after an "EI" sequence has been found. This check involves the
creation of a new object because peekBytes() calls subarray().
Unfortunately, the check is currently run on whitespace chars even when
an "EI" sequence has not yet been found, i.e. when it's not needed. For
the PDF in #2618, there are over 820,000 such checks.
This change reworks the relevant loop so that the check is only done
once an "EI" sequence has been seen. This reduces the number of checks
to 157,000, and speeds up rendering by somewhere between 2% and 7% (the
measurements are noisy).
The `console.log` statement in evaluator_spec.js is obviously not needed. In obj.js it could have been replaced by `info`, but that seemed unnecessary given the already existing `error`.
readCharCode() returns two values, and currently allocates a length-2
array on every call to do so. This change makes it instead us a
passed-in object which can be reused.
This tiny change reduces the total JS allocations done for the document
in Mozilla bug 992125 by 4.2%.
EvaluatorPreprocessor_read() is called in two cases. For the normal
layer, the args array it produces is used beyond the bounds of the loop
in which EvaluatorPreprocessor_read() is called.
But for the text layer, the args array is used in a very short-term
fashion. This change reworks things so that a single array is repeatedly
used for the text layer. This reduces total JS allocations for the
Spoorkaart map by 11%, and has similar effects on many other PDFs.
After PR 4982, the rendering of the first two pages of http://www.openmagazin.cz/pdf/2011/openMagazin-2011-04.pdf (from issue 215) no longer completes.
The issue is that we cannot have `args === null` in `PartialEvaluator_buildPath`, but *must* use an empty array instead.
In this patch I've also moved the `argsLength` variable definition in `EvaluatorPreprocessor_read`, to make sure that it's always defined.
When loading the PDF from issue #4935, this change reduces peak RSS from
~2400 to ~300 MiB, and improves overall speed by ~81%, from 6336 ms to
1222 ms.
IdentityCMap uses an array to represent a 16-bit unsigned identity
function. This is very space-inefficient, and some files cause multiple
IdentityCMaps to be instantiated (e.g. the one from #4580 has 74).
This patch make the representation implicit.
When loading the PDF from issue #4580, this change reduces peak RSS from
~370 to ~280 MiB. It also improves overall speed on that PDF by ~30%,
going from 522 ms to 366 ms.