This is achieved by adding getBytes2() and getBytes4() to streams, and by
changing int16() and int32() to take multiple scalar args instead of an array
arg.
When decoding a stream, the decode buffer is often grown multiple times, its
byte size increasing like so: 512, 1024, 2048, etc. This patch estimates the
minimum size in advance (using the length of the encoded stream), often
allowing the smaller sizes to be skipped. It also renames numerous |length|
variables as |maybeLength| to make it clear that they can be |null|.
I measured this change on eight documents. This change reduces the cumulative
size of decode buffer allocations by 0--32%, with 10--20% being typical. This
reduces peak RSS by 10 or 20 MiB for several of them.
This avoids lots of unnecessary work when such streams are referred to via
fetch(), and so their bytes aren't subsequently read. This is a large
performance win on some files.
At the initialization of `Lexer_getObj` (in `parser.js`), there's a loop
that skips whitespace and breaks out whenever EOF is encountered.
(https://github.com/mozilla/pdf.js/blob/88ec2bd1a/src/core/parser.js#L586-L599)
Whenever the current character is not a whitespace character,
`ch = this.nextChar();` is used to find the next character
(using `return this.currentChar = this.stream.getByte())`).
The aforementioned `getByte` method retrieves the next byte using
(https://github.com/mozilla/pdf.js/blob/88ec2bd1a/src/core/stream.js#L122-L128)
var pos = this.pos;
while (this.bufferLength <= pos) {
if (this.eof)
return -1;
this.readBlock();
}
return this.buffer[this.pos++];
This piece of code relies on this.eof to detect whether the last character
has been read. When the stream is a `FlateStream`, and the end of the stream
has been reached, then **`this.eof` is not set to `true`**, because this check
is done inside a loop that does not occur when the read block size is zero:
(https://github.com/mozilla/pdf.js/blob/88ec2bd1ac/src/core/stream.js#L511-L517)
for (var n = bufferLength; n < end; ++n) {
if (typeof (b = bytes[bytesPos++]) == 'undefined') {
this.eof = true;
break;
}
buffer[n] = b;
}
This commit fixes the issue by setting this.eof to true whenever the loop is not
going to run (i.e. when bufferLength === end, i.e. blockLen === 0).