pdf.js/src/core
Jonas Jenwald b5254f2745 Attempt to significantly reduce the number of ChunkedStream.{ensureByte, ensureRange} calls by inlining the this.progressiveDataLength checks at the call-sites
The number of in particular `ChunkedStream.ensureByte` calls is often absolutely *huge* (on the order of million calls) when loading and rendering even moderately complicated PDF files, which isn't entirely surprising considering that the `getByte`/`getBytes`/`peekByte`/`peekBytes` methods are used for essentially all data reading/parsing.

The idea implemented in this patch is to inline an inverted `progressiveDataLength` check at all of the `ensureByte`/`ensureRange` call-sites, which in practice will often result in *several* orders of magnitude fewer function calls.
Obviously this patch will only help if the browser supports streaming, which all reasonably modern browsers now do (including the Firefox built-in PDF viewer), and assuming that the user didn't set the `disableStream` option (e.g. for using `disableAutoFetch`). However, I think we should be able to improve performance for the default out-of-the-box use case, without worrying about e.g. older browsers (where this patch will thus incur *one* additional check before calling `ensureByte`/`ensureRange`).

This patch was inspired by the *first* commit in PR 5005, which was subsequently backed out in PR 5145 for causing regressions. Since the general idea of avoiding unnecessary function calls was really nice, I figured that re-attempting this in one way or another wouldn't be a bad idea.
Given that streaming is now supported, which it wasn't back then, using `progressiveDataLength` seemed like an easier approach in general since it also allowed supporting both `ensureByte` and `ensureRange`.

This sort of patch obviously needs data to back it up, hence I've benchmarked the changes using the following manifest file (with the default `tracemonkey` file):
```
[
    {  "id": "tracemonkey-eq",
       "file": "pdfs/tracemonkey.pdf",
       "md5": "9a192d8b1a7dc652a19835f6f08098bd",
       "rounds": 250,
       "type": "eq"
    }
]
```

I get the following complete results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |  3500 |          140 |         134 |  -6 | -4.46 |        faster
Firefox | Page Request |  3500 |            2 |           2 |   0 | -0.10 |
Firefox | Rendering    |  3500 |          138 |         131 |  -6 | -4.54 |        faster
```

Here it's pretty clear that the patch does have a positive net effect, even for a PDF file of fairly moderate size and complexity. However, in this case it's probably interesting to also look at the results per page:
```
-- Grouped By page, stat --
page | stat         | Count | Baseline(ms) | Current(ms) | +/- |     %  | Result(P<.05)
---- | ------------ | ----- | ------------ | ----------- | --- | ------ | -------------
0    | Overall      |   250 |           74 |          75 |   1 |   0.69 |
0    | Page Request |   250 |            1 |           1 |   0 |  33.20 |
0    | Rendering    |   250 |           73 |          74 |   0 |   0.25 |
1    | Overall      |   250 |          123 |         121 |  -2 |  -1.87 |        faster
1    | Page Request |   250 |            3 |           2 |   0 | -11.73 |
1    | Rendering    |   250 |          121 |         119 |  -2 |  -1.67 |
2    | Overall      |   250 |           64 |          63 |  -1 |  -1.91 |
2    | Page Request |   250 |            1 |           1 |   0 |   8.81 |
2    | Rendering    |   250 |           63 |          62 |  -1 |  -2.13 |        faster
3    | Overall      |   250 |           97 |          97 |   0 |  -0.06 |
3    | Page Request |   250 |            1 |           1 |   0 |  25.37 |
3    | Rendering    |   250 |           96 |          95 |   0 |  -0.34 |
4    | Overall      |   250 |           97 |          97 |   0 |  -0.38 |
4    | Page Request |   250 |            1 |           1 |   0 |  -5.97 |
4    | Rendering    |   250 |           96 |          96 |   0 |  -0.27 |
5    | Overall      |   250 |           99 |          97 |  -3 |  -2.92 |
5    | Page Request |   250 |            2 |           1 |   0 | -17.20 |
5    | Rendering    |   250 |           98 |          95 |  -3 |  -2.68 |
6    | Overall      |   250 |           99 |          99 |   0 |  -0.14 |
6    | Page Request |   250 |            2 |           2 |   0 | -16.49 |
6    | Rendering    |   250 |           97 |          98 |   0 |   0.16 |
7    | Overall      |   250 |           96 |          95 |  -1 |  -0.55 |
7    | Page Request |   250 |            1 |           2 |   1 |  66.67 |        slower
7    | Rendering    |   250 |           95 |          94 |  -1 |  -1.19 |
8    | Overall      |   250 |           92 |          92 |  -1 |  -0.69 |
8    | Page Request |   250 |            1 |           1 |   0 | -17.60 |
8    | Rendering    |   250 |           91 |          91 |   0 |  -0.52 |
9    | Overall      |   250 |          112 |         112 |   0 |   0.29 |
9    | Page Request |   250 |            2 |           1 |   0 |  -7.92 |
9    | Rendering    |   250 |          110 |         111 |   0 |   0.37 |
10   | Overall      |   250 |          589 |         522 | -67 | -11.38 |        faster
10   | Page Request |   250 |           14 |          13 |   0 |  -1.26 |
10   | Rendering    |   250 |          575 |         508 | -67 | -11.62 |        faster
11   | Overall      |   250 |           66 |          66 |  -1 |  -0.86 |
11   | Page Request |   250 |            1 |           1 |   0 | -16.48 |
11   | Rendering    |   250 |           65 |          65 |   0 |  -0.62 |
12   | Overall      |   250 |          303 |         291 | -12 |  -4.07 |        faster
12   | Page Request |   250 |            2 |           2 |   0 |  12.93 |
12   | Rendering    |   250 |          301 |         289 | -13 |  -4.19 |        faster
13   | Overall      |   250 |           48 |          47 |   0 |  -0.45 |
13   | Page Request |   250 |            1 |           1 |   0 |   1.59 |
13   | Rendering    |   250 |           47 |          46 |   0 |  -0.52 |
```

Here it's clear that this patch *significantly* improves the rendering performance of the slowest pages, while not causing any big regressions elsewhere. As expected, this patch thus helps larger and/or more complex pages the most (which is also where even small improvements will be most beneficial).
There's obviously the question if this is *slightly* regressing simpler pages, but given just how short the times are in most cases it's not inconceivable that the page results above are simply caused be e.g. limited `Date.now()` and/or limited numerical precision.
2019-07-18 17:30:22 +02:00
..
annotation.js Annotations - Implement parsing of IRT, RT, State and StateModel 2019-07-16 23:33:07 +02:00
arithmetic_decoder.js Convert src/core/arithmetic_decoder.js to ES6 syntax 2019-01-06 15:04:01 +01:00
bidi.js Fix inconsistent spacing and trailing commas in objects in src/core/ files, so we can enable the comma-dangle and object-curly-spacing ESLint rules later on 2017-06-02 11:20:19 +02:00
ccitt_stream.js Extract the actual decoding in CCITTFaxStream into a new CCITTFaxDecoder "class", which the new CCITTFaxStream depends on 2017-10-24 16:03:08 +02:00
ccitt.js Fix abbreviation. 2018-09-13 13:10:38 -07:00
cff_parser.js Put the string name of the glyph in the charset array. 2019-03-01 18:03:51 -08:00
charsets.js Convert src/core/charsets.js and src/core/standard_fonts.js to ES6 syntax 2019-01-06 15:04:01 +01:00
chunked_stream.js Attempt to significantly reduce the number of ChunkedStream.{ensureByte, ensureRange} calls by inlining the this.progressiveDataLength checks at the call-sites 2019-07-18 17:30:22 +02:00
cmap.js Move worker-thread only functions from src/shared/util.js and into a new src/core/core_utils.js file 2019-02-24 00:35:39 +01:00
colorspace.js Reduce unnecessary duplication of the isDefaultDecode methods on ColorSpace instances 2019-01-25 08:53:08 +01:00
core_utils.js Move worker-thread only functions from src/shared/util.js and into a new src/core/core_utils.js file 2019-02-24 00:35:39 +01:00
crypto.js Implement the AESBaseCipher class and let the AES128Cipher and AES256Cipher classes extend it 2018-02-03 20:16:33 +01:00
document.js Simplify the PDFDocument.fingerprint method slightly 2019-07-15 13:26:08 +02:00
encodings.js Implement unit tests for the encodings and fix missing items 2017-12-24 18:14:40 +01:00
evaluator.js Change the signature of the Parser constructor to take a parameter object 2019-06-23 16:01:45 +02:00
font_renderer.js Map all glyphs to the private use area and duplicate the first glyph. 2018-09-05 14:04:54 -07:00
fonts.js Fix glyph 0 in CIDFontType2 that has a CIDToGIDMap stream 2019-05-07 18:44:37 +03:00
function.js Use Dict.getArray, instead of Dict.get, when getting the 'Size' in constructSampled in src/core/function.js (PR 7295 follow-up) 2018-06-02 11:16:05 -04:00
glyphlist.js Move worker-thread only functions from src/shared/util.js and into a new src/core/core_utils.js file 2019-02-24 00:35:39 +01:00
image_utils.js Move NativeImageDecoder into a separate file, and convert it to a class 2019-03-09 15:59:04 +01:00
image.js Reduce unnecessary duplication of the isDefaultDecode methods on ColorSpace instances 2019-01-25 08:53:08 +01:00
jbig2_stream.js Fix the interface of JpegStream/JpxStream/Jbig2Stream to agree with the other DecodeStreams 2017-11-11 11:22:16 +01:00
jbig2.js Expose a Jbig2Image.parse method, by re-instating the parseJbig2 function 2018-06-16 17:56:54 +02:00
jpeg_stream.js Add a new parameter to JpegImage.getData to indicate the source of the image data (issue 9513) 2018-09-02 14:15:22 +02:00
jpg.js Enable the consistent-return ESLint rule 2019-05-11 14:27:21 +02:00
jpx_stream.js Fix the interface of JpegStream/JpxStream/Jbig2Stream to agree with the other DecodeStreams 2017-11-11 11:22:16 +01:00
jpx.js Add more validation of the /Filter entry, in image dictionaries, to the PDFImage constructor 2018-08-01 16:41:15 +02:00
metrics.js Move worker-thread only functions from src/shared/util.js and into a new src/core/core_utils.js file 2019-02-24 00:35:39 +01:00
murmurhash3.js Convert MurmurHash3_64 to an ES6 class 2019-03-09 17:03:06 +01:00
obj.js Reduce the number of isCmd calls slightly in the XRef class 2019-06-29 16:28:45 +02:00
operator_list.js Enable the eslint-plugin-no-unsanitized ESLint plugin to disallow unsafe usage of e.g. innerHTML 2019-06-23 13:50:30 +02:00
parser.js Change the signature of the Parser constructor to take a parameter object 2019-06-23 16:01:45 +02:00
pattern.js Apply bounding box before using shading patterns. 2019-07-08 14:05:48 -07:00
pdf_manager.js Move worker-thread only functions from src/shared/util.js and into a new src/core/core_utils.js file 2019-02-24 00:35:39 +01:00
primitives.js Ensure that the Cmd/Name/Ref caches are cleared when running other cleanup code 2019-05-26 14:29:59 +02:00
ps_parser.js Convert src/core/charsets.js and src/core/standard_fonts.js to ES6 syntax 2019-01-06 15:04:01 +01:00
standard_fonts.js Move worker-thread only functions from src/shared/util.js and into a new src/core/core_utils.js file 2019-02-24 00:35:39 +01:00
stream.js Simplify the PDFDocument.fingerprint method slightly 2019-07-15 13:26:08 +02:00
type1_parser.js Remove usage of makeSubStream from Type1Parser.extractFontProgram in src/core/type1_parser.js (issue 9735) 2018-05-28 14:32:20 +02:00
unicode.js Move worker-thread only functions from src/shared/util.js and into a new src/core/core_utils.js file 2019-02-24 00:35:39 +01:00
worker_stream.js Move PDFWorkerStream and related code to its own file 2019-06-15 13:05:25 +02:00
worker.js Move PDFWorkerStream and related code to its own file 2019-06-15 13:05:25 +02:00