pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	e18fa3fc45	Tweak the `QueueOptimizer` to recognize `OPS.paintImageMaskXObject` operators as repeated when the "skew" transformation matrix elements are non-zero (issue 8078) First of all, I should mention that my understanding of the finer details of the `QueueOptimizer` (and its related `CanvasGraphics` methods) is somewhat limited. Hence I'm not sure if there's actually a very good reason for only considering ImageMasks where the "skew" transformation matrix elements are zero as repeated, however simply looking at the code I just don't see why these elements cannot be non-zero as long as they are all identical for the ImageMasks. Furthermore, looking at the group case (which is what we're currently falling back to), there's no particular limitation placed upon the transformation matrix elements. While this patch obviously isn't enough to completely fix the issue, since there should be a visible Pattern rendered as well[1], it seem (at least to me) like enough of an improvement that submitting this is justified. With these changes the referenced PDF document will no longer hang the entire browser, and rendering also finishes in a reasonable time (< 10 seconds for me) which seem fine given the huge number of identical inline images present.[2] --- [1] Temporarily changing the Pattern to a solid color does render the correct/expected area, which suggests that the remaining problem is a pre-existing issue related to the Pattern-handling itself rather than the `QueueOptimizer` functionality. [2] The document isn't exactly rendered immediately in e.g. Adobe Reader either.	2020-06-20 12:18:48 +02:00
Tim van der Meij	8cfdfb237a	Merge pull request #12005 from Snuffleupagus/cff-class Convert the code in `src/core/cff_parser.js` to use ES6 classes	2020-06-17 23:30:28 +02:00
Jonas Jenwald	880a0a0f59	Convert the code in `src/core/cff_parser.js` to use ES6 classes This removes multiple instances of `// eslint-disable-next-line no-shadow`, which our old pseudo-classes necessitated. Please note: I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.	2020-06-16 12:33:21 +02:00
Jonas Jenwald	fb9b574f3d	Convert the code in `src/core/worker.js` to use ES6 classes This removes one instance of `// eslint-disable-next-line no-shadow`, which our old pseudo-classes necessitated. Please note: I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.	2020-06-16 11:54:59 +02:00
Jonas Jenwald	87b089ba42	Lazily initialize, and cache, the regular expression used in `CFFCompiler.encodeFloat` There's no particular reason for re-creating the regular expression over and over for every `encodeFloat` invocation, as far as I can tell.	2020-06-15 13:51:28 +02:00
Jonas Jenwald	517d92a121	Simplify the "is integer" checks in `CFFCompiler.encodeNumber` The `isNaN` check is obviously redundant, since `NaN` is the only value that isn't equal to itself; see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/NaN#Examples The `parseFloat`/`parseInt` comparison would make sense if the `value` ever contains a String, which however is never actually the case. Besides looking through the code, I've also run the entire test-suite locally with `assert(typeof value === "number", "encodeNumber");` added at the top of the method and there were no failures. Hence we can simplify the "is integer" check a bit in the `CFFCompiler.encodeNumber` method.	2020-06-15 13:51:20 +02:00
Jonas Jenwald	5c39de805c	Add local caching of `ColorSpace`s, by name, in `PartialEvaluator.getOperatorList` (issue 2504) By caching parsed `ColorSpace`s, we thus don't need to re-parse the same data over and over which saves CPU cycles and reduces peak memory usage. (Obviously persistent memory usage may increase a tiny bit, but since the caching is done per `PartialEvaluator.getOperatorList` invocation and given that `ColorSpace` instances generally hold very little data this shouldn't be much of an issue.) Furthermore, by caching `ColorSpace`s we can also lookup the already parsed ones synchronously during the `OperatorList` building, instead of having to defer to the event loop/microtask queue since the parsing is done asynchronously (such that error handling is easier). Possible future improvements: - Cache/lookup parsed `ColorSpaces` used in `Pattern`s and `Image`s. - Attempt to cache local `ColorSpace`s by reference as well, in addition to only by name, assuming that there's documents where that would be beneficial and that it's not too difficult to implement. - Assuming there's documents that would benefit from it, also cache repeated `ColorSpace`s globally as well. Given that we've never, until now, been doing any caching of parsed `ColorSpace`s and that even using a simple name-only local cache helps tremendously in pathological cases, I purposely decided against complicating the implementation too much initially. Also, compared to parsing of `Image`s, simply creating a `ColorSpace` instance isn't that expensive (hence I'd be somewhat surprised if adding a global cache would help much). --- This patch was tested using: - The default `tracemonkey` PDF file, which was included mostly to show that "normal" documents aren't negatively affected by these changes. - The PDF file from issue 2504, i.e. https://dl-ctlg.panasonic.com/jp/manual/sd/sd_rbm1000_0.pdf, where most pages will switch thousands of times between a handful of `ColorSpace`s. with the following manifest file: ``` [ { "id": "tracemonkey", "file": "pdfs/tracemonkey.pdf", "md5": "9a192d8b1a7dc652a19835f6f08098bd", "rounds": 100, "type": "eq" }, { "id": "issue2504", "file": "../web/pdfs/issue2504.pdf", "md5": "", "rounds": 20, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: - Overall ``` -- Grouped By browser, pdf, stat -- browser \| pdf \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ----------- \| ------------ \| ----- \| ------------ \| ----------- \| ---- \| ------ \| ------------- firefox \| issue2504 \| Overall \| 640 \| 977 \| 497 \| -479 \| -49.08 \| faster firefox \| issue2504 \| Page Request \| 640 \| 3 \| 4 \| 1 \| 59.18 \| firefox \| issue2504 \| Rendering \| 640 \| 974 \| 493 \| -481 \| -49.37 \| faster firefox \| tracemonkey \| Overall \| 1400 \| 116 \| 111 \| -5 \| -4.43 \| firefox \| tracemonkey \| Page Request \| 1400 \| 2 \| 2 \| 0 \| -2.86 \| firefox \| tracemonkey \| Rendering \| 1400 \| 114 \| 109 \| -5 \| -4.47 \| ``` - Page-specific ``` -- Grouped By browser, pdf, page, stat -- browser \| pdf \| page \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ----------- \| ---- \| ------------ \| ----- \| ------------ \| ----------- \| ----- \| ------- \| ------------- firefox \| issue2504 \| 0 \| Overall \| 20 \| 2295 \| 1268 \| -1027 \| -44.76 \| faster firefox \| issue2504 \| 0 \| Page Request \| 20 \| 6 \| 7 \| 1 \| 15.32 \| firefox \| issue2504 \| 0 \| Rendering \| 20 \| 2288 \| 1260 \| -1028 \| -44.93 \| faster firefox \| issue2504 \| 1 \| Overall \| 20 \| 3059 \| 2806 \| -252 \| -8.25 \| faster firefox \| issue2504 \| 1 \| Page Request \| 20 \| 11 \| 14 \| 3 \| 23.25 \| slower firefox \| issue2504 \| 1 \| Rendering \| 20 \| 3047 \| 2792 \| -255 \| -8.37 \| faster firefox \| issue2504 \| 2 \| Overall \| 20 \| 411 \| 295 \| -116 \| -28.20 \| faster firefox \| issue2504 \| 2 \| Page Request \| 20 \| 2 \| 42 \| 40 \| 1897.62 \| firefox \| issue2504 \| 2 \| Rendering \| 20 \| 409 \| 253 \| -156 \| -38.09 \| faster firefox \| issue2504 \| 3 \| Overall \| 20 \| 736 \| 299 \| -437 \| -59.34 \| faster firefox \| issue2504 \| 3 \| Page Request \| 20 \| 2 \| 2 \| 0 \| 0.00 \| firefox \| issue2504 \| 3 \| Rendering \| 20 \| 734 \| 297 \| -437 \| -59.49 \| faster firefox \| issue2504 \| 4 \| Overall \| 20 \| 356 \| 458 \| 102 \| 28.63 \| firefox \| issue2504 \| 4 \| Page Request \| 20 \| 1 \| 2 \| 1 \| 57.14 \| slower firefox \| issue2504 \| 4 \| Rendering \| 20 \| 354 \| 455 \| 101 \| 28.53 \| firefox \| issue2504 \| 5 \| Overall \| 20 \| 1381 \| 765 \| -616 \| -44.59 \| faster firefox \| issue2504 \| 5 \| Page Request \| 20 \| 3 \| 5 \| 2 \| 50.00 \| slower firefox \| issue2504 \| 5 \| Rendering \| 20 \| 1378 \| 760 \| -617 \| -44.81 \| faster firefox \| issue2504 \| 6 \| Overall \| 20 \| 757 \| 299 \| -459 \| -60.57 \| faster firefox \| issue2504 \| 6 \| Page Request \| 20 \| 2 \| 5 \| 3 \| 150.00 \| slower firefox \| issue2504 \| 6 \| Rendering \| 20 \| 755 \| 294 \| -462 \| -61.11 \| faster firefox \| issue2504 \| 7 \| Overall \| 20 \| 394 \| 302 \| -92 \| -23.39 \| faster firefox \| issue2504 \| 7 \| Page Request \| 20 \| 2 \| 1 \| -1 \| -34.88 \| faster firefox \| issue2504 \| 7 \| Rendering \| 20 \| 392 \| 301 \| -91 \| -23.32 \| faster firefox \| issue2504 \| 8 \| Overall \| 20 \| 2875 \| 979 \| -1896 \| -65.95 \| faster firefox \| issue2504 \| 8 \| Page Request \| 20 \| 1 \| 2 \| 0 \| 11.11 \| firefox \| issue2504 \| 8 \| Rendering \| 20 \| 2874 \| 978 \| -1896 \| -65.99 \| faster firefox \| issue2504 \| 9 \| Overall \| 20 \| 700 \| 332 \| -368 \| -52.60 \| faster firefox \| issue2504 \| 9 \| Page Request \| 20 \| 3 \| 2 \| 0 \| -4.00 \| firefox \| issue2504 \| 9 \| Rendering \| 20 \| 698 \| 329 \| -368 \| -52.78 \| faster firefox \| issue2504 \| 10 \| Overall \| 20 \| 3296 \| 926 \| -2370 \| -71.91 \| faster firefox \| issue2504 \| 10 \| Page Request \| 20 \| 2 \| 2 \| 0 \| -18.75 \| firefox \| issue2504 \| 10 \| Rendering \| 20 \| 3293 \| 924 \| -2370 \| -71.96 \| faster firefox \| issue2504 \| 11 \| Overall \| 20 \| 524 \| 197 \| -327 \| -62.34 \| faster firefox \| issue2504 \| 11 \| Page Request \| 20 \| 2 \| 3 \| 1 \| 58.54 \| firefox \| issue2504 \| 11 \| Rendering \| 20 \| 522 \| 194 \| -328 \| -62.81 \| faster firefox \| issue2504 \| 12 \| Overall \| 20 \| 752 \| 369 \| -384 \| -50.98 \| faster firefox \| issue2504 \| 12 \| Page Request \| 20 \| 3 \| 2 \| -1 \| -36.51 \| faster firefox \| issue2504 \| 12 \| Rendering \| 20 \| 749 \| 367 \| -382 \| -51.05 \| faster firefox \| issue2504 \| 13 \| Overall \| 20 \| 679 \| 487 \| -193 \| -28.38 \| faster firefox \| issue2504 \| 13 \| Page Request \| 20 \| 4 \| 2 \| -2 \| -48.68 \| faster firefox \| issue2504 \| 13 \| Rendering \| 20 \| 676 \| 485 \| -191 \| -28.28 \| faster firefox \| issue2504 \| 14 \| Overall \| 20 \| 474 \| 283 \| -191 \| -40.26 \| faster firefox \| issue2504 \| 14 \| Page Request \| 20 \| 2 \| 4 \| 2 \| 78.57 \| firefox \| issue2504 \| 14 \| Rendering \| 20 \| 471 \| 279 \| -192 \| -40.79 \| faster firefox \| issue2504 \| 15 \| Overall \| 20 \| 860 \| 618 \| -241 \| -28.05 \| faster firefox \| issue2504 \| 15 \| Page Request \| 20 \| 2 \| 3 \| 0 \| 10.87 \| firefox \| issue2504 \| 15 \| Rendering \| 20 \| 857 \| 616 \| -241 \| -28.15 \| faster firefox \| issue2504 \| 16 \| Overall \| 20 \| 389 \| 243 \| -147 \| -37.71 \| faster firefox \| issue2504 \| 16 \| Page Request \| 20 \| 2 \| 2 \| 0 \| 2.33 \| firefox \| issue2504 \| 16 \| Rendering \| 20 \| 387 \| 240 \| -147 \| -37.94 \| faster firefox \| issue2504 \| 17 \| Overall \| 20 \| 1484 \| 672 \| -812 \| -54.70 \| faster firefox \| issue2504 \| 17 \| Page Request \| 20 \| 2 \| 3 \| 1 \| 37.21 \| firefox \| issue2504 \| 17 \| Rendering \| 20 \| 1482 \| 669 \| -812 \| -54.84 \| faster firefox \| issue2504 \| 18 \| Overall \| 20 \| 575 \| 252 \| -323 \| -56.12 \| faster firefox \| issue2504 \| 18 \| Page Request \| 20 \| 2 \| 2 \| 0 \| -16.22 \| firefox \| issue2504 \| 18 \| Rendering \| 20 \| 573 \| 251 \| -322 \| -56.24 \| faster firefox \| issue2504 \| 19 \| Overall \| 20 \| 517 \| 227 \| -290 \| -56.08 \| faster firefox \| issue2504 \| 19 \| Page Request \| 20 \| 2 \| 2 \| 0 \| 21.62 \| firefox \| issue2504 \| 19 \| Rendering \| 20 \| 515 \| 225 \| -290 \| -56.37 \| faster firefox \| issue2504 \| 20 \| Overall \| 20 \| 668 \| 670 \| 2 \| 0.31 \| firefox \| issue2504 \| 20 \| Page Request \| 20 \| 4 \| 2 \| -1 \| -34.29 \| firefox \| issue2504 \| 20 \| Rendering \| 20 \| 664 \| 667 \| 3 \| 0.49 \| firefox \| issue2504 \| 21 \| Overall \| 20 \| 486 \| 309 \| -177 \| -36.44 \| faster firefox \| issue2504 \| 21 \| Page Request \| 20 \| 2 \| 2 \| 0 \| 16.13 \| firefox \| issue2504 \| 21 \| Rendering \| 20 \| 484 \| 307 \| -177 \| -36.60 \| faster firefox \| issue2504 \| 22 \| Overall \| 20 \| 543 \| 267 \| -276 \| -50.85 \| faster firefox \| issue2504 \| 22 \| Page Request \| 20 \| 2 \| 2 \| 0 \| 10.26 \| firefox \| issue2504 \| 22 \| Rendering \| 20 \| 541 \| 265 \| -276 \| -51.07 \| faster firefox \| issue2504 \| 23 \| Overall \| 20 \| 3246 \| 871 \| -2375 \| -73.17 \| faster firefox \| issue2504 \| 23 \| Page Request \| 20 \| 2 \| 3 \| 1 \| 37.21 \| firefox \| issue2504 \| 23 \| Rendering \| 20 \| 3243 \| 868 \| -2376 \| -73.25 \| faster firefox \| issue2504 \| 24 \| Overall \| 20 \| 379 \| 156 \| -223 \| -58.83 \| faster firefox \| issue2504 \| 24 \| Page Request \| 20 \| 2 \| 2 \| 0 \| -2.86 \| firefox \| issue2504 \| 24 \| Rendering \| 20 \| 378 \| 154 \| -223 \| -59.10 \| faster firefox \| issue2504 \| 25 \| Overall \| 20 \| 176 \| 127 \| -50 \| -28.19 \| faster firefox \| issue2504 \| 25 \| Page Request \| 20 \| 2 \| 1 \| 0 \| -15.63 \| firefox \| issue2504 \| 25 \| Rendering \| 20 \| 175 \| 125 \| -49 \| -28.31 \| faster firefox \| issue2504 \| 26 \| Overall \| 20 \| 181 \| 108 \| -74 \| -40.67 \| faster firefox \| issue2504 \| 26 \| Page Request \| 20 \| 3 \| 2 \| -1 \| -39.13 \| faster firefox \| issue2504 \| 26 \| Rendering \| 20 \| 178 \| 105 \| -72 \| -40.69 \| faster firefox \| issue2504 \| 27 \| Overall \| 20 \| 208 \| 104 \| -104 \| -49.92 \| faster firefox \| issue2504 \| 27 \| Page Request \| 20 \| 2 \| 2 \| 1 \| 48.39 \| firefox \| issue2504 \| 27 \| Rendering \| 20 \| 206 \| 102 \| -104 \| -50.64 \| faster firefox \| issue2504 \| 28 \| Overall \| 20 \| 241 \| 111 \| -131 \| -54.16 \| faster firefox \| issue2504 \| 28 \| Page Request \| 20 \| 2 \| 2 \| -1 \| -33.33 \| firefox \| issue2504 \| 28 \| Rendering \| 20 \| 239 \| 109 \| -130 \| -54.39 \| faster firefox \| issue2504 \| 29 \| Overall \| 20 \| 321 \| 196 \| -125 \| -39.05 \| faster firefox \| issue2504 \| 29 \| Page Request \| 20 \| 1 \| 2 \| 0 \| 17.86 \| firefox \| issue2504 \| 29 \| Rendering \| 20 \| 319 \| 194 \| -126 \| -39.35 \| faster firefox \| issue2504 \| 30 \| Overall \| 20 \| 651 \| 271 \| -380 \| -58.41 \| faster firefox \| issue2504 \| 30 \| Page Request \| 20 \| 1 \| 2 \| 1 \| 50.00 \| firefox \| issue2504 \| 30 \| Rendering \| 20 \| 649 \| 269 \| -381 \| -58.60 \| faster firefox \| issue2504 \| 31 \| Overall \| 20 \| 1635 \| 647 \| -988 \| -60.42 \| faster firefox \| issue2504 \| 31 \| Page Request \| 20 \| 1 \| 2 \| 0 \| 30.43 \| firefox \| issue2504 \| 31 \| Rendering \| 20 \| 1634 \| 645 \| -988 \| -60.49 \| faster firefox \| tracemonkey \| 0 \| Overall \| 100 \| 51 \| 51 \| 0 \| 0.02 \| firefox \| tracemonkey \| 0 \| Page Request \| 100 \| 1 \| 1 \| 0 \| -4.76 \| firefox \| tracemonkey \| 0 \| Rendering \| 100 \| 50 \| 50 \| 0 \| 0.12 \| firefox \| tracemonkey \| 1 \| Overall \| 100 \| 97 \| 91 \| -5 \| -5.52 \| faster firefox \| tracemonkey \| 1 \| Page Request \| 100 \| 3 \| 3 \| 0 \| -1.32 \| firefox \| tracemonkey \| 1 \| Rendering \| 100 \| 94 \| 88 \| -5 \| -5.73 \| faster firefox \| tracemonkey \| 2 \| Overall \| 100 \| 40 \| 40 \| 0 \| 0.50 \| firefox \| tracemonkey \| 2 \| Page Request \| 100 \| 1 \| 1 \| 0 \| 3.16 \| firefox \| tracemonkey \| 2 \| Rendering \| 100 \| 39 \| 39 \| 0 \| 0.54 \| firefox \| tracemonkey \| 3 \| Overall \| 100 \| 62 \| 62 \| -1 \| -0.94 \| firefox \| tracemonkey \| 3 \| Page Request \| 100 \| 1 \| 1 \| 0 \| 17.05 \| firefox \| tracemonkey \| 3 \| Rendering \| 100 \| 61 \| 61 \| -1 \| -1.11 \| firefox \| tracemonkey \| 4 \| Overall \| 100 \| 56 \| 58 \| 2 \| 3.41 \| firefox \| tracemonkey \| 4 \| Page Request \| 100 \| 1 \| 1 \| 0 \| 15.31 \| firefox \| tracemonkey \| 4 \| Rendering \| 100 \| 55 \| 57 \| 2 \| 3.23 \| firefox \| tracemonkey \| 5 \| Overall \| 100 \| 73 \| 71 \| -2 \| -2.28 \| firefox \| tracemonkey \| 5 \| Page Request \| 100 \| 2 \| 2 \| 0 \| 12.20 \| firefox \| tracemonkey \| 5 \| Rendering \| 100 \| 71 \| 69 \| -2 \| -2.69 \| firefox \| tracemonkey \| 6 \| Overall \| 100 \| 85 \| 69 \| -16 \| -18.73 \| faster firefox \| tracemonkey \| 6 \| Page Request \| 100 \| 2 \| 2 \| 0 \| -9.90 \| firefox \| tracemonkey \| 6 \| Rendering \| 100 \| 83 \| 67 \| -16 \| -18.97 \| faster firefox \| tracemonkey \| 7 \| Overall \| 100 \| 65 \| 64 \| 0 \| -0.37 \| firefox \| tracemonkey \| 7 \| Page Request \| 100 \| 1 \| 1 \| 0 \| -11.94 \| firefox \| tracemonkey \| 7 \| Rendering \| 100 \| 63 \| 63 \| 0 \| -0.05 \| firefox \| tracemonkey \| 8 \| Overall \| 100 \| 53 \| 54 \| 1 \| 2.04 \| firefox \| tracemonkey \| 8 \| Page Request \| 100 \| 1 \| 1 \| 0 \| 17.02 \| firefox \| tracemonkey \| 8 \| Rendering \| 100 \| 52 \| 53 \| 1 \| 1.82 \| firefox \| tracemonkey \| 9 \| Overall \| 100 \| 79 \| 73 \| -6 \| -7.86 \| faster firefox \| tracemonkey \| 9 \| Page Request \| 100 \| 2 \| 2 \| 0 \| -15.14 \| firefox \| tracemonkey \| 9 \| Rendering \| 100 \| 77 \| 71 \| -6 \| -7.86 \| faster firefox \| tracemonkey \| 10 \| Overall \| 100 \| 545 \| 519 \| -27 \| -4.86 \| faster firefox \| tracemonkey \| 10 \| Page Request \| 100 \| 14 \| 13 \| 0 \| -3.56 \| firefox \| tracemonkey \| 10 \| Rendering \| 100 \| 532 \| 506 \| -26 \| -4.90 \| faster firefox \| tracemonkey \| 11 \| Overall \| 100 \| 42 \| 41 \| -1 \| -2.50 \| firefox \| tracemonkey \| 11 \| Page Request \| 100 \| 1 \| 1 \| 0 \| -27.42 \| faster firefox \| tracemonkey \| 11 \| Rendering \| 100 \| 41 \| 40 \| -1 \| -1.75 \| firefox \| tracemonkey \| 12 \| Overall \| 100 \| 350 \| 332 \| -18 \| -5.16 \| faster firefox \| tracemonkey \| 12 \| Page Request \| 100 \| 3 \| 3 \| 0 \| -5.17 \| firefox \| tracemonkey \| 12 \| Rendering \| 100 \| 347 \| 329 \| -18 \| -5.15 \| faster firefox \| tracemonkey \| 13 \| Overall \| 100 \| 31 \| 31 \| 0 \| 0.52 \| firefox \| tracemonkey \| 13 \| Page Request \| 100 \| 1 \| 1 \| 0 \| 4.95 \| firefox \| tracemonkey \| 13 \| Rendering \| 100 \| 30 \| 30 \| 0 \| 0.20 \| ```	2020-06-14 11:51:45 +02:00
Jonas Jenwald	4b51bcc733	Ensure that `PDFImage.buildImage` won't accidentally swallow errors, e.g. from ColorSpace parsing (issue 6707, PR 11601 follow-up) Because of a really stupid `Promise`-related mistake on my part, when re-factoring `PDFImage.buildImage` during the `NativeImageDecoder` removal, we're no longer re-throwing errors occuring during image parsing/decoding as intended. The result is that some (fairly) corrupt documents will never finish loading, and unfortunately there were apparently no sufficiently corrupt images in the test-suite to catch this.	2020-06-13 15:02:37 +02:00
Jonas Jenwald	10f31bb46d	Change the `dependencies` property, on `OperatorList` instances, from an Object to a Set Since this is completely internal functionality, and furthermore limited to the worker-thread, this change should thus not have any observable effect for e.g. an API-user.	2020-06-11 16:27:13 +02:00
Jonas Jenwald	02a1d0f6c5	Remove the unused `intent`/`pageIndex` properties from `OperatorList` instances (PR 11069 follow-up) Apparently I completely overlooked the fact that with the changes in PR 11069 these properties became completely unused, and consequently they thus ought to be removed.	2020-06-11 16:05:38 +02:00
Jonas Jenwald	159e13c4e4	Convert the `ChunkedStreamManager.promisesByRequest` property to a `Map` Compared to regular `Object`s, `Map`s have a number of advantageous properties: Of particular importance in this case is the built-in iteration support, and that determining if the structure is empty is easy.	2020-06-09 17:50:14 +02:00
Jonas Jenwald	dda7a5d1b7	Convert the `ChunkedStreamManager.requestsByChunk` property to a `Map` Compared to regular `Object`s, `Map`s have a number of advantageous properties: Of particular importance in this case is the built-in iteration support, and that determining if the structure is empty is easy.	2020-06-09 17:50:11 +02:00
Jonas Jenwald	17e23ffb33	Convert the `ChunkedStreamManager.chunksNeededByRequest` property to a `Map` (containing `Set`s) Compared to regular `Object`s, `Map`s (and `Set`s) have a number of advantageous properties: Of particular importance in this case is the built-in iteration support, and that determining if the structure is empty is easy.	2020-06-09 17:49:53 +02:00
Tim van der Meij	4c2e056796	Convert the `RefSet` primitive to a proper class and use a `Set` internally The `RefSet` primitive predates ES6, so that most likely explains why an object is used internally to track the entries. However, nowadays we can use built-in JavaScript sets for this purpose. Built-in types are often more efficient/optimized and using it makes the code a bit more clear since we don't have to assign `true` to keys anymore just to indicate their presence.	2020-06-07 19:01:29 +02:00
Tim van der Meij	c97200ff59	Merge pull request #11974 from Snuffleupagus/sendImgData A couple of small image caching/sending improvements	2020-06-07 13:53:26 +02:00
Jonas Jenwald	df7d8c74ca	Extract the actual sending of image data from the `PartialEvaluator.buildPaintImageXObject` method After PRs 10727 and 11912, the code responsible for sending the decoded image data to the main-thread has now become a fair bit more involved the previously. To reduce the amount of duplication here, the actual code responsible for sending the data is thus extracted into a new helper method instead.	2020-06-07 12:01:51 +02:00
Jonas Jenwald	aff0d56326	Remove an unnecessary `RefSetCache.prototype.has()` call from `GlobalImageCache.getData` We can simply attempt to get the data directly, and instead check the result, rather than first checking if it exists.	2020-06-07 11:56:04 +02:00
Takashi Tamura	7acb112ca9	Optimization: Avoid calling Math.pow if possible when calculating the transfer function of the CalRGB color space since calling Math.pow is expensive. If the value of color is larger than the threshold, 0.99554525, the final result of the transform is larger that 254.5 since ((1 + 0.055) * 0.99554525 ** (1 / 2.4) - 0.055) * 255 === 254.50000003134699	2020-06-07 13:17:18 +09:00
Jonas Jenwald	b7272a34eb	Change the `loadedChunks` property, on `ChunkedStream` instances, from an Array to a Set In the old code the use of an Array meant that we had to manually track the `numChunksLoaded` property, given that simply using the Array `length` wouldn't have worked since there's no guarantee that the data is loaded in order when e.g. range requests are in use. Tracking closely related state separately in this manner never seem like a good idea, and we can now instead utilize a Set to avoid that.	2020-06-05 15:03:06 +02:00
Carlos Rodríguez	802aa14a99	Jpeg encoded with RGB -instead of YCbCr- write the components index as "RGB" in ASCII to say it so On ISO/IEC 10918-6:2013 (E), section 6.1: (http://www.itu.int/rec/T-REC-T.872-201206-I/en) "Images encoded with three components are assumed to be RGB data encoded as YCbCr unless the image contains an APP14 marker segment as specified in 6.5.3, in which case the colour encoding is considered either RGB or YCbCr according to the application data of the APP14 marker segment" But common jpeg libraries consider RGB too if components index are ASCII R (0x52), G (0x47) and B (0x42): https://stackoverflow.com/questions/50798014/determining-color-space-for-jpeg/50861048 Issue #11931	2020-06-04 15:08:47 +02:00
Jonas Jenwald	af815e417d	Ensure that that we don't attempt to cache inline images in the `GlobalImageCache` (PR 11912 follow-up) Since inline images, i.e. those defined inside of `/Contents` streams, are by their very definition page-specific it thus seem like a good idea to actually enforce that they won't accidentally end up in the `GlobalImageCache`.	2020-06-01 01:00:30 +02:00
Jonas Jenwald	4ef547f400	Improve caching of empty `/XObject`s in the `PartialEvaluator.getTextContent` method It turns out that `getTextContent` suffers from similar problems with repeated images as `getOperatorList`; please see the previous patch. While only `/XObject` resources of the `Form`-type will actually be parsed in `PartialEvaluator.getTextContent`, since those are the only ones that may contain text, we're still forced to fetch repeated image resources where the name differs (but not the reference). Obviously it's less bad in this case, since we're not actually parsing `/XObject`s of e.g. the `Image`-type. However, you still want to avoid even fetching the data whenever possible, since `Stream`s are not cached on the `XRef` instance (given their potential size) and the lookup can thus be somewhat expensive in general. To address these issues, we can simply replace the exiting name-only caching in `PartialEvaluator.getTextContent` with a new cache backed by `LocalImageCache` instead.	2020-05-26 09:49:01 +02:00
Jonas Jenwald	d62c9181bd	Improve the local image caching in `PartialEvaluator.getOperatorList` Currently the local `imageCache`, as used in `PartialEvaluator.getOperatorList`, will miss certain cases of repeated images because the caching is only done by name (usually using a format such as e.g. "Im0", "Im1", ...). However, in some PDF documents the `/XObject` dictionaries many contain hundreds (or even thousands) of distinctly named images, despite them referring to only a handful of actual image objects (via the XRef table). With these changes we'll now cache local images using both name and (where applicable) reference, thus improving re-usage of images resources even further. This patch was tested using the PDF file from [bug 857031](https://bugzilla.mozilla.org/show_bug.cgi?id=857031), i.e. https://bug857031.bmoattachments.org/attachment.cgi?id=732270, with the following manifest file: ``` [ { "id": "bug857031", "file": "../web/pdfs/bug857031.pdf", "md5": "", "rounds": 250, "lastPage": 1, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, page, stat -- browser \| page \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ---- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- firefox \| 0 \| Overall \| 250 \| 2749 \| 2656 \| -93 \| -3.38 \| faster firefox \| 0 \| Page Request \| 250 \| 3 \| 4 \| 1 \| 50.14 \| slower firefox \| 0 \| Rendering \| 250 \| 2746 \| 2652 \| -94 \| -3.44 \| faster ``` While this is certainly an improvement, since we now avoid re-parsing ~1000 images on the first page, all of the image resources are small enough that the total rendering time doesn't improve that much in this particular case. In pathological cases, such as e.g. the PDF document in issue 4958, the improvements with this patch can be very significant. Looking for example at page 2, from issue 4958, the rendering time drops from ~60 seconds with `master` to ~30 seconds with this patch (obviously still slow, but it really showcases the potential of this patch nicely). Finally, note that there's also potential for additional improvements by re-using `LocalImageCache` instances for e.g. /XObject data of the `Form`-type. However, given that recent changes in this area I purposely didn't want to complicate this patch more than necessary.	2020-05-25 15:14:14 +02:00
Tim van der Meij	3b615e4ca3	Merge pull request #11601 from Snuffleupagus/rm-nativeImageDecoderSupport [api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js`	2020-05-23 15:33:46 +02:00
Jonas Jenwald	8af70d75aa	Allow `GlobalImageCache.clear` to, optionally, only remove the actual data (PR 11912 follow-up) When "Cleanup" is triggered, you obviously need to remove all globally cached data on both the main- and worker-threads. However, the current the implementation of the `GlobalImageCache.clear` method also means that we lose all information about which images were cached and not just their data. This thus has the somewhat unfortunate side-effect of requiring images, which were previously known to be "global", to again having to reach `NUM_PAGES_THRESHOLD` before being cached again. To avoid doing unnecessary parsing after "Cleanup", we can thus let `GlobalImageCache.clear` keep track of which images were cached while still removing their actual data. This should not have any significant impact on memory usage, since the only extra thing being kept is a `RefSetCache` (essentially an Object) with a couple of `Set`s containing only integers.	2020-05-23 11:30:24 +02:00
Jonas Jenwald	56ebf01ae0	Avoid hanging the worker-thread for CMap data with ridiculously large ranges (issue 11922) This patch was inspired by `ad2b64f124/xpdf/CharCodeToUnicode.cc (L480-L484)`	2020-05-22 15:23:17 +02:00
Jonas Jenwald	18e0b10d3c	[api-minor] Remove the `disableCreateObjectURL` option from the `getDocument` parameters, since it's now unused in the API With the changes in previous patches, the `disableCreateObjectURL` option/functionality is no longer used for anything in the API and/or in the Worker code. Note however that there's some functionality, mainly related to file loading/downloading, in the GENERIC version of the default viewer which still depends on this option. Hence the `disableCreateObjectURL` option (and related compatibility code) is moved into the viewer, see e.g. `web/app_options.js`, such that it's still available in the default viewer.	2020-05-22 00:22:48 +02:00
Jonas Jenwald	0351852d74	[api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js` Currently some JPEG images are decoded by the built-in PDF.js decoder in `src/core/jpg.js`, while others attempt to use the browser JPEG decoder. This inconsistency seem unfortunate for a number of reasons: - It adds, compared to the other image formats supported in the PDF specification, a fair amount of code/complexity to the image handling in the PDF.js library. - The PDF specification support JPEG images with features, e.g. certain ColorSpaces, that browsers are unable to decode natively. Hence, determining if a JPEG image is possible to decode natively in the browser require a non-trivial amount of parsing. In particular, we're parsing (part of) the raw JPEG data to extract certain marker data and we also need to parse the ColorSpace for the JPEG image. - While some JPEG images may, for all intents and purposes, appear to be natively supported there's still cases where the browser may fail to decode some JPEG images. In order to support those cases, we've had to implement a fallback to the PDF.js JPEG decoder if there's any issues during the native decoding. This also means that it's no longer possible to simply send the JPEG image to the main-thread and continue parsing, but you now need to actually wait for the main-thread to indicate success/failure first. In practice this means that there's a code-path where the worker-thread is forced to wait for the main-thread, while the reverse should always be the case. - The native decoding, for anything except the simplest of JPEG images, result in increased peak memory usage because there's a handful of short-lived copies of the JPEG data (see PR 11707). Furthermore this also leads to data being parsed on the main-thread, rather than the worker-thread, which you usually want to avoid for e.g. performance and UI-reponsiveness reasons. - Not all environments, e.g. Node.js, fully support native JPEG decoding. This has, historically, lead to some issues and support requests. - Different browsers may use different JPEG decoders, possibly leading to images being rendered slightly differently depending on the platform/browser where the PDF.js library is used. Originally the implementation in `src/core/jpg.js` were unable to handle all of the JPEG images in the test-suite, but over the last couple of years I've fixed (hopefully) all of those issues. At this point in time, there's two kinds of failure with this patch: - Changes which are basically imperceivable to the naked eye, where some pixels in the images are essentially off-by-one (in all components), which could probably be attributed to things such as different rounding behaviour in the browser/PDF.js JPEG decoder. This type of "failure" accounts for the vast majority of the total number of changes in the reference tests. - Changes where the JPEG images now looks ever so slightly blurrier than with the native browser decoder. For quite some time I've just assumed that this pointed to a general deficiency in the `src/core/jpg.js` implementation, however I've discovered when comparing two viewers side-by-side that the differences vanish at higher zoom levels (usually around 200% is enough). Basically if you disable [this downscaling in canvas.js](`8fb82e939c/src/display/canvas.js (L2356-L2395)`), which is what happens when zooming in, the differences simply vanish! Hence I'm pretty satisfied that there's no significant problems with the `src/core/jpg.js` implementation, and the problems are rather tied to the general quality of the downscaling algorithm used. It could even be seen as a positive that all images now share the same downscaling behaviour, since this actually fixes one old bug; see issue 7041.	2020-05-22 00:22:48 +02:00
Jonas Jenwald	dda6626f40	Attempt to cache repeated images at the document, rather than the page, level (issue 11878) Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the same images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1] Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2] However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages. In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be twenty copies of the image data). While this obviously benefit both CPU and memory usage in this case, for very large image data this patch may possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will only cache a certain number of image resources at the document level and simply fallback to the default behaviour. Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3] Please note: The patch will lead to small movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator. --- [1] There's e.g. PDF documents that use the same image as background on all pages. [2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer. [3] If the latter case were true, we could simply check for repeat images before parsing started and thus avoid handling any duplicate image resources.	2020-05-21 18:13:45 +02:00
Jonas Jenwald	8d56a69e74	Reduce usage of SystemJS, in the development viewer, even further With these changes SystemJS is now only used, during development, on the worker-thread and in the unit/font-tests, since Firefox is currently missing support for worker modules; please see https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 Hence all the JavaScript files in the `web/` and `src/display/` folders are now loaded natively by the browser (during development) using standard `import` statements/calls, thanks to a nice `import-maps` polyfill. Please note: As soon as https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 is fixed in Firefox, we should be able to remove all traces of SystemJS and thus finally be able to use every possible modern JavaScript feature.	2020-05-20 13:36:52 +02:00
Jonas Jenwald	ec0ab91a2b	Reduce the usage of `require` statements in code-paths not protected by pre-processor and/or run-time checks This replaces some additional `require`/`exports` usage with standard `import`/`export` statements instead. Hence another, small, part in the effort to reduce the reliance on SystemJS-specific functionality in the development viewer.	2020-05-14 15:57:49 +02:00
Jonas Jenwald	73636e052a	Handle errors individually for each annotation in the `_parsedAnnotations` getter While working on PR 11872, it occurred to me that it probably wouldn't be a bad idea to change the `_parsedAnnotations` getter to handle errors individually for each annotation. This way, one broken/corrupt annotation won't prevent the rest of them from being e.g. fetched through the API.	2020-05-09 12:33:39 +02:00
Jonas Jenwald	e1f340a0c2	Use the ESLint `no-restricted-syntax` rule to ensure that `assert` is always called with two arguments Having `assert` calls without a message string isn't very helpful when debugging, and it turns out that it's easy enough to make use of ESLint to enforce better `assert` call-sites. In a couple of cases the `assert` calls were changed to "regular" throwing of errors instead, since that seemed more appropriate. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-restricted-syntax	2020-05-05 13:40:05 +02:00
Tim van der Meij	491904d30a	Merge pull request #11872 from Snuffleupagus/issue-11871 Gracefully handle annotation parsing errors in `Page.getOperatorList` (issue 11871)	2020-05-04 22:19:27 +02:00
Brendan Dahl	b1be33c96f	Add more categories of unsupported features. Fixes #11815	2020-05-04 11:02:16 -07:00
Jonas Jenwald	4aabd063fc	Gracefully handle annotation parsing errors in `Page.getOperatorList` (issue 11871) This should ensure that a page will always render successfully, even if there's errors during the Annotation fetching/parsing. Additionally the `OperatorList.addOpList` method is also adjusted to ignore invalid data, to make it slightly more robust.	2020-05-04 17:09:48 +02:00
Jonas Jenwald	911c33f025	Move the `maybeValidDimensions` check, used with JPEG images, to occur earlier (PR 11523 follow-up) Given that the `NativeImageDecoder.{isSupported, isDecodable}` methods require both dictionary lookups and ColorSpace parsing, in hindsight it actually seems more reasonable to the `JpegStream.maybeValidDimensions` checks first.	2020-04-26 12:07:46 +02:00
Jonas Jenwald	695140728a	[src/core/fonts.js] Improve the `validateOS2Table` function Rather than creating a new `Stream` just to validate the OS/2 TrueType table, it's simpler/better to just pass in a reference to the font data and use that instead (similar to other TrueType helper functions).	2020-04-19 11:25:25 +02:00
Jonas Jenwald	033d27fc25	[src/core/fonts.js] Replace some unnecessary `Stream.getUint16()` calls with `Stream.skip(2)` instead There's a handful of cases in the code where the intention is simply to advance the `Stream` position, but rather than only doing that the code instead fetches/computes a Uint16 value (and without using the result for anything).	2020-04-19 11:18:20 +02:00
Jonas Jenwald	4fae1ac5c4	[src/core/fonts.js] Replace some unnecessary `Stream.getBytes(...)` calls with `Stream.skip(...)` instead There's a handful of cases in the code where the intention is simply to advance the `Stream` position, but rather than only doing that the code instead fetches the bytes in question (and without using the result for anything).	2020-04-19 11:18:15 +02:00
Tim van der Meij	7b23476e61	Merge pull request #11818 from Snuffleupagus/eslint-dot-notation Enable the `dot-notation` ESLint rule	2020-04-18 00:19:47 +02:00
Jonas Jenwald	518d26dfb4	[src/core/jpg.js] Remove redundant marker validation at the end of the `decodeScan` function (PR 11805 follow-up) With the MCU parsing changes made in PR 11805, the final marker validation is no longer necessary before the `decodeScan` function returns.	2020-04-17 15:40:02 +02:00
Jonas Jenwald	1cc3dbb694	Enable the `dot-notation` ESLint rule Please note: These changes were done automatically, using the `gulp lint --fix` command. This rule is already enabled in mozilla-central, see https://searchfox.org/mozilla-central/rev/567b68b8ff4b6d607ba34a6f1926873d21a7b4d7/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#103-104 The main advantage, besides improved consistency, of this rule is that it reduces the size of the code (by 3 bytes for each case). In the PDF.js code-base there's close to 8000 instances being fixed by the `dot-notation` ESLint rule, which end up reducing the size of even the built files significantly; the total size of the `gulp mozcentral` build target changes from `3 247 456` to `3 224 278` bytes, which is a reduction of `23 178` bytes (or ~0.7%) for a completely mechanical change. A large number of these changes affect the (large) lookup tables used on the worker-thread, but given that they are still initialized lazily I don't think that the new formatting this patch introduces should undo any of the improvements from PR 6915. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/dot-notation	2020-04-17 12:24:46 +02:00
Tim van der Meij	96923eb2a6	Merge pull request #11805 from Snuffleupagus/issue-11794 Always skip over any additional, unexpected, RSTx (restart) markers in corrupt JPEG images (issue 11794)	2020-04-16 00:08:58 +02:00
Tim van der Meij	a7def05aa1	Merge pull request #11810 from Snuffleupagus/fromCodePoint-followup A couple of small `String.fromCodePoint` improvements (PR 11698 and 11769 follow-up)	2020-04-16 00:08:16 +02:00
Jonas Jenwald	44b4a74f48	A couple of small `String.fromCodePoint` improvements (PR 11698 and 11769 follow-up) - Add a reduced test-case for issue 11768, to prevent future regressions. (Given that PR 11769 is only a work-around, rather than a proper solution, it may not be entirely accurate for the issue to be closed as fixed.) - Add more validation of the charCode, as found by the heuristics, in `PartialEvaluator._buildSimpleFontToUnicode` to prevent future issues.	2020-04-15 13:45:08 +02:00
Jonas Jenwald	06f6f8719f	Always skip over any additional, unexpected, RSTx (restart) markers in corrupt JPEG images (issue 11794)	2020-04-14 23:27:08 +02:00
Jonas Jenwald	26cffd03b0	[src/core/jpg.js] Remove some redundant marker validation during the MCU parsing in the `decodeScan` function Some of the code in `src/core/jpg.js` is fairly old, and has with time become unnecessary when the surrounding code has been updated to handle various types of JPEG corruption. In particular the `if (!marker \|\| marker <= 0xff00) { ... }` branch is now dead code, since: - The `!marker` case can no longer happen, since we would already have broken out of the loop thanks to the `!fileMarker` branch a handful of lines above. - The `marker <= 0xff00` case can also no longer happen, since the `findNextFileMarker` function validate markers much more thoroughly (by checking `marker >= 0xffc0 && marker <= 0xfffe`). Hence we'd again have broken out of the loop via the `!fileMarker` branch above when no valid marker was found.	2020-04-14 23:27:08 +02:00
Jonas Jenwald	746eaf3154	[api-minor] Fix the return value of `PDFDocumentProxy.getViewerPreferences` when no viewer preferences are present (PR 10738 follow-up) This patch fixes yet another instalment in the never-ending series of "what the bleep was I thinking", by changing the `PDFDocumentProxy.getViewerPreferences` method to return `null` by default. Not only is this method now consistent with many other API methods, for the data not present case, but it also avoids having to e.g. loop through an object to check if it's actually empty (note the old unit-test).	2020-04-14 23:25:50 +02:00
Jonas Jenwald	426945b480	Update Prettier to version 2.0 Please note that these changes were done automatically, using `gulp lint --fix`. Given that the major version number was increased, there's a fair number of (primarily whitespace) changes; please see https://prettier.io/blog/2020/03/21/2.0.0.html In order to reduce the size of these changes somewhat, this patch maintains the old "arrowParens" style for now (once mozilla-central updates Prettier we can simply choose the same formatting, assuming it will differ here).	2020-04-14 12:28:14 +02:00

... 5 6 7 8 9 ...

1996 Commits