Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	c95fbb6e21	Convert the code in `src/core/evaluator.js` to use standard classes This removes additional `// eslint-disable-next-line no-shadow` usage, which our old pseudo-classes necessitated. Most of the re-formatting changes, after the `class` definitions and methods were fixed, were done automatically by Prettier. Please note: I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.	2020-07-05 16:01:04 +02:00
Jonas Jenwald	32a0b6fa73	Move some constants and helper functions out of the `PartialEvaluator` closure This will simplify the `class` conversion in the next patch, and with modern JavaScript the moved code is still limited to the current module scope. Please note: For improved consistency with our usual formatting, the `TILING_PATTERN`/`SHADING_PATTERN` constants where re-factored slightly.	2020-07-05 15:56:23 +02:00
Jonas Jenwald	ca719ecaa4	Add local caching of `Function`s, by reference, in the `PDFFunctionFactory` (issue 2541) Note that compared other structures, such as e.g. Images and ColorSpaces, `Function`s are not referred to by name, which however does bring the advantage of being able to share the cache for an entire page. Furthermore, similar to ColorSpaces, the parsing of individual `Function`s are generally fast enough to not really warrant trying to cache them in any "smarter" way than by reference. (Hence trying to do caching similar to e.g. Fonts would most likely be a losing proposition, given the amount of data lookup/parsing that'd be required.) Originally I tried implementing this similar to e.g. the recently added ColorSpace caching (and in a couple of different ways), however it unfortunately turned out to be quite ugly/unwieldy given the sheer number of functions/methods where you'd thus need to pass in a `LocalFunctionCache` instance. (Also, the affected functions/methods didn't exactly have short signatures as-is.) After going back and forth on this for a while it seemed to me that the simplest, or least "invasive" if you will, solution would be if each `PartialEvaluator` instance had its own `PDFFunctionFactory` instance (since the latter is already passed to all of the required code). This way each `PDFFunctionFactory` instances could have a local `Function` cache, without it being necessary to provide a `LocalFunctionCache` instance manually at every `PDFFunctionFactory.{create, createFromArray}` call-site. Obviously, with this patch, there's now (potentially) more `PDFFunctionFactory` instances than before when the entire document shared just one. However, each such instance is really quite small and it's also tied to a `PartialEvaluator` instance and those are not kept alive and/or cached. To reduce the impact of these changes, I've tried to make as many of these structures as possible lazily initialized, specifically: - The `PDFFunctionFactory`, on `PartialEvaluator` instances, since not all kinds of general parsing actually requires it. For example: `getTextContent` calls won't cause any `Function` to be parsed, and even some `getOperatorList` calls won't trigger `Function` parsing (if a page contains e.g. no Patterns or "complex" ColorSpaces). - The `LocalFunctionCache`, on `PDFFunctionFactory` instances, since only certain parsing requires it. Generally speaking, only e.g. Patterns, "complex" ColorSpaces, and/or (some) SoftMasks will trigger any `Function` parsing. To put these changes into perspective, when loading/rendering all (14) pages of the default `tracemonkey.pdf` file there's now a total of 6 `PDFFunctionFactory` and 1 `LocalFunctionCache` instances created thanks to the lazy initialization. (If you instead would keep the document-"global" `PDFFunctionFactory` instance and pass around `LocalFunctionCache` instances everywhere, the numbers for the `tracemonkey.pdf` file would be instead be something like 1 `PDFFunctionFactory` and 6 `LocalFunctionCache` instances.) All-in-all, I thus don't think that the `PDFFunctionFactory` changes should be generally problematic. With these changes, we can also modify (some) call-sites to pass in a `Reference` rather than the actual `Function` data. This is nice since `Function`s can also be `Streams`, which are not cached on the `XRef` instance (given their potential size), and this way we can avoid unnecessary lookups and thus save some additional time/resources. Obviously I had intended to include (standard) benchmark results with these changes, but for reasons I don't really understand the test run-time (even with `master`) of the document in issue 2541 is quite a bit slower than in the development viewer. However, logging the time it takes for the relevant `PDFFunctionFactory`/`PDFFunction ` parsing shows that it takes approximately `0.5 ms` for the `Function` in question. Looking up a cached `Function`, on the other hand, is one order of magnitude faster which does add up when the same `Function` is invoked close to 2000 times.	2020-07-04 00:55:18 +02:00
Jonas Jenwald	19d7976483	Improve (local) caching of parsed `ColorSpace`s (PR 12001 follow-up) This patch contains the following notable improvements: - Changes the `ColorSpace.parse` call-sites to, where possible, pass in a reference rather than actual ColorSpace data (necessary for the next point). - Adds (local) caching of `ColorSpace`s by `Ref`, when applicable, in addition the caching by name. This (generally) improves `ColorSpace` caching for e.g. the SMask code-paths. - Extends the (local) `ColorSpace` caching to also apply when handling Images and Patterns, thus further reducing unneeded re-parsing. - Adds a new `ColorSpace.parseAsync` method, almost identical to the existing `ColorSpace.parse` one, but returning a Promise instead (this simplifies some code in the `PartialEvaluator`).	2020-06-24 23:53:10 +02:00
Jonas Jenwald	51e87b9248	Add a proper `LocalColorSpaceCache` class, rather than piggybacking on the image one (PR 12001 follow-up) This will allow caching of ColorSpaces by either `Name` or `Ref`, which doesn't really make sense for images, thus allowing (better) caching for ColorSpaces used with e.g. Images and Patterns.	2020-06-24 23:53:10 +02:00
Jonas Jenwald	e22bc483a5	Re-factor `ColorSpace.parse` to take a parameter object, rather than a bunch of (randomly) ordered parameters Given the number of existing parameters, this will avoid needlessly unwieldy call-sites especially with upcoming changes in later patches.	2020-06-24 23:53:10 +02:00
Jonas Jenwald	f0708717a9	Move the `fetchBuiltInCMap` method to the `PartialEvaluator.prototype` Defining this inline in the "constructor" looks slightly weird (I really don't know why I wrote it like that originally), and it can simply be changed to a regular method instead.	2020-06-24 17:29:47 +02:00
Jonas Jenwald	5c39de805c	Add local caching of `ColorSpace`s, by name, in `PartialEvaluator.getOperatorList` (issue 2504) By caching parsed `ColorSpace`s, we thus don't need to re-parse the same data over and over which saves CPU cycles and reduces peak memory usage. (Obviously persistent memory usage may increase a tiny bit, but since the caching is done per `PartialEvaluator.getOperatorList` invocation and given that `ColorSpace` instances generally hold very little data this shouldn't be much of an issue.) Furthermore, by caching `ColorSpace`s we can also lookup the already parsed ones synchronously during the `OperatorList` building, instead of having to defer to the event loop/microtask queue since the parsing is done asynchronously (such that error handling is easier). Possible future improvements: - Cache/lookup parsed `ColorSpaces` used in `Pattern`s and `Image`s. - Attempt to cache local `ColorSpace`s by reference as well, in addition to only by name, assuming that there's documents where that would be beneficial and that it's not too difficult to implement. - Assuming there's documents that would benefit from it, also cache repeated `ColorSpace`s globally as well. Given that we've never, until now, been doing any caching of parsed `ColorSpace`s and that even using a simple name-only local cache helps tremendously in pathological cases, I purposely decided against complicating the implementation too much initially. Also, compared to parsing of `Image`s, simply creating a `ColorSpace` instance isn't that expensive (hence I'd be somewhat surprised if adding a global cache would help much). --- This patch was tested using: - The default `tracemonkey` PDF file, which was included mostly to show that "normal" documents aren't negatively affected by these changes. - The PDF file from issue 2504, i.e. https://dl-ctlg.panasonic.com/jp/manual/sd/sd_rbm1000_0.pdf, where most pages will switch thousands of times between a handful of `ColorSpace`s. with the following manifest file: ``` [ { "id": "tracemonkey", "file": "pdfs/tracemonkey.pdf", "md5": "9a192d8b1a7dc652a19835f6f08098bd", "rounds": 100, "type": "eq" }, { "id": "issue2504", "file": "../web/pdfs/issue2504.pdf", "md5": "", "rounds": 20, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: - Overall ``` -- Grouped By browser, pdf, stat -- browser \| pdf \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ----------- \| ------------ \| ----- \| ------------ \| ----------- \| ---- \| ------ \| ------------- firefox \| issue2504 \| Overall \| 640 \| 977 \| 497 \| -479 \| -49.08 \| faster firefox \| issue2504 \| Page Request \| 640 \| 3 \| 4 \| 1 \| 59.18 \| firefox \| issue2504 \| Rendering \| 640 \| 974 \| 493 \| -481 \| -49.37 \| faster firefox \| tracemonkey \| Overall \| 1400 \| 116 \| 111 \| -5 \| -4.43 \| firefox \| tracemonkey \| Page Request \| 1400 \| 2 \| 2 \| 0 \| -2.86 \| firefox \| tracemonkey \| Rendering \| 1400 \| 114 \| 109 \| -5 \| -4.47 \| ``` - Page-specific ``` -- Grouped By browser, pdf, page, stat -- browser \| pdf \| page \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ----------- \| ---- \| ------------ \| ----- \| ------------ \| ----------- \| ----- \| ------- \| ------------- firefox \| issue2504 \| 0 \| Overall \| 20 \| 2295 \| 1268 \| -1027 \| -44.76 \| faster firefox \| issue2504 \| 0 \| Page Request \| 20 \| 6 \| 7 \| 1 \| 15.32 \| firefox \| issue2504 \| 0 \| Rendering \| 20 \| 2288 \| 1260 \| -1028 \| -44.93 \| faster firefox \| issue2504 \| 1 \| Overall \| 20 \| 3059 \| 2806 \| -252 \| -8.25 \| faster firefox \| issue2504 \| 1 \| Page Request \| 20 \| 11 \| 14 \| 3 \| 23.25 \| slower firefox \| issue2504 \| 1 \| Rendering \| 20 \| 3047 \| 2792 \| -255 \| -8.37 \| faster firefox \| issue2504 \| 2 \| Overall \| 20 \| 411 \| 295 \| -116 \| -28.20 \| faster firefox \| issue2504 \| 2 \| Page Request \| 20 \| 2 \| 42 \| 40 \| 1897.62 \| firefox \| issue2504 \| 2 \| Rendering \| 20 \| 409 \| 253 \| -156 \| -38.09 \| faster firefox \| issue2504 \| 3 \| Overall \| 20 \| 736 \| 299 \| -437 \| -59.34 \| faster firefox \| issue2504 \| 3 \| Page Request \| 20 \| 2 \| 2 \| 0 \| 0.00 \| firefox \| issue2504 \| 3 \| Rendering \| 20 \| 734 \| 297 \| -437 \| -59.49 \| faster firefox \| issue2504 \| 4 \| Overall \| 20 \| 356 \| 458 \| 102 \| 28.63 \| firefox \| issue2504 \| 4 \| Page Request \| 20 \| 1 \| 2 \| 1 \| 57.14 \| slower firefox \| issue2504 \| 4 \| Rendering \| 20 \| 354 \| 455 \| 101 \| 28.53 \| firefox \| issue2504 \| 5 \| Overall \| 20 \| 1381 \| 765 \| -616 \| -44.59 \| faster firefox \| issue2504 \| 5 \| Page Request \| 20 \| 3 \| 5 \| 2 \| 50.00 \| slower firefox \| issue2504 \| 5 \| Rendering \| 20 \| 1378 \| 760 \| -617 \| -44.81 \| faster firefox \| issue2504 \| 6 \| Overall \| 20 \| 757 \| 299 \| -459 \| -60.57 \| faster firefox \| issue2504 \| 6 \| Page Request \| 20 \| 2 \| 5 \| 3 \| 150.00 \| slower firefox \| issue2504 \| 6 \| Rendering \| 20 \| 755 \| 294 \| -462 \| -61.11 \| faster firefox \| issue2504 \| 7 \| Overall \| 20 \| 394 \| 302 \| -92 \| -23.39 \| faster firefox \| issue2504 \| 7 \| Page Request \| 20 \| 2 \| 1 \| -1 \| -34.88 \| faster firefox \| issue2504 \| 7 \| Rendering \| 20 \| 392 \| 301 \| -91 \| -23.32 \| faster firefox \| issue2504 \| 8 \| Overall \| 20 \| 2875 \| 979 \| -1896 \| -65.95 \| faster firefox \| issue2504 \| 8 \| Page Request \| 20 \| 1 \| 2 \| 0 \| 11.11 \| firefox \| issue2504 \| 8 \| Rendering \| 20 \| 2874 \| 978 \| -1896 \| -65.99 \| faster firefox \| issue2504 \| 9 \| Overall \| 20 \| 700 \| 332 \| -368 \| -52.60 \| faster firefox \| issue2504 \| 9 \| Page Request \| 20 \| 3 \| 2 \| 0 \| -4.00 \| firefox \| issue2504 \| 9 \| Rendering \| 20 \| 698 \| 329 \| -368 \| -52.78 \| faster firefox \| issue2504 \| 10 \| Overall \| 20 \| 3296 \| 926 \| -2370 \| -71.91 \| faster firefox \| issue2504 \| 10 \| Page Request \| 20 \| 2 \| 2 \| 0 \| -18.75 \| firefox \| issue2504 \| 10 \| Rendering \| 20 \| 3293 \| 924 \| -2370 \| -71.96 \| faster firefox \| issue2504 \| 11 \| Overall \| 20 \| 524 \| 197 \| -327 \| -62.34 \| faster firefox \| issue2504 \| 11 \| Page Request \| 20 \| 2 \| 3 \| 1 \| 58.54 \| firefox \| issue2504 \| 11 \| Rendering \| 20 \| 522 \| 194 \| -328 \| -62.81 \| faster firefox \| issue2504 \| 12 \| Overall \| 20 \| 752 \| 369 \| -384 \| -50.98 \| faster firefox \| issue2504 \| 12 \| Page Request \| 20 \| 3 \| 2 \| -1 \| -36.51 \| faster firefox \| issue2504 \| 12 \| Rendering \| 20 \| 749 \| 367 \| -382 \| -51.05 \| faster firefox \| issue2504 \| 13 \| Overall \| 20 \| 679 \| 487 \| -193 \| -28.38 \| faster firefox \| issue2504 \| 13 \| Page Request \| 20 \| 4 \| 2 \| -2 \| -48.68 \| faster firefox \| issue2504 \| 13 \| Rendering \| 20 \| 676 \| 485 \| -191 \| -28.28 \| faster firefox \| issue2504 \| 14 \| Overall \| 20 \| 474 \| 283 \| -191 \| -40.26 \| faster firefox \| issue2504 \| 14 \| Page Request \| 20 \| 2 \| 4 \| 2 \| 78.57 \| firefox \| issue2504 \| 14 \| Rendering \| 20 \| 471 \| 279 \| -192 \| -40.79 \| faster firefox \| issue2504 \| 15 \| Overall \| 20 \| 860 \| 618 \| -241 \| -28.05 \| faster firefox \| issue2504 \| 15 \| Page Request \| 20 \| 2 \| 3 \| 0 \| 10.87 \| firefox \| issue2504 \| 15 \| Rendering \| 20 \| 857 \| 616 \| -241 \| -28.15 \| faster firefox \| issue2504 \| 16 \| Overall \| 20 \| 389 \| 243 \| -147 \| -37.71 \| faster firefox \| issue2504 \| 16 \| Page Request \| 20 \| 2 \| 2 \| 0 \| 2.33 \| firefox \| issue2504 \| 16 \| Rendering \| 20 \| 387 \| 240 \| -147 \| -37.94 \| faster firefox \| issue2504 \| 17 \| Overall \| 20 \| 1484 \| 672 \| -812 \| -54.70 \| faster firefox \| issue2504 \| 17 \| Page Request \| 20 \| 2 \| 3 \| 1 \| 37.21 \| firefox \| issue2504 \| 17 \| Rendering \| 20 \| 1482 \| 669 \| -812 \| -54.84 \| faster firefox \| issue2504 \| 18 \| Overall \| 20 \| 575 \| 252 \| -323 \| -56.12 \| faster firefox \| issue2504 \| 18 \| Page Request \| 20 \| 2 \| 2 \| 0 \| -16.22 \| firefox \| issue2504 \| 18 \| Rendering \| 20 \| 573 \| 251 \| -322 \| -56.24 \| faster firefox \| issue2504 \| 19 \| Overall \| 20 \| 517 \| 227 \| -290 \| -56.08 \| faster firefox \| issue2504 \| 19 \| Page Request \| 20 \| 2 \| 2 \| 0 \| 21.62 \| firefox \| issue2504 \| 19 \| Rendering \| 20 \| 515 \| 225 \| -290 \| -56.37 \| faster firefox \| issue2504 \| 20 \| Overall \| 20 \| 668 \| 670 \| 2 \| 0.31 \| firefox \| issue2504 \| 20 \| Page Request \| 20 \| 4 \| 2 \| -1 \| -34.29 \| firefox \| issue2504 \| 20 \| Rendering \| 20 \| 664 \| 667 \| 3 \| 0.49 \| firefox \| issue2504 \| 21 \| Overall \| 20 \| 486 \| 309 \| -177 \| -36.44 \| faster firefox \| issue2504 \| 21 \| Page Request \| 20 \| 2 \| 2 \| 0 \| 16.13 \| firefox \| issue2504 \| 21 \| Rendering \| 20 \| 484 \| 307 \| -177 \| -36.60 \| faster firefox \| issue2504 \| 22 \| Overall \| 20 \| 543 \| 267 \| -276 \| -50.85 \| faster firefox \| issue2504 \| 22 \| Page Request \| 20 \| 2 \| 2 \| 0 \| 10.26 \| firefox \| issue2504 \| 22 \| Rendering \| 20 \| 541 \| 265 \| -276 \| -51.07 \| faster firefox \| issue2504 \| 23 \| Overall \| 20 \| 3246 \| 871 \| -2375 \| -73.17 \| faster firefox \| issue2504 \| 23 \| Page Request \| 20 \| 2 \| 3 \| 1 \| 37.21 \| firefox \| issue2504 \| 23 \| Rendering \| 20 \| 3243 \| 868 \| -2376 \| -73.25 \| faster firefox \| issue2504 \| 24 \| Overall \| 20 \| 379 \| 156 \| -223 \| -58.83 \| faster firefox \| issue2504 \| 24 \| Page Request \| 20 \| 2 \| 2 \| 0 \| -2.86 \| firefox \| issue2504 \| 24 \| Rendering \| 20 \| 378 \| 154 \| -223 \| -59.10 \| faster firefox \| issue2504 \| 25 \| Overall \| 20 \| 176 \| 127 \| -50 \| -28.19 \| faster firefox \| issue2504 \| 25 \| Page Request \| 20 \| 2 \| 1 \| 0 \| -15.63 \| firefox \| issue2504 \| 25 \| Rendering \| 20 \| 175 \| 125 \| -49 \| -28.31 \| faster firefox \| issue2504 \| 26 \| Overall \| 20 \| 181 \| 108 \| -74 \| -40.67 \| faster firefox \| issue2504 \| 26 \| Page Request \| 20 \| 3 \| 2 \| -1 \| -39.13 \| faster firefox \| issue2504 \| 26 \| Rendering \| 20 \| 178 \| 105 \| -72 \| -40.69 \| faster firefox \| issue2504 \| 27 \| Overall \| 20 \| 208 \| 104 \| -104 \| -49.92 \| faster firefox \| issue2504 \| 27 \| Page Request \| 20 \| 2 \| 2 \| 1 \| 48.39 \| firefox \| issue2504 \| 27 \| Rendering \| 20 \| 206 \| 102 \| -104 \| -50.64 \| faster firefox \| issue2504 \| 28 \| Overall \| 20 \| 241 \| 111 \| -131 \| -54.16 \| faster firefox \| issue2504 \| 28 \| Page Request \| 20 \| 2 \| 2 \| -1 \| -33.33 \| firefox \| issue2504 \| 28 \| Rendering \| 20 \| 239 \| 109 \| -130 \| -54.39 \| faster firefox \| issue2504 \| 29 \| Overall \| 20 \| 321 \| 196 \| -125 \| -39.05 \| faster firefox \| issue2504 \| 29 \| Page Request \| 20 \| 1 \| 2 \| 0 \| 17.86 \| firefox \| issue2504 \| 29 \| Rendering \| 20 \| 319 \| 194 \| -126 \| -39.35 \| faster firefox \| issue2504 \| 30 \| Overall \| 20 \| 651 \| 271 \| -380 \| -58.41 \| faster firefox \| issue2504 \| 30 \| Page Request \| 20 \| 1 \| 2 \| 1 \| 50.00 \| firefox \| issue2504 \| 30 \| Rendering \| 20 \| 649 \| 269 \| -381 \| -58.60 \| faster firefox \| issue2504 \| 31 \| Overall \| 20 \| 1635 \| 647 \| -988 \| -60.42 \| faster firefox \| issue2504 \| 31 \| Page Request \| 20 \| 1 \| 2 \| 0 \| 30.43 \| firefox \| issue2504 \| 31 \| Rendering \| 20 \| 1634 \| 645 \| -988 \| -60.49 \| faster firefox \| tracemonkey \| 0 \| Overall \| 100 \| 51 \| 51 \| 0 \| 0.02 \| firefox \| tracemonkey \| 0 \| Page Request \| 100 \| 1 \| 1 \| 0 \| -4.76 \| firefox \| tracemonkey \| 0 \| Rendering \| 100 \| 50 \| 50 \| 0 \| 0.12 \| firefox \| tracemonkey \| 1 \| Overall \| 100 \| 97 \| 91 \| -5 \| -5.52 \| faster firefox \| tracemonkey \| 1 \| Page Request \| 100 \| 3 \| 3 \| 0 \| -1.32 \| firefox \| tracemonkey \| 1 \| Rendering \| 100 \| 94 \| 88 \| -5 \| -5.73 \| faster firefox \| tracemonkey \| 2 \| Overall \| 100 \| 40 \| 40 \| 0 \| 0.50 \| firefox \| tracemonkey \| 2 \| Page Request \| 100 \| 1 \| 1 \| 0 \| 3.16 \| firefox \| tracemonkey \| 2 \| Rendering \| 100 \| 39 \| 39 \| 0 \| 0.54 \| firefox \| tracemonkey \| 3 \| Overall \| 100 \| 62 \| 62 \| -1 \| -0.94 \| firefox \| tracemonkey \| 3 \| Page Request \| 100 \| 1 \| 1 \| 0 \| 17.05 \| firefox \| tracemonkey \| 3 \| Rendering \| 100 \| 61 \| 61 \| -1 \| -1.11 \| firefox \| tracemonkey \| 4 \| Overall \| 100 \| 56 \| 58 \| 2 \| 3.41 \| firefox \| tracemonkey \| 4 \| Page Request \| 100 \| 1 \| 1 \| 0 \| 15.31 \| firefox \| tracemonkey \| 4 \| Rendering \| 100 \| 55 \| 57 \| 2 \| 3.23 \| firefox \| tracemonkey \| 5 \| Overall \| 100 \| 73 \| 71 \| -2 \| -2.28 \| firefox \| tracemonkey \| 5 \| Page Request \| 100 \| 2 \| 2 \| 0 \| 12.20 \| firefox \| tracemonkey \| 5 \| Rendering \| 100 \| 71 \| 69 \| -2 \| -2.69 \| firefox \| tracemonkey \| 6 \| Overall \| 100 \| 85 \| 69 \| -16 \| -18.73 \| faster firefox \| tracemonkey \| 6 \| Page Request \| 100 \| 2 \| 2 \| 0 \| -9.90 \| firefox \| tracemonkey \| 6 \| Rendering \| 100 \| 83 \| 67 \| -16 \| -18.97 \| faster firefox \| tracemonkey \| 7 \| Overall \| 100 \| 65 \| 64 \| 0 \| -0.37 \| firefox \| tracemonkey \| 7 \| Page Request \| 100 \| 1 \| 1 \| 0 \| -11.94 \| firefox \| tracemonkey \| 7 \| Rendering \| 100 \| 63 \| 63 \| 0 \| -0.05 \| firefox \| tracemonkey \| 8 \| Overall \| 100 \| 53 \| 54 \| 1 \| 2.04 \| firefox \| tracemonkey \| 8 \| Page Request \| 100 \| 1 \| 1 \| 0 \| 17.02 \| firefox \| tracemonkey \| 8 \| Rendering \| 100 \| 52 \| 53 \| 1 \| 1.82 \| firefox \| tracemonkey \| 9 \| Overall \| 100 \| 79 \| 73 \| -6 \| -7.86 \| faster firefox \| tracemonkey \| 9 \| Page Request \| 100 \| 2 \| 2 \| 0 \| -15.14 \| firefox \| tracemonkey \| 9 \| Rendering \| 100 \| 77 \| 71 \| -6 \| -7.86 \| faster firefox \| tracemonkey \| 10 \| Overall \| 100 \| 545 \| 519 \| -27 \| -4.86 \| faster firefox \| tracemonkey \| 10 \| Page Request \| 100 \| 14 \| 13 \| 0 \| -3.56 \| firefox \| tracemonkey \| 10 \| Rendering \| 100 \| 532 \| 506 \| -26 \| -4.90 \| faster firefox \| tracemonkey \| 11 \| Overall \| 100 \| 42 \| 41 \| -1 \| -2.50 \| firefox \| tracemonkey \| 11 \| Page Request \| 100 \| 1 \| 1 \| 0 \| -27.42 \| faster firefox \| tracemonkey \| 11 \| Rendering \| 100 \| 41 \| 40 \| -1 \| -1.75 \| firefox \| tracemonkey \| 12 \| Overall \| 100 \| 350 \| 332 \| -18 \| -5.16 \| faster firefox \| tracemonkey \| 12 \| Page Request \| 100 \| 3 \| 3 \| 0 \| -5.17 \| firefox \| tracemonkey \| 12 \| Rendering \| 100 \| 347 \| 329 \| -18 \| -5.15 \| faster firefox \| tracemonkey \| 13 \| Overall \| 100 \| 31 \| 31 \| 0 \| 0.52 \| firefox \| tracemonkey \| 13 \| Page Request \| 100 \| 1 \| 1 \| 0 \| 4.95 \| firefox \| tracemonkey \| 13 \| Rendering \| 100 \| 30 \| 30 \| 0 \| 0.20 \| ```	2020-06-14 11:51:45 +02:00
Jonas Jenwald	4b51bcc733	Ensure that `PDFImage.buildImage` won't accidentally swallow errors, e.g. from ColorSpace parsing (issue 6707, PR 11601 follow-up) Because of a really stupid `Promise`-related mistake on my part, when re-factoring `PDFImage.buildImage` during the `NativeImageDecoder` removal, we're no longer re-throwing errors occuring during image parsing/decoding as intended. The result is that some (fairly) corrupt documents will never finish loading, and unfortunately there were apparently no sufficiently corrupt images in the test-suite to catch this.	2020-06-13 15:02:37 +02:00
Jonas Jenwald	df7d8c74ca	Extract the actual sending of image data from the `PartialEvaluator.buildPaintImageXObject` method After PRs 10727 and 11912, the code responsible for sending the decoded image data to the main-thread has now become a fair bit more involved the previously. To reduce the amount of duplication here, the actual code responsible for sending the data is thus extracted into a new helper method instead.	2020-06-07 12:01:51 +02:00
Jonas Jenwald	af815e417d	Ensure that that we don't attempt to cache inline images in the `GlobalImageCache` (PR 11912 follow-up) Since inline images, i.e. those defined inside of `/Contents` streams, are by their very definition page-specific it thus seem like a good idea to actually enforce that they won't accidentally end up in the `GlobalImageCache`.	2020-06-01 01:00:30 +02:00
Jonas Jenwald	4ef547f400	Improve caching of empty `/XObject`s in the `PartialEvaluator.getTextContent` method It turns out that `getTextContent` suffers from similar problems with repeated images as `getOperatorList`; please see the previous patch. While only `/XObject` resources of the `Form`-type will actually be parsed in `PartialEvaluator.getTextContent`, since those are the only ones that may contain text, we're still forced to fetch repeated image resources where the name differs (but not the reference). Obviously it's less bad in this case, since we're not actually parsing `/XObject`s of e.g. the `Image`-type. However, you still want to avoid even fetching the data whenever possible, since `Stream`s are not cached on the `XRef` instance (given their potential size) and the lookup can thus be somewhat expensive in general. To address these issues, we can simply replace the exiting name-only caching in `PartialEvaluator.getTextContent` with a new cache backed by `LocalImageCache` instead.	2020-05-26 09:49:01 +02:00
Jonas Jenwald	d62c9181bd	Improve the local image caching in `PartialEvaluator.getOperatorList` Currently the local `imageCache`, as used in `PartialEvaluator.getOperatorList`, will miss certain cases of repeated images because the caching is only done by name (usually using a format such as e.g. "Im0", "Im1", ...). However, in some PDF documents the `/XObject` dictionaries many contain hundreds (or even thousands) of distinctly named images, despite them referring to only a handful of actual image objects (via the XRef table). With these changes we'll now cache local images using both name and (where applicable) reference, thus improving re-usage of images resources even further. This patch was tested using the PDF file from [bug 857031](https://bugzilla.mozilla.org/show_bug.cgi?id=857031), i.e. https://bug857031.bmoattachments.org/attachment.cgi?id=732270, with the following manifest file: ``` [ { "id": "bug857031", "file": "../web/pdfs/bug857031.pdf", "md5": "", "rounds": 250, "lastPage": 1, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, page, stat -- browser \| page \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ---- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- firefox \| 0 \| Overall \| 250 \| 2749 \| 2656 \| -93 \| -3.38 \| faster firefox \| 0 \| Page Request \| 250 \| 3 \| 4 \| 1 \| 50.14 \| slower firefox \| 0 \| Rendering \| 250 \| 2746 \| 2652 \| -94 \| -3.44 \| faster ``` While this is certainly an improvement, since we now avoid re-parsing ~1000 images on the first page, all of the image resources are small enough that the total rendering time doesn't improve that much in this particular case. In pathological cases, such as e.g. the PDF document in issue 4958, the improvements with this patch can be very significant. Looking for example at page 2, from issue 4958, the rendering time drops from ~60 seconds with `master` to ~30 seconds with this patch (obviously still slow, but it really showcases the potential of this patch nicely). Finally, note that there's also potential for additional improvements by re-using `LocalImageCache` instances for e.g. /XObject data of the `Form`-type. However, given that recent changes in this area I purposely didn't want to complicate this patch more than necessary.	2020-05-25 15:14:14 +02:00
Jonas Jenwald	18e0b10d3c	[api-minor] Remove the `disableCreateObjectURL` option from the `getDocument` parameters, since it's now unused in the API With the changes in previous patches, the `disableCreateObjectURL` option/functionality is no longer used for anything in the API and/or in the Worker code. Note however that there's some functionality, mainly related to file loading/downloading, in the GENERIC version of the default viewer which still depends on this option. Hence the `disableCreateObjectURL` option (and related compatibility code) is moved into the viewer, see e.g. `web/app_options.js`, such that it's still available in the default viewer.	2020-05-22 00:22:48 +02:00
Jonas Jenwald	0351852d74	[api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js` Currently some JPEG images are decoded by the built-in PDF.js decoder in `src/core/jpg.js`, while others attempt to use the browser JPEG decoder. This inconsistency seem unfortunate for a number of reasons: - It adds, compared to the other image formats supported in the PDF specification, a fair amount of code/complexity to the image handling in the PDF.js library. - The PDF specification support JPEG images with features, e.g. certain ColorSpaces, that browsers are unable to decode natively. Hence, determining if a JPEG image is possible to decode natively in the browser require a non-trivial amount of parsing. In particular, we're parsing (part of) the raw JPEG data to extract certain marker data and we also need to parse the ColorSpace for the JPEG image. - While some JPEG images may, for all intents and purposes, appear to be natively supported there's still cases where the browser may fail to decode some JPEG images. In order to support those cases, we've had to implement a fallback to the PDF.js JPEG decoder if there's any issues during the native decoding. This also means that it's no longer possible to simply send the JPEG image to the main-thread and continue parsing, but you now need to actually wait for the main-thread to indicate success/failure first. In practice this means that there's a code-path where the worker-thread is forced to wait for the main-thread, while the reverse should always be the case. - The native decoding, for anything except the simplest of JPEG images, result in increased peak memory usage because there's a handful of short-lived copies of the JPEG data (see PR 11707). Furthermore this also leads to data being parsed on the main-thread, rather than the worker-thread, which you usually want to avoid for e.g. performance and UI-reponsiveness reasons. - Not all environments, e.g. Node.js, fully support native JPEG decoding. This has, historically, lead to some issues and support requests. - Different browsers may use different JPEG decoders, possibly leading to images being rendered slightly differently depending on the platform/browser where the PDF.js library is used. Originally the implementation in `src/core/jpg.js` were unable to handle all of the JPEG images in the test-suite, but over the last couple of years I've fixed (hopefully) all of those issues. At this point in time, there's two kinds of failure with this patch: - Changes which are basically imperceivable to the naked eye, where some pixels in the images are essentially off-by-one (in all components), which could probably be attributed to things such as different rounding behaviour in the browser/PDF.js JPEG decoder. This type of "failure" accounts for the vast majority of the total number of changes in the reference tests. - Changes where the JPEG images now looks ever so slightly blurrier than with the native browser decoder. For quite some time I've just assumed that this pointed to a general deficiency in the `src/core/jpg.js` implementation, however I've discovered when comparing two viewers side-by-side that the differences vanish at higher zoom levels (usually around 200% is enough). Basically if you disable [this downscaling in canvas.js](`8fb82e939c/src/display/canvas.js (L2356-L2395)`), which is what happens when zooming in, the differences simply vanish! Hence I'm pretty satisfied that there's no significant problems with the `src/core/jpg.js` implementation, and the problems are rather tied to the general quality of the downscaling algorithm used. It could even be seen as a positive that all images now share the same downscaling behaviour, since this actually fixes one old bug; see issue 7041.	2020-05-22 00:22:48 +02:00
Jonas Jenwald	dda6626f40	Attempt to cache repeated images at the document, rather than the page, level (issue 11878) Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the same images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1] Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2] However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages. In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be twenty copies of the image data). While this obviously benefit both CPU and memory usage in this case, for very large image data this patch may possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will only cache a certain number of image resources at the document level and simply fallback to the default behaviour. Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3] Please note: The patch will lead to small movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator. --- [1] There's e.g. PDF documents that use the same image as background on all pages. [2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer. [3] If the latter case were true, we could simply check for repeat images before parsing started and thus avoid handling any duplicate image resources.	2020-05-21 18:13:45 +02:00
Brendan Dahl	b1be33c96f	Add more categories of unsupported features. Fixes #11815	2020-05-04 11:02:16 -07:00
Jonas Jenwald	911c33f025	Move the `maybeValidDimensions` check, used with JPEG images, to occur earlier (PR 11523 follow-up) Given that the `NativeImageDecoder.{isSupported, isDecodable}` methods require both dictionary lookups and ColorSpace parsing, in hindsight it actually seems more reasonable to the `JpegStream.maybeValidDimensions` checks first.	2020-04-26 12:07:46 +02:00
Jonas Jenwald	1cc3dbb694	Enable the `dot-notation` ESLint rule Please note: These changes were done automatically, using the `gulp lint --fix` command. This rule is already enabled in mozilla-central, see https://searchfox.org/mozilla-central/rev/567b68b8ff4b6d607ba34a6f1926873d21a7b4d7/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#103-104 The main advantage, besides improved consistency, of this rule is that it reduces the size of the code (by 3 bytes for each case). In the PDF.js code-base there's close to 8000 instances being fixed by the `dot-notation` ESLint rule, which end up reducing the size of even the built files significantly; the total size of the `gulp mozcentral` build target changes from `3 247 456` to `3 224 278` bytes, which is a reduction of `23 178` bytes (or ~0.7%) for a completely mechanical change. A large number of these changes affect the (large) lookup tables used on the worker-thread, but given that they are still initialized lazily I don't think that the new formatting this patch introduces should undo any of the improvements from PR 6915. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/dot-notation	2020-04-17 12:24:46 +02:00
Jonas Jenwald	44b4a74f48	A couple of small `String.fromCodePoint` improvements (PR 11698 and 11769 follow-up) - Add a reduced test-case for issue 11768, to prevent future regressions. (Given that PR 11769 is only a work-around, rather than a proper solution, it may not be entirely accurate for the issue to be closed as fixed.) - Add more validation of the charCode, as found by the heuristics, in `PartialEvaluator._buildSimpleFontToUnicode` to prevent future issues.	2020-04-15 13:45:08 +02:00
Jonas Jenwald	426945b480	Update Prettier to version 2.0 Please note that these changes were done automatically, using `gulp lint --fix`. Given that the major version number was increased, there's a fair number of (primarily whitespace) changes; please see https://prettier.io/blog/2020/03/21/2.0.0.html In order to reduce the size of these changes somewhat, this patch maintains the old "arrowParens" style for now (once mozilla-central updates Prettier we can simply choose the same formatting, assuming it will differ here).	2020-04-14 12:28:14 +02:00
Jonas Jenwald	2d46230d23	[api-minor] Change `Font.exportData` to, by default, stop exporting properties which are completely unused on the main-thread and/or in the API (PR 11773 follow-up) For years now, the `Font.exportData` method has (because of its previous implementation) been exporting many properties despite them being completely unused on the main-thread and/or in the API. This is unfortunate, since among those properties there's a number of potentially very large data-structures, containing e.g. Arrays and Objects, which thus have to be first structured cloned and then stored on the main-thread. With the changes in this patch, we'll thus by default save memory for every `Font` instance created (there can be a lot in longer documents). The memory savings obviously depends a lot on the actual font data, but some approximate figures are: For non-embedded fonts it can save a couple of kilobytes, for simple embedded fonts a handful of kilobytes, and for composite fonts the size of this auxiliary can even be larger than the actual font program itself. All-in-all, there's no good reason to keep exporting these properties by default when they're unused. However, since we cannot be sure that every property is unused in custom implementations of the PDF.js library, this patch adds a new `getDocument` option (named `fontExtraProperties`) that still allows access to the following properties: - "cMap": An internal data structure, only used with composite fonts and never really intended to be exposed on the main-thread and/or in the API. Note also that the `CMap`/`IdentityCMap` classes are a lot more complex than simple Objects, but only their "internal" properties survive the structured cloning used to send data to the main-thread. Given that CMaps can often be very large, not exporting them can also save a fair bit of memory. - "defaultEncoding": An internal property used with simple fonts, and used when building the glyph mapping on the worker-thread. Considering how complex that topic is, and given that not all font types are handled identically, exposing this on the main-thread and/or in the API most likely isn't useful. - "differences": An internal property used with simple fonts, and used when building the glyph mapping on the worker-thread. Considering how complex that topic is, and given that not all font types are handled identically, exposing this on the main-thread and/or in the API most likely isn't useful. - "isSymbolicFont": An internal property, used during font parsing and building of the glyph mapping on the worker-thread. - "seacMap": An internal map, only potentially used with some Type1/CFF fonts and never intended to be exposed in the API. The existing `Font.{charToGlyph, charToGlyphs}` functionality already takes this data into account when handling text. - "toFontChar": The glyph map, necessary for mapping characters to glyphs in the font, which is built upon the various encoding information contained in the font dictionary and/or font program. This is not directly used on the main-thread and/or in the API. - "toUnicode": The unicode map, necessary for text-extraction to work correctly, which is built upon the ToUnicode/CMap information contained in the font dictionary, but not directly used on the main-thread and/or in the API. - "vmetrics": An array of width data used with fonts which are composite and vertical, but not directly used on the main-thread and/or in the API. - "widths": An array of width data used with most fonts, but not directly used on the main-thread and/or in the API.	2020-04-06 11:47:09 +02:00
Jonas Jenwald	2619272d73	Change the signature of `TranslatedFont`, and convert it to a proper class In preparation for the next patch, this changes the signature of `TranslatedFont` to take an object rather than individual parameters. This also, in my opinion, makes the call-sites easier to read since it essentially provides a small bit of documentation of the arguments. Finally, since it was necessary to touch `TranslatedFont` anyway it seemed like a good idea to also convert it to a proper `class`.	2020-04-05 20:53:48 +02:00
Jonas Jenwald	59f54b946d	Ensure that all `Font` instances have the `vertical` property set to a boolean Given that the `vertical` property is always accessed on the main-thread, ensuring that the property is explicitly defined seems like the correct thing to do since it also avoids boolean casting elsewhere in the code-base.	2020-04-05 16:27:50 +02:00
Jonas Jenwald	dcb16af968	Whitelist closure related cases to address the remaining `no-shadow` linting errors Given the way that "classes" were previously implemented in PDF.js, using regular functions and closures, there's a fair number of false positives when the `no-shadow` ESLint rule was enabled. Note that while some of these `eslint-disable` statements can be removed if/when the relevant code is converted to proper `class`es, we'll probably never be able to get rid of all of them given our naming/coding conventions (however I don't really see this being a problem).	2020-03-25 11:57:12 +01:00
Jonas Jenwald	216cbca16c	Remove variable shadowing from the JavaScript files in the `src/core/` folder This is part of a series of patches that will try to split PR 11566 into smaller chunks, to make reviewing more feasible. Once all the code has been fixed, we'll be able to eventually enable the ESLint no-shadow rule; see https://eslint.org/docs/rules/no-shadow	2020-03-23 18:28:30 +01:00
Jonas Jenwald	1cd9d5a8fd	Remove the unused `wideChars` property on `Font` instances This property was added in PR 1599 (almost eight years ago), but has been unused ever since PR 3674 (six and a half years ago).	2020-03-20 10:37:32 +01:00
Jonas Jenwald	15e8692eff	Don't accidentally accept invalid glyphNames which appear to follow the Cdd{d}/cdd{d} format in `PartialEvaluator._buildSimpleFontToUnicode` (issue 11697) The /Differences array of the problematic font contains a `/c.1` entry, which is consequently detected as a possible Cdd{d}/cdd{d} glyphName by the existing heuristics. Because of how the base 10 conversion is implemented, which is necessary for the base 16 special case, the parsed charCode becomes `0.1` thus causing `String.fromCodePoint` to throw since that obviously isn't a valid code point. To fix the referenced issue, and to hopefully prevent similar ones in the future, the patch adds additional validation of the charCode found by the heuristics.	2020-03-13 23:35:47 +01:00
Jonas Jenwald	65e6ea2cb2	Prevent lookup errors in `PartialEvaluator.hasBlendModes` from breaking all parsing/rendering of a page (issue 11678) The PDF document in question is corrupt, since it contains an XObject with a truncated dictionary and where the stream contents start without a "stream" operator.	2020-03-09 12:00:12 +01:00
Tim van der Meij	1a97c142b3	Merge pull request #11523 from Snuffleupagus/issue-10880 Add a heuristic, in `src/core/jpg.js`, to handle JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10880)	2020-03-06 23:03:09 +01:00
Jonas Jenwald	160cfc4084	Slightly simplify the lookup of data in `Dict.{get, getAsync, has}` Note that `Dict.set` will only be called with values returned through `Parser.getObj`, and thus indirectly via `Lexer.getObj`. Since neither of those methods will ever return `undefined`, we can simply assert that that's the case when inserting data into the `Dict` and thus get rid of `in` checks when doing the data lookups. In this case, since `Dict.set` is fairly hot, the patch utilizes an inline check and when necessary a direct call to `unreachable` to not affect performance of `gulp server/test` too much (rather than always just calling `assert`). For very large and complex PDF files this will help performance slightly, since `Dict.{get, getAsync, has}` is called a lot during parsing in the worker. This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file: ``` [ { "id": "issue2618", "file": "../web/pdfs/issue2618.pdf", "md5": "", "rounds": 250, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 250 \| 2838 \| 2820 \| -18 \| -0.65 \| faster Firefox \| Page Request \| 250 \| 1 \| 2 \| 0 \| 11.92 \| slower Firefox \| Rendering \| 250 \| 2837 \| 2818 \| -19 \| -0.65 \| faster ```	2020-03-06 14:12:14 +01:00
Jonas Jenwald	65e514e063	Ensure that there's always a setFont (Tf) operator before text rendering operators (issue 11651) The PDF document in question is corrupt, since it contains multiple instances of incorrect operators. We obviously don't want to slow down parsing of all documents (since most are valid), just to accommodate a particular bad PDF generator, hence the reason for the inline check before calling the `ensureStateFont` method.	2020-03-03 10:05:18 +01:00
Jonas Jenwald	c55d30a715	Use the same non-embedded Wingdings fallback for fonts named "Wingdings-Regular" too (PR 5463 follow-up, issue 11451) This patch extends the existing heuristics, which are really the best that we can do in general for these kinds of non-embedded and non-standard fonts. Furthermore, this patch also tries to improve the copy-and-paste behaviour for non-embedded Wingdings fonts by also using the `ZapfDingbatsEncoding` in this case. Note: I'm not sure that adding additional tests for Wingdings fonts matters that much, given how limited our "support" for them really is.	2020-02-24 17:40:06 +01:00
Jonas Jenwald	5494f7d5bc	Add basic validation of the `scanLines` parameter in JPEG images, before delegating decoding to the browser In some cases PDF documents can contain JPEG images that the native browser decoder cannot handle, e.g. images with DNL (Define Number of Lines) markers or images where the SOF (Start of Frame) marker contains a wildly incorrect `scanLines` parameter. Currently, for "simple" JPEG images, we're relying on native image decoding to fail before falling back to the implementation in `src/core/jpg.js`. In some cases, note e.g. issue 10880, the native image decoder doesn't outright fail and thus some images may not render. In an attempt to improve the current situation, this patch adds additional validation of the JPEG image SOF data to force the use of `src/core/jpg.js` directly in cases where the native JPEG decoder cannot be trusted to do the right thing. The only way to implement this is unfortunately to parse the beginning of the JPEG image data, looking for a SOF marker. To limit the impact of this extra parsing, the result is cached on the `JpegStream` instance and this code is only run for images which passed all of the pre-existing "can the JPEG image be natively rendered and/or decoded" checks. --- Slightly off-topic: Working on this really makes me start questioning if native rendering/decoding of JPEG images is actually a good idea. There's certain kinds of JPEG images not supported natively, and all of the validation which is now necessary isn't "free". At this point, in the `NativeImageDecoder`, we're having to check for certain properties in the image dictionary, parse the `ColorSpace`, and finally read the actual image data to find the SOF marker. Furthermore, we cannot just send the image to the main-thread and be done in the "JpegStream" case, but we also need to wait for rendering to complete (or fail) before continuing with other parsing. In the "JpegDecode" case we're even having to parse part of the image on the main-thread, which seems completely at odds with the principle of doing all heavy parsing in the Worker, and there's also a couple of potentially large (temporary) allocations/copies of TypedArray data involved as well.	2020-02-22 14:16:07 +01:00
Tim van der Meij	61056a9238	Merge pull request #11551 from Snuffleupagus/issue-11549 Allow skipping of errors when reading broken/corrupt ToUnicode data (issue 11549)	2020-02-09 17:32:35 +01:00
Brendan Dahl	09a6e17d22	Merge pull request #11528 from janpe2/type1-nonemb-notdef Hide .notdef glyphs in non-embedded Type1 fonts and don't ignore Widths	2020-02-06 13:30:07 -08:00
Jonas Jenwald	4c54395ff6	Allow skipping of errors when reading broken/corrupt ToUnicode data (issue 11549) This will allow font loading/parsing to continue, rather than immediately failing, when broken/corrupt CMap data is encountered.	2020-01-30 13:19:05 +01:00
Tim van der Meij	cbbda9d883	Merge pull request #11515 from Snuffleupagus/cache-fallback-font Cache the fallback font dictionary on the `PartialEvaluator` (PR 11218 follow-up)	2020-01-25 21:32:28 +01:00
Jonas Jenwald	83bdb525a4	Fix remaining linting errors, from enabling the `prefer-const` ESLint rule globally This covers cases that the `--fix` command couldn't deal with, and in a few cases (notably `src/core/jbig2.js`) the code was changed to use block-scoped variables instead.	2020-01-25 00:20:23 +01:00
Jonas Jenwald	9e262ae7fa	Enable the ESLint `prefer-const` rule globally (PR 11450 follow-up) Please find additional details about the ESLint rule at https://eslint.org/docs/rules/prefer-const With the recent introduction of Prettier this sort of mass enabling of ESLint rules becomes a lot easier, since the code will be automatically reformatted as necessary to account for e.g. changed line lengths. Note that this patch is generated automatically, by using the ESLint `--fix` argument, and will thus require some additional clean-up (which is done separately).	2020-01-25 00:20:22 +01:00
Jani Pehkonen	809b96b40c	Hide .notdef glyphs in non-embedded Type1 fonts and don't ignore Widths Fixes #11403 The PDF uses the non-embedded Type1 font Helvetica. Character codes 194 and 160 (`Â` and `NBSP`) are encoded as `.notdef`. We shouldn't show those glyphs because it seems that Acrobat Reader doesn't draw glyphs that are named `.notdef` in fonts like this. In addition to testing `glyphName === ".notdef"`, we must test also `glyphName === ""` because the name `""` is used in `core/encodings.js` for undefined glyphs in encodings like `WinAnsiEncoding`. The solution above hides the `Â` characters but now the replacement character (space) appears to be too wide. I found out that PDF.js ignores font's `Widths` array if the font has no `FontDescriptor` entry. That happens in #11403, so the default widths of Helvetica were used as specified in `core/metrics.js` and `.nodef` got a width of 333. The correct width is 0 as specified by the `Widths` array in the PDF. Thus we must never ignore `Widths`.	2020-01-21 21:35:25 +02:00
Jonas Jenwald	9ab7c280aa	Cache the fallback font dictionary on the `PartialEvaluator` (PR 11218 follow-up) This way we'll benefit from the existing font caching, and can thus avoid re-creating a fallback font over and over again during parsing. (Thece changes necessitated the previous patch, since otherwise breakage could occur e.g. with fake workers.)	2020-01-16 15:12:05 +01:00
Jonas Jenwald	36881e3770	Ensure that all `import` and `require` statements, in the entire code-base, have a `.js` file extension In order to eventually get rid of SystemJS and start using native `import`s instead, we'll need to provide "complete" file identifiers since otherwise there'll be MIME type errors when attempting to use `import`.	2020-01-04 13:01:43 +01:00
Jonas Jenwald	a63f7ad486	Fix the linting errors, from the Prettier auto-formatting, that ESLint `--fix` couldn't handle This patch makes the follow changes: - Remove no longer necessary inline `// eslint-disable-...` comments. - Fix `// eslint-disable-...` comments that Prettier moved down, thus causing new linting errors. - Concatenate strings which now fit on just one line. - Fix comments that are now too long. - Finally, and most importantly, adjust comments that Prettier moved down, since the new positions often is confusing or outright wrong.	2019-12-26 12:35:12 +01:00
Jonas Jenwald	de36b2aaba	Enable auto-formatting of the entire code-base using Prettier (issue 11444) Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes). Prettier is being used for a couple of reasons: - To be consistent with `mozilla-central`, where Prettier is already in use across the tree. - To ensure a consistent coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters. Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some). Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that comments won't become too long. Please note: This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a separate commit. (On a more personal note, I'll readily admit that some of the changes Prettier makes are extremely ugly. However, in the name of consistency we'll probably have to live with that.)	2019-12-26 12:34:24 +01:00
Jonas Jenwald	835d8c2be5	Allow skipping of errors when parsing broken/unsupported ColorSpaces (issue 6707, issue 11287) This will allow us to attempt to recover as much as possible of a page, rather than immediately failing, when a broken/unsupported ColorSpace is encountered. This patch thus extends the framework added in PRs such as e.g. 8240 and 8922, to also cover parsing of ColorSpaces.	2019-11-01 09:01:24 +01:00
Jonas Jenwald	0496ea61f5	Ensure that `PartialEvaluator.hasBlendModes` handles Blend Modes in Arrays (PR 11281 follow-up) I completely overlooked this in PR 11281, but you obviously need to make similar changes in `PartialEvaluator.hasBlendModes` since it will otherwise ignore valid Blend Modes.	2019-10-28 11:37:05 +01:00
Jonas Jenwald	5c266f0e8c	Support Blend Modes which are specified in an Array of Names (issue 11279) According to the specification, the first supported Blend Mode should be choosen in this case; please see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G10.4848607	2019-10-26 14:24:31 +02:00
Tim van der Meij	ca3a58f93a	Consistently use `@returns` for returned data types in JSDoc comments Sometimes we also used `@return`, but `@returns` is what the JSDoc documentation recommends. Even though `@return` works as an alias, it's good to use the recommended syntax and to be consistent within the project.	2019-10-13 13:58:17 +02:00
Jonas Jenwald	bfcbf2d78d	Cache processed 'ExtGState's in `PartialEvaluator.hasBlendModes` to avoid unnecessary parsing/lookups This simply extends the already existing caching of processed resources to avoid duplicated parsing of 'ExtGState's, which should help with badly generated PDF documents. This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf, with the following manifest file: ``` [ { "id": "issue6961", "file": "../web/pdfs/issue6961.pdf", "md5": "", "rounds": 200, "type": "eq" } ] ``` which gave the following overall results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 400 \| 1063 \| 1051 \| -12 \| -1.17 \| faster Firefox \| Page Request \| 400 \| 552 \| 543 \| -9 \| -1.69 \| faster Firefox \| Rendering \| 400 \| 511 \| 508 \| -3 \| -0.61 \| ``` and the following page-specific results: ``` -- Grouped By page, stat -- page \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ---- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- 0 \| Overall \| 200 \| 1122 \| 1110 \| -12 \| -1.03 \| 0 \| Page Request \| 200 \| 552 \| 544 \| -8 \| -1.48 \| faster 0 \| Rendering \| 200 \| 570 \| 566 \| -4 \| -0.62 \| 1 \| Overall \| 200 \| 1005 \| 992 \| -13 \| -1.33 \| faster 1 \| Page Request \| 200 \| 552 \| 542 \| -11 \| -1.91 \| faster 1 \| Rendering \| 200 \| 452 \| 450 \| -3 \| -0.61 \| ```	2019-10-12 12:35:42 +02:00

1 2 3 4 5 ...