Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	8137c0547d	Fix the `gStateObj` lookup in `TranslatedFont._removeType3ColorOperators` (PR 12718 follow-up) As can be seen in `2cba290361/src/core/evaluator.js (L986)` the `gStateObj` (which is actually an Array despite its name), is wrapped in Array when it's inserted into the OperatorList. Hence we obviously need to take this into account when accessing it in `TranslatedFont._removeType3ColorOperators`; this mistake happened because we don't have any test-cases for this particular code-path as far as I know.	2021-01-22 12:27:38 +01:00
calixteman	1039698697	Add a parser to get font data from the default appearance (#12831 ) * Add a parser to get font data from the default appearance - pdfium & poppler use a special parser too to get these info. * Update src/core/default_appearance.js Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com> Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>	2021-01-21 20:15:31 +01:00
Jonas Jenwald	78c32c2697	Improve the handling of errors, in `PartialEvaluator.loadFont`, occuring in `PartialEvaluator.preEvaluateFont` (issue 12823) Currently any errors thrown in `preEvaluateFont`, which is a synchronous method, will not be handled at all in the `loadFont` method and we were thus failing to return an `ErrorFont`-instance as intended here. Also, add an explicit check in `PartialEvaluator.preEvaluateFont` to ensure that Type0-fonts always have a valid dictionary.	2021-01-07 11:38:38 +01:00
Jonas Jenwald	67e5db75d8	Ignore color-operators in Type3 glyphs beginning with a `d1` operator (issue 12705) Please refer to the PDF specification at https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1977497 and https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.3998470 This patch removes the color-operators in the evaluator, since that should be more efficient than doing it repeatedly in the main-thread when rendering the Type3 glyphs.	2020-12-11 15:49:13 +01:00
Jonas Jenwald	082cd8fc6c	Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in `PartialEvaluator.hasBlendModes` The `PartialEvaluator.hasBlendModes` method is necessary to determine if there's any blend modes on a page, which unfortunately requires synchronous parsing of the /Resources of each page before its rendering can start (see the "StartRenderPage"-message). In practice it's not uncommon for certain /Resources-entries to be found on more than one page (referenced via the XRef-table), which thus leads to unnecessary re-fetching/re-parsing of data in `PartialEvaluator.hasBlendModes`. To improve performance, especially in pathological cases, we can cache /Resources-entries when it's absolutely clear that they do not contain any blend modes at all[1]. This way, subsequent `PartialEvaluator.hasBlendModes` calls can be made significantly more efficient. This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf: ``` [ { "id": "issue6961", "file": "../web/pdfs/issue6961.pdf", "md5": "a80e4357a8fda758d96c2c76f2980b03", "rounds": 100, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, page, stat -- browser \| page \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ---- \| ------------ \| ----- \| ------------ \| ----------- \| ---- \| ------ \| ------------- firefox \| 0 \| Overall \| 100 \| 1034 \| 555 \| -480 \| -46.39 \| faster firefox \| 0 \| Page Request \| 100 \| 489 \| 7 \| -482 \| -98.67 \| faster firefox \| 0 \| Rendering \| 100 \| 545 \| 548 \| 2 \| 0.45 \| firefox \| 1 \| Overall \| 100 \| 912 \| 428 \| -484 \| -53.06 \| faster firefox \| 1 \| Page Request \| 100 \| 487 \| 1 \| -486 \| -99.77 \| faster firefox \| 1 \| Rendering \| 100 \| 425 \| 427 \| 2 \| 0.51 \| ``` --- [1] In the case where blend modes are found, it becomes a lot more difficult to know if it's generally safe to skip /Resources-entries. Hence we don't cache anything in that case, however note that most document/pages do not utilize blend modes anyway.	2020-11-05 16:59:08 +01:00
Jonas Jenwald	46e94cad17	Fix some errors reported by the ESLint `no-useless-escape` rule This patch removes unnecessary escape-sequence in (mostly) strings, as a first step, since the ones in regular expressions probably requires more careful testing (just in case). The only exception is a regular expression in `src/core/annotation.js`, since we should have both unit- and reference-tests for this code and given [this information on MDN](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Character_Classes#Types): > Inside a character set, the dot loses its special meaning and matches a literal dot. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-useless-escape	2020-10-29 15:40:40 +01:00
Tim van der Meij	b4ca3d55b8	Merge pull request #12508 from calixteman/button_fallback_font Fallback font for buttons must be ZapfDingbats.	2020-10-24 18:56:12 +02:00
Jonas Jenwald	b478d3e7b9	Improve argument/name handling when parsing TilingPatterns (PR 12458 follow-up) - Handle the arguments correctly in `PartialEvaluator.handleColorN`. For TilingPatterns with a base-ColorSpace, we're currently using the `args` when computing the color. However, as can be seen we're passing the Array as-is to the `ColorSpace.getRgb` method, which means that the `Name` is included as well.[1] Thankfully this hasn't, as far as I know, caused any actual bugs, but that may be more luck than anything else given how the `ColorSpace` code is implemented. This can be easily fixed though, simply by popping the `Name`-object off of the `args` Array. - Cache TilingPatterns using the `Name`-string, rather than the object directly. This is not only consistent with other caches in `PartialEvaluator`, but importantly it also ensures that the cache lookup always works correctly. Note that since `Name`-objects, similar to other primitives, uses a cache themselves a manually triggered `cleanup`-call could thus (theoretically) cause the `LocalTilingPatternCache` to not find an existing entry. While the likelihood of this happening is extremely small, it's still something that we should fix. --- [1] The `args` Array can e.g. look like this: `[0.043, 0.09, 0.188, 0.004, /P1]`, which means that we're passing in the `Name`-object to the `ColorSpace` method.	2020-10-24 13:49:46 +02:00
Calixte Denizet	37c86b2daa	Fallback font for buttons must be ZapfDingbats. Fix bug https://bugzilla.mozilla.org/show_bug.cgi?id=1669099.	2020-10-24 12:00:03 +02:00
Jonas Jenwald	f956d0a96a	Stop caching the parsed Font data on its `Dict` object (PR 7347 follow-up) Given that all fonts are, ever since PR 7347, now cached in the "normal" `fontCache` there's actually no reason for the special `font.translated` construction. (Given how Objects in JavaScript are references, rather than raw values, the old code shouldn't have caused any significant memory overhead.) Instead we can simply store the `cacheKey`, which is a simple string, on only the Font `Dict`s where it's needed and thus look-up all fonts using the `fontCache` instead.	2020-10-16 17:45:01 +02:00
Jonas Jenwald	bc6b47a50e	Convert `PartialEvaluator.translateFont` to an `async` method This allows us to make a slight simplification in `PartialEvaluator.loadFont`, which thus removes an old TODO-comment from the method. Furthermore, in `PartialEvaluator.translateFont`, the CMap-handling is now limited to only composite fonts to avoid having to wait for a "dummy"-Promise for most fonts.	2020-10-15 09:42:58 +02:00
Jonas Jenwald	30e8d5dea1	Add local caching of TilingPatterns in `PartialEvaluator.getOperatorList` (issue 2765 and 8473) In practice it's not uncommon for PDF documents to re-use the same TilingPatterns more than once, and parsing them is essentially equal to parsing of a (small) page since a `getOperatorList` call is required. By caching the internal TilingPattern representation we can thus avoid having to re-parse the same data over and over, and there's also less asynchronous parsing required for repeated TilingPatterns. Initially I had intended to include (standard) benchmark results with this patch, however it's not entirely clear that this is actually necessary here given the preliminary results. When testing this manually in the development viewer, using `pdfBug=Stats`, the following (approximate) reduction in rendering times were observed when comparing `master` against this patch: - http://pubs.usgs.gov/sim/3067/pdf/sim3067sheet-2.pdf (from issue 2765): `6800 ms` -> `4100 ms`. - https://github.com/mozilla/pdf.js/files/1046131/stepped.pdf (from issue 8473): `54000 ms` -> `13000 ms` - https://github.com/mozilla/pdf.js/files/1046130/proof.pdf (from issue 8473): `5900 ms` -> `2500 ms` As always, whenever you're dealing with documents which are "slow", there's usually a certain level of subjectivity involved with regards to what's deemed acceptable performance. Hence it's not clear to me that we want to regard any of the referenced issues as fixed, however the improvements are significant enough to warrant caching of TilingPatterns in my opinion.	2020-10-08 18:43:21 +02:00
Jonas Jenwald	9416b14e8b	Re-factor how the ESLint `no-var` rule is enabled in the `src/` folder This simplifies/consolidates the ESLint configuration slightly in the `src/` folder, and prevents the addition of any new files where `var` is being used.[1] Hence we no longer need to manually add `/* eslint no-var: error */` in files, which is easy to forget, and can instead disable the rule in the `src/core/` files where `var` is still in use. --- [1] Obviously the `no-var` rule can, in the same way as every other rule, be disabled on a case-by-case basis where actually necessary.	2020-10-03 20:15:29 +02:00
Jonas Jenwald	784a420027	Add support, in `Dict.merge`, for merging of "sub"-dictionaries This allows for merging of dictionaries one level deeper than previously. This could be useful e.g. for /Resources dictionaries, where you want to e.g. merge their respective /Font dictionaries (and other) together rather than picking just the first one.	2020-08-30 23:18:32 +02:00
Jonas Jenwald	1058f16605	Add (basic) support for transfer functions to Images (issue 6931, bug 1149713) This is similar to the existing transfer function support for SMasks, but extended to simple image data. Please note that the extra amount of data now being sent to the worker-thread, for affected /ExtGState entries, is limited to at most 4 `Uint8Array`s each with a length of 256 elements. Refer to https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G9.1658137 for additional details.	2020-08-17 10:34:12 +02:00
Jonas Jenwald	9d3e046a4f	Don't cache /ExtGState entries that contain fonts (PR 12087 follow-up) I completely overlooked the fact that `PartialEvaluator.handleSetFont` also updates the current `state`, which means that currently we're not actually handling font data correctly for cached /ExtGState data. (Thankfully, using /ExtGState to set a font is somewhat rare in practice.)	2020-08-17 08:17:25 +02:00
Calixte Denizet	1747d259f9	Support textfield and choice widgets for printing	2020-08-06 14:45:23 +02:00
Brendan Dahl	ac494a2278	Add support for optional marked content. Add a new method to the API to get the optional content configuration. Add a new render task param that accepts the above configuration. For now, the optional content is not controllable by the user in the viewer, but renders with the default configuration in the PDF. All of the test files added exhibit different uses of optional content. Fixes #269. Fix test to work with optional content. - Change the stopAtErrors test to ensure the operator list has something, instead of asserting the exact number of operators.	2020-08-04 09:26:55 -07:00
Jonas Jenwald	835b5ffddd	Only check `isType3Font` the first time that `TranslatedFont.loadType3Data` is called If the `TranslatedFont.type3Loaded` property exists, then you already know that the font must be a Type3 one.	2020-07-27 13:20:15 +02:00
Jonas Jenwald	f3ff526019	Send/receive Type3 images the same way as other globally-cached images There's quite frankly no particular reason to special-case Type3-fonts with image resources, which are very rare anyway, now that we have a general mechanism for sending/receiving images globally.	2020-07-27 13:20:15 +02:00
Jonas Jenwald	7c9d0d5939	Improve how Type3-fonts with dependencies are handled While the `CharProcs` streams of Type3-fonts usually don't rely on dependencies, such as e.g. images, it does happen in some cases. Currently any dependencies are simply appended to the parent operatorList, which in practice means only the operatorList of the first page where the Type3-font is being used. However, there's one thing that's slightly unfortunate with that approach: Since fonts are global to the PDF document, we really ought to ensure that any Type3 dependencies are appended to the operatorList of all pages where the Type3-font is being used. Otherwise there's a theoretical risk that, if one page has its rendering paused, another page may try to use a Type3-font whose dependencies are not yet fully resolved. In that case there would be errors, since Type3 operatorLists are executed synchronously. Hence this patch, which ensures that all relevant pages will have Type3 dependencies appended to the main operatorList. (Note here that the `OperatorList.addDependencies` method, via `OperatorList.addDependency`, ensures that a dependency is only added once to any operatorList.) Finally, these changes also remove the need for the "waiting for the main-thread"-hack that was added to `PartialEvaluator.buildPaintImageXObject` as part of fixing issue 10717.	2020-07-27 13:20:13 +02:00
Jonas Jenwald	ea8e432c45	Add a `getRawValues` method, to `Dict` instances, to provide an easier way of getting all raw values When the old `Dict.getAll()` method was removed, it was replaced with a `Dict.getKeys()` call and `Dict.get(...)` calls (in a loop). While this pattern obviously makes a lot of sense in many cases, there's some instances where we actually want the raw `Dict` values (i.e. `Ref`s where applicable). In those cases, `Dict.getRaw(...)` calls are instead used within the loop. However, by introducing a new `Dict.getRawValues()` method we can reduce the number of (strictly unnecessary) function calls by simply getting the raw `Dict` values directly.	2020-07-17 16:32:00 +02:00
Tim van der Meij	e63d1ebff5	Merge pull request #12087 from Snuffleupagus/LocalGStateCache Add local caching of "simple" Graphics State (ExtGState) data in `PartialEvaluator.{getOperatorList, getTextContent}` (issue 2813)	2020-07-17 16:02:45 +02:00
Jonas Jenwald	b3480842b3	Use a `RefSet`, rather than a plain Object, for tracking already processed nodes in `PartialEvaluator.hasBlendModes`	2020-07-17 09:52:36 +02:00
Jonas Jenwald	03547b5633	Change `PartialEvaluator.setGState` to an `async` method Since this method calls `Dict.get` to fetch data, there could thus be `Error`s thrown in corrupt PDF documents when attempting to resolve an indirect object. To ensure that this won't ever become a problem, we change the method to be `async` such that a rejected Promise would be returned and general OperatorList parsing won't break.	2020-07-15 14:27:18 +02:00
Jonas Jenwald	f20aeb9343	Slightly simplify the code in `PartialEvaluator.hasBlendModes`, e.g. by using `for...of` loops - Replace the existing loops with `for...of` variants instead. - Make use of `continue`, to reduce indentation and to make the code (slightly) easier to follow, when checking `/Resources` entries.	2020-07-15 12:47:11 +02:00
Jonas Jenwald	15fa3f8518	Remove a redundant `/XObject` stream dictionary `objId` check in `PartialEvaluator.hasBlendModes` (PR 6971 follow-up) This case should no longer happen, given the `instanceof Ref` branch just above (added in PR 6971). Also, I've run the entire test-suite locally with `continue` replaced by `throw new Error(...)` and didn't find any problems.	2020-07-15 12:47:11 +02:00
Jonas Jenwald	84476da26e	Handle lookup errors "silently" in `PartialEvaluator.hasBlendModes` (PR 11680 follow-up) Given that this method is used during what's essentially a pre-parsing stage, before the actual OperatorList parsing occurs, on second thought it doesn't seem at all necessary to warn and trigger fallback in cases where there's lookup errors. Please note: Any any errors will still be either suppressed or thrown, according to the `ignoreErrors` option, during the actual OperatorList parsing.	2020-07-15 12:47:07 +02:00
Jonas Jenwald	981ff41b5f	Add local caching of non-font Graphics State (ExtGState) data in `PartialEvaluator.getTextContent` It turns out that `getTextContent` suffers from similar problems with repeated GStates as `getOperatorList`; please see the previous patch. While only `/ExtGState` resources containing Fonts will actually be parsed by `PartialEvaluator.getTextContent`, we're still forced to fetch/validate repeated `/ExtGState` resources even though most of them won't affect the textContent (since they mostly contain purely graphical state). With these changes we also no longer need to immediately reset the current text-state when encountering a `setGState` operator, which may thus improve text-selection in some cases.	2020-07-14 10:34:43 +02:00
Jonas Jenwald	90eb579713	Add local caching of "simple" Graphics State (ExtGState) data in `PartialEvaluator.getOperatorList` (issue 2813) This patch will help pathological cases the most, with issue 2813 being a particularily problematic example. While there's only four `/ExtGState` resources, there's a total `29062` of `setGState` operators. Even though parsing of a single `/ExtGState` resource is quite fast, having to re-parse them thousands of times does add up quite significantly. For simplicity we'll only cache "simple" `/ExtGState` resource, since e.g. the general `SMask` case cannot be easily cached (without re-factoring other code, which may have undesirable effects on general parsing). By caching "simple" `/ExtGState` resource, we thus improve performance by: - Not having to fetch/validate/parse the same `/ExtGState` data over and over. - Handling of repeated `setGState` operators becomes synchronous during the `OperatorList` building, instead of having to defer to the event-loop/microtask-queue since the `/ExtGState` parsing is done asynchronously. --- Obviously I had intended to include (standard) benchmark results with this patch, but for reasons I don't understand the test run-time (even with `master`) of the document in issue 2813 is a lot slower than in the development viewer (making normal benchmarking infeasible). However, testing this manually in the development viewer (using `pdfBug=Stats`) shows a reduction of `~10 %` in the rendering time of the PDF document in issue 2813.	2020-07-14 10:34:43 +02:00
Jonas Jenwald	d4d7ac1b88	Stop special-casing the (very unlikely) "no `/XObject` found"-scenario, when parsing `OPS.paintXObject` operators, in `PartialEvaluator.{getOperatorList, getTextContent}` Originally there weren't any (generally) good ways to handle errors gracefully, on the worker-side, however that's no longer the case and we can simply fallback to the existing `ignoreErrors` functionality instead. Also, please note that the "no `/XObject` found"-scenario should be extremely unlikely in practice and would only occur in corrupt/broken documents. Note that the `PartialEvaluator.getOperatorList` case is especially bad currently, since we'll simply (attempt to) send the data as-is to the main-thread. This is quite bad, since in a corrupt/broken document the data could contain anything and e.g. be unclonable (which would cause breaking errors). Also, we're (obviously) not attempting to do anything with this "raw" `OPS.paintXObject` data on the main-thread and simply ensuring that we never send it definately seems like the correct approach.	2020-07-12 21:59:59 +02:00
Jonas Jenwald	4cc6797f17	Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it Note how the `getFontID`-method in `src/core/fonts.js` is completely global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the same PDF document the `fontID`s will still be incremented continuously. For comparison the `createObjId` method, on `idFactory`, will always create a consistent id, assuming of course that the document and its pages are parsed/rendered in the same order. In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.) Please note: Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.	2020-07-07 16:33:31 +02:00
Jonas Jenwald	c95fbb6e21	Convert the code in `src/core/evaluator.js` to use standard classes This removes additional `// eslint-disable-next-line no-shadow` usage, which our old pseudo-classes necessitated. Most of the re-formatting changes, after the `class` definitions and methods were fixed, were done automatically by Prettier. Please note: I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.	2020-07-05 16:01:04 +02:00
Jonas Jenwald	32a0b6fa73	Move some constants and helper functions out of the `PartialEvaluator` closure This will simplify the `class` conversion in the next patch, and with modern JavaScript the moved code is still limited to the current module scope. Please note: For improved consistency with our usual formatting, the `TILING_PATTERN`/`SHADING_PATTERN` constants where re-factored slightly.	2020-07-05 15:56:23 +02:00
Jonas Jenwald	ca719ecaa4	Add local caching of `Function`s, by reference, in the `PDFFunctionFactory` (issue 2541) Note that compared other structures, such as e.g. Images and ColorSpaces, `Function`s are not referred to by name, which however does bring the advantage of being able to share the cache for an entire page. Furthermore, similar to ColorSpaces, the parsing of individual `Function`s are generally fast enough to not really warrant trying to cache them in any "smarter" way than by reference. (Hence trying to do caching similar to e.g. Fonts would most likely be a losing proposition, given the amount of data lookup/parsing that'd be required.) Originally I tried implementing this similar to e.g. the recently added ColorSpace caching (and in a couple of different ways), however it unfortunately turned out to be quite ugly/unwieldy given the sheer number of functions/methods where you'd thus need to pass in a `LocalFunctionCache` instance. (Also, the affected functions/methods didn't exactly have short signatures as-is.) After going back and forth on this for a while it seemed to me that the simplest, or least "invasive" if you will, solution would be if each `PartialEvaluator` instance had its own `PDFFunctionFactory` instance (since the latter is already passed to all of the required code). This way each `PDFFunctionFactory` instances could have a local `Function` cache, without it being necessary to provide a `LocalFunctionCache` instance manually at every `PDFFunctionFactory.{create, createFromArray}` call-site. Obviously, with this patch, there's now (potentially) more `PDFFunctionFactory` instances than before when the entire document shared just one. However, each such instance is really quite small and it's also tied to a `PartialEvaluator` instance and those are not kept alive and/or cached. To reduce the impact of these changes, I've tried to make as many of these structures as possible lazily initialized, specifically: - The `PDFFunctionFactory`, on `PartialEvaluator` instances, since not all kinds of general parsing actually requires it. For example: `getTextContent` calls won't cause any `Function` to be parsed, and even some `getOperatorList` calls won't trigger `Function` parsing (if a page contains e.g. no Patterns or "complex" ColorSpaces). - The `LocalFunctionCache`, on `PDFFunctionFactory` instances, since only certain parsing requires it. Generally speaking, only e.g. Patterns, "complex" ColorSpaces, and/or (some) SoftMasks will trigger any `Function` parsing. To put these changes into perspective, when loading/rendering all (14) pages of the default `tracemonkey.pdf` file there's now a total of 6 `PDFFunctionFactory` and 1 `LocalFunctionCache` instances created thanks to the lazy initialization. (If you instead would keep the document-"global" `PDFFunctionFactory` instance and pass around `LocalFunctionCache` instances everywhere, the numbers for the `tracemonkey.pdf` file would be instead be something like 1 `PDFFunctionFactory` and 6 `LocalFunctionCache` instances.) All-in-all, I thus don't think that the `PDFFunctionFactory` changes should be generally problematic. With these changes, we can also modify (some) call-sites to pass in a `Reference` rather than the actual `Function` data. This is nice since `Function`s can also be `Streams`, which are not cached on the `XRef` instance (given their potential size), and this way we can avoid unnecessary lookups and thus save some additional time/resources. Obviously I had intended to include (standard) benchmark results with these changes, but for reasons I don't really understand the test run-time (even with `master`) of the document in issue 2541 is quite a bit slower than in the development viewer. However, logging the time it takes for the relevant `PDFFunctionFactory`/`PDFFunction ` parsing shows that it takes approximately `0.5 ms` for the `Function` in question. Looking up a cached `Function`, on the other hand, is one order of magnitude faster which does add up when the same `Function` is invoked close to 2000 times.	2020-07-04 00:55:18 +02:00
Jonas Jenwald	19d7976483	Improve (local) caching of parsed `ColorSpace`s (PR 12001 follow-up) This patch contains the following notable improvements: - Changes the `ColorSpace.parse` call-sites to, where possible, pass in a reference rather than actual ColorSpace data (necessary for the next point). - Adds (local) caching of `ColorSpace`s by `Ref`, when applicable, in addition the caching by name. This (generally) improves `ColorSpace` caching for e.g. the SMask code-paths. - Extends the (local) `ColorSpace` caching to also apply when handling Images and Patterns, thus further reducing unneeded re-parsing. - Adds a new `ColorSpace.parseAsync` method, almost identical to the existing `ColorSpace.parse` one, but returning a Promise instead (this simplifies some code in the `PartialEvaluator`).	2020-06-24 23:53:10 +02:00
Jonas Jenwald	51e87b9248	Add a proper `LocalColorSpaceCache` class, rather than piggybacking on the image one (PR 12001 follow-up) This will allow caching of ColorSpaces by either `Name` or `Ref`, which doesn't really make sense for images, thus allowing (better) caching for ColorSpaces used with e.g. Images and Patterns.	2020-06-24 23:53:10 +02:00
Jonas Jenwald	e22bc483a5	Re-factor `ColorSpace.parse` to take a parameter object, rather than a bunch of (randomly) ordered parameters Given the number of existing parameters, this will avoid needlessly unwieldy call-sites especially with upcoming changes in later patches.	2020-06-24 23:53:10 +02:00
Jonas Jenwald	f0708717a9	Move the `fetchBuiltInCMap` method to the `PartialEvaluator.prototype` Defining this inline in the "constructor" looks slightly weird (I really don't know why I wrote it like that originally), and it can simply be changed to a regular method instead.	2020-06-24 17:29:47 +02:00
Jonas Jenwald	5c39de805c	Add local caching of `ColorSpace`s, by name, in `PartialEvaluator.getOperatorList` (issue 2504) By caching parsed `ColorSpace`s, we thus don't need to re-parse the same data over and over which saves CPU cycles and reduces peak memory usage. (Obviously persistent memory usage may increase a tiny bit, but since the caching is done per `PartialEvaluator.getOperatorList` invocation and given that `ColorSpace` instances generally hold very little data this shouldn't be much of an issue.) Furthermore, by caching `ColorSpace`s we can also lookup the already parsed ones synchronously during the `OperatorList` building, instead of having to defer to the event loop/microtask queue since the parsing is done asynchronously (such that error handling is easier). Possible future improvements: - Cache/lookup parsed `ColorSpaces` used in `Pattern`s and `Image`s. - Attempt to cache local `ColorSpace`s by reference as well, in addition to only by name, assuming that there's documents where that would be beneficial and that it's not too difficult to implement. - Assuming there's documents that would benefit from it, also cache repeated `ColorSpace`s globally as well. Given that we've never, until now, been doing any caching of parsed `ColorSpace`s and that even using a simple name-only local cache helps tremendously in pathological cases, I purposely decided against complicating the implementation too much initially. Also, compared to parsing of `Image`s, simply creating a `ColorSpace` instance isn't that expensive (hence I'd be somewhat surprised if adding a global cache would help much). --- This patch was tested using: - The default `tracemonkey` PDF file, which was included mostly to show that "normal" documents aren't negatively affected by these changes. - The PDF file from issue 2504, i.e. https://dl-ctlg.panasonic.com/jp/manual/sd/sd_rbm1000_0.pdf, where most pages will switch thousands of times between a handful of `ColorSpace`s. with the following manifest file: ``` [ { "id": "tracemonkey", "file": "pdfs/tracemonkey.pdf", "md5": "9a192d8b1a7dc652a19835f6f08098bd", "rounds": 100, "type": "eq" }, { "id": "issue2504", "file": "../web/pdfs/issue2504.pdf", "md5": "", "rounds": 20, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: - Overall ``` -- Grouped By browser, pdf, stat -- browser \| pdf \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ----------- \| ------------ \| ----- \| ------------ \| ----------- \| ---- \| ------ \| ------------- firefox \| issue2504 \| Overall \| 640 \| 977 \| 497 \| -479 \| -49.08 \| faster firefox \| issue2504 \| Page Request \| 640 \| 3 \| 4 \| 1 \| 59.18 \| firefox \| issue2504 \| Rendering \| 640 \| 974 \| 493 \| -481 \| -49.37 \| faster firefox \| tracemonkey \| Overall \| 1400 \| 116 \| 111 \| -5 \| -4.43 \| firefox \| tracemonkey \| Page Request \| 1400 \| 2 \| 2 \| 0 \| -2.86 \| firefox \| tracemonkey \| Rendering \| 1400 \| 114 \| 109 \| -5 \| -4.47 \| ``` - Page-specific ``` -- Grouped By browser, pdf, page, stat -- browser \| pdf \| page \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ----------- \| ---- \| ------------ \| ----- \| ------------ \| ----------- \| ----- \| ------- \| ------------- firefox \| issue2504 \| 0 \| Overall \| 20 \| 2295 \| 1268 \| -1027 \| -44.76 \| faster firefox \| issue2504 \| 0 \| Page Request \| 20 \| 6 \| 7 \| 1 \| 15.32 \| firefox \| issue2504 \| 0 \| Rendering \| 20 \| 2288 \| 1260 \| -1028 \| -44.93 \| faster firefox \| issue2504 \| 1 \| Overall \| 20 \| 3059 \| 2806 \| -252 \| -8.25 \| faster firefox \| issue2504 \| 1 \| Page Request \| 20 \| 11 \| 14 \| 3 \| 23.25 \| slower firefox \| issue2504 \| 1 \| Rendering \| 20 \| 3047 \| 2792 \| -255 \| -8.37 \| faster firefox \| issue2504 \| 2 \| Overall \| 20 \| 411 \| 295 \| -116 \| -28.20 \| faster firefox \| issue2504 \| 2 \| Page Request \| 20 \| 2 \| 42 \| 40 \| 1897.62 \| firefox \| issue2504 \| 2 \| Rendering \| 20 \| 409 \| 253 \| -156 \| -38.09 \| faster firefox \| issue2504 \| 3 \| Overall \| 20 \| 736 \| 299 \| -437 \| -59.34 \| faster firefox \| issue2504 \| 3 \| Page Request \| 20 \| 2 \| 2 \| 0 \| 0.00 \| firefox \| issue2504 \| 3 \| Rendering \| 20 \| 734 \| 297 \| -437 \| -59.49 \| faster firefox \| issue2504 \| 4 \| Overall \| 20 \| 356 \| 458 \| 102 \| 28.63 \| firefox \| issue2504 \| 4 \| Page Request \| 20 \| 1 \| 2 \| 1 \| 57.14 \| slower firefox \| issue2504 \| 4 \| Rendering \| 20 \| 354 \| 455 \| 101 \| 28.53 \| firefox \| issue2504 \| 5 \| Overall \| 20 \| 1381 \| 765 \| -616 \| -44.59 \| faster firefox \| issue2504 \| 5 \| Page Request \| 20 \| 3 \| 5 \| 2 \| 50.00 \| slower firefox \| issue2504 \| 5 \| Rendering \| 20 \| 1378 \| 760 \| -617 \| -44.81 \| faster firefox \| issue2504 \| 6 \| Overall \| 20 \| 757 \| 299 \| -459 \| -60.57 \| faster firefox \| issue2504 \| 6 \| Page Request \| 20 \| 2 \| 5 \| 3 \| 150.00 \| slower firefox \| issue2504 \| 6 \| Rendering \| 20 \| 755 \| 294 \| -462 \| -61.11 \| faster firefox \| issue2504 \| 7 \| Overall \| 20 \| 394 \| 302 \| -92 \| -23.39 \| faster firefox \| issue2504 \| 7 \| Page Request \| 20 \| 2 \| 1 \| -1 \| -34.88 \| faster firefox \| issue2504 \| 7 \| Rendering \| 20 \| 392 \| 301 \| -91 \| -23.32 \| faster firefox \| issue2504 \| 8 \| Overall \| 20 \| 2875 \| 979 \| -1896 \| -65.95 \| faster firefox \| issue2504 \| 8 \| Page Request \| 20 \| 1 \| 2 \| 0 \| 11.11 \| firefox \| issue2504 \| 8 \| Rendering \| 20 \| 2874 \| 978 \| -1896 \| -65.99 \| faster firefox \| issue2504 \| 9 \| Overall \| 20 \| 700 \| 332 \| -368 \| -52.60 \| faster firefox \| issue2504 \| 9 \| Page Request \| 20 \| 3 \| 2 \| 0 \| -4.00 \| firefox \| issue2504 \| 9 \| Rendering \| 20 \| 698 \| 329 \| -368 \| -52.78 \| faster firefox \| issue2504 \| 10 \| Overall \| 20 \| 3296 \| 926 \| -2370 \| -71.91 \| faster firefox \| issue2504 \| 10 \| Page Request \| 20 \| 2 \| 2 \| 0 \| -18.75 \| firefox \| issue2504 \| 10 \| Rendering \| 20 \| 3293 \| 924 \| -2370 \| -71.96 \| faster firefox \| issue2504 \| 11 \| Overall \| 20 \| 524 \| 197 \| -327 \| -62.34 \| faster firefox \| issue2504 \| 11 \| Page Request \| 20 \| 2 \| 3 \| 1 \| 58.54 \| firefox \| issue2504 \| 11 \| Rendering \| 20 \| 522 \| 194 \| -328 \| -62.81 \| faster firefox \| issue2504 \| 12 \| Overall \| 20 \| 752 \| 369 \| -384 \| -50.98 \| faster firefox \| issue2504 \| 12 \| Page Request \| 20 \| 3 \| 2 \| -1 \| -36.51 \| faster firefox \| issue2504 \| 12 \| Rendering \| 20 \| 749 \| 367 \| -382 \| -51.05 \| faster firefox \| issue2504 \| 13 \| Overall \| 20 \| 679 \| 487 \| -193 \| -28.38 \| faster firefox \| issue2504 \| 13 \| Page Request \| 20 \| 4 \| 2 \| -2 \| -48.68 \| faster firefox \| issue2504 \| 13 \| Rendering \| 20 \| 676 \| 485 \| -191 \| -28.28 \| faster firefox \| issue2504 \| 14 \| Overall \| 20 \| 474 \| 283 \| -191 \| -40.26 \| faster firefox \| issue2504 \| 14 \| Page Request \| 20 \| 2 \| 4 \| 2 \| 78.57 \| firefox \| issue2504 \| 14 \| Rendering \| 20 \| 471 \| 279 \| -192 \| -40.79 \| faster firefox \| issue2504 \| 15 \| Overall \| 20 \| 860 \| 618 \| -241 \| -28.05 \| faster firefox \| issue2504 \| 15 \| Page Request \| 20 \| 2 \| 3 \| 0 \| 10.87 \| firefox \| issue2504 \| 15 \| Rendering \| 20 \| 857 \| 616 \| -241 \| -28.15 \| faster firefox \| issue2504 \| 16 \| Overall \| 20 \| 389 \| 243 \| -147 \| -37.71 \| faster firefox \| issue2504 \| 16 \| Page Request \| 20 \| 2 \| 2 \| 0 \| 2.33 \| firefox \| issue2504 \| 16 \| Rendering \| 20 \| 387 \| 240 \| -147 \| -37.94 \| faster firefox \| issue2504 \| 17 \| Overall \| 20 \| 1484 \| 672 \| -812 \| -54.70 \| faster firefox \| issue2504 \| 17 \| Page Request \| 20 \| 2 \| 3 \| 1 \| 37.21 \| firefox \| issue2504 \| 17 \| Rendering \| 20 \| 1482 \| 669 \| -812 \| -54.84 \| faster firefox \| issue2504 \| 18 \| Overall \| 20 \| 575 \| 252 \| -323 \| -56.12 \| faster firefox \| issue2504 \| 18 \| Page Request \| 20 \| 2 \| 2 \| 0 \| -16.22 \| firefox \| issue2504 \| 18 \| Rendering \| 20 \| 573 \| 251 \| -322 \| -56.24 \| faster firefox \| issue2504 \| 19 \| Overall \| 20 \| 517 \| 227 \| -290 \| -56.08 \| faster firefox \| issue2504 \| 19 \| Page Request \| 20 \| 2 \| 2 \| 0 \| 21.62 \| firefox \| issue2504 \| 19 \| Rendering \| 20 \| 515 \| 225 \| -290 \| -56.37 \| faster firefox \| issue2504 \| 20 \| Overall \| 20 \| 668 \| 670 \| 2 \| 0.31 \| firefox \| issue2504 \| 20 \| Page Request \| 20 \| 4 \| 2 \| -1 \| -34.29 \| firefox \| issue2504 \| 20 \| Rendering \| 20 \| 664 \| 667 \| 3 \| 0.49 \| firefox \| issue2504 \| 21 \| Overall \| 20 \| 486 \| 309 \| -177 \| -36.44 \| faster firefox \| issue2504 \| 21 \| Page Request \| 20 \| 2 \| 2 \| 0 \| 16.13 \| firefox \| issue2504 \| 21 \| Rendering \| 20 \| 484 \| 307 \| -177 \| -36.60 \| faster firefox \| issue2504 \| 22 \| Overall \| 20 \| 543 \| 267 \| -276 \| -50.85 \| faster firefox \| issue2504 \| 22 \| Page Request \| 20 \| 2 \| 2 \| 0 \| 10.26 \| firefox \| issue2504 \| 22 \| Rendering \| 20 \| 541 \| 265 \| -276 \| -51.07 \| faster firefox \| issue2504 \| 23 \| Overall \| 20 \| 3246 \| 871 \| -2375 \| -73.17 \| faster firefox \| issue2504 \| 23 \| Page Request \| 20 \| 2 \| 3 \| 1 \| 37.21 \| firefox \| issue2504 \| 23 \| Rendering \| 20 \| 3243 \| 868 \| -2376 \| -73.25 \| faster firefox \| issue2504 \| 24 \| Overall \| 20 \| 379 \| 156 \| -223 \| -58.83 \| faster firefox \| issue2504 \| 24 \| Page Request \| 20 \| 2 \| 2 \| 0 \| -2.86 \| firefox \| issue2504 \| 24 \| Rendering \| 20 \| 378 \| 154 \| -223 \| -59.10 \| faster firefox \| issue2504 \| 25 \| Overall \| 20 \| 176 \| 127 \| -50 \| -28.19 \| faster firefox \| issue2504 \| 25 \| Page Request \| 20 \| 2 \| 1 \| 0 \| -15.63 \| firefox \| issue2504 \| 25 \| Rendering \| 20 \| 175 \| 125 \| -49 \| -28.31 \| faster firefox \| issue2504 \| 26 \| Overall \| 20 \| 181 \| 108 \| -74 \| -40.67 \| faster firefox \| issue2504 \| 26 \| Page Request \| 20 \| 3 \| 2 \| -1 \| -39.13 \| faster firefox \| issue2504 \| 26 \| Rendering \| 20 \| 178 \| 105 \| -72 \| -40.69 \| faster firefox \| issue2504 \| 27 \| Overall \| 20 \| 208 \| 104 \| -104 \| -49.92 \| faster firefox \| issue2504 \| 27 \| Page Request \| 20 \| 2 \| 2 \| 1 \| 48.39 \| firefox \| issue2504 \| 27 \| Rendering \| 20 \| 206 \| 102 \| -104 \| -50.64 \| faster firefox \| issue2504 \| 28 \| Overall \| 20 \| 241 \| 111 \| -131 \| -54.16 \| faster firefox \| issue2504 \| 28 \| Page Request \| 20 \| 2 \| 2 \| -1 \| -33.33 \| firefox \| issue2504 \| 28 \| Rendering \| 20 \| 239 \| 109 \| -130 \| -54.39 \| faster firefox \| issue2504 \| 29 \| Overall \| 20 \| 321 \| 196 \| -125 \| -39.05 \| faster firefox \| issue2504 \| 29 \| Page Request \| 20 \| 1 \| 2 \| 0 \| 17.86 \| firefox \| issue2504 \| 29 \| Rendering \| 20 \| 319 \| 194 \| -126 \| -39.35 \| faster firefox \| issue2504 \| 30 \| Overall \| 20 \| 651 \| 271 \| -380 \| -58.41 \| faster firefox \| issue2504 \| 30 \| Page Request \| 20 \| 1 \| 2 \| 1 \| 50.00 \| firefox \| issue2504 \| 30 \| Rendering \| 20 \| 649 \| 269 \| -381 \| -58.60 \| faster firefox \| issue2504 \| 31 \| Overall \| 20 \| 1635 \| 647 \| -988 \| -60.42 \| faster firefox \| issue2504 \| 31 \| Page Request \| 20 \| 1 \| 2 \| 0 \| 30.43 \| firefox \| issue2504 \| 31 \| Rendering \| 20 \| 1634 \| 645 \| -988 \| -60.49 \| faster firefox \| tracemonkey \| 0 \| Overall \| 100 \| 51 \| 51 \| 0 \| 0.02 \| firefox \| tracemonkey \| 0 \| Page Request \| 100 \| 1 \| 1 \| 0 \| -4.76 \| firefox \| tracemonkey \| 0 \| Rendering \| 100 \| 50 \| 50 \| 0 \| 0.12 \| firefox \| tracemonkey \| 1 \| Overall \| 100 \| 97 \| 91 \| -5 \| -5.52 \| faster firefox \| tracemonkey \| 1 \| Page Request \| 100 \| 3 \| 3 \| 0 \| -1.32 \| firefox \| tracemonkey \| 1 \| Rendering \| 100 \| 94 \| 88 \| -5 \| -5.73 \| faster firefox \| tracemonkey \| 2 \| Overall \| 100 \| 40 \| 40 \| 0 \| 0.50 \| firefox \| tracemonkey \| 2 \| Page Request \| 100 \| 1 \| 1 \| 0 \| 3.16 \| firefox \| tracemonkey \| 2 \| Rendering \| 100 \| 39 \| 39 \| 0 \| 0.54 \| firefox \| tracemonkey \| 3 \| Overall \| 100 \| 62 \| 62 \| -1 \| -0.94 \| firefox \| tracemonkey \| 3 \| Page Request \| 100 \| 1 \| 1 \| 0 \| 17.05 \| firefox \| tracemonkey \| 3 \| Rendering \| 100 \| 61 \| 61 \| -1 \| -1.11 \| firefox \| tracemonkey \| 4 \| Overall \| 100 \| 56 \| 58 \| 2 \| 3.41 \| firefox \| tracemonkey \| 4 \| Page Request \| 100 \| 1 \| 1 \| 0 \| 15.31 \| firefox \| tracemonkey \| 4 \| Rendering \| 100 \| 55 \| 57 \| 2 \| 3.23 \| firefox \| tracemonkey \| 5 \| Overall \| 100 \| 73 \| 71 \| -2 \| -2.28 \| firefox \| tracemonkey \| 5 \| Page Request \| 100 \| 2 \| 2 \| 0 \| 12.20 \| firefox \| tracemonkey \| 5 \| Rendering \| 100 \| 71 \| 69 \| -2 \| -2.69 \| firefox \| tracemonkey \| 6 \| Overall \| 100 \| 85 \| 69 \| -16 \| -18.73 \| faster firefox \| tracemonkey \| 6 \| Page Request \| 100 \| 2 \| 2 \| 0 \| -9.90 \| firefox \| tracemonkey \| 6 \| Rendering \| 100 \| 83 \| 67 \| -16 \| -18.97 \| faster firefox \| tracemonkey \| 7 \| Overall \| 100 \| 65 \| 64 \| 0 \| -0.37 \| firefox \| tracemonkey \| 7 \| Page Request \| 100 \| 1 \| 1 \| 0 \| -11.94 \| firefox \| tracemonkey \| 7 \| Rendering \| 100 \| 63 \| 63 \| 0 \| -0.05 \| firefox \| tracemonkey \| 8 \| Overall \| 100 \| 53 \| 54 \| 1 \| 2.04 \| firefox \| tracemonkey \| 8 \| Page Request \| 100 \| 1 \| 1 \| 0 \| 17.02 \| firefox \| tracemonkey \| 8 \| Rendering \| 100 \| 52 \| 53 \| 1 \| 1.82 \| firefox \| tracemonkey \| 9 \| Overall \| 100 \| 79 \| 73 \| -6 \| -7.86 \| faster firefox \| tracemonkey \| 9 \| Page Request \| 100 \| 2 \| 2 \| 0 \| -15.14 \| firefox \| tracemonkey \| 9 \| Rendering \| 100 \| 77 \| 71 \| -6 \| -7.86 \| faster firefox \| tracemonkey \| 10 \| Overall \| 100 \| 545 \| 519 \| -27 \| -4.86 \| faster firefox \| tracemonkey \| 10 \| Page Request \| 100 \| 14 \| 13 \| 0 \| -3.56 \| firefox \| tracemonkey \| 10 \| Rendering \| 100 \| 532 \| 506 \| -26 \| -4.90 \| faster firefox \| tracemonkey \| 11 \| Overall \| 100 \| 42 \| 41 \| -1 \| -2.50 \| firefox \| tracemonkey \| 11 \| Page Request \| 100 \| 1 \| 1 \| 0 \| -27.42 \| faster firefox \| tracemonkey \| 11 \| Rendering \| 100 \| 41 \| 40 \| -1 \| -1.75 \| firefox \| tracemonkey \| 12 \| Overall \| 100 \| 350 \| 332 \| -18 \| -5.16 \| faster firefox \| tracemonkey \| 12 \| Page Request \| 100 \| 3 \| 3 \| 0 \| -5.17 \| firefox \| tracemonkey \| 12 \| Rendering \| 100 \| 347 \| 329 \| -18 \| -5.15 \| faster firefox \| tracemonkey \| 13 \| Overall \| 100 \| 31 \| 31 \| 0 \| 0.52 \| firefox \| tracemonkey \| 13 \| Page Request \| 100 \| 1 \| 1 \| 0 \| 4.95 \| firefox \| tracemonkey \| 13 \| Rendering \| 100 \| 30 \| 30 \| 0 \| 0.20 \| ```	2020-06-14 11:51:45 +02:00
Jonas Jenwald	4b51bcc733	Ensure that `PDFImage.buildImage` won't accidentally swallow errors, e.g. from ColorSpace parsing (issue 6707, PR 11601 follow-up) Because of a really stupid `Promise`-related mistake on my part, when re-factoring `PDFImage.buildImage` during the `NativeImageDecoder` removal, we're no longer re-throwing errors occuring during image parsing/decoding as intended. The result is that some (fairly) corrupt documents will never finish loading, and unfortunately there were apparently no sufficiently corrupt images in the test-suite to catch this.	2020-06-13 15:02:37 +02:00
Jonas Jenwald	df7d8c74ca	Extract the actual sending of image data from the `PartialEvaluator.buildPaintImageXObject` method After PRs 10727 and 11912, the code responsible for sending the decoded image data to the main-thread has now become a fair bit more involved the previously. To reduce the amount of duplication here, the actual code responsible for sending the data is thus extracted into a new helper method instead.	2020-06-07 12:01:51 +02:00
Jonas Jenwald	af815e417d	Ensure that that we don't attempt to cache inline images in the `GlobalImageCache` (PR 11912 follow-up) Since inline images, i.e. those defined inside of `/Contents` streams, are by their very definition page-specific it thus seem like a good idea to actually enforce that they won't accidentally end up in the `GlobalImageCache`.	2020-06-01 01:00:30 +02:00
Jonas Jenwald	4ef547f400	Improve caching of empty `/XObject`s in the `PartialEvaluator.getTextContent` method It turns out that `getTextContent` suffers from similar problems with repeated images as `getOperatorList`; please see the previous patch. While only `/XObject` resources of the `Form`-type will actually be parsed in `PartialEvaluator.getTextContent`, since those are the only ones that may contain text, we're still forced to fetch repeated image resources where the name differs (but not the reference). Obviously it's less bad in this case, since we're not actually parsing `/XObject`s of e.g. the `Image`-type. However, you still want to avoid even fetching the data whenever possible, since `Stream`s are not cached on the `XRef` instance (given their potential size) and the lookup can thus be somewhat expensive in general. To address these issues, we can simply replace the exiting name-only caching in `PartialEvaluator.getTextContent` with a new cache backed by `LocalImageCache` instead.	2020-05-26 09:49:01 +02:00
Jonas Jenwald	d62c9181bd	Improve the local image caching in `PartialEvaluator.getOperatorList` Currently the local `imageCache`, as used in `PartialEvaluator.getOperatorList`, will miss certain cases of repeated images because the caching is only done by name (usually using a format such as e.g. "Im0", "Im1", ...). However, in some PDF documents the `/XObject` dictionaries many contain hundreds (or even thousands) of distinctly named images, despite them referring to only a handful of actual image objects (via the XRef table). With these changes we'll now cache local images using both name and (where applicable) reference, thus improving re-usage of images resources even further. This patch was tested using the PDF file from [bug 857031](https://bugzilla.mozilla.org/show_bug.cgi?id=857031), i.e. https://bug857031.bmoattachments.org/attachment.cgi?id=732270, with the following manifest file: ``` [ { "id": "bug857031", "file": "../web/pdfs/bug857031.pdf", "md5": "", "rounds": 250, "lastPage": 1, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, page, stat -- browser \| page \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ---- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- firefox \| 0 \| Overall \| 250 \| 2749 \| 2656 \| -93 \| -3.38 \| faster firefox \| 0 \| Page Request \| 250 \| 3 \| 4 \| 1 \| 50.14 \| slower firefox \| 0 \| Rendering \| 250 \| 2746 \| 2652 \| -94 \| -3.44 \| faster ``` While this is certainly an improvement, since we now avoid re-parsing ~1000 images on the first page, all of the image resources are small enough that the total rendering time doesn't improve that much in this particular case. In pathological cases, such as e.g. the PDF document in issue 4958, the improvements with this patch can be very significant. Looking for example at page 2, from issue 4958, the rendering time drops from ~60 seconds with `master` to ~30 seconds with this patch (obviously still slow, but it really showcases the potential of this patch nicely). Finally, note that there's also potential for additional improvements by re-using `LocalImageCache` instances for e.g. /XObject data of the `Form`-type. However, given that recent changes in this area I purposely didn't want to complicate this patch more than necessary.	2020-05-25 15:14:14 +02:00
Jonas Jenwald	18e0b10d3c	[api-minor] Remove the `disableCreateObjectURL` option from the `getDocument` parameters, since it's now unused in the API With the changes in previous patches, the `disableCreateObjectURL` option/functionality is no longer used for anything in the API and/or in the Worker code. Note however that there's some functionality, mainly related to file loading/downloading, in the GENERIC version of the default viewer which still depends on this option. Hence the `disableCreateObjectURL` option (and related compatibility code) is moved into the viewer, see e.g. `web/app_options.js`, such that it's still available in the default viewer.	2020-05-22 00:22:48 +02:00
Jonas Jenwald	0351852d74	[api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js` Currently some JPEG images are decoded by the built-in PDF.js decoder in `src/core/jpg.js`, while others attempt to use the browser JPEG decoder. This inconsistency seem unfortunate for a number of reasons: - It adds, compared to the other image formats supported in the PDF specification, a fair amount of code/complexity to the image handling in the PDF.js library. - The PDF specification support JPEG images with features, e.g. certain ColorSpaces, that browsers are unable to decode natively. Hence, determining if a JPEG image is possible to decode natively in the browser require a non-trivial amount of parsing. In particular, we're parsing (part of) the raw JPEG data to extract certain marker data and we also need to parse the ColorSpace for the JPEG image. - While some JPEG images may, for all intents and purposes, appear to be natively supported there's still cases where the browser may fail to decode some JPEG images. In order to support those cases, we've had to implement a fallback to the PDF.js JPEG decoder if there's any issues during the native decoding. This also means that it's no longer possible to simply send the JPEG image to the main-thread and continue parsing, but you now need to actually wait for the main-thread to indicate success/failure first. In practice this means that there's a code-path where the worker-thread is forced to wait for the main-thread, while the reverse should always be the case. - The native decoding, for anything except the simplest of JPEG images, result in increased peak memory usage because there's a handful of short-lived copies of the JPEG data (see PR 11707). Furthermore this also leads to data being parsed on the main-thread, rather than the worker-thread, which you usually want to avoid for e.g. performance and UI-reponsiveness reasons. - Not all environments, e.g. Node.js, fully support native JPEG decoding. This has, historically, lead to some issues and support requests. - Different browsers may use different JPEG decoders, possibly leading to images being rendered slightly differently depending on the platform/browser where the PDF.js library is used. Originally the implementation in `src/core/jpg.js` were unable to handle all of the JPEG images in the test-suite, but over the last couple of years I've fixed (hopefully) all of those issues. At this point in time, there's two kinds of failure with this patch: - Changes which are basically imperceivable to the naked eye, where some pixels in the images are essentially off-by-one (in all components), which could probably be attributed to things such as different rounding behaviour in the browser/PDF.js JPEG decoder. This type of "failure" accounts for the vast majority of the total number of changes in the reference tests. - Changes where the JPEG images now looks ever so slightly blurrier than with the native browser decoder. For quite some time I've just assumed that this pointed to a general deficiency in the `src/core/jpg.js` implementation, however I've discovered when comparing two viewers side-by-side that the differences vanish at higher zoom levels (usually around 200% is enough). Basically if you disable [this downscaling in canvas.js](`8fb82e939c/src/display/canvas.js (L2356-L2395)`), which is what happens when zooming in, the differences simply vanish! Hence I'm pretty satisfied that there's no significant problems with the `src/core/jpg.js` implementation, and the problems are rather tied to the general quality of the downscaling algorithm used. It could even be seen as a positive that all images now share the same downscaling behaviour, since this actually fixes one old bug; see issue 7041.	2020-05-22 00:22:48 +02:00
Jonas Jenwald	dda6626f40	Attempt to cache repeated images at the document, rather than the page, level (issue 11878) Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the same images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1] Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2] However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages. In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be twenty copies of the image data). While this obviously benefit both CPU and memory usage in this case, for very large image data this patch may possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will only cache a certain number of image resources at the document level and simply fallback to the default behaviour. Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3] Please note: The patch will lead to small movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator. --- [1] There's e.g. PDF documents that use the same image as background on all pages. [2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer. [3] If the latter case were true, we could simply check for repeat images before parsing started and thus avoid handling any duplicate image resources.	2020-05-21 18:13:45 +02:00
Brendan Dahl	b1be33c96f	Add more categories of unsupported features. Fixes #11815	2020-05-04 11:02:16 -07:00
Jonas Jenwald	911c33f025	Move the `maybeValidDimensions` check, used with JPEG images, to occur earlier (PR 11523 follow-up) Given that the `NativeImageDecoder.{isSupported, isDecodable}` methods require both dictionary lookups and ColorSpace parsing, in hindsight it actually seems more reasonable to the `JpegStream.maybeValidDimensions` checks first.	2020-04-26 12:07:46 +02:00

1 2 3 4 5 ...