Commit Graph

4045 Commits

Author SHA1 Message Date
Jonas Jenwald
835b5ffddd Only check isType3Font the first time that TranslatedFont.loadType3Data is called
If the `TranslatedFont.type3Loaded` property exists, then you already know that the font must be a Type3 one.
2020-07-27 13:20:15 +02:00
Jonas Jenwald
f3ff526019 Send/receive Type3 images the same way as other globally-cached images
There's quite frankly no particular reason to special-case Type3-fonts with image resources, which are very rare anyway, now that we have a general mechanism for sending/receiving images globally.
2020-07-27 13:20:15 +02:00
Jonas Jenwald
7c9d0d5939 Improve how Type3-fonts with dependencies are handled
While the `CharProcs` streams of Type3-fonts *usually* don't rely on dependencies, such as e.g. images, it does happen in some cases.

Currently any dependencies are simply appended to the parent operatorList, which in practice means *only* the operatorList of the *first* page where the Type3-font is being used.
However, there's one thing that's slightly unfortunate with that approach: Since fonts are global to the PDF document, we really ought to ensure that any Type3 dependencies are appended to the operatorList of *all* pages where the Type3-font is being used. Otherwise there's a theoretical risk that, if one page has its rendering paused, another page may try to use a Type3-font whose dependencies are not yet fully resolved. In that case there would be errors, since Type3 operatorLists are executed synchronously.

Hence this patch, which ensures that all relevant pages will have Type3 dependencies appended to the main operatorList. (Note here that the `OperatorList.addDependencies` method, via `OperatorList.addDependency`, ensures that a dependency is only added *once* to any operatorList.)

Finally, these changes also remove the need for the "waiting for the main-thread"-hack that was added to `PartialEvaluator.buildPaintImageXObject` as part of fixing issue 10717.
2020-07-27 13:20:13 +02:00
Calixte Denizet
584902dbf8 Add an annotation storage in order to save annotation data in acroforms 2020-07-24 10:50:11 +02:00
Jonas Jenwald
684a7b89ac Remove unnecessary duplication in the addChildren helper function (used by the ObjectLoader)
Besides being fewer lines of code overall, this also avoids *one* `node instanceof Dict` check for both of the `Dict`/`Stream`-cases.
2020-07-17 16:32:24 +02:00
Jonas Jenwald
ea8e432c45 Add a getRawValues method, to Dict instances, to provide an easier way of getting all *raw* values
When the old `Dict.getAll()` method was removed, it was replaced with a `Dict.getKeys()` call and `Dict.get(...)` calls (in a loop).
While this pattern obviously makes a lot of sense in many cases, there's some instances where we actually want the *raw* `Dict` values (i.e. `Ref`s where applicable). In those cases, `Dict.getRaw(...)` calls are instead used within the loop. However, by introducing a new `Dict.getRawValues()` method we can reduce the number of (strictly unnecessary) function calls by simply getting the *raw* `Dict` values directly.
2020-07-17 16:32:00 +02:00
Jonas Jenwald
6381b5b08f Add a size getter, to Dict instances, to provide an easier way of checking the number of entries
This removes the need to manually call `Dict.getKeys()` and check its length.
2020-07-17 16:06:11 +02:00
Tim van der Meij
e63d1ebff5
Merge pull request #12087 from Snuffleupagus/LocalGStateCache
Add local caching of "simple" Graphics State (ExtGState) data in `PartialEvaluator.{getOperatorList, getTextContent}` (issue 2813)
2020-07-17 16:02:45 +02:00
Tim van der Meij
b19a1796ac
Convert RefSetCache to a proper class and to use a Map internally
Using a `Map` instead of an `Object` provides some advantages such as
cheaper ways to get the size of the cache, to find out if an entry is
contained in the cache and to iterate over the cache. Moreover, we can
clear and re-use the same `Map` object now instead of creating a new
one.
2020-07-17 13:35:29 +02:00
Tim van der Meij
a604973cc7
Merge pull request #12085 from tamuratak/fix_isnodejs
Make the detection of Node.js environments on Electron strict.
2020-07-17 13:29:59 +02:00
Jonas Jenwald
b3480842b3 Use a RefSet, rather than a plain Object, for tracking already processed nodes in PartialEvaluator.hasBlendModes 2020-07-17 09:52:36 +02:00
Jonas Jenwald
03547b5633 Change PartialEvaluator.setGState to an async method
Since this method calls `Dict.get` to fetch data, there could thus be `Error`s thrown in corrupt PDF documents when attempting to resolve an indirect object.
To ensure that this won't ever become a problem, we change the method to be `async` such that a rejected Promise would be returned and general OperatorList parsing won't break.
2020-07-15 14:27:18 +02:00
Jonas Jenwald
f20aeb9343 Slightly simplify the code in PartialEvaluator.hasBlendModes, e.g. by using for...of loops
- Replace the existing loops with `for...of` variants instead.

 - Make use of `continue`, to reduce indentation and to make the code (slightly) easier to follow, when checking `/Resources` entries.
2020-07-15 12:47:11 +02:00
Jonas Jenwald
15fa3f8518 Remove a redundant /XObject stream dictionary objId check in PartialEvaluator.hasBlendModes (PR 6971 follow-up)
This case should no longer happen, given the `instanceof Ref` branch just above (added in PR 6971).
Also, I've run the entire test-suite locally with `continue` replaced by `throw new Error(...)` and didn't find any problems.
2020-07-15 12:47:11 +02:00
Jonas Jenwald
84476da26e Handle lookup errors "silently" in PartialEvaluator.hasBlendModes (PR 11680 follow-up)
Given that this method is used during what's essentially a *pre*-parsing stage, before the actual OperatorList parsing occurs, on second thought it doesn't seem at all necessary to warn and trigger fallback in cases where there's lookup errors.

*Please note:* Any any errors will still be either suppressed or thrown, according to the `ignoreErrors` option, during the *actual* OperatorList parsing.
2020-07-15 12:47:07 +02:00
Jonas Jenwald
981ff41b5f Add local caching of non-font Graphics State (ExtGState) data in PartialEvaluator.getTextContent
It turns out that `getTextContent` suffers from *similar* problems with repeated GStates as `getOperatorList`; please see the previous patch.

While only `/ExtGState` resources containing Fonts will actually be *parsed* by `PartialEvaluator.getTextContent`, we're still forced to fetch/validate repeated `/ExtGState` resources even though *most* of them won't affect the textContent (since they mostly contain purely graphical state).

With these changes we also no longer need to immediately reset the current text-state when encountering a `setGState` operator, which may thus improve text-selection in some cases.
2020-07-14 10:34:43 +02:00
Jonas Jenwald
90eb579713 Add local caching of "simple" Graphics State (ExtGState) data in PartialEvaluator.getOperatorList (issue 2813)
This patch will help pathological cases the most, with issue 2813 being a particularily problematic example. While there's only *four* `/ExtGState` resources, there's a total `29062` of `setGState` operators. Even though parsing of a single `/ExtGState` resource is quite fast, having to re-parse them thousands of times does add up quite significantly.

For simplicity we'll only cache "simple" `/ExtGState` resource, since e.g. the general `SMask` case cannot be easily cached (without re-factoring other code, which may have undesirable effects on general parsing).

By caching "simple" `/ExtGState` resource, we thus improve performance by:
 - Not having to fetch/validate/parse the same `/ExtGState` data over and over.
 - Handling of repeated `setGState` operators becomes *synchronous* during the `OperatorList` building, instead of having to defer to the event-loop/microtask-queue since the `/ExtGState` parsing is done asynchronously.

---

Obviously I had intended to include (standard) benchmark results with this patch, but for reasons I don't understand the test run-time (even with `master`) of the document in issue 2813 is *a lot* slower than in the development viewer (making normal benchmarking infeasible).
However, testing this manually in the development viewer (using `pdfBug=Stats`) shows a *reduction* of `~10 %` in the rendering time of the PDF document in issue 2813.
2020-07-14 10:34:43 +02:00
Jonas Jenwald
d4d7ac1b88 Stop special-casing the (very unlikely) "no /XObject found"-scenario, when parsing OPS.paintXObject operators, in PartialEvaluator.{getOperatorList, getTextContent}
Originally there weren't any (generally) good ways to handle errors gracefully, on the worker-side, however that's no longer the case and we can simply fallback to the existing `ignoreErrors` functionality instead.
Also, please note that the "no `/XObject` found"-scenario should be *extremely* unlikely in practice and would only occur in corrupt/broken documents.

Note that the `PartialEvaluator.getOperatorList` case is especially bad currently, since we'll simply (attempt to) send the data as-is to the main-thread. This is quite bad, since in a corrupt/broken document the data *could* contain anything and e.g. be unclonable (which would cause breaking errors).
Also, we're (obviously) not attempting to do anything with this "raw" `OPS.paintXObject` data on the main-thread and simply ensuring that we never send it definately seems like the correct approach.
2020-07-12 21:59:59 +02:00
Takashi Tamura
473ea1f1a4 Make the detection of Node.js environments on Electron strict.
The main process and its child processes should be detected as Node.js environments.
2020-07-12 10:56:17 +09:00
Tim van der Meij
7dabc5ecc8
Merge pull request #12063 from Snuffleupagus/issue-10989
Tweak the heuristic, in `src/core/jpg.js`, that handles JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10989)
2020-07-11 00:05:11 +02:00
Jonas Jenwald
d18cf47419 Remove the special handling, used when creating Indexed ColorSpaces, for the case where the lookup-data is a Stream
This special-case was added in PR 1992, however it became unnecessary with the changes in PR 4824 since all of the ColorSpace parsing is now done on the worker-thread (with only RGB-data being sent to the main-thread).
2020-07-10 17:22:55 +02:00
Jonas Jenwald
ea6a0e4435 Remove the IR (internal representation) part of the ColorSpace parsing
Originally ColorSpaces were only *partially* parsed on the worker-thread, to obtain an IR-format which was sent to the main-thread. This had the somewhat unfortunate side-effect of causing the majority of the (potentially heavy) ColorSpace parsing to happen on the main-thread.
Hence PR 4824 which, among other things, changed ColorSpaces to be *fully* parsed on the worker-thread with only RGB-data being sent to the main-thread.

While it thus originally was necessary to have `ColorSpace.{parseToIR, fromIR}` methods, to handle the worker/main-thread split, that's no longer the case and we can thus reduce all of the ColorSpace parsing to one method instead.

Currently, when parsing a ColorSpace, we call `ColorSpace.parseToIR` which parses the ColorSpace-data from the document and then creates the IR-format. We then, immediately, call `ColorSpace.fromIR` which parses the IR-format and then finally creates the actual ColorSpace.[1]
All-in-all, this leads to a fair amount of unnecessary indirection which also (in my opinion) makes the code less clear.

Obviously these changes are not really expected to have a significant effect on performance, especially with the recently added caching of ColorSpaces, however there'll now be strictly fewer function calls, less memory allocated, and overall less parsing required during ColorSpace-handling.

---
[1] For ICCBased ColorSpaces, given the validation necessary, this currently even leads to parsing an /Alternate ColorSpace *twice*.
2020-07-10 17:22:44 +02:00
Jonas Jenwald
4cc6797f17 Re-factor the idFactory functionality, used in the core/-code, and move the fontID generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.

For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.

In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 16:33:31 +02:00
Jonas Jenwald
1d66fce781 Tweak the heuristic, in src/core/jpg.js, that handles JPEG images with a wildly incorrect SOF (Start of Frame) scanLines parameter (issue 10989) 2020-07-06 13:06:49 +02:00
Jonas Jenwald
c95fbb6e21 Convert the code in src/core/evaluator.js to use standard classes
This removes additional `// eslint-disable-next-line no-shadow` usage, which our old pseudo-classes necessitated.

Most of the re-formatting changes, after the `class` definitions and methods were fixed, were done automatically by Prettier.

*Please note:* I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.
2020-07-05 16:01:04 +02:00
Jonas Jenwald
32a0b6fa73 Move some constants and helper functions out of the PartialEvaluator closure
This will simplify the `class` conversion in the next patch, and with modern JavaScript the moved code is still limited to the current module scope.

*Please note:* For improved consistency with our usual formatting, the `TILING_PATTERN`/`SHADING_PATTERN` constants where re-factored slightly.
2020-07-05 15:56:23 +02:00
Tim van der Meij
c4255fdbfd
Merge pull request #12059 from Snuffleupagus/image-class
Convert the code in `src/core/image.js` to use ES6 classes
2020-07-05 14:08:55 +02:00
Jonas Jenwald
59da1d5829 Convert the code in src/core/image.js to use ES6 classes
This removes additional `// eslint-disable-next-line no-shadow` usage, which our old pseudo-classes necessitated.

*Please note:* I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.
2020-07-05 09:34:14 +02:00
Jonas Jenwald
85ced3fbfd Allow BaseLocalCache to, optionally, only allocate storage for caching of references (PR 12034 follow-up)
*Yet another instalment in the never-ending series of things that you think of __after__ a patch has landed.*

Since `Function`s are only cached by reference, we thus don't need to allocate storage for names in `LocalFunctionCache` instances. Obviously the effect of these changes are *really tiny*, but it seems reasonable in principle to avoid allocating data structures that are guaranteed to be unused.
2020-07-04 15:01:32 +02:00
Jonas Jenwald
ca719ecaa4 Add local caching of Functions, by reference, in the PDFFunctionFactory (issue 2541)
Note that compared other structures, such as e.g. Images and ColorSpaces, `Function`s are not referred to by name, which however does bring the advantage of being able to share the cache for an *entire* page.
Furthermore, similar to ColorSpaces, the parsing of individual `Function`s are generally fast enough to not really warrant trying to cache them in any "smarter" way than by reference. (Hence trying to do caching similar to e.g. Fonts would most likely be a losing proposition, given the amount of data lookup/parsing that'd be required.)

Originally I tried implementing this similar to e.g. the recently added ColorSpace caching (and in a couple of different ways), however it unfortunately turned out to be quite ugly/unwieldy given the sheer number of functions/methods where you'd thus need to pass in a `LocalFunctionCache` instance. (Also, the affected functions/methods didn't exactly have short signatures as-is.)
After going back and forth on this for a while it seemed to me that the simplest, or least "invasive" if you will, solution would be if each `PartialEvaluator` instance had its *own* `PDFFunctionFactory` instance (since the latter is already passed to all of the required code). This way each `PDFFunctionFactory` instances could have a local `Function` cache, without it being necessary to provide a `LocalFunctionCache` instance manually at every `PDFFunctionFactory.{create, createFromArray}` call-site.

Obviously, with this patch, there's now (potentially) more `PDFFunctionFactory` instances than before when the entire document shared just one. However, each such instance is really quite small and it's also tied to a `PartialEvaluator` instance and those are *not* kept alive and/or cached. To reduce the impact of these changes, I've tried to make as many of these structures as possible *lazily initialized*, specifically:

 - The `PDFFunctionFactory`, on `PartialEvaluator` instances, since not all kinds of general parsing actually requires it. For example: `getTextContent` calls won't cause any `Function` to be parsed, and even some `getOperatorList` calls won't trigger `Function` parsing (if a page contains e.g. no Patterns or "complex" ColorSpaces).

 - The `LocalFunctionCache`, on `PDFFunctionFactory` instances, since only certain parsing requires it. Generally speaking, only e.g. Patterns, "complex" ColorSpaces, and/or (some) SoftMasks will trigger any `Function` parsing.

To put these changes into perspective, when loading/rendering all (14) pages of the default `tracemonkey.pdf` file there's now a total of 6 `PDFFunctionFactory` and 1 `LocalFunctionCache` instances created thanks to the lazy initialization.
(If you instead would keep the document-"global" `PDFFunctionFactory` instance and pass around `LocalFunctionCache` instances everywhere, the numbers for the `tracemonkey.pdf` file would be instead be something like 1 `PDFFunctionFactory` and 6 `LocalFunctionCache` instances.)
All-in-all, I thus don't think that the `PDFFunctionFactory` changes should be generally problematic.

With these changes, we can also modify (some) call-sites to pass in a `Reference` rather than the actual `Function` data. This is nice since `Function`s can also be `Streams`, which are not cached on the `XRef` instance (given their potential size), and this way we can avoid unnecessary lookups and thus save some additional time/resources.

Obviously I had intended to include (standard) benchmark results with these changes, but for reasons I don't really understand the test run-time (even with `master`) of the document in issue 2541 is quite a bit slower than in the development viewer.
However, logging the time it takes for the relevant `PDFFunctionFactory`/`PDFFunction ` parsing shows that it takes *approximately* `0.5 ms` for the `Function` in question. Looking up a cached `Function`, on the other hand, is *one order of magnitude faster* which does add up when the same `Function` is invoked close to 2000 times.
2020-07-04 00:55:18 +02:00
Jonas Jenwald
4a7e29865d [api-minor] Use the NodeCanvasFactory/NodeCMapReaderFactory classes as defaults in Node.js environments (issue 11900)
This moves, and slightly simplifies, code that's currently residing in the unit-test utils into the actual library, such that it's bundled with `GENERIC`-builds and used in e.g. the API-code.

As an added bonus, this also brings out-of-the-box support for CMaps in e.g. the Node.js examples.
2020-07-02 04:44:23 +02:00
Brendan Dahl
fe3df495cc
Merge pull request #12040 from wojtekmaj/replace-non-inclusive
Replace non-inclusive "whitelist" term with "allowlist"
2020-07-01 15:41:41 -07:00
Jonas Jenwald
fef24658e7 Adjust the heuristics used when dealing with rectangles, i.e. re operators, with zero width/height (issue 12010) 2020-07-02 00:02:49 +02:00
Wojciech Maj
78970bbbe1
Replace non-inclusive "whitelist" term with "allowlist" 2020-06-29 17:15:14 +02:00
Jonas Jenwald
28d2ada59c Attempt to detect inline images which contain "EI" sequence in the actual image data (issue 11124)
This should reduce the possibility of accidentally truncating some inline images, while *not* causing the "EI" detection to become significantly slower.[1]
There's obviously a possibility that these added checks are not sufficient to catch *every* single case of "EI" sequences within the actual inline image data, but without specific test-cases I decided against over-engineering the solution here.

*Please note:* The interpolation issues are somewhat orthogonal to the main issue here, which is the truncated image, and it's already tracked elsewhere.

---
[1] I've looked at the issue a few times, and this is the first approach that I was able to come up with that didn't cause *unacceptable* performance regressions in e.g. issue 2618.
2020-06-26 13:15:06 +02:00
Jonas Jenwald
b8e1352934 Stop passing in unnecessary parameters when parsing the Alternate entry of ICCBased ColorSpaces (PR 9659 follow-up)
With the changes made in PR 9659, `ColorSpace.fromIR` no longer takes a second `pdfFunctionFactory` parameter and there's thus one call-site that can be simplified.
2020-06-24 23:53:10 +02:00
Jonas Jenwald
19d7976483 Improve (local) caching of parsed ColorSpaces (PR 12001 follow-up)
This patch contains the following *notable* improvements:
 - Changes the `ColorSpace.parse` call-sites to, where possible, pass in a reference rather than actual ColorSpace data (necessary for the next point).
 - Adds (local) caching of `ColorSpace`s by `Ref`, when applicable, in addition the caching by name. This (generally) improves `ColorSpace` caching for e.g. the SMask code-paths.
 - Extends the (local) `ColorSpace` caching to also apply when handling Images and Patterns, thus further reducing unneeded re-parsing.
 - Adds a new `ColorSpace.parseAsync` method, almost identical to the existing `ColorSpace.parse` one, but returning a Promise instead (this simplifies some code in the `PartialEvaluator`).
2020-06-24 23:53:10 +02:00
Jonas Jenwald
51e87b9248 Add a proper LocalColorSpaceCache class, rather than piggybacking on the image one (PR 12001 follow-up)
This will allow caching of ColorSpaces by either `Name` *or* `Ref`, which doesn't really make sense for images, thus allowing (better) caching for ColorSpaces used with e.g. Images and Patterns.
2020-06-24 23:53:10 +02:00
Jonas Jenwald
e22bc483a5 Re-factor ColorSpace.parse to take a parameter object, rather than a bunch of (randomly) ordered parameters
Given the number of existing parameters, this will avoid needlessly unwieldy call-sites especially with upcoming changes in later patches.
2020-06-24 23:53:10 +02:00
Jonas Jenwald
f0708717a9 Move the fetchBuiltInCMap method to the PartialEvaluator.prototype
Defining this *inline* in the "constructor" looks slightly weird (I really don't know why I wrote it like that originally), and it can simply be changed to a regular method instead.
2020-06-24 17:29:47 +02:00
Tim van der Meij
c1cb9ee9fc
Merge pull request #12016 from Snuffleupagus/issue-8078
Tweak the `QueueOptimizer` to recognize `OPS.paintImageMaskXObject` operators as *repeated* when the "skew" transformation matrix elements are non-zero (issue 8078)
2020-06-21 19:38:27 +02:00
Jonas Jenwald
4cb0c032f3 Convert the PDFPageProxy.intentStates property from an Object to a Map
As can be seen in the code there's a handful of places where this structure needs to be iterated, something that becomes cumbersome when dealing with `Object`s. Hence, by changing this to a `Map` instead we can both simplify the code and avoid creating unnecessary closures.

Particularily the `PDFPageProxy._tryCleanup` method becomes a lot more readable, at least in my opinion.

Finally, since this property is intended to be "private" the name is adjusted to reflect that.
2020-06-21 17:02:42 +02:00
Jonas Jenwald
cabc2cc4fc Add a InternalRenderTask.completed getter and use it to simplify PDFPageProxy._destroy
This patch aims to simplify the `PDFPageProxy._destroy` method, by:
 - Replacing the unnecessary `forEach` with a "regular" `for`-loop instead.
 - Use a more appropriate variable name, since `intentState.renderTasks` contain instances of `InternalRenderTask`.
 - Move the "is rendering completed"-handling to a new `InternalRenderTask.completed` getter, to abstract away some (mostly) internal `InternalRenderTask` state.
2020-06-21 15:56:14 +02:00
Jonas Jenwald
a04a5d8325 Tweak the loop in ChunkedStreamManager.abort to clarify what's being iterated (PR 11985 follow-up)
In hindsight, using the `for (let [key, value] of myMap) { ... }`-format when we don't care about the `key` probably wasn't such a great idea. Since `Map`s have explicit support for iterating either `key`s or `value`s, we should probably use that instead here.
2020-06-21 11:29:05 +02:00
Jonas Jenwald
e18fa3fc45 Tweak the QueueOptimizer to recognize OPS.paintImageMaskXObject operators as *repeated* when the "skew" transformation matrix elements are non-zero (issue 8078)
*First of all, I should mention that my understanding of the finer details of the `QueueOptimizer` (and its related `CanvasGraphics` methods) is somewhat limited.*
Hence I'm not sure if there's actually a very good reason for *only* considering ImageMasks where the "skew" transformation matrix elements are zero as *repeated*, however simply looking at the code I just don't see why these elements cannot be non-zero as long as they are *all identical* for the ImageMasks.
Furthermore, looking at the *group* case (which is what we're currently falling back to), there's no particular limitation placed upon the transformation matrix elements.

While this patch obviously isn't enough to *completely* fix the issue, since there should be a visible Pattern rendered as well[1], it seem (at least to me) like enough of an improvement that submitting this is justified.
With these changes the referenced PDF document will no longer hang the *entire* browser, and rendering also finishes in a *reasonable* time (< 10 seconds for me) which seem fine given the *huge* number of identical inline images present.[2]

---
[1] Temporarily changing the Pattern to a solid color *does* render the correct/expected area, which suggests that the remaining problem is a pre-existing issue related to the Pattern-handling itself rather than the `QueueOptimizer` functionality.

[2] The document isn't exactly rendered immediately in e.g. Adobe Reader either.
2020-06-20 12:18:48 +02:00
Tim van der Meij
8cfdfb237a
Merge pull request #12005 from Snuffleupagus/cff-class
Convert the code in `src/core/cff_parser.js` to use ES6 classes
2020-06-17 23:30:28 +02:00
Jonas Jenwald
880a0a0f59 Convert the code in src/core/cff_parser.js to use ES6 classes
This removes multiple instances of `// eslint-disable-next-line no-shadow`, which our old pseudo-classes necessitated.

*Please note:* I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.
2020-06-16 12:33:21 +02:00
Jonas Jenwald
fb9b574f3d Convert the code in src/core/worker.js to use ES6 classes
This removes one instance of `// eslint-disable-next-line no-shadow`, which our old pseudo-classes necessitated.

*Please note:* I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.
2020-06-16 11:54:59 +02:00
Jonas Jenwald
87b089ba42 Lazily initialize, and cache, the regular expression used in CFFCompiler.encodeFloat
There's no particular reason for re-creating the regular expression over and over for every `encodeFloat` invocation, as far as I can tell.
2020-06-15 13:51:28 +02:00
Jonas Jenwald
517d92a121 Simplify the "is integer" checks in CFFCompiler.encodeNumber
The `isNaN` check is obviously redundant, since `NaN` is the only value that isn't equal to itself; see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/NaN#Examples

The `parseFloat`/`parseInt` comparison would make sense if the `value` ever contains a String, which however is never actually the case. Besides looking through the code, I've also run the entire test-suite locally with `assert(typeof value === "number", "encodeNumber");` added at the top of the method and there were no failures.

Hence we can simplify the "is integer" check a bit in the `CFFCompiler.encodeNumber` method.
2020-06-15 13:51:20 +02:00
Jonas Jenwald
5c39de805c Add local caching of ColorSpaces, by name, in PartialEvaluator.getOperatorList (issue 2504)
By caching parsed `ColorSpace`s, we thus don't need to re-parse the same data over and over which saves CPU cycles *and* reduces peak memory usage. (Obviously persistent memory usage *may* increase a tiny bit, but since the caching is done per `PartialEvaluator.getOperatorList` invocation and given that `ColorSpace` instances generally hold very little data this shouldn't be much of an issue.)
Furthermore, by caching `ColorSpace`s we can also lookup the already parsed ones *synchronously* during the `OperatorList` building, instead of having to defer to the event loop/microtask queue since the parsing is done asynchronously (such that error handling is easier).

Possible future improvements:
 - Cache/lookup parsed `ColorSpaces` used in `Pattern`s and `Image`s.
 - Attempt to cache *local* `ColorSpace`s by reference as well, in addition to only by name, assuming that there's documents where that would be beneficial and that it's not too difficult to implement.
 - Assuming there's documents that would benefit from it, also cache repeated `ColorSpace`s *globally* as well.

Given that we've never, until now, been doing *any* caching of parsed `ColorSpace`s and that even using a simple name-only *local* cache helps tremendously in pathological cases, I purposely decided against complicating the implementation too much initially.
Also, compared to parsing of `Image`s, simply creating a `ColorSpace` instance isn't that expensive (hence I'd be somewhat surprised if adding a *global* cache would help much).

---

This patch was tested using:
 - The default `tracemonkey` PDF file, which was included mostly to show that "normal" documents aren't negatively affected by these changes.
 - The PDF file from issue 2504, i.e. https://dl-ctlg.panasonic.com/jp/manual/sd/sd_rbm1000_0.pdf, where most pages will switch *thousands* of times between a handful of `ColorSpace`s.

with the following manifest file:
```
[
    {  "id": "tracemonkey",
       "file": "pdfs/tracemonkey.pdf",
       "md5": "9a192d8b1a7dc652a19835f6f08098bd",
       "rounds": 100,
       "type": "eq"
    },
    {  "id": "issue2504",
       "file": "../web/pdfs/issue2504.pdf",
       "md5": "",
       "rounds": 20,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
 - Overall
```
-- Grouped By browser, pdf, stat --
browser | pdf         | stat         | Count | Baseline(ms) | Current(ms) |  +/- |     %  | Result(P<.05)
------- | ----------- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | issue2504   | Overall      |   640 |          977 |         497 | -479 | -49.08 |        faster
firefox | issue2504   | Page Request |   640 |            3 |           4 |    1 |  59.18 |
firefox | issue2504   | Rendering    |   640 |          974 |         493 | -481 | -49.37 |        faster
firefox | tracemonkey | Overall      |  1400 |          116 |         111 |   -5 |  -4.43 |
firefox | tracemonkey | Page Request |  1400 |            2 |           2 |    0 |  -2.86 |
firefox | tracemonkey | Rendering    |  1400 |          114 |         109 |   -5 |  -4.47 |
```

 - Page-specific
```
-- Grouped By browser, pdf, page, stat --
browser | pdf         | page | stat         | Count | Baseline(ms) | Current(ms) |   +/- |      %  | Result(P<.05)
------- | ----------- | ---- | ------------ | ----- | ------------ | ----------- | ----- | ------- | -------------
firefox | issue2504   | 0    | Overall      |    20 |         2295 |        1268 | -1027 |  -44.76 |        faster
firefox | issue2504   | 0    | Page Request |    20 |            6 |           7 |     1 |   15.32 |
firefox | issue2504   | 0    | Rendering    |    20 |         2288 |        1260 | -1028 |  -44.93 |        faster
firefox | issue2504   | 1    | Overall      |    20 |         3059 |        2806 |  -252 |   -8.25 |        faster
firefox | issue2504   | 1    | Page Request |    20 |           11 |          14 |     3 |   23.25 |        slower
firefox | issue2504   | 1    | Rendering    |    20 |         3047 |        2792 |  -255 |   -8.37 |        faster
firefox | issue2504   | 2    | Overall      |    20 |          411 |         295 |  -116 |  -28.20 |        faster
firefox | issue2504   | 2    | Page Request |    20 |            2 |          42 |    40 | 1897.62 |
firefox | issue2504   | 2    | Rendering    |    20 |          409 |         253 |  -156 |  -38.09 |        faster
firefox | issue2504   | 3    | Overall      |    20 |          736 |         299 |  -437 |  -59.34 |        faster
firefox | issue2504   | 3    | Page Request |    20 |            2 |           2 |     0 |    0.00 |
firefox | issue2504   | 3    | Rendering    |    20 |          734 |         297 |  -437 |  -59.49 |        faster
firefox | issue2504   | 4    | Overall      |    20 |          356 |         458 |   102 |   28.63 |
firefox | issue2504   | 4    | Page Request |    20 |            1 |           2 |     1 |   57.14 |        slower
firefox | issue2504   | 4    | Rendering    |    20 |          354 |         455 |   101 |   28.53 |
firefox | issue2504   | 5    | Overall      |    20 |         1381 |         765 |  -616 |  -44.59 |        faster
firefox | issue2504   | 5    | Page Request |    20 |            3 |           5 |     2 |   50.00 |        slower
firefox | issue2504   | 5    | Rendering    |    20 |         1378 |         760 |  -617 |  -44.81 |        faster
firefox | issue2504   | 6    | Overall      |    20 |          757 |         299 |  -459 |  -60.57 |        faster
firefox | issue2504   | 6    | Page Request |    20 |            2 |           5 |     3 |  150.00 |        slower
firefox | issue2504   | 6    | Rendering    |    20 |          755 |         294 |  -462 |  -61.11 |        faster
firefox | issue2504   | 7    | Overall      |    20 |          394 |         302 |   -92 |  -23.39 |        faster
firefox | issue2504   | 7    | Page Request |    20 |            2 |           1 |    -1 |  -34.88 |        faster
firefox | issue2504   | 7    | Rendering    |    20 |          392 |         301 |   -91 |  -23.32 |        faster
firefox | issue2504   | 8    | Overall      |    20 |         2875 |         979 | -1896 |  -65.95 |        faster
firefox | issue2504   | 8    | Page Request |    20 |            1 |           2 |     0 |   11.11 |
firefox | issue2504   | 8    | Rendering    |    20 |         2874 |         978 | -1896 |  -65.99 |        faster
firefox | issue2504   | 9    | Overall      |    20 |          700 |         332 |  -368 |  -52.60 |        faster
firefox | issue2504   | 9    | Page Request |    20 |            3 |           2 |     0 |   -4.00 |
firefox | issue2504   | 9    | Rendering    |    20 |          698 |         329 |  -368 |  -52.78 |        faster
firefox | issue2504   | 10   | Overall      |    20 |         3296 |         926 | -2370 |  -71.91 |        faster
firefox | issue2504   | 10   | Page Request |    20 |            2 |           2 |     0 |  -18.75 |
firefox | issue2504   | 10   | Rendering    |    20 |         3293 |         924 | -2370 |  -71.96 |        faster
firefox | issue2504   | 11   | Overall      |    20 |          524 |         197 |  -327 |  -62.34 |        faster
firefox | issue2504   | 11   | Page Request |    20 |            2 |           3 |     1 |   58.54 |
firefox | issue2504   | 11   | Rendering    |    20 |          522 |         194 |  -328 |  -62.81 |        faster
firefox | issue2504   | 12   | Overall      |    20 |          752 |         369 |  -384 |  -50.98 |        faster
firefox | issue2504   | 12   | Page Request |    20 |            3 |           2 |    -1 |  -36.51 |        faster
firefox | issue2504   | 12   | Rendering    |    20 |          749 |         367 |  -382 |  -51.05 |        faster
firefox | issue2504   | 13   | Overall      |    20 |          679 |         487 |  -193 |  -28.38 |        faster
firefox | issue2504   | 13   | Page Request |    20 |            4 |           2 |    -2 |  -48.68 |        faster
firefox | issue2504   | 13   | Rendering    |    20 |          676 |         485 |  -191 |  -28.28 |        faster
firefox | issue2504   | 14   | Overall      |    20 |          474 |         283 |  -191 |  -40.26 |        faster
firefox | issue2504   | 14   | Page Request |    20 |            2 |           4 |     2 |   78.57 |
firefox | issue2504   | 14   | Rendering    |    20 |          471 |         279 |  -192 |  -40.79 |        faster
firefox | issue2504   | 15   | Overall      |    20 |          860 |         618 |  -241 |  -28.05 |        faster
firefox | issue2504   | 15   | Page Request |    20 |            2 |           3 |     0 |   10.87 |
firefox | issue2504   | 15   | Rendering    |    20 |          857 |         616 |  -241 |  -28.15 |        faster
firefox | issue2504   | 16   | Overall      |    20 |          389 |         243 |  -147 |  -37.71 |        faster
firefox | issue2504   | 16   | Page Request |    20 |            2 |           2 |     0 |    2.33 |
firefox | issue2504   | 16   | Rendering    |    20 |          387 |         240 |  -147 |  -37.94 |        faster
firefox | issue2504   | 17   | Overall      |    20 |         1484 |         672 |  -812 |  -54.70 |        faster
firefox | issue2504   | 17   | Page Request |    20 |            2 |           3 |     1 |   37.21 |
firefox | issue2504   | 17   | Rendering    |    20 |         1482 |         669 |  -812 |  -54.84 |        faster
firefox | issue2504   | 18   | Overall      |    20 |          575 |         252 |  -323 |  -56.12 |        faster
firefox | issue2504   | 18   | Page Request |    20 |            2 |           2 |     0 |  -16.22 |
firefox | issue2504   | 18   | Rendering    |    20 |          573 |         251 |  -322 |  -56.24 |        faster
firefox | issue2504   | 19   | Overall      |    20 |          517 |         227 |  -290 |  -56.08 |        faster
firefox | issue2504   | 19   | Page Request |    20 |            2 |           2 |     0 |   21.62 |
firefox | issue2504   | 19   | Rendering    |    20 |          515 |         225 |  -290 |  -56.37 |        faster
firefox | issue2504   | 20   | Overall      |    20 |          668 |         670 |     2 |    0.31 |
firefox | issue2504   | 20   | Page Request |    20 |            4 |           2 |    -1 |  -34.29 |
firefox | issue2504   | 20   | Rendering    |    20 |          664 |         667 |     3 |    0.49 |
firefox | issue2504   | 21   | Overall      |    20 |          486 |         309 |  -177 |  -36.44 |        faster
firefox | issue2504   | 21   | Page Request |    20 |            2 |           2 |     0 |   16.13 |
firefox | issue2504   | 21   | Rendering    |    20 |          484 |         307 |  -177 |  -36.60 |        faster
firefox | issue2504   | 22   | Overall      |    20 |          543 |         267 |  -276 |  -50.85 |        faster
firefox | issue2504   | 22   | Page Request |    20 |            2 |           2 |     0 |   10.26 |
firefox | issue2504   | 22   | Rendering    |    20 |          541 |         265 |  -276 |  -51.07 |        faster
firefox | issue2504   | 23   | Overall      |    20 |         3246 |         871 | -2375 |  -73.17 |        faster
firefox | issue2504   | 23   | Page Request |    20 |            2 |           3 |     1 |   37.21 |
firefox | issue2504   | 23   | Rendering    |    20 |         3243 |         868 | -2376 |  -73.25 |        faster
firefox | issue2504   | 24   | Overall      |    20 |          379 |         156 |  -223 |  -58.83 |        faster
firefox | issue2504   | 24   | Page Request |    20 |            2 |           2 |     0 |   -2.86 |
firefox | issue2504   | 24   | Rendering    |    20 |          378 |         154 |  -223 |  -59.10 |        faster
firefox | issue2504   | 25   | Overall      |    20 |          176 |         127 |   -50 |  -28.19 |        faster
firefox | issue2504   | 25   | Page Request |    20 |            2 |           1 |     0 |  -15.63 |
firefox | issue2504   | 25   | Rendering    |    20 |          175 |         125 |   -49 |  -28.31 |        faster
firefox | issue2504   | 26   | Overall      |    20 |          181 |         108 |   -74 |  -40.67 |        faster
firefox | issue2504   | 26   | Page Request |    20 |            3 |           2 |    -1 |  -39.13 |        faster
firefox | issue2504   | 26   | Rendering    |    20 |          178 |         105 |   -72 |  -40.69 |        faster
firefox | issue2504   | 27   | Overall      |    20 |          208 |         104 |  -104 |  -49.92 |        faster
firefox | issue2504   | 27   | Page Request |    20 |            2 |           2 |     1 |   48.39 |
firefox | issue2504   | 27   | Rendering    |    20 |          206 |         102 |  -104 |  -50.64 |        faster
firefox | issue2504   | 28   | Overall      |    20 |          241 |         111 |  -131 |  -54.16 |        faster
firefox | issue2504   | 28   | Page Request |    20 |            2 |           2 |    -1 |  -33.33 |
firefox | issue2504   | 28   | Rendering    |    20 |          239 |         109 |  -130 |  -54.39 |        faster
firefox | issue2504   | 29   | Overall      |    20 |          321 |         196 |  -125 |  -39.05 |        faster
firefox | issue2504   | 29   | Page Request |    20 |            1 |           2 |     0 |   17.86 |
firefox | issue2504   | 29   | Rendering    |    20 |          319 |         194 |  -126 |  -39.35 |        faster
firefox | issue2504   | 30   | Overall      |    20 |          651 |         271 |  -380 |  -58.41 |        faster
firefox | issue2504   | 30   | Page Request |    20 |            1 |           2 |     1 |   50.00 |
firefox | issue2504   | 30   | Rendering    |    20 |          649 |         269 |  -381 |  -58.60 |        faster
firefox | issue2504   | 31   | Overall      |    20 |         1635 |         647 |  -988 |  -60.42 |        faster
firefox | issue2504   | 31   | Page Request |    20 |            1 |           2 |     0 |   30.43 |
firefox | issue2504   | 31   | Rendering    |    20 |         1634 |         645 |  -988 |  -60.49 |        faster
firefox | tracemonkey | 0    | Overall      |   100 |           51 |          51 |     0 |    0.02 |
firefox | tracemonkey | 0    | Page Request |   100 |            1 |           1 |     0 |   -4.76 |
firefox | tracemonkey | 0    | Rendering    |   100 |           50 |          50 |     0 |    0.12 |
firefox | tracemonkey | 1    | Overall      |   100 |           97 |          91 |    -5 |   -5.52 |        faster
firefox | tracemonkey | 1    | Page Request |   100 |            3 |           3 |     0 |   -1.32 |
firefox | tracemonkey | 1    | Rendering    |   100 |           94 |          88 |    -5 |   -5.73 |        faster
firefox | tracemonkey | 2    | Overall      |   100 |           40 |          40 |     0 |    0.50 |
firefox | tracemonkey | 2    | Page Request |   100 |            1 |           1 |     0 |    3.16 |
firefox | tracemonkey | 2    | Rendering    |   100 |           39 |          39 |     0 |    0.54 |
firefox | tracemonkey | 3    | Overall      |   100 |           62 |          62 |    -1 |   -0.94 |
firefox | tracemonkey | 3    | Page Request |   100 |            1 |           1 |     0 |   17.05 |
firefox | tracemonkey | 3    | Rendering    |   100 |           61 |          61 |    -1 |   -1.11 |
firefox | tracemonkey | 4    | Overall      |   100 |           56 |          58 |     2 |    3.41 |
firefox | tracemonkey | 4    | Page Request |   100 |            1 |           1 |     0 |   15.31 |
firefox | tracemonkey | 4    | Rendering    |   100 |           55 |          57 |     2 |    3.23 |
firefox | tracemonkey | 5    | Overall      |   100 |           73 |          71 |    -2 |   -2.28 |
firefox | tracemonkey | 5    | Page Request |   100 |            2 |           2 |     0 |   12.20 |
firefox | tracemonkey | 5    | Rendering    |   100 |           71 |          69 |    -2 |   -2.69 |
firefox | tracemonkey | 6    | Overall      |   100 |           85 |          69 |   -16 |  -18.73 |        faster
firefox | tracemonkey | 6    | Page Request |   100 |            2 |           2 |     0 |   -9.90 |
firefox | tracemonkey | 6    | Rendering    |   100 |           83 |          67 |   -16 |  -18.97 |        faster
firefox | tracemonkey | 7    | Overall      |   100 |           65 |          64 |     0 |   -0.37 |
firefox | tracemonkey | 7    | Page Request |   100 |            1 |           1 |     0 |  -11.94 |
firefox | tracemonkey | 7    | Rendering    |   100 |           63 |          63 |     0 |   -0.05 |
firefox | tracemonkey | 8    | Overall      |   100 |           53 |          54 |     1 |    2.04 |
firefox | tracemonkey | 8    | Page Request |   100 |            1 |           1 |     0 |   17.02 |
firefox | tracemonkey | 8    | Rendering    |   100 |           52 |          53 |     1 |    1.82 |
firefox | tracemonkey | 9    | Overall      |   100 |           79 |          73 |    -6 |   -7.86 |        faster
firefox | tracemonkey | 9    | Page Request |   100 |            2 |           2 |     0 |  -15.14 |
firefox | tracemonkey | 9    | Rendering    |   100 |           77 |          71 |    -6 |   -7.86 |        faster
firefox | tracemonkey | 10   | Overall      |   100 |          545 |         519 |   -27 |   -4.86 |        faster
firefox | tracemonkey | 10   | Page Request |   100 |           14 |          13 |     0 |   -3.56 |
firefox | tracemonkey | 10   | Rendering    |   100 |          532 |         506 |   -26 |   -4.90 |        faster
firefox | tracemonkey | 11   | Overall      |   100 |           42 |          41 |    -1 |   -2.50 |
firefox | tracemonkey | 11   | Page Request |   100 |            1 |           1 |     0 |  -27.42 |        faster
firefox | tracemonkey | 11   | Rendering    |   100 |           41 |          40 |    -1 |   -1.75 |
firefox | tracemonkey | 12   | Overall      |   100 |          350 |         332 |   -18 |   -5.16 |        faster
firefox | tracemonkey | 12   | Page Request |   100 |            3 |           3 |     0 |   -5.17 |
firefox | tracemonkey | 12   | Rendering    |   100 |          347 |         329 |   -18 |   -5.15 |        faster
firefox | tracemonkey | 13   | Overall      |   100 |           31 |          31 |     0 |    0.52 |
firefox | tracemonkey | 13   | Page Request |   100 |            1 |           1 |     0 |    4.95 |
firefox | tracemonkey | 13   | Rendering    |   100 |           30 |          30 |     0 |    0.20 |
```
2020-06-14 11:51:45 +02:00
Jonas Jenwald
4b51bcc733 Ensure that PDFImage.buildImage won't accidentally swallow errors, e.g. from ColorSpace parsing (issue 6707, PR 11601 follow-up)
Because of a really stupid `Promise`-related mistake on my part, when re-factoring `PDFImage.buildImage` during the `NativeImageDecoder` removal, we're no longer re-throwing errors occuring during image parsing/decoding as intended.
The result is that some (fairly) corrupt documents will never finish loading, and unfortunately there were apparently no sufficiently corrupt images in the test-suite to catch this.
2020-06-13 15:02:37 +02:00
Jonas Jenwald
10f31bb46d Change the dependencies property, on OperatorList instances, from an Object to a Set
Since this is completely internal functionality, and furthermore limited to the worker-thread, this change should thus not have any observable effect for e.g. an API-user.
2020-06-11 16:27:13 +02:00
Jonas Jenwald
02a1d0f6c5 Remove the unused intent/pageIndex properties from OperatorList instances (PR 11069 follow-up)
Apparently I completely overlooked the fact that with the changes in PR 11069 these properties became *completely* unused, and consequently they thus ought to be removed.
2020-06-11 16:05:38 +02:00
Jonas Jenwald
00d45fce33 Update SVGGraphics to account for globally cached images (PR 11912 follow-up)
Since there's (essentially) no tests for the SVG-backend, these changes didn't make in into PR 11912 when the code in the `src/display/canvas.js` file was modified.
2020-06-10 15:31:26 +02:00
Jonas Jenwald
88fdb482b0 Move the isEmptyObj helper function from src/shared/util.js to test/unit/test_utils.js
Since this helper function is no longer used anywhere in the main code-base, but only in a couple of unit-tests, it's thus being moved to a more appropriate spot.

Finally, the implementation of `isEmptyObj` is also tweaked slightly by removing the manual loop.
2020-06-09 17:50:16 +02:00
Jonas Jenwald
159e13c4e4 Convert the ChunkedStreamManager.promisesByRequest property to a Map
Compared to regular `Object`s, `Map`s have a number of advantageous properties: Of particular importance in this case is the built-in iteration support, and that determining if the structure is empty is easy.
2020-06-09 17:50:14 +02:00
Jonas Jenwald
dda7a5d1b7 Convert the ChunkedStreamManager.requestsByChunk property to a Map
Compared to regular `Object`s, `Map`s have a number of advantageous properties: Of particular importance in this case is the built-in iteration support, and that determining if the structure is empty is easy.
2020-06-09 17:50:11 +02:00
Jonas Jenwald
17e23ffb33 Convert the ChunkedStreamManager.chunksNeededByRequest property to a Map (containing Sets)
Compared to regular `Object`s, `Map`s (and `Set`s) have a number of advantageous properties: Of particular importance in this case is the built-in iteration support, and that determining if the structure is empty is easy.
2020-06-09 17:49:53 +02:00
Tim van der Meij
a4fa4554d6
Merge pull request #11977 from timvandermeij/refset
Convert the `RefSet` primitive to a proper class and use a `Set` internally
2020-06-07 23:15:35 +02:00
Tim van der Meij
4c2e056796
Convert the RefSet primitive to a proper class and use a Set internally
The `RefSet` primitive predates ES6, so that most likely explains why an
object is used internally to track the entries. However, nowadays we can
use built-in JavaScript sets for this purpose. Built-in types are often
more efficient/optimized and using it makes the code a bit more clear
since we don't have to assign `true` to keys anymore just to indicate
their presence.
2020-06-07 19:01:29 +02:00
Jonas Jenwald
466d10f6fc Remove unused methods from NetworkManager, in src/display/network.js
Both of the removed methods were added in PR 2719, however they are no longer used:
 - It appears that `hasPendingRequests` was never used at all, even from the beginning.
 - The only general PDF.js library usage of `abortAllRequests` was removed in PR 6879, which is now four years ago. (Originally the Firefox-specific network implementation, see https://searchfox.org/mozilla-central/source/browser/extensions/pdfjs/content/PdfJsNetwork.jsm, was shared with the `src/display/network.js` file and *there* this method is used. However, since all of the Firefox-specific code now lives directly in mozilla-central, that's not relevant for the removal in this patch.)
2020-06-07 16:03:32 +02:00
Tim van der Meij
c97200ff59
Merge pull request #11974 from Snuffleupagus/sendImgData
A couple of small image caching/sending improvements
2020-06-07 13:53:26 +02:00
Jonas Jenwald
df7d8c74ca Extract the actual sending of image data from the PartialEvaluator.buildPaintImageXObject method
After PRs 10727 and 11912, the code responsible for sending the decoded image data to the main-thread has now become a fair bit more involved the previously.
To reduce the amount of duplication here, the actual code responsible for sending the data is thus extracted into a new helper method instead.
2020-06-07 12:01:51 +02:00
Jonas Jenwald
aff0d56326 Remove an unnecessary RefSetCache.prototype.has() call from GlobalImageCache.getData
We can simply attempt to get the data *directly*, and instead check the result, rather than first checking if it exists.
2020-06-07 11:56:04 +02:00
Takashi Tamura
7acb112ca9 Optimization:
Avoid calling Math.pow if possible when calculating the transfer
function of the CalRGB color space since calling Math.pow is expensive.

If the value of color is larger than the threshold, 0.99554525,
the final result of the transform is larger that 254.5
since ((1 + 0.055) * 0.99554525 ** (1 / 2.4) - 0.055) * 255 === 254.50000003134699
2020-06-07 13:17:18 +09:00
Jonas Jenwald
b7272a34eb Change the loadedChunks property, on ChunkedStream instances, from an Array to a Set
In the old code the use of an Array meant that we had to *manually* track the `numChunksLoaded` property, given that simply using the Array `length` wouldn't have worked since there's no guarantee that the data is loaded in order when e.g. range requests are in use.

Tracking closely related state *separately* in this manner never seem like a good idea, and we can now instead utilize a Set to avoid that.
2020-06-05 15:03:06 +02:00
Carlos Rodríguez
802aa14a99 Jpeg encoded with RGB -instead of YCbCr- write the components index as "RGB" in ASCII to say it so
On ISO/IEC 10918-6:2013 (E), section 6.1: (http://www.itu.int/rec/T-REC-T.872-201206-I/en)

"Images encoded with three components are assumed to be RGB data encoded as YCbCr unless the image contains an APP14 marker segment as specified in 6.5.3, in which case the colour encoding is considered either RGB or YCbCr according to the application data of the APP14 marker segment"

But common jpeg libraries consider RGB too if components index are ASCII R (0x52), G (0x47) and B (0x42): https://stackoverflow.com/questions/50798014/determining-color-space-for-jpeg/50861048

Issue #11931
2020-06-04 15:08:47 +02:00
Jonas Jenwald
64378fc366 [api-minor] Remove the deprecated PDFDocumentProxy.getOpenActionDestination method (PR 11644 follow-up)
This method has been printing a `deprecated` warning in two releases, hence it should hopefully be safe to remove now.
2020-06-02 12:28:00 +02:00
Jonas Jenwald
af815e417d Ensure that that we don't attempt to cache *inline* images in the GlobalImageCache (PR 11912 follow-up)
Since *inline* images, i.e. those defined inside of `/Contents` streams, are by their very definition page-specific it thus seem like a good idea to actually enforce that they won't accidentally end up in the `GlobalImageCache`.
2020-06-01 01:00:30 +02:00
Tim van der Meij
fe5689705d
Merge pull request #11930 from Snuffleupagus/LocalImageCache
Improve the *local* image caching in `PartialEvaluator.getOperatorList`
2020-05-28 00:12:37 +02:00
Jonas Jenwald
4d60430b1c Add comments to the export list in the src/pdf.js file (PR 11914 follow-up)
When converting this file to use standard `import`/`export` statements, I sorted the exports in the same order as the imports to simplify things.

However, looking at the list of `export`ed properties it probably doesn't hurt to add a couple of comments to clarify from where specifically the `export`s originated.
2020-05-27 13:57:25 +02:00
Jonas Jenwald
4ef547f400 Improve caching of empty /XObjects in the PartialEvaluator.getTextContent method
It turns out that `getTextContent` suffers from *similar* problems with repeated images as `getOperatorList`; please see the previous patch.

While only `/XObject` resources of the `Form`-type will actually be *parsed* in `PartialEvaluator.getTextContent`, since those are the only ones that may contain text, we're still forced to fetch repeated image resources where the name differs (but not the reference).
Obviously it's less bad in this case, since we're not actually parsing `/XObject`s of e.g. the `Image`-type. However, you still want to avoid even fetching the data whenever possible, since `Stream`s are not cached on the `XRef` instance (given their potential size) and the lookup can thus be somewhat expensive in general.

To address these issues, we can simply replace the exiting name-only caching in `PartialEvaluator.getTextContent` with a new cache backed by `LocalImageCache` instead.
2020-05-26 09:49:01 +02:00
Jonas Jenwald
d62c9181bd Improve the *local* image caching in PartialEvaluator.getOperatorList
Currently the local `imageCache`, as used in `PartialEvaluator.getOperatorList`, will miss certain cases of repeated images because the caching is *only* done by name (usually using a format such as e.g. "Im0", "Im1", ...).
However, in some PDF documents the `/XObject` dictionaries many contain hundreds (or even thousands) of distinctly named images, despite them referring to only a handful of actual image objects (via the XRef table).

With these changes we'll now cache *local* images using both name and (where applicable) reference, thus improving re-usage of images resources even further.

This patch was tested using the PDF file from [bug 857031](https://bugzilla.mozilla.org/show_bug.cgi?id=857031), i.e. https://bug857031.bmoattachments.org/attachment.cgi?id=732270, with the following manifest file:
```
[
    {  "id": "bug857031",
       "file": "../web/pdfs/bug857031.pdf",
       "md5": "",
       "rounds": 250,
       "lastPage": 1,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
firefox | 0    | Overall      |   250 |         2749 |        2656 | -93 | -3.38 |        faster
firefox | 0    | Page Request |   250 |            3 |           4 |   1 | 50.14 |        slower
firefox | 0    | Rendering    |   250 |         2746 |        2652 | -94 | -3.44 |        faster
```

While this is certainly an improvement, since we now avoid re-parsing ~1000 images on the first page, all of the image resources are small enough that the total rendering time doesn't improve that much in this particular case.

In pathological cases, such as e.g. the PDF document in issue 4958, the improvements with this patch can be very significant. Looking for example at page 2, from issue 4958, the rendering time drops from ~60 seconds with `master` to ~30 seconds with this patch (obviously still slow, but it really showcases the potential of this patch nicely).

Finally, note that there's also potential for additional improvements by re-using `LocalImageCache` instances for e.g. /XObject data of the `Form`-type. However, given that recent changes in this area I purposely didn't want to complicate *this* patch more than necessary.
2020-05-25 15:14:14 +02:00
Tim van der Meij
f14215da37
Implement fill opacity for shading patterns in the SVG back-end
In the PDF file from the issue below, the fill alpha (`ca`) is set
before drawing the circles using the `setGState` operator. Doing so
causes the global alpha to be set on the canvas' context for the canvas
back-end, but this was not handled in the SVG back-end. This patch fixes
that by taking the fill opacity into account when drawing shading
patterns in the same way as done elsewhere so it is only included if the
value is non-default.

Fixes #11812.
2020-05-24 14:25:40 +02:00
Tim van der Meij
3b615e4ca3
Merge pull request #11601 from Snuffleupagus/rm-nativeImageDecoderSupport
[api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js`
2020-05-23 15:33:46 +02:00
Jonas Jenwald
8af70d75aa Allow GlobalImageCache.clear to, optionally, only remove the actual data (PR 11912 follow-up)
When "Cleanup" is triggered, you obviously need to remove all globally cached data on *both* the main- and worker-threads.
However, the current the implementation of the `GlobalImageCache.clear` method also means that we lose *all* information about which images were cached and not just their data. This thus has the somewhat unfortunate side-effect of requiring images, which were previously known to be "global", to *again* having to reach `NUM_PAGES_THRESHOLD` before being cached again.

To avoid doing unnecessary parsing after "Cleanup", we can thus let `GlobalImageCache.clear` keep track of which images were cached while still removing their actual data. This should not have any significant impact on memory usage, since the only extra thing being kept is a `RefSetCache` (essentially an Object) with a couple of `Set`s containing only integers.
2020-05-23 11:30:24 +02:00
Jonas Jenwald
56ebf01ae0 Avoid hanging the worker-thread for CMap data with ridiculously large ranges (issue 11922)
This patch was inspired by ad2b64f124/xpdf/CharCodeToUnicode.cc (L480-L484)
2020-05-22 15:23:17 +02:00
Jonas Jenwald
18e0b10d3c [api-minor] Remove the disableCreateObjectURL option from the getDocument parameters, since it's now unused in the API
With the changes in previous patches, the `disableCreateObjectURL` option/functionality is no longer used for anything in the API and/or in the Worker code.

Note however that there's some functionality, mainly related to file loading/downloading, in the GENERIC version of the default viewer which still depends on this option.
Hence the `disableCreateObjectURL` option (and related compatibility code) is moved into the viewer, see e.g. `web/app_options.js`, such that it's still available in the default viewer.
2020-05-22 00:22:48 +02:00
Jonas Jenwald
cc4cc8b11b Remove the, now unused, releaseImageResources helper function
With the changes in the previous patch, this is now dead code which should thus be removed.
2020-05-22 00:22:48 +02:00
Jonas Jenwald
0351852d74 [api-minor] Decode all JPEG images with the built-in PDF.js decoder in src/core/jpg.js
Currently some JPEG images are decoded by the built-in PDF.js decoder in `src/core/jpg.js`, while others attempt to use the browser JPEG decoder. This inconsistency seem unfortunate for a number of reasons:

 - It adds, compared to the other image formats supported in the PDF specification, a fair amount of code/complexity to the image handling in the PDF.js library.

 - The PDF specification support JPEG images with features, e.g. certain ColorSpaces, that browsers are unable to decode natively. Hence, determining if a JPEG image is possible to decode natively in the browser require a non-trivial amount of parsing. In particular, we're parsing (part of) the raw JPEG data to extract certain marker data and we also need to parse the ColorSpace for the JPEG image.

 - While some JPEG images may, for all intents and purposes, appear to be natively supported there's still cases where the browser may fail to decode some JPEG images. In order to support those cases, we've had to implement a fallback to the PDF.js JPEG decoder if there's any issues during the native decoding. This also means that it's no longer possible to simply send the JPEG image to the main-thread and continue parsing, but you now need to actually wait for the main-thread to indicate success/failure first.
   In practice this means that there's a code-path where the worker-thread is forced to wait for the main-thread, while the reverse should *always* be the case.

 - The native decoding, for anything except the *simplest* of JPEG images, result in increased peak memory usage because there's a handful of short-lived copies of the JPEG data (see PR 11707).
Furthermore this also leads to data being *parsed* on the main-thread, rather than the worker-thread, which you usually want to avoid for e.g. performance and UI-reponsiveness reasons.

 - Not all environments, e.g. Node.js, fully support native JPEG decoding. This has, historically, lead to some issues and support requests.

 - Different browsers may use different JPEG decoders, possibly leading to images being rendered slightly differently depending on the platform/browser where the PDF.js library is used.

Originally the implementation in `src/core/jpg.js` were unable to handle all of the JPEG images in the test-suite, but over the last couple of years I've fixed (hopefully) all of those issues.
At this point in time, there's two kinds of failure with this patch:

 - Changes which are basically imperceivable to the naked eye, where some pixels in the images are essentially off-by-one (in all components), which could probably be attributed to things such as different rounding behaviour in the browser/PDF.js JPEG decoder.
   This type of "failure" accounts for the *vast* majority of the total number of changes in the reference tests.

 - Changes where the JPEG images now looks *ever so slightly* blurrier than with the native browser decoder. For quite some time I've just assumed that this pointed to a general deficiency in the `src/core/jpg.js` implementation, however I've discovered when comparing two viewers side-by-side that the differences vanish at higher zoom levels (usually around 200% is enough).
   Basically if you disable [this downscaling in canvas.js](8fb82e939c/src/display/canvas.js (L2356-L2395)), which is what happens when zooming in, the differences simply vanish!
   Hence I'm pretty satisfied that there's no significant problems with the `src/core/jpg.js` implementation, and the problems are rather tied to the general quality of the downscaling algorithm used. It could even be seen as a positive that *all* images now share the same downscaling behaviour, since this actually fixes one old bug; see issue 7041.
2020-05-22 00:22:48 +02:00
Jonas Jenwald
dda6626f40 Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]

Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.

In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.

Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]

*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.

---
[1] There's e.g. PDF documents that use the same image as background on all pages.

[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.

[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-21 18:13:45 +02:00
Jonas Jenwald
8d56a69e74 Reduce usage of SystemJS, in the development viewer, even further
With these changes SystemJS is now only used, during development, on the worker-thread and in the unit/font-tests, since Firefox is currently missing support for worker modules; please see https://bugzilla.mozilla.org/show_bug.cgi?id=1247687

Hence all the JavaScript files in the `web/` and `src/display/` folders are now loaded *natively* by the browser (during development) using standard `import` statements/calls, thanks to a nice `import-maps` polyfill.

*Please note:* As soon as https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 is fixed in Firefox, we should be able to remove all traces of SystemJS and thus finally be able to use every possible modern JavaScript feature.
2020-05-20 13:36:52 +02:00
Jonas Jenwald
e2c3312416 Convert the src/pdf.js and src/pdf.worker.js files to use standard import/export statements
As part of reducing our reliance on SystemJS in the development viewer, this patch replaces usage of `require` statements with modern standards `import`/`export` statements instead.

If we want to try and move forward with reducing usage of SystemJS, we don't have much choice but to make these kind changes (despite what prior test-results showed, however I'm no longer able to reproduce the issues locally).
2020-05-20 13:18:23 +02:00
Jonas Jenwald
d4d933538b Re-factor setPDFNetworkStreamFactory, in src/display/api.js, to also accept an asynchronous function
As part of trying to reduce the usage of SystemJS in the development viewer, this patch is a necessary step that will allow removal of some `require` statements.

Currently this uses `SystemJS.import` in non-PRODUCTION mode, but it should be possible to replace those with standard *dynamic* `import` calls in the future.
2020-05-20 13:18:18 +02:00
Jonas Jenwald
ec0ab91a2b Reduce the usage of require statements in code-paths not protected by pre-processor and/or run-time checks
This replaces some additional `require`/`exports` usage with standard `import`/`export` statements instead.
Hence another, small, part in the effort to reduce the reliance on SystemJS-specific functionality in the development viewer.
2020-05-14 15:57:49 +02:00
Jonas Jenwald
73636e052a Handle errors individually for each annotation in the _parsedAnnotations getter
While working on PR 11872, it occurred to me that it probably wouldn't be a bad idea to change the `_parsedAnnotations` getter to handle errors individually for each annotation. This way, one broken/corrupt annotation won't prevent the rest of them from being e.g. fetched through the API.
2020-05-09 12:33:39 +02:00
Jonas Jenwald
e1f340a0c2 Use the ESLint no-restricted-syntax rule to ensure that assert is always called with two arguments
Having `assert` calls without a message string isn't very helpful when debugging, and it turns out that it's easy enough to make use of ESLint to enforce better `assert` call-sites.
In a couple of cases the `assert` calls were changed to "regular" throwing of errors instead, since that seemed more appropriate.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-restricted-syntax
2020-05-05 13:40:05 +02:00
Tim van der Meij
491904d30a
Merge pull request #11872 from Snuffleupagus/issue-11871
Gracefully handle annotation parsing errors in `Page.getOperatorList` (issue 11871)
2020-05-04 22:19:27 +02:00
Brendan Dahl
b1be33c96f Add more categories of unsupported features.
Fixes #11815
2020-05-04 11:02:16 -07:00
Jonas Jenwald
4aabd063fc Gracefully handle annotation parsing errors in Page.getOperatorList (issue 11871)
This should ensure that a page will always render successfully, even if there's errors during the Annotation fetching/parsing.
Additionally the `OperatorList.addOpList` method is also adjusted to ignore invalid data, to make it slightly more robust.
2020-05-04 17:09:48 +02:00
roccobeno
371e699905
Include the name for interactive form elements
We already rendered the name for radio buttons, but it was missing for
all other interactive form elements. This commit adds that so that
values entered in form elements can be read based on the element name.
2020-04-27 16:55:35 +02:00
Jonas Jenwald
911c33f025 Move the maybeValidDimensions check, used with JPEG images, to occur earlier (PR 11523 follow-up)
Given that the `NativeImageDecoder.{isSupported, isDecodable}` methods require both dictionary lookups *and* ColorSpace parsing, in hindsight it actually seems more reasonable to the `JpegStream.maybeValidDimensions` checks *first*.
2020-04-26 12:07:46 +02:00
Jonas Jenwald
c355f91d2e [api-minor] Immediately release the font.data property once the font been attached to the DOM (PR 11777 follow-up)
*This patch implements https://github.com/mozilla/pdf.js/pull/11777#issuecomment-609741348*

This extends the work from PR 11773 and 11777 further, by immediately releasing the `font.data` property once the font been attached to the DOM. By not unnecessarily holding onto this data on the main-thread, we'll thus reduce the memory usage of fonts even further (especially beneficial in longer documents with composite fonts).

The new behaviour is controlled by the recently added `fontExtraProperties` API option (adding a new option just for this patch didn't seem necessary), since there's one edge-case in the SVG renderer where the `font.data` property is necessary (see the `pdf2svg` example).

Note that while the default viewer does run clean-up with an idle timeout, that timeout will be reset whenever rendering occurs *or* when scrolling happens in the viewer. In practice this means that unless the user doesn't interact with the viewer in *any* way during an extended period of time, currently set to 30 seconds, the `PDFDocumentProxy.cleanup` method will never be called and font resources will thus not be cleaned-up.
2020-04-23 13:04:57 +02:00
Jonas Jenwald
cdc60402f6 [api-minor] Change PageViewport to throw when the rotation is not a multiple of 90 degrees
As evident from the code, `PageViewport` only supports[1] `rotation` values which are a multiple of 90 degrees. Besides it being somewhat difficult to imagine meaningful use-cases for a non-multiple of 90 degrees `rotation`, the code also becomes both simpler and more efficient by not having to consider arbitrary `rotation` values.

However, any invalid rotation will *silently* fallback to assume zero `rotation` which probably isn't great for e.g. `PDFPageProxy.getViewport` in the API. Hence this patch, which will now enforce that only valid `rotation` values are accepted.

---
[1] As far as I can tell, from looking through the history, nothing else has ever been supported either.
2020-04-22 15:19:13 +02:00
Jonas Jenwald
695140728a [src/core/fonts.js] Improve the validateOS2Table function
Rather than creating a new `Stream` just to validate the OS/2 TrueType table, it's simpler/better to just pass in a reference to the font data and use that instead (similar to other TrueType helper functions).
2020-04-19 11:25:25 +02:00
Jonas Jenwald
033d27fc25 [src/core/fonts.js] Replace some unnecessary Stream.getUint16() calls with Stream.skip(2) instead
There's a handful of cases in the code where the intention is simply to advance the `Stream` position, but rather than only doing that the code instead fetches/computes a Uint16 value (and without using the result for anything).
2020-04-19 11:18:20 +02:00
Jonas Jenwald
4fae1ac5c4 [src/core/fonts.js] Replace some unnecessary Stream.getBytes(...) calls with Stream.skip(...) instead
There's a handful of cases in the code where the intention is simply to advance the `Stream` position, but rather than only doing that the code instead fetches the bytes in question (and without using the result for anything).
2020-04-19 11:18:15 +02:00
Tim van der Meij
44da021012
Merge pull request #11814 from tamuratak/svg_text_vertical
Support the vertical writing mode with SVG backend.
2020-04-18 00:34:39 +02:00
Tim van der Meij
7b23476e61
Merge pull request #11818 from Snuffleupagus/eslint-dot-notation
Enable the `dot-notation` ESLint rule
2020-04-18 00:19:47 +02:00