Commit Graph

4206 Commits

Author SHA1 Message Date
Jonas Jenwald
6d192f987e Prevent Uncaught (in promise) AbortException when running the unit-tests
These errors can/will occur if data is still loading when the document is destroyed, which is the case in the API unit-tests that load the `tracemonkey.pdf` file.
While this patch prevents these kind of problems, and thus allows us to update Jasmine again, I cannot help but thinking that it's slightly "hacky". Basically, we'll simply catch and ignore (some) rejected promises once the document is destroyed and/or its data loading is aborted. However, I don't *think* that these changes should cause issues in general, since we don't really care about errors once document destruction has started (note e.g. the fair number of `catch` handlers ignoring `AbortException`s already).
2020-07-31 23:29:05 +02:00
Jonathan Grimes
9b16b8ef71
Allow loading pdf fonts into another document. 2020-07-31 11:41:48 -05:00
Jonas Jenwald
346afd1e1c [api-minor] Fix the AnnotationStorage usage properly in the viewer/tests (PR 12107 and 12143 follow-up)
*The [api-minor] label probably ought to have been added to the original PR, given the changes to the `createAnnotationLayerBuilder` signature (if nothing else).*

This patch fixes the following things:
 - Let the `AnnotationLayer.render` method create an `AnnotationStorage`-instance if none was provided, thus making the parameter *properly* optional. This not only fixes the reference tests, it also prevents issues when the viewer components are used.
 - Stop exporting `AnnotationStorage` in the official API, i.e. the `src/pdf.js` file, since it's no longer necessary given the change above. Generally speaking, unless absolutely necessary we probably shouldn't export unused things in the API.
 - Fix a number of JSDocs `typedef`s, in `src/display/` and `web/` code, to actually account for the new `annotationStorage` parameter.
 - Update `web/interfaces.js` to account for the changes in `createAnnotationLayerBuilder`.
 - Initialize the storage, in `AnnotationStorage`, using `Object.create(null)` rather than `{}` (which is the PDF.js default).
2020-07-31 16:32:46 +02:00
Calixte Denizet
538017f7a7 Add support for radios printing 2020-07-31 14:31:49 +02:00
Aki Sasaki
7bb65bab7f fix reftests after #12107
The f1040-annotations reftest started hanging after #12107. We traced
this to `TypeError: can't access property "getOrCreateValue", storage is
undefined`.

We essentially need to add `annotationStorage` to the parameters in
test/driver.js.
2020-07-30 12:25:27 -07:00
Linus Gasser
f1bbfdc16d Add typescript definitions
This PR adds typescript definitions from the JSDoc already present.
It adds a new gulp-target 'types' that calls 'tsc', the typescript
compiler, to create the definitions.

To use the definitions, users can simply do the following:

```
import {getDocument, GlobalWorkerOptions} from "pdfjs-dist";
import pdfjsWorker from "pdfjs-dist/build/pdf.worker.entry";
GlobalWorkerOptions.workerSrc = pdfjsWorker;

const pdf = await getDocument("file:///some.pdf").promise;
```

Co-authored-by: @oBusk
Co-authored-by: @tamuratak
2020-07-30 11:10:37 +02:00
Tim van der Meij
eb4d6a0652
Merge pull request #12107 from calixteman/checkbox
Add support for checkboxes printing
2020-07-30 00:11:41 +02:00
Calixte Denizet
cb60523a15 Add support for checkboxes printing 2020-07-29 16:42:57 +02:00
Tim van der Meij
6537e64cb8
Merge pull request #12136 from Snuffleupagus/PDFFetchStreamRangeReader-AbortError
Ignore `fetch()` errors, in `PDFFetchStreamRangeReader`, once the request has been aborted
2020-07-29 00:09:51 +02:00
Jonas Jenwald
1b720a4b23 Ignore fetch() errors, in PDFFetchStreamRangeReader, once the request has been aborted
Besides making general sense, as far as I can tell, this patch should also prevent *one* source of `Uncaught (in promise) ...` exceptions.
Unfortunately `reason instanceof AbortError` doesn't work here, since `AbortError` isn't actually defined in browsers; note how even the DOM specification contains an example using the `name` property: https://dom.spec.whatwg.org/#aborting-ongoing-activities

This patch prevents the following errors from being logged in the console, when the unit-tests are running:
 - Firefox: `Uncaught (in promise) DOMException: The operation was aborted.`
 - Chrome: `Uncaught (in promise) DOMException: The user aborted a request.`
2020-07-28 17:18:49 +02:00
Jonas Jenwald
fbe90b63ec [src/core/worker.js] Remove a useless Promise handler from the pdfManagerReady function
Looking carefully at this code, you'll notice that the `loadDocument` function has no less than *three* Promise handling functions. This obviously makes no sense, since a Promise can only have one resolve and one reject handler.

Hence the final `onFailure`-case is unreachable, which only serves to add confusion when reading the code. Note that this code has been re-factored more than once over the years, but it seems as if this may even have been incorrect already in PR 3310 (and no-one have noticed for seven years :-).
2020-07-28 14:51:50 +02:00
Jonas Jenwald
835b5ffddd Only check isType3Font the first time that TranslatedFont.loadType3Data is called
If the `TranslatedFont.type3Loaded` property exists, then you already know that the font must be a Type3 one.
2020-07-27 13:20:15 +02:00
Jonas Jenwald
f3ff526019 Send/receive Type3 images the same way as other globally-cached images
There's quite frankly no particular reason to special-case Type3-fonts with image resources, which are very rare anyway, now that we have a general mechanism for sending/receiving images globally.
2020-07-27 13:20:15 +02:00
Jonas Jenwald
7c9d0d5939 Improve how Type3-fonts with dependencies are handled
While the `CharProcs` streams of Type3-fonts *usually* don't rely on dependencies, such as e.g. images, it does happen in some cases.

Currently any dependencies are simply appended to the parent operatorList, which in practice means *only* the operatorList of the *first* page where the Type3-font is being used.
However, there's one thing that's slightly unfortunate with that approach: Since fonts are global to the PDF document, we really ought to ensure that any Type3 dependencies are appended to the operatorList of *all* pages where the Type3-font is being used. Otherwise there's a theoretical risk that, if one page has its rendering paused, another page may try to use a Type3-font whose dependencies are not yet fully resolved. In that case there would be errors, since Type3 operatorLists are executed synchronously.

Hence this patch, which ensures that all relevant pages will have Type3 dependencies appended to the main operatorList. (Note here that the `OperatorList.addDependencies` method, via `OperatorList.addDependency`, ensures that a dependency is only added *once* to any operatorList.)

Finally, these changes also remove the need for the "waiting for the main-thread"-hack that was added to `PartialEvaluator.buildPaintImageXObject` as part of fixing issue 10717.
2020-07-27 13:20:13 +02:00
Calixte Denizet
584902dbf8 Add an annotation storage in order to save annotation data in acroforms 2020-07-24 10:50:11 +02:00
Jonas Jenwald
684a7b89ac Remove unnecessary duplication in the addChildren helper function (used by the ObjectLoader)
Besides being fewer lines of code overall, this also avoids *one* `node instanceof Dict` check for both of the `Dict`/`Stream`-cases.
2020-07-17 16:32:24 +02:00
Jonas Jenwald
ea8e432c45 Add a getRawValues method, to Dict instances, to provide an easier way of getting all *raw* values
When the old `Dict.getAll()` method was removed, it was replaced with a `Dict.getKeys()` call and `Dict.get(...)` calls (in a loop).
While this pattern obviously makes a lot of sense in many cases, there's some instances where we actually want the *raw* `Dict` values (i.e. `Ref`s where applicable). In those cases, `Dict.getRaw(...)` calls are instead used within the loop. However, by introducing a new `Dict.getRawValues()` method we can reduce the number of (strictly unnecessary) function calls by simply getting the *raw* `Dict` values directly.
2020-07-17 16:32:00 +02:00
Jonas Jenwald
6381b5b08f Add a size getter, to Dict instances, to provide an easier way of checking the number of entries
This removes the need to manually call `Dict.getKeys()` and check its length.
2020-07-17 16:06:11 +02:00
Tim van der Meij
e63d1ebff5
Merge pull request #12087 from Snuffleupagus/LocalGStateCache
Add local caching of "simple" Graphics State (ExtGState) data in `PartialEvaluator.{getOperatorList, getTextContent}` (issue 2813)
2020-07-17 16:02:45 +02:00
Tim van der Meij
b19a1796ac
Convert RefSetCache to a proper class and to use a Map internally
Using a `Map` instead of an `Object` provides some advantages such as
cheaper ways to get the size of the cache, to find out if an entry is
contained in the cache and to iterate over the cache. Moreover, we can
clear and re-use the same `Map` object now instead of creating a new
one.
2020-07-17 13:35:29 +02:00
Tim van der Meij
a604973cc7
Merge pull request #12085 from tamuratak/fix_isnodejs
Make the detection of Node.js environments on Electron strict.
2020-07-17 13:29:59 +02:00
Jonas Jenwald
b3480842b3 Use a RefSet, rather than a plain Object, for tracking already processed nodes in PartialEvaluator.hasBlendModes 2020-07-17 09:52:36 +02:00
Jonas Jenwald
03547b5633 Change PartialEvaluator.setGState to an async method
Since this method calls `Dict.get` to fetch data, there could thus be `Error`s thrown in corrupt PDF documents when attempting to resolve an indirect object.
To ensure that this won't ever become a problem, we change the method to be `async` such that a rejected Promise would be returned and general OperatorList parsing won't break.
2020-07-15 14:27:18 +02:00
Jonas Jenwald
f20aeb9343 Slightly simplify the code in PartialEvaluator.hasBlendModes, e.g. by using for...of loops
- Replace the existing loops with `for...of` variants instead.

 - Make use of `continue`, to reduce indentation and to make the code (slightly) easier to follow, when checking `/Resources` entries.
2020-07-15 12:47:11 +02:00
Jonas Jenwald
15fa3f8518 Remove a redundant /XObject stream dictionary objId check in PartialEvaluator.hasBlendModes (PR 6971 follow-up)
This case should no longer happen, given the `instanceof Ref` branch just above (added in PR 6971).
Also, I've run the entire test-suite locally with `continue` replaced by `throw new Error(...)` and didn't find any problems.
2020-07-15 12:47:11 +02:00
Jonas Jenwald
84476da26e Handle lookup errors "silently" in PartialEvaluator.hasBlendModes (PR 11680 follow-up)
Given that this method is used during what's essentially a *pre*-parsing stage, before the actual OperatorList parsing occurs, on second thought it doesn't seem at all necessary to warn and trigger fallback in cases where there's lookup errors.

*Please note:* Any any errors will still be either suppressed or thrown, according to the `ignoreErrors` option, during the *actual* OperatorList parsing.
2020-07-15 12:47:07 +02:00
Jonas Jenwald
981ff41b5f Add local caching of non-font Graphics State (ExtGState) data in PartialEvaluator.getTextContent
It turns out that `getTextContent` suffers from *similar* problems with repeated GStates as `getOperatorList`; please see the previous patch.

While only `/ExtGState` resources containing Fonts will actually be *parsed* by `PartialEvaluator.getTextContent`, we're still forced to fetch/validate repeated `/ExtGState` resources even though *most* of them won't affect the textContent (since they mostly contain purely graphical state).

With these changes we also no longer need to immediately reset the current text-state when encountering a `setGState` operator, which may thus improve text-selection in some cases.
2020-07-14 10:34:43 +02:00
Jonas Jenwald
90eb579713 Add local caching of "simple" Graphics State (ExtGState) data in PartialEvaluator.getOperatorList (issue 2813)
This patch will help pathological cases the most, with issue 2813 being a particularily problematic example. While there's only *four* `/ExtGState` resources, there's a total `29062` of `setGState` operators. Even though parsing of a single `/ExtGState` resource is quite fast, having to re-parse them thousands of times does add up quite significantly.

For simplicity we'll only cache "simple" `/ExtGState` resource, since e.g. the general `SMask` case cannot be easily cached (without re-factoring other code, which may have undesirable effects on general parsing).

By caching "simple" `/ExtGState` resource, we thus improve performance by:
 - Not having to fetch/validate/parse the same `/ExtGState` data over and over.
 - Handling of repeated `setGState` operators becomes *synchronous* during the `OperatorList` building, instead of having to defer to the event-loop/microtask-queue since the `/ExtGState` parsing is done asynchronously.

---

Obviously I had intended to include (standard) benchmark results with this patch, but for reasons I don't understand the test run-time (even with `master`) of the document in issue 2813 is *a lot* slower than in the development viewer (making normal benchmarking infeasible).
However, testing this manually in the development viewer (using `pdfBug=Stats`) shows a *reduction* of `~10 %` in the rendering time of the PDF document in issue 2813.
2020-07-14 10:34:43 +02:00
Jonas Jenwald
d4d7ac1b88 Stop special-casing the (very unlikely) "no /XObject found"-scenario, when parsing OPS.paintXObject operators, in PartialEvaluator.{getOperatorList, getTextContent}
Originally there weren't any (generally) good ways to handle errors gracefully, on the worker-side, however that's no longer the case and we can simply fallback to the existing `ignoreErrors` functionality instead.
Also, please note that the "no `/XObject` found"-scenario should be *extremely* unlikely in practice and would only occur in corrupt/broken documents.

Note that the `PartialEvaluator.getOperatorList` case is especially bad currently, since we'll simply (attempt to) send the data as-is to the main-thread. This is quite bad, since in a corrupt/broken document the data *could* contain anything and e.g. be unclonable (which would cause breaking errors).
Also, we're (obviously) not attempting to do anything with this "raw" `OPS.paintXObject` data on the main-thread and simply ensuring that we never send it definately seems like the correct approach.
2020-07-12 21:59:59 +02:00
Takashi Tamura
473ea1f1a4 Make the detection of Node.js environments on Electron strict.
The main process and its child processes should be detected as Node.js environments.
2020-07-12 10:56:17 +09:00
Tim van der Meij
7dabc5ecc8
Merge pull request #12063 from Snuffleupagus/issue-10989
Tweak the heuristic, in `src/core/jpg.js`, that handles JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10989)
2020-07-11 00:05:11 +02:00
Jonas Jenwald
d18cf47419 Remove the special handling, used when creating Indexed ColorSpaces, for the case where the lookup-data is a Stream
This special-case was added in PR 1992, however it became unnecessary with the changes in PR 4824 since all of the ColorSpace parsing is now done on the worker-thread (with only RGB-data being sent to the main-thread).
2020-07-10 17:22:55 +02:00
Jonas Jenwald
ea6a0e4435 Remove the IR (internal representation) part of the ColorSpace parsing
Originally ColorSpaces were only *partially* parsed on the worker-thread, to obtain an IR-format which was sent to the main-thread. This had the somewhat unfortunate side-effect of causing the majority of the (potentially heavy) ColorSpace parsing to happen on the main-thread.
Hence PR 4824 which, among other things, changed ColorSpaces to be *fully* parsed on the worker-thread with only RGB-data being sent to the main-thread.

While it thus originally was necessary to have `ColorSpace.{parseToIR, fromIR}` methods, to handle the worker/main-thread split, that's no longer the case and we can thus reduce all of the ColorSpace parsing to one method instead.

Currently, when parsing a ColorSpace, we call `ColorSpace.parseToIR` which parses the ColorSpace-data from the document and then creates the IR-format. We then, immediately, call `ColorSpace.fromIR` which parses the IR-format and then finally creates the actual ColorSpace.[1]
All-in-all, this leads to a fair amount of unnecessary indirection which also (in my opinion) makes the code less clear.

Obviously these changes are not really expected to have a significant effect on performance, especially with the recently added caching of ColorSpaces, however there'll now be strictly fewer function calls, less memory allocated, and overall less parsing required during ColorSpace-handling.

---
[1] For ICCBased ColorSpaces, given the validation necessary, this currently even leads to parsing an /Alternate ColorSpace *twice*.
2020-07-10 17:22:44 +02:00
Jonas Jenwald
4cc6797f17 Re-factor the idFactory functionality, used in the core/-code, and move the fontID generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.

For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.

In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 16:33:31 +02:00
Jonas Jenwald
1d66fce781 Tweak the heuristic, in src/core/jpg.js, that handles JPEG images with a wildly incorrect SOF (Start of Frame) scanLines parameter (issue 10989) 2020-07-06 13:06:49 +02:00
Jonas Jenwald
c95fbb6e21 Convert the code in src/core/evaluator.js to use standard classes
This removes additional `// eslint-disable-next-line no-shadow` usage, which our old pseudo-classes necessitated.

Most of the re-formatting changes, after the `class` definitions and methods were fixed, were done automatically by Prettier.

*Please note:* I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.
2020-07-05 16:01:04 +02:00
Jonas Jenwald
32a0b6fa73 Move some constants and helper functions out of the PartialEvaluator closure
This will simplify the `class` conversion in the next patch, and with modern JavaScript the moved code is still limited to the current module scope.

*Please note:* For improved consistency with our usual formatting, the `TILING_PATTERN`/`SHADING_PATTERN` constants where re-factored slightly.
2020-07-05 15:56:23 +02:00
Tim van der Meij
c4255fdbfd
Merge pull request #12059 from Snuffleupagus/image-class
Convert the code in `src/core/image.js` to use ES6 classes
2020-07-05 14:08:55 +02:00
Jonas Jenwald
59da1d5829 Convert the code in src/core/image.js to use ES6 classes
This removes additional `// eslint-disable-next-line no-shadow` usage, which our old pseudo-classes necessitated.

*Please note:* I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.
2020-07-05 09:34:14 +02:00
Jonas Jenwald
85ced3fbfd Allow BaseLocalCache to, optionally, only allocate storage for caching of references (PR 12034 follow-up)
*Yet another instalment in the never-ending series of things that you think of __after__ a patch has landed.*

Since `Function`s are only cached by reference, we thus don't need to allocate storage for names in `LocalFunctionCache` instances. Obviously the effect of these changes are *really tiny*, but it seems reasonable in principle to avoid allocating data structures that are guaranteed to be unused.
2020-07-04 15:01:32 +02:00
Jonas Jenwald
ca719ecaa4 Add local caching of Functions, by reference, in the PDFFunctionFactory (issue 2541)
Note that compared other structures, such as e.g. Images and ColorSpaces, `Function`s are not referred to by name, which however does bring the advantage of being able to share the cache for an *entire* page.
Furthermore, similar to ColorSpaces, the parsing of individual `Function`s are generally fast enough to not really warrant trying to cache them in any "smarter" way than by reference. (Hence trying to do caching similar to e.g. Fonts would most likely be a losing proposition, given the amount of data lookup/parsing that'd be required.)

Originally I tried implementing this similar to e.g. the recently added ColorSpace caching (and in a couple of different ways), however it unfortunately turned out to be quite ugly/unwieldy given the sheer number of functions/methods where you'd thus need to pass in a `LocalFunctionCache` instance. (Also, the affected functions/methods didn't exactly have short signatures as-is.)
After going back and forth on this for a while it seemed to me that the simplest, or least "invasive" if you will, solution would be if each `PartialEvaluator` instance had its *own* `PDFFunctionFactory` instance (since the latter is already passed to all of the required code). This way each `PDFFunctionFactory` instances could have a local `Function` cache, without it being necessary to provide a `LocalFunctionCache` instance manually at every `PDFFunctionFactory.{create, createFromArray}` call-site.

Obviously, with this patch, there's now (potentially) more `PDFFunctionFactory` instances than before when the entire document shared just one. However, each such instance is really quite small and it's also tied to a `PartialEvaluator` instance and those are *not* kept alive and/or cached. To reduce the impact of these changes, I've tried to make as many of these structures as possible *lazily initialized*, specifically:

 - The `PDFFunctionFactory`, on `PartialEvaluator` instances, since not all kinds of general parsing actually requires it. For example: `getTextContent` calls won't cause any `Function` to be parsed, and even some `getOperatorList` calls won't trigger `Function` parsing (if a page contains e.g. no Patterns or "complex" ColorSpaces).

 - The `LocalFunctionCache`, on `PDFFunctionFactory` instances, since only certain parsing requires it. Generally speaking, only e.g. Patterns, "complex" ColorSpaces, and/or (some) SoftMasks will trigger any `Function` parsing.

To put these changes into perspective, when loading/rendering all (14) pages of the default `tracemonkey.pdf` file there's now a total of 6 `PDFFunctionFactory` and 1 `LocalFunctionCache` instances created thanks to the lazy initialization.
(If you instead would keep the document-"global" `PDFFunctionFactory` instance and pass around `LocalFunctionCache` instances everywhere, the numbers for the `tracemonkey.pdf` file would be instead be something like 1 `PDFFunctionFactory` and 6 `LocalFunctionCache` instances.)
All-in-all, I thus don't think that the `PDFFunctionFactory` changes should be generally problematic.

With these changes, we can also modify (some) call-sites to pass in a `Reference` rather than the actual `Function` data. This is nice since `Function`s can also be `Streams`, which are not cached on the `XRef` instance (given their potential size), and this way we can avoid unnecessary lookups and thus save some additional time/resources.

Obviously I had intended to include (standard) benchmark results with these changes, but for reasons I don't really understand the test run-time (even with `master`) of the document in issue 2541 is quite a bit slower than in the development viewer.
However, logging the time it takes for the relevant `PDFFunctionFactory`/`PDFFunction ` parsing shows that it takes *approximately* `0.5 ms` for the `Function` in question. Looking up a cached `Function`, on the other hand, is *one order of magnitude faster* which does add up when the same `Function` is invoked close to 2000 times.
2020-07-04 00:55:18 +02:00
Jonas Jenwald
4a7e29865d [api-minor] Use the NodeCanvasFactory/NodeCMapReaderFactory classes as defaults in Node.js environments (issue 11900)
This moves, and slightly simplifies, code that's currently residing in the unit-test utils into the actual library, such that it's bundled with `GENERIC`-builds and used in e.g. the API-code.

As an added bonus, this also brings out-of-the-box support for CMaps in e.g. the Node.js examples.
2020-07-02 04:44:23 +02:00
Brendan Dahl
fe3df495cc
Merge pull request #12040 from wojtekmaj/replace-non-inclusive
Replace non-inclusive "whitelist" term with "allowlist"
2020-07-01 15:41:41 -07:00
Jonas Jenwald
fef24658e7 Adjust the heuristics used when dealing with rectangles, i.e. re operators, with zero width/height (issue 12010) 2020-07-02 00:02:49 +02:00
Wojciech Maj
78970bbbe1
Replace non-inclusive "whitelist" term with "allowlist" 2020-06-29 17:15:14 +02:00
Jonas Jenwald
28d2ada59c Attempt to detect inline images which contain "EI" sequence in the actual image data (issue 11124)
This should reduce the possibility of accidentally truncating some inline images, while *not* causing the "EI" detection to become significantly slower.[1]
There's obviously a possibility that these added checks are not sufficient to catch *every* single case of "EI" sequences within the actual inline image data, but without specific test-cases I decided against over-engineering the solution here.

*Please note:* The interpolation issues are somewhat orthogonal to the main issue here, which is the truncated image, and it's already tracked elsewhere.

---
[1] I've looked at the issue a few times, and this is the first approach that I was able to come up with that didn't cause *unacceptable* performance regressions in e.g. issue 2618.
2020-06-26 13:15:06 +02:00
Jonas Jenwald
b8e1352934 Stop passing in unnecessary parameters when parsing the Alternate entry of ICCBased ColorSpaces (PR 9659 follow-up)
With the changes made in PR 9659, `ColorSpace.fromIR` no longer takes a second `pdfFunctionFactory` parameter and there's thus one call-site that can be simplified.
2020-06-24 23:53:10 +02:00
Jonas Jenwald
19d7976483 Improve (local) caching of parsed ColorSpaces (PR 12001 follow-up)
This patch contains the following *notable* improvements:
 - Changes the `ColorSpace.parse` call-sites to, where possible, pass in a reference rather than actual ColorSpace data (necessary for the next point).
 - Adds (local) caching of `ColorSpace`s by `Ref`, when applicable, in addition the caching by name. This (generally) improves `ColorSpace` caching for e.g. the SMask code-paths.
 - Extends the (local) `ColorSpace` caching to also apply when handling Images and Patterns, thus further reducing unneeded re-parsing.
 - Adds a new `ColorSpace.parseAsync` method, almost identical to the existing `ColorSpace.parse` one, but returning a Promise instead (this simplifies some code in the `PartialEvaluator`).
2020-06-24 23:53:10 +02:00
Jonas Jenwald
51e87b9248 Add a proper LocalColorSpaceCache class, rather than piggybacking on the image one (PR 12001 follow-up)
This will allow caching of ColorSpaces by either `Name` *or* `Ref`, which doesn't really make sense for images, thus allowing (better) caching for ColorSpaces used with e.g. Images and Patterns.
2020-06-24 23:53:10 +02:00
Jonas Jenwald
e22bc483a5 Re-factor ColorSpace.parse to take a parameter object, rather than a bunch of (randomly) ordered parameters
Given the number of existing parameters, this will avoid needlessly unwieldy call-sites especially with upcoming changes in later patches.
2020-06-24 23:53:10 +02:00
Jonas Jenwald
f0708717a9 Move the fetchBuiltInCMap method to the PartialEvaluator.prototype
Defining this *inline* in the "constructor" looks slightly weird (I really don't know why I wrote it like that originally), and it can simply be changed to a regular method instead.
2020-06-24 17:29:47 +02:00
Tim van der Meij
c1cb9ee9fc
Merge pull request #12016 from Snuffleupagus/issue-8078
Tweak the `QueueOptimizer` to recognize `OPS.paintImageMaskXObject` operators as *repeated* when the "skew" transformation matrix elements are non-zero (issue 8078)
2020-06-21 19:38:27 +02:00
Jonas Jenwald
4cb0c032f3 Convert the PDFPageProxy.intentStates property from an Object to a Map
As can be seen in the code there's a handful of places where this structure needs to be iterated, something that becomes cumbersome when dealing with `Object`s. Hence, by changing this to a `Map` instead we can both simplify the code and avoid creating unnecessary closures.

Particularily the `PDFPageProxy._tryCleanup` method becomes a lot more readable, at least in my opinion.

Finally, since this property is intended to be "private" the name is adjusted to reflect that.
2020-06-21 17:02:42 +02:00
Jonas Jenwald
cabc2cc4fc Add a InternalRenderTask.completed getter and use it to simplify PDFPageProxy._destroy
This patch aims to simplify the `PDFPageProxy._destroy` method, by:
 - Replacing the unnecessary `forEach` with a "regular" `for`-loop instead.
 - Use a more appropriate variable name, since `intentState.renderTasks` contain instances of `InternalRenderTask`.
 - Move the "is rendering completed"-handling to a new `InternalRenderTask.completed` getter, to abstract away some (mostly) internal `InternalRenderTask` state.
2020-06-21 15:56:14 +02:00
Jonas Jenwald
a04a5d8325 Tweak the loop in ChunkedStreamManager.abort to clarify what's being iterated (PR 11985 follow-up)
In hindsight, using the `for (let [key, value] of myMap) { ... }`-format when we don't care about the `key` probably wasn't such a great idea. Since `Map`s have explicit support for iterating either `key`s or `value`s, we should probably use that instead here.
2020-06-21 11:29:05 +02:00
Jonas Jenwald
e18fa3fc45 Tweak the QueueOptimizer to recognize OPS.paintImageMaskXObject operators as *repeated* when the "skew" transformation matrix elements are non-zero (issue 8078)
*First of all, I should mention that my understanding of the finer details of the `QueueOptimizer` (and its related `CanvasGraphics` methods) is somewhat limited.*
Hence I'm not sure if there's actually a very good reason for *only* considering ImageMasks where the "skew" transformation matrix elements are zero as *repeated*, however simply looking at the code I just don't see why these elements cannot be non-zero as long as they are *all identical* for the ImageMasks.
Furthermore, looking at the *group* case (which is what we're currently falling back to), there's no particular limitation placed upon the transformation matrix elements.

While this patch obviously isn't enough to *completely* fix the issue, since there should be a visible Pattern rendered as well[1], it seem (at least to me) like enough of an improvement that submitting this is justified.
With these changes the referenced PDF document will no longer hang the *entire* browser, and rendering also finishes in a *reasonable* time (< 10 seconds for me) which seem fine given the *huge* number of identical inline images present.[2]

---
[1] Temporarily changing the Pattern to a solid color *does* render the correct/expected area, which suggests that the remaining problem is a pre-existing issue related to the Pattern-handling itself rather than the `QueueOptimizer` functionality.

[2] The document isn't exactly rendered immediately in e.g. Adobe Reader either.
2020-06-20 12:18:48 +02:00
Tim van der Meij
8cfdfb237a
Merge pull request #12005 from Snuffleupagus/cff-class
Convert the code in `src/core/cff_parser.js` to use ES6 classes
2020-06-17 23:30:28 +02:00
Jonas Jenwald
880a0a0f59 Convert the code in src/core/cff_parser.js to use ES6 classes
This removes multiple instances of `// eslint-disable-next-line no-shadow`, which our old pseudo-classes necessitated.

*Please note:* I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.
2020-06-16 12:33:21 +02:00
Jonas Jenwald
fb9b574f3d Convert the code in src/core/worker.js to use ES6 classes
This removes one instance of `// eslint-disable-next-line no-shadow`, which our old pseudo-classes necessitated.

*Please note:* I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.
2020-06-16 11:54:59 +02:00
Jonas Jenwald
87b089ba42 Lazily initialize, and cache, the regular expression used in CFFCompiler.encodeFloat
There's no particular reason for re-creating the regular expression over and over for every `encodeFloat` invocation, as far as I can tell.
2020-06-15 13:51:28 +02:00
Jonas Jenwald
517d92a121 Simplify the "is integer" checks in CFFCompiler.encodeNumber
The `isNaN` check is obviously redundant, since `NaN` is the only value that isn't equal to itself; see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/NaN#Examples

The `parseFloat`/`parseInt` comparison would make sense if the `value` ever contains a String, which however is never actually the case. Besides looking through the code, I've also run the entire test-suite locally with `assert(typeof value === "number", "encodeNumber");` added at the top of the method and there were no failures.

Hence we can simplify the "is integer" check a bit in the `CFFCompiler.encodeNumber` method.
2020-06-15 13:51:20 +02:00
Jonas Jenwald
5c39de805c Add local caching of ColorSpaces, by name, in PartialEvaluator.getOperatorList (issue 2504)
By caching parsed `ColorSpace`s, we thus don't need to re-parse the same data over and over which saves CPU cycles *and* reduces peak memory usage. (Obviously persistent memory usage *may* increase a tiny bit, but since the caching is done per `PartialEvaluator.getOperatorList` invocation and given that `ColorSpace` instances generally hold very little data this shouldn't be much of an issue.)
Furthermore, by caching `ColorSpace`s we can also lookup the already parsed ones *synchronously* during the `OperatorList` building, instead of having to defer to the event loop/microtask queue since the parsing is done asynchronously (such that error handling is easier).

Possible future improvements:
 - Cache/lookup parsed `ColorSpaces` used in `Pattern`s and `Image`s.
 - Attempt to cache *local* `ColorSpace`s by reference as well, in addition to only by name, assuming that there's documents where that would be beneficial and that it's not too difficult to implement.
 - Assuming there's documents that would benefit from it, also cache repeated `ColorSpace`s *globally* as well.

Given that we've never, until now, been doing *any* caching of parsed `ColorSpace`s and that even using a simple name-only *local* cache helps tremendously in pathological cases, I purposely decided against complicating the implementation too much initially.
Also, compared to parsing of `Image`s, simply creating a `ColorSpace` instance isn't that expensive (hence I'd be somewhat surprised if adding a *global* cache would help much).

---

This patch was tested using:
 - The default `tracemonkey` PDF file, which was included mostly to show that "normal" documents aren't negatively affected by these changes.
 - The PDF file from issue 2504, i.e. https://dl-ctlg.panasonic.com/jp/manual/sd/sd_rbm1000_0.pdf, where most pages will switch *thousands* of times between a handful of `ColorSpace`s.

with the following manifest file:
```
[
    {  "id": "tracemonkey",
       "file": "pdfs/tracemonkey.pdf",
       "md5": "9a192d8b1a7dc652a19835f6f08098bd",
       "rounds": 100,
       "type": "eq"
    },
    {  "id": "issue2504",
       "file": "../web/pdfs/issue2504.pdf",
       "md5": "",
       "rounds": 20,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
 - Overall
```
-- Grouped By browser, pdf, stat --
browser | pdf         | stat         | Count | Baseline(ms) | Current(ms) |  +/- |     %  | Result(P<.05)
------- | ----------- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | issue2504   | Overall      |   640 |          977 |         497 | -479 | -49.08 |        faster
firefox | issue2504   | Page Request |   640 |            3 |           4 |    1 |  59.18 |
firefox | issue2504   | Rendering    |   640 |          974 |         493 | -481 | -49.37 |        faster
firefox | tracemonkey | Overall      |  1400 |          116 |         111 |   -5 |  -4.43 |
firefox | tracemonkey | Page Request |  1400 |            2 |           2 |    0 |  -2.86 |
firefox | tracemonkey | Rendering    |  1400 |          114 |         109 |   -5 |  -4.47 |
```

 - Page-specific
```
-- Grouped By browser, pdf, page, stat --
browser | pdf         | page | stat         | Count | Baseline(ms) | Current(ms) |   +/- |      %  | Result(P<.05)
------- | ----------- | ---- | ------------ | ----- | ------------ | ----------- | ----- | ------- | -------------
firefox | issue2504   | 0    | Overall      |    20 |         2295 |        1268 | -1027 |  -44.76 |        faster
firefox | issue2504   | 0    | Page Request |    20 |            6 |           7 |     1 |   15.32 |
firefox | issue2504   | 0    | Rendering    |    20 |         2288 |        1260 | -1028 |  -44.93 |        faster
firefox | issue2504   | 1    | Overall      |    20 |         3059 |        2806 |  -252 |   -8.25 |        faster
firefox | issue2504   | 1    | Page Request |    20 |           11 |          14 |     3 |   23.25 |        slower
firefox | issue2504   | 1    | Rendering    |    20 |         3047 |        2792 |  -255 |   -8.37 |        faster
firefox | issue2504   | 2    | Overall      |    20 |          411 |         295 |  -116 |  -28.20 |        faster
firefox | issue2504   | 2    | Page Request |    20 |            2 |          42 |    40 | 1897.62 |
firefox | issue2504   | 2    | Rendering    |    20 |          409 |         253 |  -156 |  -38.09 |        faster
firefox | issue2504   | 3    | Overall      |    20 |          736 |         299 |  -437 |  -59.34 |        faster
firefox | issue2504   | 3    | Page Request |    20 |            2 |           2 |     0 |    0.00 |
firefox | issue2504   | 3    | Rendering    |    20 |          734 |         297 |  -437 |  -59.49 |        faster
firefox | issue2504   | 4    | Overall      |    20 |          356 |         458 |   102 |   28.63 |
firefox | issue2504   | 4    | Page Request |    20 |            1 |           2 |     1 |   57.14 |        slower
firefox | issue2504   | 4    | Rendering    |    20 |          354 |         455 |   101 |   28.53 |
firefox | issue2504   | 5    | Overall      |    20 |         1381 |         765 |  -616 |  -44.59 |        faster
firefox | issue2504   | 5    | Page Request |    20 |            3 |           5 |     2 |   50.00 |        slower
firefox | issue2504   | 5    | Rendering    |    20 |         1378 |         760 |  -617 |  -44.81 |        faster
firefox | issue2504   | 6    | Overall      |    20 |          757 |         299 |  -459 |  -60.57 |        faster
firefox | issue2504   | 6    | Page Request |    20 |            2 |           5 |     3 |  150.00 |        slower
firefox | issue2504   | 6    | Rendering    |    20 |          755 |         294 |  -462 |  -61.11 |        faster
firefox | issue2504   | 7    | Overall      |    20 |          394 |         302 |   -92 |  -23.39 |        faster
firefox | issue2504   | 7    | Page Request |    20 |            2 |           1 |    -1 |  -34.88 |        faster
firefox | issue2504   | 7    | Rendering    |    20 |          392 |         301 |   -91 |  -23.32 |        faster
firefox | issue2504   | 8    | Overall      |    20 |         2875 |         979 | -1896 |  -65.95 |        faster
firefox | issue2504   | 8    | Page Request |    20 |            1 |           2 |     0 |   11.11 |
firefox | issue2504   | 8    | Rendering    |    20 |         2874 |         978 | -1896 |  -65.99 |        faster
firefox | issue2504   | 9    | Overall      |    20 |          700 |         332 |  -368 |  -52.60 |        faster
firefox | issue2504   | 9    | Page Request |    20 |            3 |           2 |     0 |   -4.00 |
firefox | issue2504   | 9    | Rendering    |    20 |          698 |         329 |  -368 |  -52.78 |        faster
firefox | issue2504   | 10   | Overall      |    20 |         3296 |         926 | -2370 |  -71.91 |        faster
firefox | issue2504   | 10   | Page Request |    20 |            2 |           2 |     0 |  -18.75 |
firefox | issue2504   | 10   | Rendering    |    20 |         3293 |         924 | -2370 |  -71.96 |        faster
firefox | issue2504   | 11   | Overall      |    20 |          524 |         197 |  -327 |  -62.34 |        faster
firefox | issue2504   | 11   | Page Request |    20 |            2 |           3 |     1 |   58.54 |
firefox | issue2504   | 11   | Rendering    |    20 |          522 |         194 |  -328 |  -62.81 |        faster
firefox | issue2504   | 12   | Overall      |    20 |          752 |         369 |  -384 |  -50.98 |        faster
firefox | issue2504   | 12   | Page Request |    20 |            3 |           2 |    -1 |  -36.51 |        faster
firefox | issue2504   | 12   | Rendering    |    20 |          749 |         367 |  -382 |  -51.05 |        faster
firefox | issue2504   | 13   | Overall      |    20 |          679 |         487 |  -193 |  -28.38 |        faster
firefox | issue2504   | 13   | Page Request |    20 |            4 |           2 |    -2 |  -48.68 |        faster
firefox | issue2504   | 13   | Rendering    |    20 |          676 |         485 |  -191 |  -28.28 |        faster
firefox | issue2504   | 14   | Overall      |    20 |          474 |         283 |  -191 |  -40.26 |        faster
firefox | issue2504   | 14   | Page Request |    20 |            2 |           4 |     2 |   78.57 |
firefox | issue2504   | 14   | Rendering    |    20 |          471 |         279 |  -192 |  -40.79 |        faster
firefox | issue2504   | 15   | Overall      |    20 |          860 |         618 |  -241 |  -28.05 |        faster
firefox | issue2504   | 15   | Page Request |    20 |            2 |           3 |     0 |   10.87 |
firefox | issue2504   | 15   | Rendering    |    20 |          857 |         616 |  -241 |  -28.15 |        faster
firefox | issue2504   | 16   | Overall      |    20 |          389 |         243 |  -147 |  -37.71 |        faster
firefox | issue2504   | 16   | Page Request |    20 |            2 |           2 |     0 |    2.33 |
firefox | issue2504   | 16   | Rendering    |    20 |          387 |         240 |  -147 |  -37.94 |        faster
firefox | issue2504   | 17   | Overall      |    20 |         1484 |         672 |  -812 |  -54.70 |        faster
firefox | issue2504   | 17   | Page Request |    20 |            2 |           3 |     1 |   37.21 |
firefox | issue2504   | 17   | Rendering    |    20 |         1482 |         669 |  -812 |  -54.84 |        faster
firefox | issue2504   | 18   | Overall      |    20 |          575 |         252 |  -323 |  -56.12 |        faster
firefox | issue2504   | 18   | Page Request |    20 |            2 |           2 |     0 |  -16.22 |
firefox | issue2504   | 18   | Rendering    |    20 |          573 |         251 |  -322 |  -56.24 |        faster
firefox | issue2504   | 19   | Overall      |    20 |          517 |         227 |  -290 |  -56.08 |        faster
firefox | issue2504   | 19   | Page Request |    20 |            2 |           2 |     0 |   21.62 |
firefox | issue2504   | 19   | Rendering    |    20 |          515 |         225 |  -290 |  -56.37 |        faster
firefox | issue2504   | 20   | Overall      |    20 |          668 |         670 |     2 |    0.31 |
firefox | issue2504   | 20   | Page Request |    20 |            4 |           2 |    -1 |  -34.29 |
firefox | issue2504   | 20   | Rendering    |    20 |          664 |         667 |     3 |    0.49 |
firefox | issue2504   | 21   | Overall      |    20 |          486 |         309 |  -177 |  -36.44 |        faster
firefox | issue2504   | 21   | Page Request |    20 |            2 |           2 |     0 |   16.13 |
firefox | issue2504   | 21   | Rendering    |    20 |          484 |         307 |  -177 |  -36.60 |        faster
firefox | issue2504   | 22   | Overall      |    20 |          543 |         267 |  -276 |  -50.85 |        faster
firefox | issue2504   | 22   | Page Request |    20 |            2 |           2 |     0 |   10.26 |
firefox | issue2504   | 22   | Rendering    |    20 |          541 |         265 |  -276 |  -51.07 |        faster
firefox | issue2504   | 23   | Overall      |    20 |         3246 |         871 | -2375 |  -73.17 |        faster
firefox | issue2504   | 23   | Page Request |    20 |            2 |           3 |     1 |   37.21 |
firefox | issue2504   | 23   | Rendering    |    20 |         3243 |         868 | -2376 |  -73.25 |        faster
firefox | issue2504   | 24   | Overall      |    20 |          379 |         156 |  -223 |  -58.83 |        faster
firefox | issue2504   | 24   | Page Request |    20 |            2 |           2 |     0 |   -2.86 |
firefox | issue2504   | 24   | Rendering    |    20 |          378 |         154 |  -223 |  -59.10 |        faster
firefox | issue2504   | 25   | Overall      |    20 |          176 |         127 |   -50 |  -28.19 |        faster
firefox | issue2504   | 25   | Page Request |    20 |            2 |           1 |     0 |  -15.63 |
firefox | issue2504   | 25   | Rendering    |    20 |          175 |         125 |   -49 |  -28.31 |        faster
firefox | issue2504   | 26   | Overall      |    20 |          181 |         108 |   -74 |  -40.67 |        faster
firefox | issue2504   | 26   | Page Request |    20 |            3 |           2 |    -1 |  -39.13 |        faster
firefox | issue2504   | 26   | Rendering    |    20 |          178 |         105 |   -72 |  -40.69 |        faster
firefox | issue2504   | 27   | Overall      |    20 |          208 |         104 |  -104 |  -49.92 |        faster
firefox | issue2504   | 27   | Page Request |    20 |            2 |           2 |     1 |   48.39 |
firefox | issue2504   | 27   | Rendering    |    20 |          206 |         102 |  -104 |  -50.64 |        faster
firefox | issue2504   | 28   | Overall      |    20 |          241 |         111 |  -131 |  -54.16 |        faster
firefox | issue2504   | 28   | Page Request |    20 |            2 |           2 |    -1 |  -33.33 |
firefox | issue2504   | 28   | Rendering    |    20 |          239 |         109 |  -130 |  -54.39 |        faster
firefox | issue2504   | 29   | Overall      |    20 |          321 |         196 |  -125 |  -39.05 |        faster
firefox | issue2504   | 29   | Page Request |    20 |            1 |           2 |     0 |   17.86 |
firefox | issue2504   | 29   | Rendering    |    20 |          319 |         194 |  -126 |  -39.35 |        faster
firefox | issue2504   | 30   | Overall      |    20 |          651 |         271 |  -380 |  -58.41 |        faster
firefox | issue2504   | 30   | Page Request |    20 |            1 |           2 |     1 |   50.00 |
firefox | issue2504   | 30   | Rendering    |    20 |          649 |         269 |  -381 |  -58.60 |        faster
firefox | issue2504   | 31   | Overall      |    20 |         1635 |         647 |  -988 |  -60.42 |        faster
firefox | issue2504   | 31   | Page Request |    20 |            1 |           2 |     0 |   30.43 |
firefox | issue2504   | 31   | Rendering    |    20 |         1634 |         645 |  -988 |  -60.49 |        faster
firefox | tracemonkey | 0    | Overall      |   100 |           51 |          51 |     0 |    0.02 |
firefox | tracemonkey | 0    | Page Request |   100 |            1 |           1 |     0 |   -4.76 |
firefox | tracemonkey | 0    | Rendering    |   100 |           50 |          50 |     0 |    0.12 |
firefox | tracemonkey | 1    | Overall      |   100 |           97 |          91 |    -5 |   -5.52 |        faster
firefox | tracemonkey | 1    | Page Request |   100 |            3 |           3 |     0 |   -1.32 |
firefox | tracemonkey | 1    | Rendering    |   100 |           94 |          88 |    -5 |   -5.73 |        faster
firefox | tracemonkey | 2    | Overall      |   100 |           40 |          40 |     0 |    0.50 |
firefox | tracemonkey | 2    | Page Request |   100 |            1 |           1 |     0 |    3.16 |
firefox | tracemonkey | 2    | Rendering    |   100 |           39 |          39 |     0 |    0.54 |
firefox | tracemonkey | 3    | Overall      |   100 |           62 |          62 |    -1 |   -0.94 |
firefox | tracemonkey | 3    | Page Request |   100 |            1 |           1 |     0 |   17.05 |
firefox | tracemonkey | 3    | Rendering    |   100 |           61 |          61 |    -1 |   -1.11 |
firefox | tracemonkey | 4    | Overall      |   100 |           56 |          58 |     2 |    3.41 |
firefox | tracemonkey | 4    | Page Request |   100 |            1 |           1 |     0 |   15.31 |
firefox | tracemonkey | 4    | Rendering    |   100 |           55 |          57 |     2 |    3.23 |
firefox | tracemonkey | 5    | Overall      |   100 |           73 |          71 |    -2 |   -2.28 |
firefox | tracemonkey | 5    | Page Request |   100 |            2 |           2 |     0 |   12.20 |
firefox | tracemonkey | 5    | Rendering    |   100 |           71 |          69 |    -2 |   -2.69 |
firefox | tracemonkey | 6    | Overall      |   100 |           85 |          69 |   -16 |  -18.73 |        faster
firefox | tracemonkey | 6    | Page Request |   100 |            2 |           2 |     0 |   -9.90 |
firefox | tracemonkey | 6    | Rendering    |   100 |           83 |          67 |   -16 |  -18.97 |        faster
firefox | tracemonkey | 7    | Overall      |   100 |           65 |          64 |     0 |   -0.37 |
firefox | tracemonkey | 7    | Page Request |   100 |            1 |           1 |     0 |  -11.94 |
firefox | tracemonkey | 7    | Rendering    |   100 |           63 |          63 |     0 |   -0.05 |
firefox | tracemonkey | 8    | Overall      |   100 |           53 |          54 |     1 |    2.04 |
firefox | tracemonkey | 8    | Page Request |   100 |            1 |           1 |     0 |   17.02 |
firefox | tracemonkey | 8    | Rendering    |   100 |           52 |          53 |     1 |    1.82 |
firefox | tracemonkey | 9    | Overall      |   100 |           79 |          73 |    -6 |   -7.86 |        faster
firefox | tracemonkey | 9    | Page Request |   100 |            2 |           2 |     0 |  -15.14 |
firefox | tracemonkey | 9    | Rendering    |   100 |           77 |          71 |    -6 |   -7.86 |        faster
firefox | tracemonkey | 10   | Overall      |   100 |          545 |         519 |   -27 |   -4.86 |        faster
firefox | tracemonkey | 10   | Page Request |   100 |           14 |          13 |     0 |   -3.56 |
firefox | tracemonkey | 10   | Rendering    |   100 |          532 |         506 |   -26 |   -4.90 |        faster
firefox | tracemonkey | 11   | Overall      |   100 |           42 |          41 |    -1 |   -2.50 |
firefox | tracemonkey | 11   | Page Request |   100 |            1 |           1 |     0 |  -27.42 |        faster
firefox | tracemonkey | 11   | Rendering    |   100 |           41 |          40 |    -1 |   -1.75 |
firefox | tracemonkey | 12   | Overall      |   100 |          350 |         332 |   -18 |   -5.16 |        faster
firefox | tracemonkey | 12   | Page Request |   100 |            3 |           3 |     0 |   -5.17 |
firefox | tracemonkey | 12   | Rendering    |   100 |          347 |         329 |   -18 |   -5.15 |        faster
firefox | tracemonkey | 13   | Overall      |   100 |           31 |          31 |     0 |    0.52 |
firefox | tracemonkey | 13   | Page Request |   100 |            1 |           1 |     0 |    4.95 |
firefox | tracemonkey | 13   | Rendering    |   100 |           30 |          30 |     0 |    0.20 |
```
2020-06-14 11:51:45 +02:00
Jonas Jenwald
4b51bcc733 Ensure that PDFImage.buildImage won't accidentally swallow errors, e.g. from ColorSpace parsing (issue 6707, PR 11601 follow-up)
Because of a really stupid `Promise`-related mistake on my part, when re-factoring `PDFImage.buildImage` during the `NativeImageDecoder` removal, we're no longer re-throwing errors occuring during image parsing/decoding as intended.
The result is that some (fairly) corrupt documents will never finish loading, and unfortunately there were apparently no sufficiently corrupt images in the test-suite to catch this.
2020-06-13 15:02:37 +02:00
Jonas Jenwald
10f31bb46d Change the dependencies property, on OperatorList instances, from an Object to a Set
Since this is completely internal functionality, and furthermore limited to the worker-thread, this change should thus not have any observable effect for e.g. an API-user.
2020-06-11 16:27:13 +02:00
Jonas Jenwald
02a1d0f6c5 Remove the unused intent/pageIndex properties from OperatorList instances (PR 11069 follow-up)
Apparently I completely overlooked the fact that with the changes in PR 11069 these properties became *completely* unused, and consequently they thus ought to be removed.
2020-06-11 16:05:38 +02:00
Jonas Jenwald
00d45fce33 Update SVGGraphics to account for globally cached images (PR 11912 follow-up)
Since there's (essentially) no tests for the SVG-backend, these changes didn't make in into PR 11912 when the code in the `src/display/canvas.js` file was modified.
2020-06-10 15:31:26 +02:00
Jonas Jenwald
88fdb482b0 Move the isEmptyObj helper function from src/shared/util.js to test/unit/test_utils.js
Since this helper function is no longer used anywhere in the main code-base, but only in a couple of unit-tests, it's thus being moved to a more appropriate spot.

Finally, the implementation of `isEmptyObj` is also tweaked slightly by removing the manual loop.
2020-06-09 17:50:16 +02:00
Jonas Jenwald
159e13c4e4 Convert the ChunkedStreamManager.promisesByRequest property to a Map
Compared to regular `Object`s, `Map`s have a number of advantageous properties: Of particular importance in this case is the built-in iteration support, and that determining if the structure is empty is easy.
2020-06-09 17:50:14 +02:00
Jonas Jenwald
dda7a5d1b7 Convert the ChunkedStreamManager.requestsByChunk property to a Map
Compared to regular `Object`s, `Map`s have a number of advantageous properties: Of particular importance in this case is the built-in iteration support, and that determining if the structure is empty is easy.
2020-06-09 17:50:11 +02:00
Jonas Jenwald
17e23ffb33 Convert the ChunkedStreamManager.chunksNeededByRequest property to a Map (containing Sets)
Compared to regular `Object`s, `Map`s (and `Set`s) have a number of advantageous properties: Of particular importance in this case is the built-in iteration support, and that determining if the structure is empty is easy.
2020-06-09 17:49:53 +02:00
Tim van der Meij
a4fa4554d6
Merge pull request #11977 from timvandermeij/refset
Convert the `RefSet` primitive to a proper class and use a `Set` internally
2020-06-07 23:15:35 +02:00
Tim van der Meij
4c2e056796
Convert the RefSet primitive to a proper class and use a Set internally
The `RefSet` primitive predates ES6, so that most likely explains why an
object is used internally to track the entries. However, nowadays we can
use built-in JavaScript sets for this purpose. Built-in types are often
more efficient/optimized and using it makes the code a bit more clear
since we don't have to assign `true` to keys anymore just to indicate
their presence.
2020-06-07 19:01:29 +02:00
Jonas Jenwald
466d10f6fc Remove unused methods from NetworkManager, in src/display/network.js
Both of the removed methods were added in PR 2719, however they are no longer used:
 - It appears that `hasPendingRequests` was never used at all, even from the beginning.
 - The only general PDF.js library usage of `abortAllRequests` was removed in PR 6879, which is now four years ago. (Originally the Firefox-specific network implementation, see https://searchfox.org/mozilla-central/source/browser/extensions/pdfjs/content/PdfJsNetwork.jsm, was shared with the `src/display/network.js` file and *there* this method is used. However, since all of the Firefox-specific code now lives directly in mozilla-central, that's not relevant for the removal in this patch.)
2020-06-07 16:03:32 +02:00
Tim van der Meij
c97200ff59
Merge pull request #11974 from Snuffleupagus/sendImgData
A couple of small image caching/sending improvements
2020-06-07 13:53:26 +02:00
Jonas Jenwald
df7d8c74ca Extract the actual sending of image data from the PartialEvaluator.buildPaintImageXObject method
After PRs 10727 and 11912, the code responsible for sending the decoded image data to the main-thread has now become a fair bit more involved the previously.
To reduce the amount of duplication here, the actual code responsible for sending the data is thus extracted into a new helper method instead.
2020-06-07 12:01:51 +02:00
Jonas Jenwald
aff0d56326 Remove an unnecessary RefSetCache.prototype.has() call from GlobalImageCache.getData
We can simply attempt to get the data *directly*, and instead check the result, rather than first checking if it exists.
2020-06-07 11:56:04 +02:00
Takashi Tamura
7acb112ca9 Optimization:
Avoid calling Math.pow if possible when calculating the transfer
function of the CalRGB color space since calling Math.pow is expensive.

If the value of color is larger than the threshold, 0.99554525,
the final result of the transform is larger that 254.5
since ((1 + 0.055) * 0.99554525 ** (1 / 2.4) - 0.055) * 255 === 254.50000003134699
2020-06-07 13:17:18 +09:00
Jonas Jenwald
b7272a34eb Change the loadedChunks property, on ChunkedStream instances, from an Array to a Set
In the old code the use of an Array meant that we had to *manually* track the `numChunksLoaded` property, given that simply using the Array `length` wouldn't have worked since there's no guarantee that the data is loaded in order when e.g. range requests are in use.

Tracking closely related state *separately* in this manner never seem like a good idea, and we can now instead utilize a Set to avoid that.
2020-06-05 15:03:06 +02:00
Carlos Rodríguez
802aa14a99 Jpeg encoded with RGB -instead of YCbCr- write the components index as "RGB" in ASCII to say it so
On ISO/IEC 10918-6:2013 (E), section 6.1: (http://www.itu.int/rec/T-REC-T.872-201206-I/en)

"Images encoded with three components are assumed to be RGB data encoded as YCbCr unless the image contains an APP14 marker segment as specified in 6.5.3, in which case the colour encoding is considered either RGB or YCbCr according to the application data of the APP14 marker segment"

But common jpeg libraries consider RGB too if components index are ASCII R (0x52), G (0x47) and B (0x42): https://stackoverflow.com/questions/50798014/determining-color-space-for-jpeg/50861048

Issue #11931
2020-06-04 15:08:47 +02:00
Jonas Jenwald
64378fc366 [api-minor] Remove the deprecated PDFDocumentProxy.getOpenActionDestination method (PR 11644 follow-up)
This method has been printing a `deprecated` warning in two releases, hence it should hopefully be safe to remove now.
2020-06-02 12:28:00 +02:00
Jonas Jenwald
af815e417d Ensure that that we don't attempt to cache *inline* images in the GlobalImageCache (PR 11912 follow-up)
Since *inline* images, i.e. those defined inside of `/Contents` streams, are by their very definition page-specific it thus seem like a good idea to actually enforce that they won't accidentally end up in the `GlobalImageCache`.
2020-06-01 01:00:30 +02:00
Tim van der Meij
fe5689705d
Merge pull request #11930 from Snuffleupagus/LocalImageCache
Improve the *local* image caching in `PartialEvaluator.getOperatorList`
2020-05-28 00:12:37 +02:00
Jonas Jenwald
4d60430b1c Add comments to the export list in the src/pdf.js file (PR 11914 follow-up)
When converting this file to use standard `import`/`export` statements, I sorted the exports in the same order as the imports to simplify things.

However, looking at the list of `export`ed properties it probably doesn't hurt to add a couple of comments to clarify from where specifically the `export`s originated.
2020-05-27 13:57:25 +02:00
Jonas Jenwald
4ef547f400 Improve caching of empty /XObjects in the PartialEvaluator.getTextContent method
It turns out that `getTextContent` suffers from *similar* problems with repeated images as `getOperatorList`; please see the previous patch.

While only `/XObject` resources of the `Form`-type will actually be *parsed* in `PartialEvaluator.getTextContent`, since those are the only ones that may contain text, we're still forced to fetch repeated image resources where the name differs (but not the reference).
Obviously it's less bad in this case, since we're not actually parsing `/XObject`s of e.g. the `Image`-type. However, you still want to avoid even fetching the data whenever possible, since `Stream`s are not cached on the `XRef` instance (given their potential size) and the lookup can thus be somewhat expensive in general.

To address these issues, we can simply replace the exiting name-only caching in `PartialEvaluator.getTextContent` with a new cache backed by `LocalImageCache` instead.
2020-05-26 09:49:01 +02:00
Jonas Jenwald
d62c9181bd Improve the *local* image caching in PartialEvaluator.getOperatorList
Currently the local `imageCache`, as used in `PartialEvaluator.getOperatorList`, will miss certain cases of repeated images because the caching is *only* done by name (usually using a format such as e.g. "Im0", "Im1", ...).
However, in some PDF documents the `/XObject` dictionaries many contain hundreds (or even thousands) of distinctly named images, despite them referring to only a handful of actual image objects (via the XRef table).

With these changes we'll now cache *local* images using both name and (where applicable) reference, thus improving re-usage of images resources even further.

This patch was tested using the PDF file from [bug 857031](https://bugzilla.mozilla.org/show_bug.cgi?id=857031), i.e. https://bug857031.bmoattachments.org/attachment.cgi?id=732270, with the following manifest file:
```
[
    {  "id": "bug857031",
       "file": "../web/pdfs/bug857031.pdf",
       "md5": "",
       "rounds": 250,
       "lastPage": 1,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
firefox | 0    | Overall      |   250 |         2749 |        2656 | -93 | -3.38 |        faster
firefox | 0    | Page Request |   250 |            3 |           4 |   1 | 50.14 |        slower
firefox | 0    | Rendering    |   250 |         2746 |        2652 | -94 | -3.44 |        faster
```

While this is certainly an improvement, since we now avoid re-parsing ~1000 images on the first page, all of the image resources are small enough that the total rendering time doesn't improve that much in this particular case.

In pathological cases, such as e.g. the PDF document in issue 4958, the improvements with this patch can be very significant. Looking for example at page 2, from issue 4958, the rendering time drops from ~60 seconds with `master` to ~30 seconds with this patch (obviously still slow, but it really showcases the potential of this patch nicely).

Finally, note that there's also potential for additional improvements by re-using `LocalImageCache` instances for e.g. /XObject data of the `Form`-type. However, given that recent changes in this area I purposely didn't want to complicate *this* patch more than necessary.
2020-05-25 15:14:14 +02:00
Tim van der Meij
f14215da37
Implement fill opacity for shading patterns in the SVG back-end
In the PDF file from the issue below, the fill alpha (`ca`) is set
before drawing the circles using the `setGState` operator. Doing so
causes the global alpha to be set on the canvas' context for the canvas
back-end, but this was not handled in the SVG back-end. This patch fixes
that by taking the fill opacity into account when drawing shading
patterns in the same way as done elsewhere so it is only included if the
value is non-default.

Fixes #11812.
2020-05-24 14:25:40 +02:00
Tim van der Meij
3b615e4ca3
Merge pull request #11601 from Snuffleupagus/rm-nativeImageDecoderSupport
[api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js`
2020-05-23 15:33:46 +02:00
Jonas Jenwald
8af70d75aa Allow GlobalImageCache.clear to, optionally, only remove the actual data (PR 11912 follow-up)
When "Cleanup" is triggered, you obviously need to remove all globally cached data on *both* the main- and worker-threads.
However, the current the implementation of the `GlobalImageCache.clear` method also means that we lose *all* information about which images were cached and not just their data. This thus has the somewhat unfortunate side-effect of requiring images, which were previously known to be "global", to *again* having to reach `NUM_PAGES_THRESHOLD` before being cached again.

To avoid doing unnecessary parsing after "Cleanup", we can thus let `GlobalImageCache.clear` keep track of which images were cached while still removing their actual data. This should not have any significant impact on memory usage, since the only extra thing being kept is a `RefSetCache` (essentially an Object) with a couple of `Set`s containing only integers.
2020-05-23 11:30:24 +02:00
Jonas Jenwald
56ebf01ae0 Avoid hanging the worker-thread for CMap data with ridiculously large ranges (issue 11922)
This patch was inspired by ad2b64f124/xpdf/CharCodeToUnicode.cc (L480-L484)
2020-05-22 15:23:17 +02:00
Jonas Jenwald
18e0b10d3c [api-minor] Remove the disableCreateObjectURL option from the getDocument parameters, since it's now unused in the API
With the changes in previous patches, the `disableCreateObjectURL` option/functionality is no longer used for anything in the API and/or in the Worker code.

Note however that there's some functionality, mainly related to file loading/downloading, in the GENERIC version of the default viewer which still depends on this option.
Hence the `disableCreateObjectURL` option (and related compatibility code) is moved into the viewer, see e.g. `web/app_options.js`, such that it's still available in the default viewer.
2020-05-22 00:22:48 +02:00
Jonas Jenwald
cc4cc8b11b Remove the, now unused, releaseImageResources helper function
With the changes in the previous patch, this is now dead code which should thus be removed.
2020-05-22 00:22:48 +02:00
Jonas Jenwald
0351852d74 [api-minor] Decode all JPEG images with the built-in PDF.js decoder in src/core/jpg.js
Currently some JPEG images are decoded by the built-in PDF.js decoder in `src/core/jpg.js`, while others attempt to use the browser JPEG decoder. This inconsistency seem unfortunate for a number of reasons:

 - It adds, compared to the other image formats supported in the PDF specification, a fair amount of code/complexity to the image handling in the PDF.js library.

 - The PDF specification support JPEG images with features, e.g. certain ColorSpaces, that browsers are unable to decode natively. Hence, determining if a JPEG image is possible to decode natively in the browser require a non-trivial amount of parsing. In particular, we're parsing (part of) the raw JPEG data to extract certain marker data and we also need to parse the ColorSpace for the JPEG image.

 - While some JPEG images may, for all intents and purposes, appear to be natively supported there's still cases where the browser may fail to decode some JPEG images. In order to support those cases, we've had to implement a fallback to the PDF.js JPEG decoder if there's any issues during the native decoding. This also means that it's no longer possible to simply send the JPEG image to the main-thread and continue parsing, but you now need to actually wait for the main-thread to indicate success/failure first.
   In practice this means that there's a code-path where the worker-thread is forced to wait for the main-thread, while the reverse should *always* be the case.

 - The native decoding, for anything except the *simplest* of JPEG images, result in increased peak memory usage because there's a handful of short-lived copies of the JPEG data (see PR 11707).
Furthermore this also leads to data being *parsed* on the main-thread, rather than the worker-thread, which you usually want to avoid for e.g. performance and UI-reponsiveness reasons.

 - Not all environments, e.g. Node.js, fully support native JPEG decoding. This has, historically, lead to some issues and support requests.

 - Different browsers may use different JPEG decoders, possibly leading to images being rendered slightly differently depending on the platform/browser where the PDF.js library is used.

Originally the implementation in `src/core/jpg.js` were unable to handle all of the JPEG images in the test-suite, but over the last couple of years I've fixed (hopefully) all of those issues.
At this point in time, there's two kinds of failure with this patch:

 - Changes which are basically imperceivable to the naked eye, where some pixels in the images are essentially off-by-one (in all components), which could probably be attributed to things such as different rounding behaviour in the browser/PDF.js JPEG decoder.
   This type of "failure" accounts for the *vast* majority of the total number of changes in the reference tests.

 - Changes where the JPEG images now looks *ever so slightly* blurrier than with the native browser decoder. For quite some time I've just assumed that this pointed to a general deficiency in the `src/core/jpg.js` implementation, however I've discovered when comparing two viewers side-by-side that the differences vanish at higher zoom levels (usually around 200% is enough).
   Basically if you disable [this downscaling in canvas.js](8fb82e939c/src/display/canvas.js (L2356-L2395)), which is what happens when zooming in, the differences simply vanish!
   Hence I'm pretty satisfied that there's no significant problems with the `src/core/jpg.js` implementation, and the problems are rather tied to the general quality of the downscaling algorithm used. It could even be seen as a positive that *all* images now share the same downscaling behaviour, since this actually fixes one old bug; see issue 7041.
2020-05-22 00:22:48 +02:00
Jonas Jenwald
dda6626f40 Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]

Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.

In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.

Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]

*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.

---
[1] There's e.g. PDF documents that use the same image as background on all pages.

[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.

[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-21 18:13:45 +02:00
Jonas Jenwald
8d56a69e74 Reduce usage of SystemJS, in the development viewer, even further
With these changes SystemJS is now only used, during development, on the worker-thread and in the unit/font-tests, since Firefox is currently missing support for worker modules; please see https://bugzilla.mozilla.org/show_bug.cgi?id=1247687

Hence all the JavaScript files in the `web/` and `src/display/` folders are now loaded *natively* by the browser (during development) using standard `import` statements/calls, thanks to a nice `import-maps` polyfill.

*Please note:* As soon as https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 is fixed in Firefox, we should be able to remove all traces of SystemJS and thus finally be able to use every possible modern JavaScript feature.
2020-05-20 13:36:52 +02:00
Jonas Jenwald
e2c3312416 Convert the src/pdf.js and src/pdf.worker.js files to use standard import/export statements
As part of reducing our reliance on SystemJS in the development viewer, this patch replaces usage of `require` statements with modern standards `import`/`export` statements instead.

If we want to try and move forward with reducing usage of SystemJS, we don't have much choice but to make these kind changes (despite what prior test-results showed, however I'm no longer able to reproduce the issues locally).
2020-05-20 13:18:23 +02:00
Jonas Jenwald
d4d933538b Re-factor setPDFNetworkStreamFactory, in src/display/api.js, to also accept an asynchronous function
As part of trying to reduce the usage of SystemJS in the development viewer, this patch is a necessary step that will allow removal of some `require` statements.

Currently this uses `SystemJS.import` in non-PRODUCTION mode, but it should be possible to replace those with standard *dynamic* `import` calls in the future.
2020-05-20 13:18:18 +02:00
Jonas Jenwald
ec0ab91a2b Reduce the usage of require statements in code-paths not protected by pre-processor and/or run-time checks
This replaces some additional `require`/`exports` usage with standard `import`/`export` statements instead.
Hence another, small, part in the effort to reduce the reliance on SystemJS-specific functionality in the development viewer.
2020-05-14 15:57:49 +02:00
Jonas Jenwald
73636e052a Handle errors individually for each annotation in the _parsedAnnotations getter
While working on PR 11872, it occurred to me that it probably wouldn't be a bad idea to change the `_parsedAnnotations` getter to handle errors individually for each annotation. This way, one broken/corrupt annotation won't prevent the rest of them from being e.g. fetched through the API.
2020-05-09 12:33:39 +02:00
Jonas Jenwald
e1f340a0c2 Use the ESLint no-restricted-syntax rule to ensure that assert is always called with two arguments
Having `assert` calls without a message string isn't very helpful when debugging, and it turns out that it's easy enough to make use of ESLint to enforce better `assert` call-sites.
In a couple of cases the `assert` calls were changed to "regular" throwing of errors instead, since that seemed more appropriate.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-restricted-syntax
2020-05-05 13:40:05 +02:00
Tim van der Meij
491904d30a
Merge pull request #11872 from Snuffleupagus/issue-11871
Gracefully handle annotation parsing errors in `Page.getOperatorList` (issue 11871)
2020-05-04 22:19:27 +02:00
Brendan Dahl
b1be33c96f Add more categories of unsupported features.
Fixes #11815
2020-05-04 11:02:16 -07:00
Jonas Jenwald
4aabd063fc Gracefully handle annotation parsing errors in Page.getOperatorList (issue 11871)
This should ensure that a page will always render successfully, even if there's errors during the Annotation fetching/parsing.
Additionally the `OperatorList.addOpList` method is also adjusted to ignore invalid data, to make it slightly more robust.
2020-05-04 17:09:48 +02:00
roccobeno
371e699905
Include the name for interactive form elements
We already rendered the name for radio buttons, but it was missing for
all other interactive form elements. This commit adds that so that
values entered in form elements can be read based on the element name.
2020-04-27 16:55:35 +02:00
Jonas Jenwald
911c33f025 Move the maybeValidDimensions check, used with JPEG images, to occur earlier (PR 11523 follow-up)
Given that the `NativeImageDecoder.{isSupported, isDecodable}` methods require both dictionary lookups *and* ColorSpace parsing, in hindsight it actually seems more reasonable to the `JpegStream.maybeValidDimensions` checks *first*.
2020-04-26 12:07:46 +02:00
Jonas Jenwald
c355f91d2e [api-minor] Immediately release the font.data property once the font been attached to the DOM (PR 11777 follow-up)
*This patch implements https://github.com/mozilla/pdf.js/pull/11777#issuecomment-609741348*

This extends the work from PR 11773 and 11777 further, by immediately releasing the `font.data` property once the font been attached to the DOM. By not unnecessarily holding onto this data on the main-thread, we'll thus reduce the memory usage of fonts even further (especially beneficial in longer documents with composite fonts).

The new behaviour is controlled by the recently added `fontExtraProperties` API option (adding a new option just for this patch didn't seem necessary), since there's one edge-case in the SVG renderer where the `font.data` property is necessary (see the `pdf2svg` example).

Note that while the default viewer does run clean-up with an idle timeout, that timeout will be reset whenever rendering occurs *or* when scrolling happens in the viewer. In practice this means that unless the user doesn't interact with the viewer in *any* way during an extended period of time, currently set to 30 seconds, the `PDFDocumentProxy.cleanup` method will never be called and font resources will thus not be cleaned-up.
2020-04-23 13:04:57 +02:00
Jonas Jenwald
cdc60402f6 [api-minor] Change PageViewport to throw when the rotation is not a multiple of 90 degrees
As evident from the code, `PageViewport` only supports[1] `rotation` values which are a multiple of 90 degrees. Besides it being somewhat difficult to imagine meaningful use-cases for a non-multiple of 90 degrees `rotation`, the code also becomes both simpler and more efficient by not having to consider arbitrary `rotation` values.

However, any invalid rotation will *silently* fallback to assume zero `rotation` which probably isn't great for e.g. `PDFPageProxy.getViewport` in the API. Hence this patch, which will now enforce that only valid `rotation` values are accepted.

---
[1] As far as I can tell, from looking through the history, nothing else has ever been supported either.
2020-04-22 15:19:13 +02:00
Jonas Jenwald
695140728a [src/core/fonts.js] Improve the validateOS2Table function
Rather than creating a new `Stream` just to validate the OS/2 TrueType table, it's simpler/better to just pass in a reference to the font data and use that instead (similar to other TrueType helper functions).
2020-04-19 11:25:25 +02:00
Jonas Jenwald
033d27fc25 [src/core/fonts.js] Replace some unnecessary Stream.getUint16() calls with Stream.skip(2) instead
There's a handful of cases in the code where the intention is simply to advance the `Stream` position, but rather than only doing that the code instead fetches/computes a Uint16 value (and without using the result for anything).
2020-04-19 11:18:20 +02:00
Jonas Jenwald
4fae1ac5c4 [src/core/fonts.js] Replace some unnecessary Stream.getBytes(...) calls with Stream.skip(...) instead
There's a handful of cases in the code where the intention is simply to advance the `Stream` position, but rather than only doing that the code instead fetches the bytes in question (and without using the result for anything).
2020-04-19 11:18:15 +02:00
Tim van der Meij
44da021012
Merge pull request #11814 from tamuratak/svg_text_vertical
Support the vertical writing mode with SVG backend.
2020-04-18 00:34:39 +02:00
Tim van der Meij
7b23476e61
Merge pull request #11818 from Snuffleupagus/eslint-dot-notation
Enable the `dot-notation` ESLint rule
2020-04-18 00:19:47 +02:00
Jonas Jenwald
518d26dfb4 [src/core/jpg.js] Remove redundant marker validation at the end of the decodeScan function (PR 11805 follow-up)
With the MCU parsing changes made in PR 11805, the final marker validation is no longer necessary before the `decodeScan` function returns.
2020-04-17 15:40:02 +02:00
Jonas Jenwald
1cc3dbb694 Enable the dot-notation ESLint rule
*Please note:* These changes were done automatically, using the `gulp lint --fix` command.

This rule is already enabled in mozilla-central, see https://searchfox.org/mozilla-central/rev/567b68b8ff4b6d607ba34a6f1926873d21a7b4d7/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#103-104

The main advantage, besides improved consistency, of this rule is that it reduces the size of the code (by 3 bytes for each case). In the PDF.js code-base there's close to 8000 instances being fixed by the `dot-notation` ESLint rule, which end up reducing the size of even the *built* files significantly; the total size of the `gulp mozcentral` build target changes from `3 247 456` to `3 224 278` bytes, which is a *reduction* of `23 178` bytes (or ~0.7%) for a completely mechanical change.

A large number of these changes affect the (large) lookup tables used on the worker-thread, but given that they are still initialized lazily I don't *think* that the new formatting this patch introduces should undo any of the improvements from PR 6915.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/dot-notation
2020-04-17 12:24:46 +02:00
Takashi Tamura
32f9cabf82 Support the vertical writing mode with SVG backend. 2020-04-17 09:01:51 +09:00
Tim van der Meij
96923eb2a6
Merge pull request #11805 from Snuffleupagus/issue-11794
Always skip over any additional, unexpected, RSTx (restart) markers in corrupt JPEG images (issue 11794)
2020-04-16 00:08:58 +02:00
Tim van der Meij
a7def05aa1
Merge pull request #11810 from Snuffleupagus/fromCodePoint-followup
A couple of small `String.fromCodePoint` improvements (PR 11698 and 11769 follow-up)
2020-04-16 00:08:16 +02:00
Jonas Jenwald
44b4a74f48 A couple of small String.fromCodePoint improvements (PR 11698 and 11769 follow-up)
- Add a reduced test-case for issue 11768, to prevent future regressions.
   (Given that PR 11769 is only a work-around, rather than a proper solution, it may not be entirely accurate for the issue to be closed as fixed.)

 - Add more validation of the charCode, as found by the heuristics, in `PartialEvaluator._buildSimpleFontToUnicode` to prevent future issues.
2020-04-15 13:45:08 +02:00
Jonas Jenwald
06f6f8719f Always skip over any additional, unexpected, RSTx (restart) markers in corrupt JPEG images (issue 11794) 2020-04-14 23:27:08 +02:00
Jonas Jenwald
26cffd03b0 [src/core/jpg.js] Remove some redundant marker validation during the MCU parsing in the decodeScan function
Some of the code in `src/core/jpg.js` is fairly old, and has with time become unnecessary when the surrounding code has been updated to handle various types of JPEG corruption.
In particular the `if (!marker || marker <= 0xff00) { ... }` branch is now dead code, since:

 - The `!marker` case can no longer happen, since we would already have broken out of the loop thanks to the `!fileMarker` branch a handful of lines above.

 - The `marker <= 0xff00` case can also no longer happen, since the `findNextFileMarker` function validate markers much more thoroughly (by checking `marker >= 0xffc0 && marker <= 0xfffe`). Hence we'd again have broken out of the loop via the `!fileMarker` branch above when no valid marker was found.
2020-04-14 23:27:08 +02:00
Jonas Jenwald
746eaf3154 [api-minor] Fix the return value of PDFDocumentProxy.getViewerPreferences when no viewer preferences are present (PR 10738 follow-up)
This patch fixes yet another instalment in the never-ending series of "what the *bleep* was I thinking", by changing the `PDFDocumentProxy.getViewerPreferences` method to return `null` by default.
Not only is this method now consistent with many other API methods, for the data not present case, but it also avoids having to e.g. loop through an object to check if it's actually empty (note the old unit-test).
2020-04-14 23:25:50 +02:00
Jonas Jenwald
426945b480 Update Prettier to version 2.0
Please note that these changes were done automatically, using `gulp lint --fix`.

Given that the major version number was increased, there's a fair number of (primarily whitespace) changes; please see https://prettier.io/blog/2020/03/21/2.0.0.html
In order to reduce the size of these changes somewhat, this patch maintains the old "arrowParens" style for now (once mozilla-central updates Prettier we can simply choose the same formatting, assuming it will differ here).
2020-04-14 12:28:14 +02:00
Jonas Jenwald
ecbcde7ff3 Fail early, in modern GENERIC builds, if certain required browser functionality is missing (PR 11771 follow-up)
With two kind of builds now being produced, with/without translation/polyfills, it's unfortunately somewhat easy for users to accidentally pick the wrong one.

In the case where a user would attempt to use a modern build of PDF.js in an older browser, such as e.g. IE11, the failure would be immediate when the code is loaded (given the use of unsupported ECMAScript features).
However in some browsers/environments, a modern PDF.js build may load correctly and thus *appear* to function, only to fail for e.g. certain API calls. To hopefully lessen the support burden, and to try and improve things overall, this patch adds additional checks to ensure that a modern build of PDF.js cannot be used in browsers/environments which lack native support for `Promise.allSettled`.[1] Hence we'll fail early, with an error message telling users to pick an ES5-compatible build instead.

*Please note:* While it's probably too early to tell if this will be a widespread issue, it's possible that this is the sort of patch that *may* warrant being `git cherry-pick`ed onto the current beta version (v2.4.456).

---
[1] This was a fairly recent addition to the web platform, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/allSettled#Browser_compatibility
2020-04-11 13:42:03 +02:00
Jonas Jenwald
91efde5246 Add a heuristic to scale even single-char text, when the horizontal/vertical scaling differs significantly (issue 11713)
At this point in time, compared to when the "ignore single-char" code was added, we *should* generally be doing a much better job of combining text into as few chunks as possible.
However, there's still bad cases where we're not able to combine text as much as one would like, which is why I'm *not* proposing to simply measure/scale all text. Instead this patch will to only measure/scale single-char text in cases where the horizontal/vertical scale is off significantly, since that's were you'd expect bad text-selection behaviour otherwise.

Note that most of the movement caused by this patch is with Type3 fonts, which is a somewhat special font type and one where our current text-selection behaviour is probably the least good.
2020-04-07 00:36:23 +02:00
Tim van der Meij
70c54ab9d9
Merge pull request #11746 from Snuffleupagus/issue-11740
Create the glyph mapping correctly for composite Type1, i.e. CIDFontType0, fonts (issue 11740)
2020-04-07 00:10:12 +02:00
Jonas Jenwald
2d46230d23 [api-minor] Change Font.exportData to, by default, stop exporting properties which are completely unused on the main-thread and/or in the API (PR 11773 follow-up)
For years now, the `Font.exportData` method has (because of its previous implementation) been exporting many properties despite them being completely unused on the main-thread and/or in the API.
This is unfortunate, since among those properties there's a number of potentially very large data-structures, containing e.g. Arrays and Objects, which thus have to be first structured cloned and then stored on the main-thread.

With the changes in this patch, we'll thus by default save memory for *every* `Font` instance created (there can be a lot in longer documents). The memory savings obviously depends a lot on the actual font data, but some approximate figures are: For non-embedded fonts it can save a couple of kilobytes, for simple embedded fonts a handful of kilobytes, and for composite fonts the size of this auxiliary can even be larger than the actual font program itself.

All-in-all, there's no good reason to keep exporting these properties by default when they're unused. However, since we cannot be sure that every property is unused in custom implementations of the PDF.js library, this patch adds a new `getDocument` option (named `fontExtraProperties`) that still allows access to the following properties:

 - "cMap": An internal data structure, only used with composite fonts and never really intended to be exposed on the main-thread and/or in the API.
   Note also that the `CMap`/`IdentityCMap` classes are a lot more complex than simple Objects, but only their "internal" properties survive the structured cloning used to send data to the main-thread. Given that CMaps can often be *very* large, not exporting them can also save a fair bit of memory.

 - "defaultEncoding": An internal property used with simple fonts, and used when building the glyph mapping on the worker-thread. Considering how complex that topic is, and given that not all font types are handled identically, exposing this on the main-thread and/or in the API most likely isn't useful.

 - "differences": An internal property used with simple fonts, and used when building the glyph mapping on the worker-thread. Considering how complex that topic is, and given that not all font types are handled identically, exposing this on the main-thread and/or in the API most likely isn't useful.

 - "isSymbolicFont": An internal property, used during font parsing and building of the glyph mapping on the worker-thread.

  - "seacMap": An internal map, only potentially used with *some* Type1/CFF fonts and never intended to be exposed in the API. The existing `Font.{charToGlyph, charToGlyphs}` functionality already takes this data into account when handling text.

 - "toFontChar": The glyph map, necessary for mapping characters to glyphs in the font, which is built upon the various encoding information contained in the font dictionary and/or font program. This is not directly used on the main-thread and/or in the API.

 - "toUnicode": The unicode map, necessary for text-extraction to work correctly, which is built upon the ToUnicode/CMap information contained in the font dictionary, but not directly used on the main-thread and/or in the API.

 - "vmetrics": An array of width data used with fonts which are composite *and* vertical, but not directly used on the main-thread and/or in the API.

 - "widths": An array of width data used with most fonts, but not directly used on the main-thread and/or in the API.
2020-04-06 11:47:09 +02:00
Jonas Jenwald
8770ca3014 Make the decryptAscii helper function, in src/core/type1_parser.js, slightly more efficient
By slicing the Uint8Array directly, rather than using the prototype and a `call` invocation, the runtime of `decryptAscii` is decreased slightly (~30% based on quick logging).
The `decryptAscii` function is still less efficient than `decrypt`, however ASCII encoded Type1 font programs are sufficiently rare that it probably doesn't matter much (we've only seen *two* examples, issue 4630 and 11740).
2020-04-06 11:21:02 +02:00
Jonas Jenwald
938d519192 Create the glyph mapping correctly for composite Type1, i.e. CIDFontType0, fonts (issue 11740)
This updates `Type1Font.getGlyphMapping` with a code-path "borrowed" from `CFFFont.getGlyphMapping`.
2020-04-06 11:21:02 +02:00
Jonas Jenwald
6a8c591301 Improve detection of binary/ASCII eexec encrypted Type1 font programs in Type1Parser (issue 11740)
The PDF document, in the referenced issue, actually contains ASCII-encoded Type1 data which we currently *incorrectly* identify as binary.

According to the specification, see https://www-cdf.fnal.gov/offline/PostScript/T1_SPEC.PDF#[{%22num%22%3A203%2C%22gen%22%3A0}%2C{%22name%22%3A%22XYZ%22}%2C87%2C452%2Cnull], the current checks are insufficient to decide between binary/ASCII encoded Type1 font programs.
2020-04-06 11:21:02 +02:00
Jonas Jenwald
2619272d73 Change the signature of TranslatedFont, and convert it to a proper class
In preparation for the next patch, this changes the signature of `TranslatedFont` to take an object rather than individual parameters. This also, in my opinion, makes the call-sites easier to read since it essentially provides a small bit of documentation of the arguments.

Finally, since it was necessary to touch `TranslatedFont` anyway it seemed like a good idea to also convert it to a proper `class`.
2020-04-05 20:53:48 +02:00
Tim van der Meij
0400109b87
Merge pull request #11773 from Snuffleupagus/Font-exportData-1
[api-minor] Change `Font.exportData` to use an explicit white-list of exportable properties, and stop exporting internal/unused properties
2020-04-05 20:50:33 +02:00
Jonas Jenwald
59f54b946d Ensure that all Font instances have the vertical property set to a boolean
Given that the `vertical` property is always accessed on the main-thread, ensuring that the property is explicitly defined seems like the correct thing to do since it also avoids boolean casting elsewhere in the code-base.
2020-04-05 16:27:50 +02:00
Jonas Jenwald
c5e1fd3fde Use "standard" shadowing in the Font.spaceWidth method
With `Font.exportData` now only exporting white-listed properties, there should no longer be any reason to not use standard shadowing in the `Font.spaceWidth` method.
Furthermore, considering the amount of other changes to the code-base over the years it's not even clear to me that the special-case was necessary any more (regardless of the preceding patches).
2020-04-05 16:27:50 +02:00
Jonas Jenwald
a5e4cccf13 [api-minor] Prevent Font.exportData from exporting internal/unused properties
A number of *internal* font properties, which only make sense on the worker-thread, were previously exported. Some of these properties could also contain potentially large Arrays/Objects, which thus unnecessarily increases memory usage since we're forced to copy these to the main-thread and also store them there.

This patch stops exporting the following font properties:

 - "_shadowWidth": An internal property, which was never intended to be exported.

 - "charsCache": An internal cache, which was never intended to be exported and doesn't make any sense on the main-thread. Furthermore, by the time `Font.exportData` is called it's usually `undefined` or a mostly empty Object as well.

 - "cidEncoding": An internal property used with (some) composite fonts.
   As can be seen in the `PartialEvaluator.translateFont` method, `cidEncoding` will only be assigned a value when the font dictionary has an "Encoding" entry which is a `Name` (and not in the `Stream` case, since those obviously cannot be cloned).
   All-in-all this property doesn't really make sense on the main-thread and/or in the API, and note also that the resulting `cMap` property is (partially) available already.

 - "fallbackToUnicode": An internal map, part of the heuristics used to improve text-selection in (some) badly generated PDF documents with simple fonts. This was never intended to be exposed on the main-thread and/or in the API.

 - "glyphCache": An internal cache, which was never intended to be exported and which doesn't make any sense on the main-thread. Furthermore, by the time `Font.exportData` is called it's usually a mostly empty Object as well.

 - "isOpenType": An internal property, used only during font parsing on the worker-thread. In the *very* unlikely event that an API consumer actually needs that information, then `fontType` should be a (generally) much better property to use.

Finally, in the (hopefully) unlikely event that any of these properties become necessary on the main-thread, re-adding them to the white-list is easy to do.
2020-04-05 16:27:50 +02:00
Jonas Jenwald
664f7de540 Change Font.exportData to use an explicit white-list of exportable properties
This patch addresses an existing, and very long standing, TODO in the code such that it's no longer possible to send arbitrary/unnecessary font properties to the main-thread.
Furthermore, by having a white-list it's also very easy to see *exactly* which font properties are being exported.

Please note that in its current form, the list of exported properties contains *every* possible enumerable property that may exist in a `Font` instance.
In practice no single font will contain *all* of these properties, and e.g. embedded/non-embedded/Type3 fonts will all differ slightly with respect to what properties are being defined. Hence why only explicitly set properties are included in the exported data, to avoid half of them being `undefined`, which however should not be a problem for any existing consumer (since they'd already need to handle those cases).

Since a fair number of these font properties are completely *internal* functionality, and doesn't make any sense to expose on the main-thread and/or in the API, follow-up patch(es) will be required to trim down the list. (I purposely included all properties here for brevity and future documentation purposes.)
2020-04-05 16:27:48 +02:00
Jonas Jenwald
87142a635e Ensure that Font.charToGlyph won't fail because String.fromCodePoint is given an invalid code point (issue 11768)
*Please note:* This patch on its own is *not* sufficient to address the underlying problem in the referenced issue, hence why no test-case is included since the *actual* bug still needs to be fixed.

As can be seen in the specification, https://tc39.es/ecma262/#sec-string.fromcodepoint, `String.fromCodePoint` will throw a RangeError for invalid code points.

In the event that a CMap, in a composite font, contains invalid data and/or we fail to parse it correctly, it's thus possible that the glyph mapping that we build end up with entires that cause `String.fromCodePoint` to throw and thus `Font.charToGlyph` to break.
If that happens, as is the case in issue 11768, significant portions of a page/document may fail to render which seems very unfortunate.

While this patch doesn't fix the underlying problem, it's hopefully deemed useful not only for the referenced issue but also to prevent similar bugs in the future.
2020-04-03 09:49:50 +02:00
Jonas Jenwald
710704508c Fail early, in modern GENERIC builds, if certain required browser functionality is missing (issue 11762)
With two kind of builds now being produced, with/without translation/polyfills, it's unfortunately somewhat easy for users to accidentally pick the wrong one.

In the case where a user would attempt to use a modern build of PDF.js in an older browser, such as e.g. IE11, the failure would be immediate when the code is loaded (given the use of unsupported ECMAScript features).
However in some browsers/environments, in particular Node.js, a modern PDF.js build may load correctly and thus *appear* to function, only to fail for e.g. certain API calls. To hopefully lessen the support burden, and to try and improve things overall, this patch adds checks to ensure that a modern build of PDF.js cannot be used in browsers/environments which lack native support for critical functionality (such as e.g. `ReadableStream`). Hence we'll fail early, with an error message telling users to pick an ES5-compatible build instead.

To ensure that we actually test things better especially w.r.t. usage of the PDF.js library in Node.js environments, the `gulp npm-test` task as used by Node.js/Travis was changed (back) to test an ES5-compatible build.
(Since the bots still test the code as-is, without transpilation/polyfills, this shouldn't really be a problem as far as I can tell.)
As part of these changes there's now both `gulp lib` and `gulp lib-es5` build targets, similar to e.g. the generic builds, which thanks to some re-factoring only required adding a small amount of code.

*Please note:* While it's probably too early to tell if this will be a widespread issue, it's possible that this is the sort of patch that *may* warrant being `git cherry-pick`ed onto the current beta version (v2.4.456).
2020-04-01 19:42:48 +02:00
Jonas Jenwald
14c999e3ee Remove the unused sizes and encoding properties on Font instances
The `sizes` property doesn't appear to have been used ever since the code was first split into main/worker-threads, which is so many years ago that I wasn't able to easily find exactly in which PR/commit it became unused.

The `encoding` property is always assigned the `properties.baseEncoding` value, however the `PartialEvaluator` doesn't actually compute/set that value any more. Again it was difficult to determine when it became unused, but it's been that way for years.
2020-03-27 10:12:01 +01:00
Jonas Jenwald
dcb16af968 Whitelist closure related cases to address the remaining no-shadow linting errors
Given the way that "classes" were previously implemented in PDF.js, using regular functions and closures, there's a fair number of false positives when the `no-shadow` ESLint rule was enabled.

Note that while *some* of these `eslint-disable` statements can be removed if/when the relevant code is converted to proper `class`es, we'll probably never be able to get rid of all of them given our naming/coding conventions (however I don't really see this being a problem).
2020-03-25 11:57:12 +01:00
Jonas Jenwald
1d2f787d6a Enable the ESLint no-shadow rule
This rule is *not* currently enabled in mozilla-central, but it appears commented out[1] in the ESLint definition file; see https://searchfox.org/mozilla-central/rev/c80fa7258c935223fe319c5345b58eae85d4c6ae/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#238-239

Unfortunately this rule is, for fairly obvious reasons, impossible to `--fix` automatically (even partially) and each case thus required careful manual analysis.
Hence this ESLint rule is, by some margin, probably the most difficult one that we've enabled thus far. However, using this rule does seem like a good idea in general since allowing variable shadowing could lead to subtle (and difficult to find) bugs or at the very least confusing code.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-shadow

---
[1] Most likely, a very large number of lint errors have prevented this rule from being enabled thus far.
2020-03-25 11:56:05 +01:00
Tim van der Meij
475fa1f97f
Merge pull request #11744 from janpe2/cff-glyph-zero
The first glyph in CFF CIDFonts must be named 0 instead of ".notdef"
2020-03-24 23:52:21 +01:00
Tim van der Meij
292b77fe7b
Merge pull request #11707 from Snuffleupagus/issue-11694
Always prefer the PDF.js JPEG decoder for very large images, in order to reduce peak memory usage (issue 11694)
2020-03-24 23:51:31 +01:00
Tim van der Meij
f85105379e
Merge pull request #11738 from Snuffleupagus/no-shadow-src-core
Remove variable shadowing from the JavaScript files in the `src/core/` folder
2020-03-24 23:10:37 +01:00
Jani Pehkonen
a22c0eab48 The first glyph in CFF CIDFonts must be named 0 instead of ".notdef"
Fixes #11718 in which the `ff` ligature glyph is at index zero in a CFF font. Beacuse this is a CIDFont, glyph names are CIDs, which are integers. Thus the string `".notdef"` is not correct. The rest of the charset data is already parsed correctly as integers when the boolean argument `cid` is true.
2020-03-24 15:56:50 +02:00
Jonas Jenwald
216cbca16c Remove variable shadowing from the JavaScript files in the src/core/ folder
*This is part of a series of patches that will try to split PR 11566 into smaller chunks, to make reviewing more feasible.*

Once all the code has been fixed, we'll be able to eventually enable the ESLint no-shadow rule; see https://eslint.org/docs/rules/no-shadow
2020-03-23 18:28:30 +01:00
Jonas Jenwald
5eabe08c74 Exclude the setPDFNetworkStreamFactory function from the built API docs
Please note that the `setPDFNetworkStreamFactory` functionality isn't exposed in the public API, i.e. not listed among the exports in the `src/pdf.js` file, and that even if it were it wouldn't really be useful considering that none of the `PDFNetworkStream`/`PDFFetchStream`/`PDFNodeStream` classes are exported either.
2020-03-23 16:41:06 +01:00
Jonas Jenwald
c3c197d87a Remove old API methods which were previously converted to throwing (PR 11219 follow-up)
These methods were deprecated already in PDF.js version `2.1.266`, see PRs 10246 and 10369, and were converted to throw `Error`s upon invocation in PDF.js version `2.4.456`, see PR 11219.
Hence it ought to be possible to remove these methods now.
2020-03-23 16:41:03 +01:00
Jonas Jenwald
3539a17d2a Remove variable shadowing from the JavaScript files in the src/display/ folder
*This is part of a series of patches that will try to split PR 11566 into smaller chunks, to make reviewing more feasible.*

Once all the code has been fixed, we'll be able to eventually enable the ESLint no-shadow rule; see https://eslint.org/docs/rules/no-shadow
2020-03-20 23:09:41 +01:00
Tim van der Meij
6ecc9fae1c
Merge pull request #11720 from Snuffleupagus/eslint-no-unsanitized
Update the `eslint-plugin-no-unsanitized` package to the latest version
2020-03-20 21:04:24 +01:00
Jonas Jenwald
62a9c26cda Always prefer the PDF.js JPEG decoder for very large images, in order to reduce peak memory usage (issue 11694)
When JPEG images are decoded by the browser, on the main-thread, there's a handful of short-lived copies of the image data; see c3f4690bde/src/display/api.js (L2364-L2408)
That code thus becomes quite problematic for very big JPEG images, since it increases peak memory usage a lot during decoding. In the referenced issue there's a couple of JPEG images whose dimensions are `10006 x 7088` (i.e. ~68 mega-pixels), which causes the *peak* memory usage to increase by close to `1 GB` (i.e. one giga-byte) in my testing.

By letting the PDF.js JPEG decoder, rather than the browser, handle very large images the *peak* memory usage is considerably reduced and the allocated memory also seem to be reclaimed faster.

*Please note:* This will lead to movement in some existing `eq` tests.
2020-03-20 16:37:19 +01:00
Jonas Jenwald
b02be3b268 Update the eslint-plugin-no-unsanitized package to the latest version 2020-03-20 11:25:39 +01:00
Jonas Jenwald
1cd9d5a8fd Remove the unused wideChars property on Font instances
This property was added in PR 1599 (almost eight years ago), but has been unused ever since PR 3674 (six and a half years ago).
2020-03-20 10:37:32 +01:00
Tim van der Meij
228a591ca3
Merge pull request #11716 from Snuffleupagus/private-PDFPageProxy-pageIndex
[api-minor] Change the pageIndex, on `PDFPageProxy` instances, to a private property
2020-03-19 22:35:29 +01:00
Jonas Jenwald
ae2900e510 [api-minor] Change the pageIndex, on PDFPageProxy instances, to a private property
This property has never been documented and/or *intentionally* exposed through the API, instead the `PDFPageProxy.pageNumber` property is the documented/intended API to use here.
Hence pageIndex is changed to a "private" property on `PDFPageProxy` instances, and internal API functionality is also updated to *consistently* use `this._pageIndex` rather than a mix of formats.
2020-03-19 15:47:11 +01:00
Jonas Jenwald
e011be037e Enable the prefer-exponentiation-operator ESLint rule
Please see https://eslint.org/docs/rules/prefer-exponentiation-operator for additional information.
2020-03-19 12:41:25 +01:00
Tim van der Meij
1bc5cef2b5
Merge pull request #11698 from Snuffleupagus/issue-11697
Don't accidentally accept invalid glyphNames which *appear* to follow the Cdd{d}/cdd{d} format in `PartialEvaluator._buildSimpleFontToUnicode` (issue 11697)
2020-03-15 13:36:09 +01:00
Tim van der Meij
4dc1058ceb
Merge pull request #11553 from tamuratak/svg_texthscale
Fix the horizontal scaling of texts with SVG backend. #10988
2020-03-15 13:25:08 +01:00
Tim van der Meij
aa3e5a2b8f
Merge pull request #11644 from Snuffleupagus/openAction
[api-minor] Add more general OpenAction support (PR 10334 follow-up, issue 11642)
2020-03-15 13:16:37 +01:00
Jonas Jenwald
15e8692eff Don't accidentally accept invalid glyphNames which *appear* to follow the Cdd{d}/cdd{d} format in PartialEvaluator._buildSimpleFontToUnicode (issue 11697)
The /Differences array of the problematic font contains a `/c.1` entry, which is consequently detected as a *possible* Cdd{d}/cdd{d} glyphName by the existing heuristics.
Because of how the base 10 conversion is implemented, which is necessary for the base 16 special case, the parsed charCode becomes `0.1` thus causing `String.fromCodePoint` to throw since that obviously isn't a valid code point.

To fix the referenced issue, and to hopefully prevent similar ones in the future, the patch adds *additional* validation of the charCode found by the heuristics.
2020-03-13 23:35:47 +01:00
Jonas Jenwald
c5f67300e9 Rename the isSpace helper function to isWhiteSpace
Trying to enable the ESLint rule `no-shadow`, against the `master` branch, would result in a fair number of errors in the `Glyph` class in `src/core/fonts.js`.
Since the glyphs are exposed through the API, we can't very well change the `isSpace` property on `Glyph` instances. Thus the best approach seems, at least to me, to simply rename the `isSpace` helper function to `isWhiteSpace` which shouldn't cause any issues given that it's only used in the `src/core/` folder.
2020-03-12 11:36:59 +01:00
Jonas Jenwald
e4758beaaa Move IsLittleEndianCached and IsEvalSupportedCached to src/shared/util.js
Rather than duplicating the lookup and caching in multiple files, it seems easier to simply move all of this functionality into `src/shared/util.js` instead.
This will also help avoid a bunch of ESLint errors once the `no-shadow` rule is eventually enabled.
2020-03-12 11:36:26 +01:00
Jonas Jenwald
3adbba55b2 Limit the number of warning messages printed by any one Lexer.getHexString invocation
*This patch fixes something that's annoyed me every now and then over the years, when debugging/fixing corrupt PDF documents.*

For corrupt PDF documents where `Lexer.getHexString` encounters invalid characters, there's very rarely just a handful of them. In practice it's not uncommon for there to be many hundreds, or even many thousands, invalid hex characters found.
Not only is the resulting console warning spam utterly useless in these cases, there's often enough of it that performance may even suffer; hence this patch which limits the amount of messages that any one `Lexer.getHexString` invocation may print.
2020-03-09 13:34:53 +01:00
Jonas Jenwald
65e6ea2cb2 Prevent lookup errors in PartialEvaluator.hasBlendModes from breaking all parsing/rendering of a page (issue 11678)
The PDF document in question is *corrupt*, since it contains an XObject with a truncated dictionary and where the stream contents start without a "stream" operator.
2020-03-09 12:00:12 +01:00
Tim van der Meij
1a97c142b3
Merge pull request #11523 from Snuffleupagus/issue-10880
Add a heuristic, in `src/core/jpg.js`, to handle JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10880)
2020-03-06 23:03:09 +01:00
Tim van der Meij
1bb25f5cb8
Merge pull request #11673 from Snuffleupagus/FontLoader-bind-more-await
Update the `FontLoader.bind` method to avoid explicitly returning `undefined`
2020-03-06 22:51:39 +01:00
Jonas Jenwald
7d4be08dad Update the FontLoader.bind method to avoid explicitly returning undefined
The only reason for the `return undefined;` lines was to appease the ESLint `consistent-return` rule, but that's not actually necessary if you make use of the fact that the method is `async` and that we can thus await the Promise rather than returning it.
2020-03-06 17:45:24 +01:00
Jonas Jenwald
160cfc4084 Slightly simplify the lookup of data in Dict.{get, getAsync, has}
Note that `Dict.set` will only be called with values returned through `Parser.getObj`, and thus indirectly via `Lexer.getObj`. Since neither of those methods will ever return `undefined`, we can simply assert that that's the case when inserting data into the `Dict` and thus get rid of `in` checks when doing the data lookups.
In this case, since `Dict.set` is fairly hot, the patch utilizes an *inline check* and when necessary a direct call to `unreachable` to not affect performance of `gulp server/test` too much (rather than always just calling `assert`).

For very large and complex PDF files this will help performance *slightly*, since `Dict.{get, getAsync, has}` is called *a lot* during parsing in the worker.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 250,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   250 |         2838 |        2820 | -18 | -0.65 |        faster
Firefox | Page Request |   250 |            1 |           2 |   0 | 11.92 |        slower
Firefox | Rendering    |   250 |         2837 |        2818 | -19 | -0.65 |        faster
```
2020-03-06 14:12:14 +01:00
Jonas Jenwald
01fb309a2a [api-minor] Add more general OpenAction support (PR 10334 follow-up, issue 11642)
This patch deprecates the existing `getOpenActionDestination` API method, in favor of a better and more general `getOpenAction` method instead. (For now JavaScript actions, related to printing, are still handled as before.)

By clearly separating "regular" Print actions from the JavaScript handling, it's thus possible to get rid of the somewhat annoying and strictly incorrect warning when the viewer loads.
2020-03-06 13:03:00 +01:00
Tim van der Meij
c95b9b1e17
Merge pull request #11653 from Snuffleupagus/ensureStateFont
Ensure that there's always a setFont (Tf) operator before text rendering operators (issue 11651)
2020-03-03 23:33:13 +01:00
Jani Pehkonen
71e7686950 Fix Type1 font parsing when .notdef is not at index zero
Fixes #11477
The PDF draws many space characters but the embedded fonts don't have a glyph named `space`, so `.notdef` should be drawn instead. PDF.js assumed that Type1 fonts define `.notdef` as the first glyph (index 0). However, now the fonts have the glyph `A` at index 0 and `.notdef` is the last one, so `A` appears where spaces are expected.

Because the rest of the font machinery in `core/fonts.js` assumes `.notdef` is at index zero, it's easiest to modify `core/type1_parser.js` so that it "repairs" fonts and makes sure `.notdef` is at index 0.
2020-03-03 21:55:51 +02:00
Jonas Jenwald
65e514e063 Ensure that there's always a setFont (Tf) operator before text rendering operators (issue 11651)
The PDF document in question is *corrupt*, since it contains multiple instances of incorrect operators.
We obviously don't want to slow down parsing of *all* documents (since most are valid), just to accommodate a particular bad PDF generator, hence the reason for the inline check before calling the `ensureStateFont` method.
2020-03-03 10:05:18 +01:00
Jonas Jenwald
1ad65cf405 Simplify the BaseFontLoader.isFontLoadingAPISupported getter
It's no longer necessary to special-case this getter in the `GenericFontLoader` case, since the GENERIC build hasn't been using `mozPrintCallback` for years now (furthermore Firefox 63 is really old as well).
2020-03-02 23:14:48 +01:00
Takashi Tamura
d6b67cd28a Fix the horizontal scaling of texts with SVG backend. #10988 2020-03-02 14:54:41 +09:00
Takashi Tamura
d8c9f119b0 Fix the vertical writing mode with horizontal scaling. #11555.
It is not valid to multiply textHScale when the writing mode is vertical.

See 9.4.4 Text Space Details, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1694762
2020-02-29 07:48:29 +09:00
Tim van der Meij
e1586016c5
Merge pull request #11577 from Snuffleupagus/Pages-tree-refs
Prevent circular references in the /Pages tree
2020-02-27 23:36:11 +01:00
Tim van der Meij
965ebe63fd
Merge pull request #11540 from tamuratak/charspacing
Fix text spacing with vertical fonts. #7687 and #11526.
2020-02-26 22:26:27 +01:00
Jonas Jenwald
c55d30a715 Use the same non-embedded Wingdings fallback for fonts named "Wingdings-Regular" too (PR 5463 follow-up, issue 11451)
This patch extends the existing heuristics, which are really the best that we can do in general for these kinds of non-embedded *and* non-standard fonts.

Furthermore, this patch also tries to improve the copy-and-paste behaviour for non-embedded Wingdings fonts by also using the `ZapfDingbatsEncoding` in this case.

*Note:* I'm not sure that adding additional tests for Wingdings fonts matters that much, given how limited our "support" for them really is.
2020-02-24 17:40:06 +01:00
Jonas Jenwald
bf09d79eea Use the ESLint no-restricted-syntax rule to prevent direct usage of new Cmd()/new Name()/new Ref()
Given that all of these primitives implement caching, to avoid unnecessarily duplicating those objects *a lot* during parsing, it would thus be good to actually enforce usage of `Cmd.get()`/`Name.get()`/`Ref.get()` in the code-base.
Luckily it turns out that there's an ESLint rule, which is fairly easy to use, that can be used to disallow arbitrary JavaScript syntax.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-restricted-syntax
2020-02-22 21:15:00 +01:00
Jonas Jenwald
c3c3b8cd81 Add a heuristic, in src/core/jpg.js, to handle JPEG images with a wildly incorrect SOF (Start of Frame) scanLines parameter (issue 10880)
*This whole patch feels somewhat arbitrary, and I'd be slightly worried about possibly breaking something else.*

To limit the impact of these changes, we only re-parse JPEG images using a reduced `scanLines` value if and only if: An unexpected EOI (End of Image) marker was encountered during decoding of Scan data *and* the "actual" `scanLines` value is at least one order of magnitude smaller than expected.
2020-02-22 14:16:07 +01:00
Jonas Jenwald
5494f7d5bc Add basic validation of the scanLines parameter in JPEG images, before delegating decoding to the browser
In some cases PDF documents can contain JPEG images that the native browser decoder cannot handle, e.g. images with DNL (Define Number of Lines) markers or images where the SOF (Start of Frame) marker contains a wildly incorrect `scanLines` parameter.
Currently, for "simple" JPEG images, we're relying on native image decoding to *fail* before falling back to the implementation in `src/core/jpg.js`. In some cases, note e.g. issue 10880, the native image decoder doesn't outright fail and thus some images may not render.

In an attempt to improve the current situation, this patch adds additional validation of the JPEG image SOF data to force the use of `src/core/jpg.js` directly in cases where the native JPEG decoder cannot be trusted to do the right thing.
The only way to implement this is unfortunately to parse the *beginning* of the JPEG image data, looking for a SOF marker. To limit the impact of this extra parsing, the result is cached on the `JpegStream` instance and this code is only run for images which passed all of the pre-existing "can the JPEG image be natively rendered and/or decoded" checks.

---

*Slightly off-topic:* Working on this *really* makes me start questioning if native rendering/decoding of JPEG images is actually a good idea.
There's certain kinds of JPEG images not supported natively, and all of the validation which is now necessary isn't "free". At this point, in the `NativeImageDecoder`, we're having to check for certain properties in the image dictionary, parse the `ColorSpace`, and finally read the actual image data to find the SOF marker.
Furthermore, we cannot just send the image to the main-thread and be done in the "JpegStream" case, but we also need to wait for rendering to complete (or fail) before continuing with other parsing.
In the "JpegDecode" case we're even having to parse part of the image on the main-thread, which seems completely at odds with the principle of doing all heavy parsing in the Worker, and there's also a couple of potentially large (temporary) allocations/copies of TypedArray data involved as well.
2020-02-22 14:16:07 +01:00
Jonas Jenwald
6b44ae2170 Remove the unused thisArg from RefSetCache.forEach
Given that this is completely unused, and that a "normal" function call may be a *tiny* bit more efficient, there's no good reason as far as I can tell to keep it.
2020-02-21 14:23:05 +01:00
Jonas Jenwald
3c7b7be100 Prevent circular references in the /Pages tree 2020-02-19 01:49:39 +01:00
Tim van der Meij
4092aa9fbd
Merge pull request #11604 from Snuffleupagus/eslint-prefer-starts-ends-with
Enable the `unicorn/prefer-starts-ends-with` ESLint plugin rule
2020-02-16 13:17:27 +01:00
Jonas Jenwald
bc31a4be5d Enable the unicorn/prefer-starts-ends-with ESLint plugin rule
This complements the existing `mozilla/use-includes-instead-of-indexOf` plugin rule, by also disallowing unnecessary regular expressions when comparing strings.

Please see https://github.com/sindresorhus/eslint-plugin-unicorn/blob/master/docs/rules/prefer-starts-ends-with.md for additional information.
2020-02-16 12:41:53 +01:00
Jonas Jenwald
6ebd851d27 Enable the no-buffer-constructor ESLint rule
According to https://nodejs.org/api/buffer.html#buffer_class_buffer: `new Buffer(...)` is deprecated in up-to-date versions of Node.js, hence you want to prevent it from being accidentally used.

Please see https://eslint.org/docs/rules/no-buffer-constructor for additional information.
2020-02-16 12:21:40 +01:00
Tim van der Meij
f6ffc2bf37
Merge pull request #11598 from Snuffleupagus/polyfill-Map-Set-iteration
Add polyfills to support iteration of `Map` and `Set`
2020-02-14 23:24:20 +01:00
Jonas Jenwald
c97c778f8f [api-minor] Produce non-translated/non-polyfilled builds by default 2020-02-14 18:12:07 +01:00
Jonas Jenwald
4a76ab352c Add polyfills to support iteration of Map and Set
Without this, things such as e.g. `Metadata.getAll` is broken in IE11 (see PR 11596).

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map#Browser_compatibility

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Set#Browser_compatibility
2020-02-14 15:53:02 +01:00
Tim van der Meij
cd3f2d49e6
Merge pull request #11596 from Snuffleupagus/metadata-map
Re-factor how `Metadata` class instances store its data internally
2020-02-13 23:01:51 +01:00
Jonas Jenwald
5cdfff4a47 Re-factor how Metadata class instances store its data internally
Please note that these changes do *not* affect the *public* interface of the `Metadata` class, but only touches internal structures.[1]

These changes were prompted by looking at the `getAll` method, which simply returns the "private" metadata object to the consumer. This seems wrong conceptually, since it allows way too easy/accidental changes to the internal parsed metadata.
As part of fixing this, the internal metadata was changed to use a `Map` rather than a plain Object.

---
[1] Basically, we shouldn't need to worry about someone depending on internal implementation details.
2020-02-13 18:23:15 +01:00
Jonas Jenwald
3f1568b51a A couple of small improvements of the Metadata._repair method
- Remove the "capturing group" in the regular expression that removes leading "junk" from the raw metadata, since it's not necessary here (it's simply a case of too much copy-pasting in a prior patch).
   According to [MDN](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Cheatsheet#Groups_and_ranges) you want to, for performance reasons, avoid "capturing groups" unless actually needed.

 - Add inline comments to document a bunch of magic values in the code.
2020-02-13 17:20:52 +01:00
Jonas Jenwald
a5db4e985a Remove LoopbackPort.postMessage special-case for polyfilled TypedArrays
Given that all `TypedArray` polyfills were removed in PDF.js version `2.0`, since native support is now required, this branch has been dead code for awhile.
2020-02-13 12:50:41 +01:00
Jonas Jenwald
7b0836ca75 [TextLayer] Immediately set the padding, rather than checking if it's empty, in expandTextDivs
In practice it's extremely rare[1] for the padding to be zero in *all* components, hence it seems better to just set it directly rather than creating a temporary variable and checking for the "no padding"-case.

---
[1] In the `tracemonkey.pdf` file that only happens with `0.08%` of all text elements.
2020-02-11 15:52:36 +01:00
Takashi Tamura
512dbe3060 Fix text spacing with vertical fonts. #7687 and #11526.
When the writing mode is vertical, we have to reverse
the sign of spacing since we are subtracting it from
current.y. We have to add it to current.y.
See 9.4.4 Text Space Details, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1694762
2020-02-11 08:49:23 +09:00
Jonas Jenwald
ae5a34c520 [api-minor] Ensure that the Array.prototype doesn't contain any enumerable properties
Over the years there's been a fair number of issues/PRs opened, where people have wanted to add `hasOwnProperty` checks in (hot) loops in the font parsing code. This has always been rejected, since we don't want to risk reducing performance in the Firefox PDF viewer simply because some users of the general PDF.js library are *incorrectly* extending the `Array.prototype` with enumerable properties.

With this patch the general PDF.js library will now fail immediately with a hopefully useful Error message, rather than having (some) fonts fail to render, when the `Array.prototype` is incorrectly extended.

Note that I did consider making this a warning, but ultimately decided against it since it's first of all possible to disable those (with the `verbosity` parameter). Secondly, even when printed, warnings can be easy to overlook and finally a warning may also *seem* OK to ignore (as opposed to an actual Error).
2020-02-10 14:17:27 +01:00
Tim van der Meij
dced0a3821
Merge pull request #11579 from Snuffleupagus/issue-11578
Ignore spaces when normalizing the font name in `Font.fallbackToSystemFont` (issue 11578)
2020-02-09 17:33:09 +01:00
Tim van der Meij
61056a9238
Merge pull request #11551 from Snuffleupagus/issue-11549
Allow skipping of errors when reading broken/corrupt ToUnicode data (issue 11549)
2020-02-09 17:32:35 +01:00
Tim van der Meij
2fb4076e05 Merge pull request #11568 from Snuffleupagus/PDF-header-validation
Ensure that the PDF header contains an actual number (PR 11463 follow-up)
2020-02-09 17:16:25 +01:00
Tim van der Meij
102af0f915 Merge pull request #11547 from Snuffleupagus/convertCmykToRgb-scale
Use fewer multiplications in `JpegImage._convertCmykToRgb`
2020-02-09 17:06:23 +01:00
Tim van der Meij
f178805412 Merge pull request #11557 from Snuffleupagus/_getLinearizedBlockData-xScaleBlockOffset
Avoid re-calculating the `xScaleBlockOffset` when not necessary in `JpegImage._getLinearizedBlockData`
2020-02-09 16:54:28 +01:00
Tim van der Meij
7948faf675 Merge pull request #11573 from Snuffleupagus/api-cleanup-returns
[api-minor] Change `PDFDocumentProxy.cleanup`/`PDFPageProxy.cleanup` to return data
2020-02-08 20:42:28 +01:00
Tim van der Meij
a73a38029c Merge pull request #11569 from Snuffleupagus/rm-most-setAttribute
Replace most remaining `Element.setAttribute("style", ...)` usage with `Element.style = ...` instead
2020-02-08 20:13:56 +01:00
Jonas Jenwald
7937165537 Ignore spaces when normalizing the font name in Font.fallbackToSystemFont (issue 11578) 2020-02-08 19:59:04 +01:00
Jonas Jenwald
7117ee03d6 [api-minor] Change PDFDocumentProxy.cleanup/PDFPageProxy.cleanup to return data
This patch makes the following changes, to improve these API methods:

 - Let `PDFPageProxy.cleanup` return a boolean indicating if clean-up actually happened, since ongoing rendering will block clean-up.
   Besides being used in other parts of this patch, it seems that an API user may also be interested in the return value given that clean-up isn't *guaranteed* to happen.

 - Let `PDFDocumentProxy.cleanup` return the promise indicating when clean-up is finished.

 - Improve the JSDoc comment for `PDFDocumentProxy.cleanup` to mention that clean-up is triggered on *both* threads (without going into unnecessary specifics regarding what *exactly* said data actually is).
   Add a note in the JSDoc comment about not calling this method when rendering is ongoing.

 - Change `WorkerTransport.startCleanup` to throw an `Error` if it's called when rendering is ongoing, to prevent rendering from breaking.
   Please note that this won't stop *worker-thread* clean-up from happening (since there's no general "something is rendering"-flag), however I'm not sure if that's really a problem; but please don't quote me on that :-)
   All of the caches that's being cleared in `Catalog.cleanup`, on the worker-thread, *should* be re-filled automatically even if cleared *during* parsing/rendering, and the only thing that probably happens is that e.g. font data would have to be re-parsed.
  On the main-thread, on the other hand, clearing the caches is more-or-less guaranteed to cause rendering errors, since the rendering code in `src/display/canvas.js` isn't able to re-request any image/font data that's suddenly being pulled out from under it.

 - Last, but not least, add a couple of basic unit-tests for the clean-up functionality.
2020-02-07 17:00:29 +01:00
Jonas Jenwald
88c35d872f Ensure that the PDF header contains an actual number (PR 11463 follow-up)
While it would be nice to change the `PDFFormatVersion` property, as returned through `PDFDocumentProxy.getMetadata`, to a number (rather than a string) that would unfortunately be a breaking API change.
However, it does seem like a good idea to at least *validate* the PDF header version on the worker-thread, rather than potentially returning an arbitrary string.
2020-02-07 12:25:07 +01:00
Tim van der Meij
e12e83702d
Merge pull request #11559 from bhasto/curveto2-fix
Fix how curveTo2 (v operator) is translated to SVG
2020-02-06 23:10:41 +01:00
Brendan Dahl
09a6e17d22
Merge pull request #11528 from janpe2/type1-nonemb-notdef
Hide .notdef glyphs in non-embedded Type1 fonts and don't ignore Widths
2020-02-06 13:30:07 -08:00
Jonas Jenwald
5cbd44b628 Replace most remaining Element.setAttribute("style", ...) usage with Element.style = ... instead
This should hopefully be useful in environments where restrictive CSPs are in effect.
In most cases the replacement is entirely straighforward, and there's only a couple of special cases:
 - For the `src/display/font_loader.js` and `web/pdf_outline_viewer.js `cases, since the elements aren't appended to the document yet, it shouldn't matter if the style properties are set one-by-one rather than all at once.
 - For the `web/debugger.js` case, there's really no need to set the `padding` inline at all and the definition was simply moved to `web/viewer.css` instead.

*Please note:* There's still *a single* case left, in `web/toolbar.js` for setting the width of the zoom dropdown, which is left intact for now.
The reasons are that this particular case shouldn't matter for users of the general PDF.js library, and that it'd make a lot more sense to just try and re-factor that very old code anyway (thus fixing the `setAttribute` usage in the process).
2020-02-05 22:26:47 +01:00
Branislav Hašto
393aed9978 Fix how curveTo2 (v operator) is translated to SVG
Based on the PDF spec, with `v` operator, current point should be used as the first control point of the curve.

Do not overwrite current point before an SVG curve is built, so it can b actually used as first control point.
2020-02-02 17:03:29 +01:00
Jonas Jenwald
a4440a1c6b Avoid re-calculating the xScaleBlockOffset when not necessary in JpegImage._getLinearizedBlockData
As can be seen in the code, the `xScaleBlockOffset` typed array doesn't depend on the actual image data but only on the width and x-scale. The width is obviously consistent for an image, and it turns out that in practice the `componentScaleX` is quite often identical between two (or more) adjacent image components.
All-in-all it's thus not necessary to *unconditionally* re-compute the `xScaleBlockOffset` when getting the JPEG image data.

While avoiding, in many cases, one or more loops can never be a bad thing these changes are unfortunately completely dominated by the rest of the JpegImage code and consequently doesn't really show up in benchmark results. *Hence I'd understand if this patch is ultimately deemed not necessary.*
2020-02-01 11:58:50 +01:00
Jonas Jenwald
4c54395ff6 Allow skipping of errors when reading broken/corrupt ToUnicode data (issue 11549)
This will allow font loading/parsing to continue, rather than immediately failing, when broken/corrupt CMap data is encountered.
2020-01-30 13:19:05 +01:00
Jonas Jenwald
ce4f41d06a Use fewer multiplications in JpegImage._convertCmykToRgb
*Note:* This is inspired by PR 5473, which made similar changes for another kind of JPEG data.

Since the implementation in `src/core/jpg.js` only supports 8-bit data, as opposed to similar code in `src/core/colorspace.js`, the computations can be further simplified since the `scale` is always constant.
By updating the coefficients, effectively inlining the `scale`, we'll thus avoid *four* multiplications for each loop iteration.

Unfortunately I wasn't able, based on a quick look through the test-files, to find a sufficiently *large* CMYK JPEG image in order for these changes to really show up in benchmark results. However, when testing the `cmykjpeg.pdf` manually there's a total of `120 000` fewer multiplication with this patch.
2020-01-29 18:34:58 +01:00
Takashi Tamura
0b701e7950 Fix the indices of arguments for RadialAxial. It is related to #10646. 2020-01-29 19:18:50 +09:00
Tim van der Meij
7ae504222f
Merge pull request #11544 from Snuffleupagus/decodeHuffman
Make the `decodeHuffman` function, in `src/core/jpg.js`, slightly more efficient
2020-01-28 22:54:46 +01:00
Tim van der Meij
e9dc179673
Merge pull request #11537 from Snuffleupagus/setupFakeWorker-configure
Send the `verbosity` level when setting up fake workers (issue 11536)
2020-01-28 22:50:30 +01:00
Jonas Jenwald
f5a617a334 Make the decodeHuffman function, in src/core/jpg.js, slightly more efficient
Rather than repeating the `typeof node` check twice, we can use a `switch` statement instead.

This patch was tested using the PDF file from issue 3809, i.e. https://web.archive.org/web/20140801150504/http://vs.twonky.dk/invitation.pdf, with the following manifest file:
```
[
    {  "id": "issue3809",
       "file": "../web/pdfs/issue3809.pdf",
       "md5": "",
       "rounds": 50,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |    50 |        12537 |       12451 | -86 | -0.69 |        faster
Firefox | Page Request |    50 |            5 |           5 |   0 |  0.77 |
Firefox | Rendering    |    50 |        12532 |       12446 | -86 | -0.69 |        faster
```
2020-01-28 14:23:58 +01:00
Tim van der Meij
474fe1757e
Merge pull request #11508 from Snuffleupagus/jpg-default-marker
Simplify the handling of unsupported/incorrect markers in `src/core/jpg.js`
2020-01-26 21:32:13 +01:00
Jonas Jenwald
62b2b984cc Render Popup annotations last, once all other annotations have been rendered (issue 11362)
In the current `AnnotationLayer` implementation, Popup annotations require that the parent annotation have already been rendered (otherwise they're simply ignored).
Usually the annotations are ordered, in the `/Annots` array, in such a way that this isn't a problem, however there's obviously no guarantee that all PDF generators actually do so. Hence we simply ensure, when rendering the `AnnotationLayer`, that the Popup annotations are handled last.
2020-01-26 15:49:55 +01:00
Jonas Jenwald
427df2dfd7 Send the verbosity level when setting up fake workers (issue 11536)
Interestingly the viewer already seem to work correctly as-is, with workers disabled and a non-standard `verbosity` level.
Hence this is possibly Node.js specific, but given that the issue is lacking *both* the PDF file in question and a runnable test-case, so this patch is essentially a best-effort guess at what the problem could be.
2020-01-26 12:37:45 +01:00
Jonas Jenwald
13930e5202 Simplify the handling of unsupported/incorrect markers in src/core/jpg.js
- Re-factor the "incorrect encoding" check, since this can be easily achieved using the general `findNextFileMarker` helper function (with a suitable `startPos` argument).

 - Tweak a condition, to make it easier to see that the end of the data has been reached.

 - Add a reference test for issue 1877, since it's what prompted the "incorrect encoding" check.
2020-01-25 22:52:24 +01:00
Tim van der Meij
3775b711ed
Merge pull request #11482 from Snuffleupagus/more-core-utils
Convert `src/core/jpg.js` to use the `readUint16` helper function in `src/core/core_utils.js`, rather than re-implementing it twice
2020-01-25 21:38:34 +01:00
Tim van der Meij
cbbda9d883
Merge pull request #11515 from Snuffleupagus/cache-fallback-font
Cache the fallback font dictionary on the `PartialEvaluator` (PR 11218 follow-up)
2020-01-25 21:32:28 +01:00
Jonas Jenwald
188b320e18 Convert src/core/jpg.js to use the readUint16 helper function in src/core/core_utils.js, rather than re-implementing it twice
The other image decoders, i.e. the JBIG2 and JPEG 2000 ones, are using the common helper function `readUint16`. Most likely, the only reason that the JPEG decoder is doing it this way is because it originated *outside* of the PDF.js library.
Hence we can simply re-factor `src/core/jpg.js` to use the common `readUint16` helper function, which is especially nice given that the functionality was essentially *duplicated* in the code.
2020-01-25 00:35:10 +01:00
Jonas Jenwald
3f031f69c2 Move additional worker-thread only functions from src/shared/util.js and into a src/core/core_utils.js instead
This moves the `log2`, `readInt8`, `readUint16`, `readUint32`, and `isSpace` functions since they are only used in the worker-thread.
2020-01-25 00:33:52 +01:00
Jonas Jenwald
83bdb525a4 Fix remaining linting errors, from enabling the prefer-const ESLint rule globally
This covers cases that the `--fix` command couldn't deal with, and in a few cases (notably `src/core/jbig2.js`) the code was changed to use block-scoped variables instead.
2020-01-25 00:20:23 +01:00
Jonas Jenwald
9e262ae7fa Enable the ESLint prefer-const rule globally (PR 11450 follow-up)
Please find additional details about the ESLint rule at https://eslint.org/docs/rules/prefer-const

With the recent introduction of Prettier this sort of mass enabling of ESLint rules becomes a lot easier, since the code will be automatically reformatted as necessary to account for e.g. changed line lengths.

Note that this patch is generated automatically, by using the ESLint `--fix` argument, and will thus require some additional clean-up (which is done separately).
2020-01-25 00:20:22 +01:00
Tim van der Meij
d2d9441373
Merge pull request #11489 from Snuffleupagus/rm-FIREFOX-define
Remove the `FIREFOX` build flag, since it's completely unused and simplify a couple of `PDFJSDev` checks
2020-01-24 23:59:13 +01:00
Tim van der Meij
668a29aa45
Merge pull request #11497 from Snuffleupagus/Promise-allSettled
Add support for `Promise.allSettled`
2020-01-22 23:06:54 +01:00
Tim van der Meij
a88dec197f
Merge pull request #11511 from Snuffleupagus/eslint-no-nested-ternary
Enable the `no-nested-ternary` ESLint rule (PR 11488 follow-up)
2020-01-22 22:52:59 +01:00
Jonas Jenwald
3b78f4e8f8 Fix a couple of cases where Prettier broke existing formatting (PR 11446 follow-up)
These two cases should have been whitelisted prior to re-formatting respectively had the comments fixed afterwards, however I unfortunately missed them because of the massive size of the diff.
2020-01-22 09:12:12 +01:00
Jani Pehkonen
809b96b40c Hide .notdef glyphs in non-embedded Type1 fonts and don't ignore Widths
Fixes #11403
The PDF uses the non-embedded Type1 font Helvetica. Character codes 194 and 160 (`Â` and `NBSP`) are encoded as `.notdef`. We shouldn't show those glyphs because it seems that Acrobat Reader doesn't draw glyphs that are named `.notdef` in fonts like this.

In addition to testing `glyphName === ".notdef"`, we must test also `glyphName === ""` because the name `""` is used in `core/encodings.js` for undefined glyphs in encodings like `WinAnsiEncoding`.

The solution above hides the `Â` characters but now the replacement character (space) appears to be too wide. I found out that PDF.js ignores font's `Widths` array if the font has no `FontDescriptor` entry. That happens in #11403, so the default widths of Helvetica were used as specified in `core/metrics.js` and `.nodef` got a width of 333. The correct width is 0 as specified by the `Widths` array in the PDF. Thus we must never ignore `Widths`.
2020-01-21 21:35:25 +02:00
Jonas Jenwald
a39943554a Simplify, and tweak, a couple of PDFJSDev checks
This removes a couple of, thanks to preceeding code, unnecessary `typeof PDFJSDev` checks, and also fixes a couple of incorrectly implemented (my fault) checks intended for `TESTING` builds.
2020-01-21 00:06:15 +01:00
Jonas Jenwald
7322a24ce4 Remove the FIREFOX build flag, since it's completely unused
After PR 9566, which removed all of the old Firefox extension code, the `FIREFOX` build flag is no longer used for anything.
It thus seems to me that it should be removed, for a couple of reasons:
 - It's simply dead code now, which only serves to add confusion when looking at the `PDFJSDev` calls.
 - It used to be that `MOZCENTRAL` and `FIREFOX` was *almost* always used together. However, ever since PR 9566 there's obviously been no effort put into keeping the `FIREFOX` build flags up to date.
 - In the event that a new, Webextension based, Firefox addon is created in the future you'd still need to audit all `MOZCENTRAL` (and possibly `CHROME`) build flags to see what'd make sense for the addon.
2020-01-21 00:06:15 +01:00
Tim van der Meij
ccf327538b
Merge pull request #11519 from tamuratak/enable_eslint_import_extensions
Enable import/extensions of ESlint plugin to enforce all `import` have a `.js` file extension.
2020-01-19 17:37:19 +01:00
Jonas Jenwald
ee87e898db Update the GlobalWorkerOptions.workerSrc JSDoc comment
This particular JSDoc comment is fairly old and it also contains some now unrelated/confusing information.
The only way to *guarantee* that the PDF.js library works as expected is to correctly set the global `workerSrc`[1], hence giving the impression that the option isn't strictly necessary is thus incorrect.

---
[1] Since advertising the fallbackWorkerSrc functionality definitely seems like the *wrong* thing to do.
2020-01-19 12:44:42 +01:00
Takashi Tamura
00ce7898a2 Enable import/extensions of ESlint plugin to enforce all import have a .js file extension.
Related to #11465.

- https://github.com/benmosher/eslint-plugin-import/blob/master/docs/rules/extensions.md
2020-01-18 10:53:01 +09:00
Jonas Jenwald
9ab7c280aa Cache the fallback font dictionary on the PartialEvaluator (PR 11218 follow-up)
This way we'll benefit from the existing font caching, and can thus avoid re-creating a fallback font over and over again during parsing.
(Thece changes necessitated the previous patch, since otherwise breakage could occur e.g. with fake workers.)
2020-01-16 15:12:05 +01:00
Jonas Jenwald
090ff116d4 Ensure that full clean-up is always run when handling the "Terminate" message in src/core/worker.js
This is beneficial in situations where the Worker is being re-used, for example with fake workers, since it ensures that things like font resources are actually released.
2020-01-16 15:11:56 +01:00
Jonas Jenwald
c591826f3b Enable the no-nested-ternary ESLint rule (PR 11488 follow-up)
This rule is already enabled in mozilla-central, and helps avoid some confusing formatting, see https://searchfox.org/mozilla-central/rev/9e45d74b956be046e5021a746b0c8912f1c27318/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#209-210

With the recent introduction of Prettier some of the existing nested ternary statements became even more difficult to read, since any possibly helpful indentation was removed.
This particular ESLint rule wasn't entirely straightforward to enable, and I do recognize that there's a certain amount of subjectivity in the changes being made. Generally, the changes in this patch fall into three categories:
 - Cases where a value is only clamped to a certain range (the easiest ones to update).
 - Cases where the values involved are "simple", such as Numbers and Strings, which are re-factored to initialize the variable with the *default* value and only update it when necessary by using `if`/`else if` statements.
 - Cases with more complex and/or larger values, such as TypedArrays, which are re-factored to let the variable be (implicitly) undefined and where all values are then set through `if`/`else if`/`else` statements.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-nested-ternary
2020-01-14 17:49:39 +01:00
Jonas Jenwald
78917bab91 Update src/display/{annotation_layer.js, svg.js} to determine the fontWeight in the same way as with canvas (PR 6091 and 7839 follow-up) 2020-01-14 15:29:59 +01:00
Jonas Jenwald
6590cc32f2 Extract the subroutine bias computation into a helper function in src/core/font_renderer.js 2020-01-14 15:29:53 +01:00
Jonas Jenwald
2942233c9c Add support for Promise.allSettled
Please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/allSettled
2020-01-10 14:35:12 +01:00
Tim van der Meij
93aa613db7
Merge pull request #11465 from Snuffleupagus/import-file-extension
Ensure that all `import` and `require` statements, in the entire code-base, have a `.js` file extension
2020-01-06 23:24:43 +01:00
Jonas Jenwald
94f084958a Update the year in the license_header files 2020-01-05 12:14:03 +01:00
Jonas Jenwald
36881e3770 Ensure that all import and require statements, in the entire code-base, have a .js file extension
In order to eventually get rid of SystemJS and start using native `import`s instead, we'll need to provide "complete" file identifiers since otherwise there'll be MIME type errors when attempting to use `import`.
2020-01-04 13:01:43 +01:00
Jonas Jenwald
f8ab8c4d3a Move the SegoeUISymbol font to the getNonStdFontMap (PR 8698 follow-up)
For reasons that I now cannot even begin to understand, the non-standard SegoeUISymbol font was placed in the `getStdFontMap`. That honestly makes no sense, hence this patch which does what I *should* have done from the start.
2019-12-28 11:02:49 +01:00
Jonas Jenwald
a63f7ad486 Fix the linting errors, from the Prettier auto-formatting, that ESLint --fix couldn't handle
This patch makes the follow changes:
 - Remove no longer necessary inline `// eslint-disable-...` comments.
 - Fix `// eslint-disable-...` comments that Prettier moved down, thus causing new linting errors.
 - Concatenate strings which now fit on just one line.
 - Fix comments that are now too long.
 - Finally, and most importantly, adjust comments that Prettier moved down, since the new positions often is confusing or outright wrong.
2019-12-26 12:35:12 +01:00
Jonas Jenwald
de36b2aaba Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).

Prettier is being used for a couple of reasons:

 - To be consistent with `mozilla-central`, where Prettier is already in use across the tree.

 - To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.

Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.

*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.

(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-26 12:34:24 +01:00
Jonas Jenwald
8ec1dfde49 Add // prettier-ignore comments to prevent re-formatting of certain data structures
There's a fair number of (primarily) `Array`s/`TypedArray`s whose formatting we don't want disturb, since in many cases that would lead to the code becoming much more difficult to read and/or break existing inline comments.

*Please note:* It may be a good idea to look through these cases individually, and possibly re-write some of the them (especially the `String` ones) to reduce the need for all of these ignore commands.
2019-12-26 00:14:03 +01:00
Jonas Jenwald
70e3345cb4 Support OpenAction dictionaries without Type entries when parsing Print actions (issue 11442)
The PDF generator didn't bother including the `Type` entry in the OpenAction dictionary, hence we skipped parsing the `Print` action.
2019-12-24 10:41:33 +01:00
Wojciech Maj
d40d33682b
Extract & use createHeaders helper in src/display/fetch_stream.js 2019-12-23 08:08:17 +01:00