Commit Graph

4114 Commits

Author SHA1 Message Date
Jani Pehkonen
a22c0eab48 The first glyph in CFF CIDFonts must be named 0 instead of ".notdef"
Fixes #11718 in which the `ff` ligature glyph is at index zero in a CFF font. Beacuse this is a CIDFont, glyph names are CIDs, which are integers. Thus the string `".notdef"` is not correct. The rest of the charset data is already parsed correctly as integers when the boolean argument `cid` is true.
2020-03-24 15:56:50 +02:00
Jonas Jenwald
216cbca16c Remove variable shadowing from the JavaScript files in the src/core/ folder
*This is part of a series of patches that will try to split PR 11566 into smaller chunks, to make reviewing more feasible.*

Once all the code has been fixed, we'll be able to eventually enable the ESLint no-shadow rule; see https://eslint.org/docs/rules/no-shadow
2020-03-23 18:28:30 +01:00
Jonas Jenwald
5eabe08c74 Exclude the setPDFNetworkStreamFactory function from the built API docs
Please note that the `setPDFNetworkStreamFactory` functionality isn't exposed in the public API, i.e. not listed among the exports in the `src/pdf.js` file, and that even if it were it wouldn't really be useful considering that none of the `PDFNetworkStream`/`PDFFetchStream`/`PDFNodeStream` classes are exported either.
2020-03-23 16:41:06 +01:00
Jonas Jenwald
c3c197d87a Remove old API methods which were previously converted to throwing (PR 11219 follow-up)
These methods were deprecated already in PDF.js version `2.1.266`, see PRs 10246 and 10369, and were converted to throw `Error`s upon invocation in PDF.js version `2.4.456`, see PR 11219.
Hence it ought to be possible to remove these methods now.
2020-03-23 16:41:03 +01:00
Jonas Jenwald
3539a17d2a Remove variable shadowing from the JavaScript files in the src/display/ folder
*This is part of a series of patches that will try to split PR 11566 into smaller chunks, to make reviewing more feasible.*

Once all the code has been fixed, we'll be able to eventually enable the ESLint no-shadow rule; see https://eslint.org/docs/rules/no-shadow
2020-03-20 23:09:41 +01:00
Tim van der Meij
6ecc9fae1c
Merge pull request #11720 from Snuffleupagus/eslint-no-unsanitized
Update the `eslint-plugin-no-unsanitized` package to the latest version
2020-03-20 21:04:24 +01:00
Jonas Jenwald
62a9c26cda Always prefer the PDF.js JPEG decoder for very large images, in order to reduce peak memory usage (issue 11694)
When JPEG images are decoded by the browser, on the main-thread, there's a handful of short-lived copies of the image data; see c3f4690bde/src/display/api.js (L2364-L2408)
That code thus becomes quite problematic for very big JPEG images, since it increases peak memory usage a lot during decoding. In the referenced issue there's a couple of JPEG images whose dimensions are `10006 x 7088` (i.e. ~68 mega-pixels), which causes the *peak* memory usage to increase by close to `1 GB` (i.e. one giga-byte) in my testing.

By letting the PDF.js JPEG decoder, rather than the browser, handle very large images the *peak* memory usage is considerably reduced and the allocated memory also seem to be reclaimed faster.

*Please note:* This will lead to movement in some existing `eq` tests.
2020-03-20 16:37:19 +01:00
Jonas Jenwald
b02be3b268 Update the eslint-plugin-no-unsanitized package to the latest version 2020-03-20 11:25:39 +01:00
Jonas Jenwald
1cd9d5a8fd Remove the unused wideChars property on Font instances
This property was added in PR 1599 (almost eight years ago), but has been unused ever since PR 3674 (six and a half years ago).
2020-03-20 10:37:32 +01:00
Tim van der Meij
228a591ca3
Merge pull request #11716 from Snuffleupagus/private-PDFPageProxy-pageIndex
[api-minor] Change the pageIndex, on `PDFPageProxy` instances, to a private property
2020-03-19 22:35:29 +01:00
Jonas Jenwald
ae2900e510 [api-minor] Change the pageIndex, on PDFPageProxy instances, to a private property
This property has never been documented and/or *intentionally* exposed through the API, instead the `PDFPageProxy.pageNumber` property is the documented/intended API to use here.
Hence pageIndex is changed to a "private" property on `PDFPageProxy` instances, and internal API functionality is also updated to *consistently* use `this._pageIndex` rather than a mix of formats.
2020-03-19 15:47:11 +01:00
Jonas Jenwald
e011be037e Enable the prefer-exponentiation-operator ESLint rule
Please see https://eslint.org/docs/rules/prefer-exponentiation-operator for additional information.
2020-03-19 12:41:25 +01:00
Tim van der Meij
1bc5cef2b5
Merge pull request #11698 from Snuffleupagus/issue-11697
Don't accidentally accept invalid glyphNames which *appear* to follow the Cdd{d}/cdd{d} format in `PartialEvaluator._buildSimpleFontToUnicode` (issue 11697)
2020-03-15 13:36:09 +01:00
Tim van der Meij
4dc1058ceb
Merge pull request #11553 from tamuratak/svg_texthscale
Fix the horizontal scaling of texts with SVG backend. #10988
2020-03-15 13:25:08 +01:00
Tim van der Meij
aa3e5a2b8f
Merge pull request #11644 from Snuffleupagus/openAction
[api-minor] Add more general OpenAction support (PR 10334 follow-up, issue 11642)
2020-03-15 13:16:37 +01:00
Jonas Jenwald
15e8692eff Don't accidentally accept invalid glyphNames which *appear* to follow the Cdd{d}/cdd{d} format in PartialEvaluator._buildSimpleFontToUnicode (issue 11697)
The /Differences array of the problematic font contains a `/c.1` entry, which is consequently detected as a *possible* Cdd{d}/cdd{d} glyphName by the existing heuristics.
Because of how the base 10 conversion is implemented, which is necessary for the base 16 special case, the parsed charCode becomes `0.1` thus causing `String.fromCodePoint` to throw since that obviously isn't a valid code point.

To fix the referenced issue, and to hopefully prevent similar ones in the future, the patch adds *additional* validation of the charCode found by the heuristics.
2020-03-13 23:35:47 +01:00
Jonas Jenwald
c5f67300e9 Rename the isSpace helper function to isWhiteSpace
Trying to enable the ESLint rule `no-shadow`, against the `master` branch, would result in a fair number of errors in the `Glyph` class in `src/core/fonts.js`.
Since the glyphs are exposed through the API, we can't very well change the `isSpace` property on `Glyph` instances. Thus the best approach seems, at least to me, to simply rename the `isSpace` helper function to `isWhiteSpace` which shouldn't cause any issues given that it's only used in the `src/core/` folder.
2020-03-12 11:36:59 +01:00
Jonas Jenwald
e4758beaaa Move IsLittleEndianCached and IsEvalSupportedCached to src/shared/util.js
Rather than duplicating the lookup and caching in multiple files, it seems easier to simply move all of this functionality into `src/shared/util.js` instead.
This will also help avoid a bunch of ESLint errors once the `no-shadow` rule is eventually enabled.
2020-03-12 11:36:26 +01:00
Jonas Jenwald
3adbba55b2 Limit the number of warning messages printed by any one Lexer.getHexString invocation
*This patch fixes something that's annoyed me every now and then over the years, when debugging/fixing corrupt PDF documents.*

For corrupt PDF documents where `Lexer.getHexString` encounters invalid characters, there's very rarely just a handful of them. In practice it's not uncommon for there to be many hundreds, or even many thousands, invalid hex characters found.
Not only is the resulting console warning spam utterly useless in these cases, there's often enough of it that performance may even suffer; hence this patch which limits the amount of messages that any one `Lexer.getHexString` invocation may print.
2020-03-09 13:34:53 +01:00
Jonas Jenwald
65e6ea2cb2 Prevent lookup errors in PartialEvaluator.hasBlendModes from breaking all parsing/rendering of a page (issue 11678)
The PDF document in question is *corrupt*, since it contains an XObject with a truncated dictionary and where the stream contents start without a "stream" operator.
2020-03-09 12:00:12 +01:00
Tim van der Meij
1a97c142b3
Merge pull request #11523 from Snuffleupagus/issue-10880
Add a heuristic, in `src/core/jpg.js`, to handle JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10880)
2020-03-06 23:03:09 +01:00
Tim van der Meij
1bb25f5cb8
Merge pull request #11673 from Snuffleupagus/FontLoader-bind-more-await
Update the `FontLoader.bind` method to avoid explicitly returning `undefined`
2020-03-06 22:51:39 +01:00
Jonas Jenwald
7d4be08dad Update the FontLoader.bind method to avoid explicitly returning undefined
The only reason for the `return undefined;` lines was to appease the ESLint `consistent-return` rule, but that's not actually necessary if you make use of the fact that the method is `async` and that we can thus await the Promise rather than returning it.
2020-03-06 17:45:24 +01:00
Jonas Jenwald
160cfc4084 Slightly simplify the lookup of data in Dict.{get, getAsync, has}
Note that `Dict.set` will only be called with values returned through `Parser.getObj`, and thus indirectly via `Lexer.getObj`. Since neither of those methods will ever return `undefined`, we can simply assert that that's the case when inserting data into the `Dict` and thus get rid of `in` checks when doing the data lookups.
In this case, since `Dict.set` is fairly hot, the patch utilizes an *inline check* and when necessary a direct call to `unreachable` to not affect performance of `gulp server/test` too much (rather than always just calling `assert`).

For very large and complex PDF files this will help performance *slightly*, since `Dict.{get, getAsync, has}` is called *a lot* during parsing in the worker.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 250,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   250 |         2838 |        2820 | -18 | -0.65 |        faster
Firefox | Page Request |   250 |            1 |           2 |   0 | 11.92 |        slower
Firefox | Rendering    |   250 |         2837 |        2818 | -19 | -0.65 |        faster
```
2020-03-06 14:12:14 +01:00
Jonas Jenwald
01fb309a2a [api-minor] Add more general OpenAction support (PR 10334 follow-up, issue 11642)
This patch deprecates the existing `getOpenActionDestination` API method, in favor of a better and more general `getOpenAction` method instead. (For now JavaScript actions, related to printing, are still handled as before.)

By clearly separating "regular" Print actions from the JavaScript handling, it's thus possible to get rid of the somewhat annoying and strictly incorrect warning when the viewer loads.
2020-03-06 13:03:00 +01:00
Tim van der Meij
c95b9b1e17
Merge pull request #11653 from Snuffleupagus/ensureStateFont
Ensure that there's always a setFont (Tf) operator before text rendering operators (issue 11651)
2020-03-03 23:33:13 +01:00
Jani Pehkonen
71e7686950 Fix Type1 font parsing when .notdef is not at index zero
Fixes #11477
The PDF draws many space characters but the embedded fonts don't have a glyph named `space`, so `.notdef` should be drawn instead. PDF.js assumed that Type1 fonts define `.notdef` as the first glyph (index 0). However, now the fonts have the glyph `A` at index 0 and `.notdef` is the last one, so `A` appears where spaces are expected.

Because the rest of the font machinery in `core/fonts.js` assumes `.notdef` is at index zero, it's easiest to modify `core/type1_parser.js` so that it "repairs" fonts and makes sure `.notdef` is at index 0.
2020-03-03 21:55:51 +02:00
Jonas Jenwald
65e514e063 Ensure that there's always a setFont (Tf) operator before text rendering operators (issue 11651)
The PDF document in question is *corrupt*, since it contains multiple instances of incorrect operators.
We obviously don't want to slow down parsing of *all* documents (since most are valid), just to accommodate a particular bad PDF generator, hence the reason for the inline check before calling the `ensureStateFont` method.
2020-03-03 10:05:18 +01:00
Jonas Jenwald
1ad65cf405 Simplify the BaseFontLoader.isFontLoadingAPISupported getter
It's no longer necessary to special-case this getter in the `GenericFontLoader` case, since the GENERIC build hasn't been using `mozPrintCallback` for years now (furthermore Firefox 63 is really old as well).
2020-03-02 23:14:48 +01:00
Takashi Tamura
d6b67cd28a Fix the horizontal scaling of texts with SVG backend. #10988 2020-03-02 14:54:41 +09:00
Takashi Tamura
d8c9f119b0 Fix the vertical writing mode with horizontal scaling. #11555.
It is not valid to multiply textHScale when the writing mode is vertical.

See 9.4.4 Text Space Details, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1694762
2020-02-29 07:48:29 +09:00
Tim van der Meij
e1586016c5
Merge pull request #11577 from Snuffleupagus/Pages-tree-refs
Prevent circular references in the /Pages tree
2020-02-27 23:36:11 +01:00
Tim van der Meij
965ebe63fd
Merge pull request #11540 from tamuratak/charspacing
Fix text spacing with vertical fonts. #7687 and #11526.
2020-02-26 22:26:27 +01:00
Jonas Jenwald
c55d30a715 Use the same non-embedded Wingdings fallback for fonts named "Wingdings-Regular" too (PR 5463 follow-up, issue 11451)
This patch extends the existing heuristics, which are really the best that we can do in general for these kinds of non-embedded *and* non-standard fonts.

Furthermore, this patch also tries to improve the copy-and-paste behaviour for non-embedded Wingdings fonts by also using the `ZapfDingbatsEncoding` in this case.

*Note:* I'm not sure that adding additional tests for Wingdings fonts matters that much, given how limited our "support" for them really is.
2020-02-24 17:40:06 +01:00
Jonas Jenwald
bf09d79eea Use the ESLint no-restricted-syntax rule to prevent direct usage of new Cmd()/new Name()/new Ref()
Given that all of these primitives implement caching, to avoid unnecessarily duplicating those objects *a lot* during parsing, it would thus be good to actually enforce usage of `Cmd.get()`/`Name.get()`/`Ref.get()` in the code-base.
Luckily it turns out that there's an ESLint rule, which is fairly easy to use, that can be used to disallow arbitrary JavaScript syntax.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-restricted-syntax
2020-02-22 21:15:00 +01:00
Jonas Jenwald
c3c3b8cd81 Add a heuristic, in src/core/jpg.js, to handle JPEG images with a wildly incorrect SOF (Start of Frame) scanLines parameter (issue 10880)
*This whole patch feels somewhat arbitrary, and I'd be slightly worried about possibly breaking something else.*

To limit the impact of these changes, we only re-parse JPEG images using a reduced `scanLines` value if and only if: An unexpected EOI (End of Image) marker was encountered during decoding of Scan data *and* the "actual" `scanLines` value is at least one order of magnitude smaller than expected.
2020-02-22 14:16:07 +01:00
Jonas Jenwald
5494f7d5bc Add basic validation of the scanLines parameter in JPEG images, before delegating decoding to the browser
In some cases PDF documents can contain JPEG images that the native browser decoder cannot handle, e.g. images with DNL (Define Number of Lines) markers or images where the SOF (Start of Frame) marker contains a wildly incorrect `scanLines` parameter.
Currently, for "simple" JPEG images, we're relying on native image decoding to *fail* before falling back to the implementation in `src/core/jpg.js`. In some cases, note e.g. issue 10880, the native image decoder doesn't outright fail and thus some images may not render.

In an attempt to improve the current situation, this patch adds additional validation of the JPEG image SOF data to force the use of `src/core/jpg.js` directly in cases where the native JPEG decoder cannot be trusted to do the right thing.
The only way to implement this is unfortunately to parse the *beginning* of the JPEG image data, looking for a SOF marker. To limit the impact of this extra parsing, the result is cached on the `JpegStream` instance and this code is only run for images which passed all of the pre-existing "can the JPEG image be natively rendered and/or decoded" checks.

---

*Slightly off-topic:* Working on this *really* makes me start questioning if native rendering/decoding of JPEG images is actually a good idea.
There's certain kinds of JPEG images not supported natively, and all of the validation which is now necessary isn't "free". At this point, in the `NativeImageDecoder`, we're having to check for certain properties in the image dictionary, parse the `ColorSpace`, and finally read the actual image data to find the SOF marker.
Furthermore, we cannot just send the image to the main-thread and be done in the "JpegStream" case, but we also need to wait for rendering to complete (or fail) before continuing with other parsing.
In the "JpegDecode" case we're even having to parse part of the image on the main-thread, which seems completely at odds with the principle of doing all heavy parsing in the Worker, and there's also a couple of potentially large (temporary) allocations/copies of TypedArray data involved as well.
2020-02-22 14:16:07 +01:00
Jonas Jenwald
6b44ae2170 Remove the unused thisArg from RefSetCache.forEach
Given that this is completely unused, and that a "normal" function call may be a *tiny* bit more efficient, there's no good reason as far as I can tell to keep it.
2020-02-21 14:23:05 +01:00
Jonas Jenwald
3c7b7be100 Prevent circular references in the /Pages tree 2020-02-19 01:49:39 +01:00
Tim van der Meij
4092aa9fbd
Merge pull request #11604 from Snuffleupagus/eslint-prefer-starts-ends-with
Enable the `unicorn/prefer-starts-ends-with` ESLint plugin rule
2020-02-16 13:17:27 +01:00
Jonas Jenwald
bc31a4be5d Enable the unicorn/prefer-starts-ends-with ESLint plugin rule
This complements the existing `mozilla/use-includes-instead-of-indexOf` plugin rule, by also disallowing unnecessary regular expressions when comparing strings.

Please see https://github.com/sindresorhus/eslint-plugin-unicorn/blob/master/docs/rules/prefer-starts-ends-with.md for additional information.
2020-02-16 12:41:53 +01:00
Jonas Jenwald
6ebd851d27 Enable the no-buffer-constructor ESLint rule
According to https://nodejs.org/api/buffer.html#buffer_class_buffer: `new Buffer(...)` is deprecated in up-to-date versions of Node.js, hence you want to prevent it from being accidentally used.

Please see https://eslint.org/docs/rules/no-buffer-constructor for additional information.
2020-02-16 12:21:40 +01:00
Tim van der Meij
f6ffc2bf37
Merge pull request #11598 from Snuffleupagus/polyfill-Map-Set-iteration
Add polyfills to support iteration of `Map` and `Set`
2020-02-14 23:24:20 +01:00
Jonas Jenwald
c97c778f8f [api-minor] Produce non-translated/non-polyfilled builds by default 2020-02-14 18:12:07 +01:00
Jonas Jenwald
4a76ab352c Add polyfills to support iteration of Map and Set
Without this, things such as e.g. `Metadata.getAll` is broken in IE11 (see PR 11596).

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map#Browser_compatibility

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Set#Browser_compatibility
2020-02-14 15:53:02 +01:00
Tim van der Meij
cd3f2d49e6
Merge pull request #11596 from Snuffleupagus/metadata-map
Re-factor how `Metadata` class instances store its data internally
2020-02-13 23:01:51 +01:00
Jonas Jenwald
5cdfff4a47 Re-factor how Metadata class instances store its data internally
Please note that these changes do *not* affect the *public* interface of the `Metadata` class, but only touches internal structures.[1]

These changes were prompted by looking at the `getAll` method, which simply returns the "private" metadata object to the consumer. This seems wrong conceptually, since it allows way too easy/accidental changes to the internal parsed metadata.
As part of fixing this, the internal metadata was changed to use a `Map` rather than a plain Object.

---
[1] Basically, we shouldn't need to worry about someone depending on internal implementation details.
2020-02-13 18:23:15 +01:00
Jonas Jenwald
3f1568b51a A couple of small improvements of the Metadata._repair method
- Remove the "capturing group" in the regular expression that removes leading "junk" from the raw metadata, since it's not necessary here (it's simply a case of too much copy-pasting in a prior patch).
   According to [MDN](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Cheatsheet#Groups_and_ranges) you want to, for performance reasons, avoid "capturing groups" unless actually needed.

 - Add inline comments to document a bunch of magic values in the code.
2020-02-13 17:20:52 +01:00
Jonas Jenwald
a5db4e985a Remove LoopbackPort.postMessage special-case for polyfilled TypedArrays
Given that all `TypedArray` polyfills were removed in PDF.js version `2.0`, since native support is now required, this branch has been dead code for awhile.
2020-02-13 12:50:41 +01:00
Jonas Jenwald
7b0836ca75 [TextLayer] Immediately set the padding, rather than checking if it's empty, in expandTextDivs
In practice it's extremely rare[1] for the padding to be zero in *all* components, hence it seems better to just set it directly rather than creating a temporary variable and checking for the "no padding"-case.

---
[1] In the `tracemonkey.pdf` file that only happens with `0.08%` of all text elements.
2020-02-11 15:52:36 +01:00
Takashi Tamura
512dbe3060 Fix text spacing with vertical fonts. #7687 and #11526.
When the writing mode is vertical, we have to reverse
the sign of spacing since we are subtracting it from
current.y. We have to add it to current.y.
See 9.4.4 Text Space Details, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1694762
2020-02-11 08:49:23 +09:00
Jonas Jenwald
ae5a34c520 [api-minor] Ensure that the Array.prototype doesn't contain any enumerable properties
Over the years there's been a fair number of issues/PRs opened, where people have wanted to add `hasOwnProperty` checks in (hot) loops in the font parsing code. This has always been rejected, since we don't want to risk reducing performance in the Firefox PDF viewer simply because some users of the general PDF.js library are *incorrectly* extending the `Array.prototype` with enumerable properties.

With this patch the general PDF.js library will now fail immediately with a hopefully useful Error message, rather than having (some) fonts fail to render, when the `Array.prototype` is incorrectly extended.

Note that I did consider making this a warning, but ultimately decided against it since it's first of all possible to disable those (with the `verbosity` parameter). Secondly, even when printed, warnings can be easy to overlook and finally a warning may also *seem* OK to ignore (as opposed to an actual Error).
2020-02-10 14:17:27 +01:00
Tim van der Meij
dced0a3821
Merge pull request #11579 from Snuffleupagus/issue-11578
Ignore spaces when normalizing the font name in `Font.fallbackToSystemFont` (issue 11578)
2020-02-09 17:33:09 +01:00
Tim van der Meij
61056a9238
Merge pull request #11551 from Snuffleupagus/issue-11549
Allow skipping of errors when reading broken/corrupt ToUnicode data (issue 11549)
2020-02-09 17:32:35 +01:00
Tim van der Meij
2fb4076e05 Merge pull request #11568 from Snuffleupagus/PDF-header-validation
Ensure that the PDF header contains an actual number (PR 11463 follow-up)
2020-02-09 17:16:25 +01:00
Tim van der Meij
102af0f915 Merge pull request #11547 from Snuffleupagus/convertCmykToRgb-scale
Use fewer multiplications in `JpegImage._convertCmykToRgb`
2020-02-09 17:06:23 +01:00
Tim van der Meij
f178805412 Merge pull request #11557 from Snuffleupagus/_getLinearizedBlockData-xScaleBlockOffset
Avoid re-calculating the `xScaleBlockOffset` when not necessary in `JpegImage._getLinearizedBlockData`
2020-02-09 16:54:28 +01:00
Tim van der Meij
7948faf675 Merge pull request #11573 from Snuffleupagus/api-cleanup-returns
[api-minor] Change `PDFDocumentProxy.cleanup`/`PDFPageProxy.cleanup` to return data
2020-02-08 20:42:28 +01:00
Tim van der Meij
a73a38029c Merge pull request #11569 from Snuffleupagus/rm-most-setAttribute
Replace most remaining `Element.setAttribute("style", ...)` usage with `Element.style = ...` instead
2020-02-08 20:13:56 +01:00
Jonas Jenwald
7937165537 Ignore spaces when normalizing the font name in Font.fallbackToSystemFont (issue 11578) 2020-02-08 19:59:04 +01:00
Jonas Jenwald
7117ee03d6 [api-minor] Change PDFDocumentProxy.cleanup/PDFPageProxy.cleanup to return data
This patch makes the following changes, to improve these API methods:

 - Let `PDFPageProxy.cleanup` return a boolean indicating if clean-up actually happened, since ongoing rendering will block clean-up.
   Besides being used in other parts of this patch, it seems that an API user may also be interested in the return value given that clean-up isn't *guaranteed* to happen.

 - Let `PDFDocumentProxy.cleanup` return the promise indicating when clean-up is finished.

 - Improve the JSDoc comment for `PDFDocumentProxy.cleanup` to mention that clean-up is triggered on *both* threads (without going into unnecessary specifics regarding what *exactly* said data actually is).
   Add a note in the JSDoc comment about not calling this method when rendering is ongoing.

 - Change `WorkerTransport.startCleanup` to throw an `Error` if it's called when rendering is ongoing, to prevent rendering from breaking.
   Please note that this won't stop *worker-thread* clean-up from happening (since there's no general "something is rendering"-flag), however I'm not sure if that's really a problem; but please don't quote me on that :-)
   All of the caches that's being cleared in `Catalog.cleanup`, on the worker-thread, *should* be re-filled automatically even if cleared *during* parsing/rendering, and the only thing that probably happens is that e.g. font data would have to be re-parsed.
  On the main-thread, on the other hand, clearing the caches is more-or-less guaranteed to cause rendering errors, since the rendering code in `src/display/canvas.js` isn't able to re-request any image/font data that's suddenly being pulled out from under it.

 - Last, but not least, add a couple of basic unit-tests for the clean-up functionality.
2020-02-07 17:00:29 +01:00
Jonas Jenwald
88c35d872f Ensure that the PDF header contains an actual number (PR 11463 follow-up)
While it would be nice to change the `PDFFormatVersion` property, as returned through `PDFDocumentProxy.getMetadata`, to a number (rather than a string) that would unfortunately be a breaking API change.
However, it does seem like a good idea to at least *validate* the PDF header version on the worker-thread, rather than potentially returning an arbitrary string.
2020-02-07 12:25:07 +01:00
Tim van der Meij
e12e83702d
Merge pull request #11559 from bhasto/curveto2-fix
Fix how curveTo2 (v operator) is translated to SVG
2020-02-06 23:10:41 +01:00
Brendan Dahl
09a6e17d22
Merge pull request #11528 from janpe2/type1-nonemb-notdef
Hide .notdef glyphs in non-embedded Type1 fonts and don't ignore Widths
2020-02-06 13:30:07 -08:00
Jonas Jenwald
5cbd44b628 Replace most remaining Element.setAttribute("style", ...) usage with Element.style = ... instead
This should hopefully be useful in environments where restrictive CSPs are in effect.
In most cases the replacement is entirely straighforward, and there's only a couple of special cases:
 - For the `src/display/font_loader.js` and `web/pdf_outline_viewer.js `cases, since the elements aren't appended to the document yet, it shouldn't matter if the style properties are set one-by-one rather than all at once.
 - For the `web/debugger.js` case, there's really no need to set the `padding` inline at all and the definition was simply moved to `web/viewer.css` instead.

*Please note:* There's still *a single* case left, in `web/toolbar.js` for setting the width of the zoom dropdown, which is left intact for now.
The reasons are that this particular case shouldn't matter for users of the general PDF.js library, and that it'd make a lot more sense to just try and re-factor that very old code anyway (thus fixing the `setAttribute` usage in the process).
2020-02-05 22:26:47 +01:00
Branislav Hašto
393aed9978 Fix how curveTo2 (v operator) is translated to SVG
Based on the PDF spec, with `v` operator, current point should be used as the first control point of the curve.

Do not overwrite current point before an SVG curve is built, so it can b actually used as first control point.
2020-02-02 17:03:29 +01:00
Jonas Jenwald
a4440a1c6b Avoid re-calculating the xScaleBlockOffset when not necessary in JpegImage._getLinearizedBlockData
As can be seen in the code, the `xScaleBlockOffset` typed array doesn't depend on the actual image data but only on the width and x-scale. The width is obviously consistent for an image, and it turns out that in practice the `componentScaleX` is quite often identical between two (or more) adjacent image components.
All-in-all it's thus not necessary to *unconditionally* re-compute the `xScaleBlockOffset` when getting the JPEG image data.

While avoiding, in many cases, one or more loops can never be a bad thing these changes are unfortunately completely dominated by the rest of the JpegImage code and consequently doesn't really show up in benchmark results. *Hence I'd understand if this patch is ultimately deemed not necessary.*
2020-02-01 11:58:50 +01:00
Jonas Jenwald
4c54395ff6 Allow skipping of errors when reading broken/corrupt ToUnicode data (issue 11549)
This will allow font loading/parsing to continue, rather than immediately failing, when broken/corrupt CMap data is encountered.
2020-01-30 13:19:05 +01:00
Jonas Jenwald
ce4f41d06a Use fewer multiplications in JpegImage._convertCmykToRgb
*Note:* This is inspired by PR 5473, which made similar changes for another kind of JPEG data.

Since the implementation in `src/core/jpg.js` only supports 8-bit data, as opposed to similar code in `src/core/colorspace.js`, the computations can be further simplified since the `scale` is always constant.
By updating the coefficients, effectively inlining the `scale`, we'll thus avoid *four* multiplications for each loop iteration.

Unfortunately I wasn't able, based on a quick look through the test-files, to find a sufficiently *large* CMYK JPEG image in order for these changes to really show up in benchmark results. However, when testing the `cmykjpeg.pdf` manually there's a total of `120 000` fewer multiplication with this patch.
2020-01-29 18:34:58 +01:00
Takashi Tamura
0b701e7950 Fix the indices of arguments for RadialAxial. It is related to #10646. 2020-01-29 19:18:50 +09:00
Tim van der Meij
7ae504222f
Merge pull request #11544 from Snuffleupagus/decodeHuffman
Make the `decodeHuffman` function, in `src/core/jpg.js`, slightly more efficient
2020-01-28 22:54:46 +01:00
Tim van der Meij
e9dc179673
Merge pull request #11537 from Snuffleupagus/setupFakeWorker-configure
Send the `verbosity` level when setting up fake workers (issue 11536)
2020-01-28 22:50:30 +01:00
Jonas Jenwald
f5a617a334 Make the decodeHuffman function, in src/core/jpg.js, slightly more efficient
Rather than repeating the `typeof node` check twice, we can use a `switch` statement instead.

This patch was tested using the PDF file from issue 3809, i.e. https://web.archive.org/web/20140801150504/http://vs.twonky.dk/invitation.pdf, with the following manifest file:
```
[
    {  "id": "issue3809",
       "file": "../web/pdfs/issue3809.pdf",
       "md5": "",
       "rounds": 50,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |    50 |        12537 |       12451 | -86 | -0.69 |        faster
Firefox | Page Request |    50 |            5 |           5 |   0 |  0.77 |
Firefox | Rendering    |    50 |        12532 |       12446 | -86 | -0.69 |        faster
```
2020-01-28 14:23:58 +01:00
Tim van der Meij
474fe1757e
Merge pull request #11508 from Snuffleupagus/jpg-default-marker
Simplify the handling of unsupported/incorrect markers in `src/core/jpg.js`
2020-01-26 21:32:13 +01:00
Jonas Jenwald
62b2b984cc Render Popup annotations last, once all other annotations have been rendered (issue 11362)
In the current `AnnotationLayer` implementation, Popup annotations require that the parent annotation have already been rendered (otherwise they're simply ignored).
Usually the annotations are ordered, in the `/Annots` array, in such a way that this isn't a problem, however there's obviously no guarantee that all PDF generators actually do so. Hence we simply ensure, when rendering the `AnnotationLayer`, that the Popup annotations are handled last.
2020-01-26 15:49:55 +01:00
Jonas Jenwald
427df2dfd7 Send the verbosity level when setting up fake workers (issue 11536)
Interestingly the viewer already seem to work correctly as-is, with workers disabled and a non-standard `verbosity` level.
Hence this is possibly Node.js specific, but given that the issue is lacking *both* the PDF file in question and a runnable test-case, so this patch is essentially a best-effort guess at what the problem could be.
2020-01-26 12:37:45 +01:00
Jonas Jenwald
13930e5202 Simplify the handling of unsupported/incorrect markers in src/core/jpg.js
- Re-factor the "incorrect encoding" check, since this can be easily achieved using the general `findNextFileMarker` helper function (with a suitable `startPos` argument).

 - Tweak a condition, to make it easier to see that the end of the data has been reached.

 - Add a reference test for issue 1877, since it's what prompted the "incorrect encoding" check.
2020-01-25 22:52:24 +01:00
Tim van der Meij
3775b711ed
Merge pull request #11482 from Snuffleupagus/more-core-utils
Convert `src/core/jpg.js` to use the `readUint16` helper function in `src/core/core_utils.js`, rather than re-implementing it twice
2020-01-25 21:38:34 +01:00
Tim van der Meij
cbbda9d883
Merge pull request #11515 from Snuffleupagus/cache-fallback-font
Cache the fallback font dictionary on the `PartialEvaluator` (PR 11218 follow-up)
2020-01-25 21:32:28 +01:00
Jonas Jenwald
188b320e18 Convert src/core/jpg.js to use the readUint16 helper function in src/core/core_utils.js, rather than re-implementing it twice
The other image decoders, i.e. the JBIG2 and JPEG 2000 ones, are using the common helper function `readUint16`. Most likely, the only reason that the JPEG decoder is doing it this way is because it originated *outside* of the PDF.js library.
Hence we can simply re-factor `src/core/jpg.js` to use the common `readUint16` helper function, which is especially nice given that the functionality was essentially *duplicated* in the code.
2020-01-25 00:35:10 +01:00
Jonas Jenwald
3f031f69c2 Move additional worker-thread only functions from src/shared/util.js and into a src/core/core_utils.js instead
This moves the `log2`, `readInt8`, `readUint16`, `readUint32`, and `isSpace` functions since they are only used in the worker-thread.
2020-01-25 00:33:52 +01:00
Jonas Jenwald
83bdb525a4 Fix remaining linting errors, from enabling the prefer-const ESLint rule globally
This covers cases that the `--fix` command couldn't deal with, and in a few cases (notably `src/core/jbig2.js`) the code was changed to use block-scoped variables instead.
2020-01-25 00:20:23 +01:00
Jonas Jenwald
9e262ae7fa Enable the ESLint prefer-const rule globally (PR 11450 follow-up)
Please find additional details about the ESLint rule at https://eslint.org/docs/rules/prefer-const

With the recent introduction of Prettier this sort of mass enabling of ESLint rules becomes a lot easier, since the code will be automatically reformatted as necessary to account for e.g. changed line lengths.

Note that this patch is generated automatically, by using the ESLint `--fix` argument, and will thus require some additional clean-up (which is done separately).
2020-01-25 00:20:22 +01:00
Tim van der Meij
d2d9441373
Merge pull request #11489 from Snuffleupagus/rm-FIREFOX-define
Remove the `FIREFOX` build flag, since it's completely unused and simplify a couple of `PDFJSDev` checks
2020-01-24 23:59:13 +01:00
Tim van der Meij
668a29aa45
Merge pull request #11497 from Snuffleupagus/Promise-allSettled
Add support for `Promise.allSettled`
2020-01-22 23:06:54 +01:00
Tim van der Meij
a88dec197f
Merge pull request #11511 from Snuffleupagus/eslint-no-nested-ternary
Enable the `no-nested-ternary` ESLint rule (PR 11488 follow-up)
2020-01-22 22:52:59 +01:00
Jonas Jenwald
3b78f4e8f8 Fix a couple of cases where Prettier broke existing formatting (PR 11446 follow-up)
These two cases should have been whitelisted prior to re-formatting respectively had the comments fixed afterwards, however I unfortunately missed them because of the massive size of the diff.
2020-01-22 09:12:12 +01:00
Jani Pehkonen
809b96b40c Hide .notdef glyphs in non-embedded Type1 fonts and don't ignore Widths
Fixes #11403
The PDF uses the non-embedded Type1 font Helvetica. Character codes 194 and 160 (`Â` and `NBSP`) are encoded as `.notdef`. We shouldn't show those glyphs because it seems that Acrobat Reader doesn't draw glyphs that are named `.notdef` in fonts like this.

In addition to testing `glyphName === ".notdef"`, we must test also `glyphName === ""` because the name `""` is used in `core/encodings.js` for undefined glyphs in encodings like `WinAnsiEncoding`.

The solution above hides the `Â` characters but now the replacement character (space) appears to be too wide. I found out that PDF.js ignores font's `Widths` array if the font has no `FontDescriptor` entry. That happens in #11403, so the default widths of Helvetica were used as specified in `core/metrics.js` and `.nodef` got a width of 333. The correct width is 0 as specified by the `Widths` array in the PDF. Thus we must never ignore `Widths`.
2020-01-21 21:35:25 +02:00
Jonas Jenwald
a39943554a Simplify, and tweak, a couple of PDFJSDev checks
This removes a couple of, thanks to preceeding code, unnecessary `typeof PDFJSDev` checks, and also fixes a couple of incorrectly implemented (my fault) checks intended for `TESTING` builds.
2020-01-21 00:06:15 +01:00
Jonas Jenwald
7322a24ce4 Remove the FIREFOX build flag, since it's completely unused
After PR 9566, which removed all of the old Firefox extension code, the `FIREFOX` build flag is no longer used for anything.
It thus seems to me that it should be removed, for a couple of reasons:
 - It's simply dead code now, which only serves to add confusion when looking at the `PDFJSDev` calls.
 - It used to be that `MOZCENTRAL` and `FIREFOX` was *almost* always used together. However, ever since PR 9566 there's obviously been no effort put into keeping the `FIREFOX` build flags up to date.
 - In the event that a new, Webextension based, Firefox addon is created in the future you'd still need to audit all `MOZCENTRAL` (and possibly `CHROME`) build flags to see what'd make sense for the addon.
2020-01-21 00:06:15 +01:00
Tim van der Meij
ccf327538b
Merge pull request #11519 from tamuratak/enable_eslint_import_extensions
Enable import/extensions of ESlint plugin to enforce all `import` have a `.js` file extension.
2020-01-19 17:37:19 +01:00
Jonas Jenwald
ee87e898db Update the GlobalWorkerOptions.workerSrc JSDoc comment
This particular JSDoc comment is fairly old and it also contains some now unrelated/confusing information.
The only way to *guarantee* that the PDF.js library works as expected is to correctly set the global `workerSrc`[1], hence giving the impression that the option isn't strictly necessary is thus incorrect.

---
[1] Since advertising the fallbackWorkerSrc functionality definitely seems like the *wrong* thing to do.
2020-01-19 12:44:42 +01:00
Takashi Tamura
00ce7898a2 Enable import/extensions of ESlint plugin to enforce all import have a .js file extension.
Related to #11465.

- https://github.com/benmosher/eslint-plugin-import/blob/master/docs/rules/extensions.md
2020-01-18 10:53:01 +09:00
Jonas Jenwald
9ab7c280aa Cache the fallback font dictionary on the PartialEvaluator (PR 11218 follow-up)
This way we'll benefit from the existing font caching, and can thus avoid re-creating a fallback font over and over again during parsing.
(Thece changes necessitated the previous patch, since otherwise breakage could occur e.g. with fake workers.)
2020-01-16 15:12:05 +01:00
Jonas Jenwald
090ff116d4 Ensure that full clean-up is always run when handling the "Terminate" message in src/core/worker.js
This is beneficial in situations where the Worker is being re-used, for example with fake workers, since it ensures that things like font resources are actually released.
2020-01-16 15:11:56 +01:00
Jonas Jenwald
c591826f3b Enable the no-nested-ternary ESLint rule (PR 11488 follow-up)
This rule is already enabled in mozilla-central, and helps avoid some confusing formatting, see https://searchfox.org/mozilla-central/rev/9e45d74b956be046e5021a746b0c8912f1c27318/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#209-210

With the recent introduction of Prettier some of the existing nested ternary statements became even more difficult to read, since any possibly helpful indentation was removed.
This particular ESLint rule wasn't entirely straightforward to enable, and I do recognize that there's a certain amount of subjectivity in the changes being made. Generally, the changes in this patch fall into three categories:
 - Cases where a value is only clamped to a certain range (the easiest ones to update).
 - Cases where the values involved are "simple", such as Numbers and Strings, which are re-factored to initialize the variable with the *default* value and only update it when necessary by using `if`/`else if` statements.
 - Cases with more complex and/or larger values, such as TypedArrays, which are re-factored to let the variable be (implicitly) undefined and where all values are then set through `if`/`else if`/`else` statements.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-nested-ternary
2020-01-14 17:49:39 +01:00
Jonas Jenwald
78917bab91 Update src/display/{annotation_layer.js, svg.js} to determine the fontWeight in the same way as with canvas (PR 6091 and 7839 follow-up) 2020-01-14 15:29:59 +01:00
Jonas Jenwald
6590cc32f2 Extract the subroutine bias computation into a helper function in src/core/font_renderer.js 2020-01-14 15:29:53 +01:00
Jonas Jenwald
2942233c9c Add support for Promise.allSettled
Please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/allSettled
2020-01-10 14:35:12 +01:00
Tim van der Meij
93aa613db7
Merge pull request #11465 from Snuffleupagus/import-file-extension
Ensure that all `import` and `require` statements, in the entire code-base, have a `.js` file extension
2020-01-06 23:24:43 +01:00
Jonas Jenwald
94f084958a Update the year in the license_header files 2020-01-05 12:14:03 +01:00
Jonas Jenwald
36881e3770 Ensure that all import and require statements, in the entire code-base, have a .js file extension
In order to eventually get rid of SystemJS and start using native `import`s instead, we'll need to provide "complete" file identifiers since otherwise there'll be MIME type errors when attempting to use `import`.
2020-01-04 13:01:43 +01:00
Jonas Jenwald
f8ab8c4d3a Move the SegoeUISymbol font to the getNonStdFontMap (PR 8698 follow-up)
For reasons that I now cannot even begin to understand, the non-standard SegoeUISymbol font was placed in the `getStdFontMap`. That honestly makes no sense, hence this patch which does what I *should* have done from the start.
2019-12-28 11:02:49 +01:00
Jonas Jenwald
a63f7ad486 Fix the linting errors, from the Prettier auto-formatting, that ESLint --fix couldn't handle
This patch makes the follow changes:
 - Remove no longer necessary inline `// eslint-disable-...` comments.
 - Fix `// eslint-disable-...` comments that Prettier moved down, thus causing new linting errors.
 - Concatenate strings which now fit on just one line.
 - Fix comments that are now too long.
 - Finally, and most importantly, adjust comments that Prettier moved down, since the new positions often is confusing or outright wrong.
2019-12-26 12:35:12 +01:00
Jonas Jenwald
de36b2aaba Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).

Prettier is being used for a couple of reasons:

 - To be consistent with `mozilla-central`, where Prettier is already in use across the tree.

 - To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.

Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.

*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.

(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-26 12:34:24 +01:00
Jonas Jenwald
8ec1dfde49 Add // prettier-ignore comments to prevent re-formatting of certain data structures
There's a fair number of (primarily) `Array`s/`TypedArray`s whose formatting we don't want disturb, since in many cases that would lead to the code becoming much more difficult to read and/or break existing inline comments.

*Please note:* It may be a good idea to look through these cases individually, and possibly re-write some of the them (especially the `String` ones) to reduce the need for all of these ignore commands.
2019-12-26 00:14:03 +01:00
Jonas Jenwald
70e3345cb4 Support OpenAction dictionaries without Type entries when parsing Print actions (issue 11442)
The PDF generator didn't bother including the `Type` entry in the OpenAction dictionary, hence we skipped parsing the `Print` action.
2019-12-24 10:41:33 +01:00
Wojciech Maj
d40d33682b
Extract & use createHeaders helper in src/display/fetch_stream.js 2019-12-23 08:08:17 +01:00
Jonas Jenwald
d370037618 [api-minor] Tweak the Node.js fake worker loader to prevent Critical dependency: ... warnings from Webpack
Since bundlers, such as Webpack, cannot be told to leave `require` statements alone we are thus forced to jump through hoops in order to prevent these warnings in third-party deployments of the PDF.js library; please see [Webpack issue 8826](https://github.com/webpack/webpack) and libraries such as [require-fool-webpack](https://github.com/sindresorhus/require-fool-webpack).

*Please note:* This is based on the assumption that code running in Node.js won't ever be affected by e.g. Content Security Policies that prevent use of `eval`. If that ever occurs, we should revert to a normal `require` statement and simply document the Webpack warnings instead.
2019-12-20 17:36:10 +01:00
Jonas Jenwald
8519f87efb Re-factor the setupFakeWorkerGlobal function (in src/display/api.js), and the loadFakeWorker function (in web/app.js)
This patch reduces some duplication, by moving *all* fake worker loader code into the `setupFakeWorkerGlobal` function. Furthermore, the functions are simplified further by using `async`/`await` where appropriate.
2019-12-20 17:36:10 +01:00
Jonas Jenwald
a5485e1ef7 [api-minor] Support loading the fake worker from GlobalWorkerOptions.workerSrc in Node.js
There's no particularily good reason, as far as I can tell, to not support a custom worker path in Node.js environments (even if workers aren't supported). This patch thus make the Node.js fake worker loader code-path consistent with the fallback code-path used with *browser* fake worker loader.

Finally, this patch also deprecates[1] the `fallbackWorkerSrc` functionality, except in Node.js, since the user should *always* provide correct worker options since the fallback is nothing more than a best-effort solution.

---
[1] Although it probably shouldn't be removed until the next major version.
2019-12-20 17:36:10 +01:00
Jonas Jenwald
591e754831 Move the fake worker loader code into the PDFWorkerClosure
Given that this code isn't needed "globally" in the file, it seems reasonable to move it to where it's actually used instead.
2019-12-20 17:36:10 +01:00
Jonas Jenwald
aab0f91740 [api-minor] Simplify the *fallback* fake worker loader code in src/display/api.js
For performance reasons, and to avoid hanging the browser UI, the PDF.js library should *always* be used with web workers enabled.
At this point in time all of the supported browsers should have proper worker support, and Node.js is thus the only environment where workers aren't supported. Hence it no longer seems relevant/necessary to provide, by default, fake worker loaders for various JS builders/bundlers/frameworks in the PDF.js code itself.[1]

In order to simplify things, the fake worker loader code is thus simplified to now *only* support Node.js usage respectively "normal" browser usage out-of-the-box.[2]

*Please note:* The officially intended way of using the PDF.js library is with workers enabled, which can be done by setting `GlobalWorkerOptions.workerSrc`, `GlobalWorkerOptions.workerPort`, or manually providing a `PDFWorker` instance when calling `getDocument`.

---
[1] Note that it's still possible to *manually* disable workers, simply my manually loading the built `pdf.worker.js` file into the (current) global scope, however this's mostly intended for testing/debugging purposes.

[2] Unfortunately some bundlers such as Webpack, when used with third-party deployments of the PDF.js library, will start to print `Critical dependency: ...` warnings when run against the built `pdf.js` file from this patch. The reason is that despite the `require` calls being protected by *runtime* `isNodeJS` checks, it's not possible to simply tell Webpack to just ignore the `require`; please see [Webpack issue 8826](https://github.com/webpack/webpack) and libraries such as [require-fool-webpack](https://github.com/sindresorhus/require-fool-webpack).
2019-12-20 17:36:08 +01:00
Jonas Jenwald
dbb82f05fc Re-factor the find helper function, in src/core/document.js, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.

The main benefits here are:
 - No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
 - In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-14 13:43:26 +01:00
Jonas Jenwald
e24050fa13 [api-minor] Move the ReadableStream polyfill to the global scope
Note that most (reasonably) modern browsers have supported this for a while now, see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#Browser_compatibility

By moving the polyfill into `src/shared/compatibility.js` we can thus get rid of the need to manually export/import `ReadableStream` and simply use it directly instead.

The only change here which *could* possibly lead to a difference in behavior is in the `isFetchSupported` function. Previously we attempted to check for the existence of a global `ReadableStream` implementation, which could now pass (assuming obviously that the preceding checks also succeeded).
However I'm not sure if that's a problem, since the previous check only confirmed the existence of a native `ReadableStream` implementation and not that it actually worked correctly. Finally it *could* just as well have been a globally registered polyfill from an application embedding the PDF.js library.
2019-12-11 19:02:37 +01:00
Jonas Jenwald
b00835f589 Attempt to improve the PDFDocument error message for empty files (issue 5887)
Given that the error in question is surfaced on the API-side, this patch makes the following changes:
 - Updates the wording such that it'll hopefully be slightly easier for users to understand.
 - Changes the plain `Error` to an `InvalidPDFException` instead, since that should work better with the existing Error handling.
 - Adds a unit-test which loads an empty PDF document (and also improves a pre-existing `InvalidPDFException` message and its test-case).
2019-12-09 15:45:50 +01:00
Tim van der Meij
a6db045789
Merge pull request #11387 from Snuffleupagus/issue-11385
Handle corrupt ASCII85Decode inline images with truncated EOD markers (issue 11385)
2019-12-08 20:27:46 +01:00
Tim van der Meij
16778118f6
Merge pull request #11391 from Snuffleupagus/globalThis
Replace `globalScope` with the standard `globalThis` property instead
2019-12-08 20:23:19 +01:00
Jonas Jenwald
71d61e4c6f Re-factor getMainThreadWorkerMessageHandler to support arbitrary global scopes, rather than only window 2019-12-08 20:19:04 +01:00
Jonas Jenwald
a8fc306b6e Replace globalScope with the standard globalThis property instead
Please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/globalThis and note that most (reasonably) modern browsers have supported this for a while now, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/globalThis#Browser_compatibility

Since ESLint doesn't support this new global yet, it was added to the `globals` list in the top-level configuration file to prevent issues.

Finally, for older browsers a polyfill was added in `ssrc/shared/compatibility.js`.
2019-12-08 20:19:02 +01:00
Jonas Jenwald
a02122e984 Ensure that PDFDocument.checkFirstPage waits for cleanup to complete (PR 10392 follow-up)
Given how this method is currently used there shouldn't be any fonts loaded at the point in time where it's called, but it does seem like a bad idea to assume that that's always going to be the case. Since `PDFDocument.checkFirstPage` is already asynchronous, it's easy enough to simply await `Catalog.cleanup` here.

(The patch also makes a tiny simplification in a loop in `Catalog.cleanup`.)
2019-12-07 12:31:41 +01:00
Jonas Jenwald
5c0336872e Handle corrupt ASCII85Decode inline images with truncated EOD markers (issue 11385)
In the PDF document in question, there's an ASCII85Decode inline image where the '>' part of EOD (end-of-data) marker is missing; hence the PDF document is corrupt.
2019-12-05 15:53:18 +01:00
Jonas Jenwald
c3b1c8f857 Slightly simplify the XRef cache lookup in XRef.fetch
Note that the XRef cache will only hold objects returned through `Parser.getObj`, and indirectly via `Lexer.getObj`. Since neither of those methods will ever return `undefined`, we can simply `assert` that when inserting objects into the cache and thus get rid of one function call when doing cache lookups.

Obviously this won't have a huge effect on performance, however `XRef.fetch` is usually called *a lot* in larger documents and this patch thus cannot hurt.
2019-11-30 22:41:53 +01:00
Jonas Jenwald
168c6aecae Stop caching Streams in XRef.fetchCompressed
I'm slightly surprised that this hasn't actually caused any (known) bugs, but that may be more luck than anything else since it fortunately doesn't seem common for Streams to be defined inside of an 'ObjStm'.[1]

Note that in the `XRef.fetchUncompressed` method we're *not* caching Streams, and that for very good reasons too.

 - Streams, especially the `DecodeStream` ones, can become *very* large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so.

 - Attempting to read from the *same* Stream more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position.

 - Given that even the `src/core/` code is now fairly asynchronous, see e.g. the `PartialEvaluator`, it's generally impossible to assert that any one Stream isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Streams isn't going to work in the general case.

All in all, I cannot understand why it'd ever be correct to cache Streams in the `XRef.fetchCompressed` method.

---
[1] One example where that happens is the `issue3115r.pdf` file in the test-suite, where the streams in question are not actually used for anything within the PDF.js code.
2019-11-30 10:21:08 +01:00
Jonas Jenwald
06412a557b Slighthly re-factor XRef.fetchCompressed
- Change all occurences of `var` to `let`/`const`.

 - Initialize the (temporary) Arrays with the correct sizes upfront.

 - Inline the `isCmd` check. Obviously this won't make a huge difference, but given that the check is only relevant for corrupt documents it cannot hurt.
2019-11-30 09:49:51 +01:00
Jonas Jenwald
725566cfea Remove the Number.isInteger checks from XRef.fetchUncompressed (PR 8857 follow-up)
Having ran the entire test-suite locally with these `Number.isInteger` checks removed, there wasn't a single test failure anywhere; see also PR 8857.
Hence everything points to this being completely unnecessary now, and by removing this code there's thus fewer function calls being made in `XRef.fetchUncompressed`.
2019-11-28 23:25:39 +01:00
Jonas Jenwald
cc76132c24 Remove outdated, and misleading, JSDoc comment from the PDFDocument class
The contents of this comment hasn't been correct for *years*, ever since the library was properly split into main/worker-threads, so it's probably high time for this to be updated.
2019-11-25 11:36:29 +01:00
Jonas Jenwald
a965662184 Enable the getter-return, no-dupe-else-if, and no-setter-return ESLint rules
All of these rules can help catch errors during development. Please note that only `getter-return` required a few changes, which was limited to disabling the rule in a couple of spots; please find additional details about these rules at:
 - https://eslint.org/docs/rules/getter-return
 - https://eslint.org/docs/rules/no-dupe-else-if
 - https://eslint.org/docs/rules/no-setter-return
2019-11-23 11:40:30 +01:00
Tim van der Meij
be02e67972
Merge pull request #11335 from Snuffleupagus/issue-11330
Subtract `stream.start` when getting the `startXRef` property for documents with a Linearization dictionary (issue 11330)
2019-11-16 13:56:01 +01:00
Jonas Jenwald
9199b02a42 Subtract stream.start when getting the startXRef property for documents with a Linearization dictionary (issue 11330)
For documents with a Linearization dictionary the computed `startXRef` position will be relative to the raw file, rather than the actual PDF document itself (which begins with `%PDF-`).
Hence it's necessary to subtract `stream.start` in this case, since otherwise the `XRef.readXRef` method will increment the position too far resulting in parsing errors.
2019-11-16 09:29:10 +01:00
Jonas Jenwald
688d15526e Use getBytes, rather than looping over getByte, in FlateStream.prototype.readBlock
*Please note:* A a similar change was attempted in PR 5005, but it was subsequently backed out (in PR 5069) since other parts of the patch caused issues.

With these changes, it's possible to replace repeated function calls within a loop with just a single function call and subsequent assignment instead.
2019-11-15 15:45:31 +01:00
Jonas Jenwald
878432784c [PDFHistory] Move the IE11 pushState/replaceState work-around to src/shared/compatibility.js (PR 10461 follow-up)
I've always disliked the solution in PR 10461, since it required changes to the `PDFHistory` code itself to deal with a bug in IE11.
Now that IE11 support is limited, it seems reasonable to remove these `pushState`/`replaceState` hacks from the main code-base and simply use polyfills instead.
2019-11-11 17:48:04 +01:00
Jonas Jenwald
74e00ed93c Change isNodeJS from a function to a constant
Given that this shouldn't change after the `pdf.js`/`pdf.worker.js` files have been loaded, it doesn't seems necessary to keep this as a function.
2019-11-10 16:44:29 +01:00
Jonas Jenwald
2817121bc1 Convert globalScope and isNodeJS to proper modules
Slightly unrelated to the rest of the patch, but this also removes an out-of-place `globals` definition from the `web/viewer.js` file.
2019-11-10 16:44:29 +01:00
Tim van der Meij
6763e16804
Merge pull request #11313 from Snuffleupagus/issue-11122
Ensure that Popup annotations, where the parent annotation is a polyline, will always be possible to open/close (issue 11122)
2019-11-10 13:31:51 +01:00
Jonas Jenwald
0233fc07b6
Revert "Convert Catalog.getPageDict to an async method" 2019-11-09 22:36:23 +01:00
Jonas Jenwald
536a52e981 Ensure that Popup annotations, where the parent annotation is a polyline, will always be possible to open/close (issue 11122)
For Popup annotation trigger elements consisting of an arbitrary polyline, you need to ensure that the 'stroke-width' is always non-zero since otherwise it's impossible to actually open/close the popup.

Unfortunately I don't believe that any of the test-suites can be used to test this, hence why no tests are included in the patch.
2019-11-09 13:35:59 +01:00
Jonas Jenwald
79d7c002de Inline a couple of isRef/isDict checks in the getPageDict method
As we've seen in numerous other cases, avoiding unnecessary function calls is never a bad thing (even if the effect is probably tiny here).

With these changes we also avoid potentially two back-to-back `isDict` checks when evaluating possible Page nodes, and can also no longer accidentally pick a dictionary with an incorrect /Type.
2019-11-08 17:53:00 +01:00
Jonas Jenwald
0d89006bf1 Convert Catalog.getPageDict to an async method
This makes it possible to remove the internal `next` helper function, and also gets rid of the need to manually resolve/reject a `PromiseCapability`.
2019-11-08 17:45:28 +01:00
Jonas Jenwald
98f570c103 Prevent browser exceptions from incorrectly triggering the assert in PDFPageProxy._abortOperatorList (PR 11069 follow-up)
For certain canvas-related errors (and probably others), the browser rendering exceptions may be propagated "as-is" to the PDF.js code. In this case, the exceptions are of the somewhat cryptic `NS_ERROR_FAILURE` type.
Unfortunately these aren't actual `Error`s, which thus ends up unintentionally triggering the `assert` in `PDFPageProxy._abortOperatorList`; sorry about that!
2019-11-07 11:37:48 +01:00
Jonas Jenwald
80342e2fdc Support UTF-16 little-endian strings in the stringToPDFString helper function (bug 1593902)
The bug report seem to suggest that we don't support UTF-16 strings with a BOM (byte order mark), which we *actually* do as evident by both the code and a unit-test.
The issue at play here is rather that we previously only supported big-endian UTF-16 BOM, and the `Title` string in the PDF document is using a *little-endian* UTF-16 BOM instead.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1593902
2019-11-05 12:43:17 +01:00
Jonas Jenwald
04497bcb3c Re-factor the ObjectLoader._walk method to be properly asynchronous
Rather than having to store a `PromiseCapability` on the `ObjectLoader` instances, we can simply convert `_walk` to be `async` and thus have the same functionality with native JavaScript instead.
2019-11-03 15:04:20 +01:00
Jonas Jenwald
fec1f02b2a Slightly re-factor setting of the link target in addLinkAttributes
I happened to look at this code and the way that the link target is set seems unecessarily convoluted, since we're using `Object.values` and `Array.prototype.includes` for *every* link being parsed.
Given that the number of link targets are so few, the easist solution honestly seem to be to just use a `switch` statement to do the link target mapping.
2019-11-02 14:01:31 +01:00
Tim van der Meij
bbd2386bd9
Merge pull request #11296 from Snuffleupagus/parseColorSpace-stopAtErrors
Allow skipping of errors when parsing broken/unsupported ColorSpaces (issue 6707, issue 11287)
2019-11-01 22:47:50 +01:00
Jonas Jenwald
829d6ba2dc Ensure that the peekByte methods, on the various Streams, handles end of data correctly (PR 5286 follow-up)
When the end of data has already been reached for the various Streams, the `getByte` methods will return `-1` to signal that to the caller. Note however that the current position obviously won't be incremented in this case, meaning that the `peekByte` methods will in this case *incorrectly* decrement the position.

Thankfully the corresponding `peekBytes` shouldn't be affected by this bug, since they decrement the current position with the *actually* returned number of bytes.

I'm not aware of any bugs caused by this blatant oversight, but that doesn't mean this shouldn't be fixed :-)
2019-11-01 18:22:33 +01:00
Jonas Jenwald
835d8c2be5 Allow skipping of errors when parsing broken/unsupported ColorSpaces (issue 6707, issue 11287)
This will allow us to attempt to recover as much as possible of a page, rather than immediately failing, when a broken/unsupported ColorSpace is encountered. This patch thus extends the framework added in PRs such as e.g. 8240 and 8922, to also cover parsing of ColorSpaces.
2019-11-01 09:01:24 +01:00
Tim van der Meij
30ef05c161
Merge pull request #11290 from Snuffleupagus/MessageHandler-rm-in
[MessageHandler] Re-factor and convert the code to a proper `class`
2019-10-31 23:57:52 +01:00
Jonas Jenwald
eedd449cb4 Remove some unused require statements, used when loading fake workers, in non-PRODUCTION mode
The code in question is *only* relevant in non-`PRODUCTION` mode, i.e. the *development* version of the viewer run with `gulp server`, and has been completely unused at least since SystemJS was added.
I really cannot see any reason to keep this, since it's code which first of all isn't shipping and secondly isn't even being used in the development viewer.
2019-10-31 12:08:07 +01:00
Jonas Jenwald
0293222b96 [MessageHandler] Convert the code to a proper class 2019-10-30 23:22:59 +01:00
Jonas Jenwald
5d5733c0a7 [MessageHandler] Convert all instances of var to const in the code 2019-10-30 23:22:59 +01:00
Jonas Jenwald
f61fb3e0f9 [MessageHandler] Re-factor the _onComObjOnMessage function to use early returns
When `ReadableStream` support was added to the `MessageHandler`, the `_onComObjOnMessage` function became more complex than previously.
All of the nested `if`/`else if`/`else` branches are now, at least in my opinion, making some of this code a bit difficult to follow. Hence this patch, which attempts to help readability by making use of early `return`s and `Error`s.

The patch also changes a couple of `var`/`let` occurences to `const`.
2019-10-30 23:22:59 +01:00
Jonas Jenwald
62f28e11a3 [MessageHandler] Remove unnecessary usage of in from the code
Note that using `in` leads to unnecessary stringification of the properties, which seems completely unnecessary here. To avoid future problems from these changes the `MessageHandler.on` method will now assert, in non-`PRODUCTION`/`TESTING` builds, that it's always called with a function as expected.

This patch also renames `callbacksCapabilities` to `callbackCapabilities`, note the removed "s", since using a double plural format looks a bit strange.
2019-10-30 23:22:59 +01:00
Jonas Jenwald
3e46e800a0 [MessageHandler] Replace the internal isReply property, as sent when Promise callbacks are used, with enumeration values
Given that the `isReply` property is an internal implementation detail, changing its type shouldn't be a problem. Note that by directly indicating if either data or an Error is sent, it's no longer necessary to use `in` when handling the callback.
2019-10-30 23:22:59 +01:00
Jonas Jenwald
2d35a49dd8 Inline a couple of isRef/isDict checks in the ObjectLoader code
As we've seen in numerous other cases, avoiding unnecessary function calls is never a bad thing (even if the effect is probably tiny here).
2019-10-29 23:20:10 +01:00
Jonas Jenwald
1133dbac33 Make the ObjectLoader use more efficient methods when determining if data needs to be loaded
Currently, for data in `ChunkedStream` instances, the `getMissingChunks` method is used in a couple of places to determine if data is already available or if it needs to be loaded.

When looking at how `ChunkedStream.getMissingChunks` is being used in the `ObjectLoader` you'll notice that we don't actually care about which *specific* chunks are missing, but rather only want essentially a yes/no answer to the "Is the data available?" question.
Furthermore, when looking at how `ChunkedStream.getMissingChunks` itself is implemented you'll notice that it (somewhat expectedly) always iterates over *all* chunks.

All in all, using `ChunkedStream.getMissingChunks` in the `ObjectLoader` seems like an unnecessary "heavy" and roundabout way to obtain a boolean value. However, it turns out there already exists a `ChunkedStream.allChunksLoaded` method, consisting of a *single* simple check, which seems like a perfect fit for the `ObjectLoader` use cases.
In particular, once the *entire* PDF document has been loaded (which is usually fairly quick with streaming enabled), you'd really want the `ObjectLoader` to be as simple/quick as possible (similar to e.g. loading a local files) which this patch should help with.

Note that I wouldn't expect this patch to have a huge effect on performance, but it will nonetheless save some CPU/memory resources when the `ObjectLoader` is used. (As usual this should help larger PDF documents, w.r.t. both file size and number of pages, the most.)
2019-10-29 23:20:09 +01:00
Jonas Jenwald
0496ea61f5 Ensure that PartialEvaluator.hasBlendModes handles Blend Modes in Arrays (PR 11281 follow-up)
I completely overlooked this in PR 11281, but you obviously need to make similar changes in `PartialEvaluator.hasBlendModes` since it will otherwise ignore valid Blend Modes.
2019-10-28 11:37:05 +01:00
Jonas Jenwald
5c266f0e8c Support Blend Modes which are specified in an Array of Names (issue 11279)
According to the specification, the first *supported* Blend Mode should be choosen in this case; please see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G10.4848607
2019-10-26 14:24:31 +02:00
Tim van der Meij
4a5a4328f4
Merge pull request #11273 from Snuffleupagus/getViewport-offsets
[api-minor] Support custom `offsetX`/`offsetY` values in `PDFPageProxy.getViewport` and `PageViewport.clone`
2019-10-24 00:08:40 +02:00
Jonas Jenwald
681bc9d70e [api-minor] Support custom offsetX/offsetY values in PDFPageProxy.getViewport and PageViewport.clone
There's no good reason, as far as I can tell, to not also support `offsetX`/`offsetY` in addition to e.g. `dontFlip`.
2019-10-23 20:48:14 +02:00
Jonas Jenwald
6f7f8257bc Slightly re-factor the String handling in StatTimer
This uses template strings in a couple of spots, and a buffer in the `toString` method.
2019-10-23 14:45:18 +02:00
Jonas Jenwald
8e5d3836d6 Remove the enable argument from the StatTimer constructor
This argument is a left-over from older API code, where we unconditionally initialized `StatTimer` instances for every page. For quite some time that's only been done when `pdfBug` is set, hence it seems unnecessary to keep this functionality.
2019-10-23 14:45:18 +02:00
Jonas Jenwald
9fc40f8b84 Remove DummyStatTimer since it's unused now
Since this isn't part of the API surface, removing it now that it's unused shouldn't cause any problems.
2019-10-23 14:45:16 +02:00
Jonas Jenwald
860da8b840 Stop using the DummyStatTimer in the API, and check if this._stats exists when trying to report statistics
Even though the currect situation only results in six unnecessary function calls per page, it nonetheless seems completely unnecessary to call dummy functions when `pdfBug` is *not* set (i.e. the default behaviour).
2019-10-23 13:23:41 +02:00
Jonas Jenwald
df0e1edab5 Re-factor sending of various Exceptions from the worker to the API
As can be seen in the API, there's a number of document loading Exception handlers which are both really simple and highly similar. Hence these are changed such that all the relevant Exceptions are sent via *one* message instead.

Furthermore, the patch also avoids unnecessarily re-creating `UnknownErrorException`s at the worker side and removes an unnecessary `bind` call.
2019-10-19 12:54:54 +02:00
Tim van der Meij
11f3851a97
Merge pull request #11243 from Snuffleupagus/issue-11242
Add a fallback for non-embedded *composite* Verdana fonts (issue 11242)
2019-10-18 23:56:46 +02:00
Tim van der Meij
c54bb222ca
Merge pull request #11231 from Snuffleupagus/indexObjects-entries-gen
Allow over-writing entries, in `XRef.indexObjects`, only when the generation number matches (issues 11230, 11139, 9552, 9129, 7303)
2019-10-17 23:56:26 +02:00
Jonas Jenwald
2fcb5afc7b Add a fallback for non-embedded *composite* Verdana fonts (issue 11242)
Obviously this won't look exactly right, but considering that the PDF file doesn't bother embedding non-standard fonts this is the best that we can do here.
2019-10-17 17:00:55 +02:00
Pedro Luiz Cabral Salomon Prado
4d0c759b7f Change variable assignment (#11247)
Remove unused variable assignment in `src/core/fonts.js`
2019-10-16 00:39:25 +02:00
Jonas Jenwald
ffc847eaa5 Allow over-writing entries, in XRef.indexObjects, only when the generation number matches (issues 11230, 11139, 9552, 9129, 7303)
This patch is making me somewhat worried about future regressions, since it's certainly easy to imagine this completely breaking certain kinds of corrupt/edited PDF documents while fixing others.[1]

Obviously it passes all existing reference tests (and even improves one), however compared to many other patches there's no telling how much it could break.
The only reason that I'm even submitting this patch, is because of the number of open issues that it would address.

Generally speaking though, the best course of action would probably be if `XRef.indexObjects` was re-written to be much more robust (since it currently feels somewhat hand-wavy in parts). E.g. by actually checking/validating more of the objects before committing to them.

---
[1] Especially given that it's reverting part of PR 5910, however in the case of issue 5909 it seems that other (more recent) changes have actually made that PR redundant.
2019-10-14 22:10:04 +02:00
Tim van der Meij
ec6a99d781
Bundle all API documentation in a module
This commit allows JSDoc to generate all API documentation in the
`pdfjsLib` module (namespace) so the documentation becomes easier to
navigate.
2019-10-13 21:23:00 +02:00
Tim van der Meij
9f4d45ddf4
Don't include private methods in the the PDFPageProxy API documentation 2019-10-13 21:23:00 +02:00
Tim van der Meij
36c01c2c2a
Deduplicate the documentation for PDFDocumentLoadingTask and PDFWorker
Both classes live inside a closure with the same name, which confuses
JSDoc. Move the documentation to the inner class to deduplicate them.
2019-10-13 21:23:00 +02:00
Tim van der Meij
ca3a58f93a
Consistently use @returns for returned data types in JSDoc comments
Sometimes we also used `@return`, but `@returns` is what the JSDoc
documentation recommends. Even though `@return` works as an alias, it's
good to use the recommended syntax and to be consistent within the
project.
2019-10-13 13:58:17 +02:00
Tim van der Meij
8b4ae6f3eb
Consistently use @type for getter data types in JSDoc comments
Sometimes we also used `@return` or `@returns`, but `@type` is what
the JSDoc documentation recommends. This also improves the documentation
because before this commit the types were not shown and now they are.
2019-10-13 13:58:17 +02:00
Tim van der Meij
f4daafc077
Consistently use square brackets for optional parameters in JSDoc comments
Square brackets are recommended to indicate optional parameters. Using
them helps for automatically generating correct documentation.
2019-10-13 13:58:17 +02:00
Tim van der Meij
efd331daa1
Consistently use string for string data types in JSDoc comments
Sometimes we also used `String`, but `string` is the what the JSDoc
documentation recommends.
2019-10-13 13:58:17 +02:00
Tim van der Meij
e75991b49e
Consistently use number for numeric data types in JSDoc comments
Sometimes we also used `Number` and `integer`, but `number` is what
the JSDoc documentation recommends.
2019-10-13 13:58:13 +02:00
Jonas Jenwald
03387ebaa8 Update src/shared/compatibility.js to only run with SKIP_BABEL = false set
Rather than specifying certain build targets manually, it seems much more appropriate (and future-proof) to use the `SKIP_BABEL` build target instead.

Also, the patch adds a missing `/* eslint no-var: error */` line since I'm touch the file anyway and no code-changes were necessary for it.
2019-10-13 11:33:41 +02:00
Jonas Jenwald
bfcbf2d78d Cache processed 'ExtGState's in PartialEvaluator.hasBlendModes to avoid unnecessary parsing/lookups
This simply extends the already existing caching of processed resources to avoid duplicated parsing of 'ExtGState's, which should help with badly generated PDF documents.

This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf, with the following manifest file:
```
[
    {  "id": "issue6961",
       "file": "../web/pdfs/issue6961.pdf",
       "md5": "",
       "rounds": 200,
       "type": "eq"
    }
]
```

which gave the following *overall* results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   400 |         1063 |        1051 | -12 | -1.17 |        faster
Firefox | Page Request |   400 |          552 |         543 |  -9 | -1.69 |        faster
Firefox | Rendering    |   400 |          511 |         508 |  -3 | -0.61 |
```

and the following *page-specific* results:
```
-- Grouped By page, stat --
page | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
---- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
0    | Overall      |   200 |         1122 |        1110 | -12 | -1.03 |
0    | Page Request |   200 |          552 |         544 |  -8 | -1.48 |        faster
0    | Rendering    |   200 |          570 |         566 |  -4 | -0.62 |
1    | Overall      |   200 |         1005 |         992 | -13 | -1.33 |        faster
1    | Page Request |   200 |          552 |         542 | -11 | -1.91 |        faster
1    | Rendering    |   200 |          452 |         450 |  -3 | -0.61 |
```
2019-10-12 12:35:42 +02:00
Jonas Jenwald
af71f9b40a Inline all the possible type checks in PartialEvaluator.hasBlendModes to avoid unnecessary function calls
For badly generated PDF documents, with issue 6961 being one example, there's well over one hundred thousand function calls being made in total for just the *two* pages.
2019-10-12 11:24:37 +02:00
huzjakd
94171d9d72 Attempt to fallback to a default font, for non-available ones, in PartialEvaluator.loadFont
This handles the two different ways that fonts can be loaded, either by Name (which is the common case) or by Reference.
Furthermore, this also takes the `ignoreErrors` option into account when deciding whether to fallback or Error.
Finally, by creating a minimal but valid Font dictionary, there's no special-cases necessary in any of the font parsing code.

Co-authored-by: huzjakd <huzjakd@gmail.com>
Co-Authored-By: Jonas Jenwald <jonas.jenwald@gmail.com>
2019-10-10 16:49:46 +02:00
Jonas Jenwald
ea729ec55c [api-minor] Replace all deprecated calls with throwing of actual Errors
All of these methods have been marked as `deprecated` in *three* releases now, and I'd thus like to (slowly) move towards complete removal.

However rather than just removing the methods right away, which would cause somewhat cryptic failures, this patch tries to implement a hopefully reasonable middle ground by throwing `Error`s with (essentially) the same information as the previous warnings.

While the previous `deprecated` messages could perhaps be seen as optional, with these changes API consumers will now be forced to actually migrate their code.
2019-10-09 09:21:15 +02:00
Takashi Tamura
d5ee083050 * use square brackets for optional properties in the JSDoc comments of src/display/api.js 2019-10-08 20:34:17 +09:00
Tim van der Meij
cead77ef3a
Merge pull request #11186 from Snuffleupagus/issue-9655
Improve the heuristics, in `PartialEvaluator._buildSimpleFontToUnicode`, for glyphNames of the Cdd{d}/cdd{d} format (issue 9655)
2019-10-06 19:50:43 +02:00
Jonas Jenwald
eabedab38e [MessageHandler] Add a non-PRODUCTION/TESTING check to ensure that wrapReason is called with a valid reason
There shouldn't be any situation where `reason` isn't either an `Error`, or a cloned "Error" sent via `postMessage`.
2019-10-06 14:15:13 +02:00
Jonas Jenwald
9201c8dad4 [MessageHandler] Convert the deleteStreamController helper function to a "private" method instead 2019-10-06 14:15:02 +02:00
Jonas Jenwald
f5be2d62a3 Improve the heuristics, in PartialEvaluator._buildSimpleFontToUnicode, for glyphNames of the Cdd{d}/cdd{d} format (issue 9655)
*Please note:* I've been thinking about possible ways of addressing this issue for a while now, but all of the solutions I came up with became too complicated and thus hurt readability of the code.
However, it occured to me that we're essentially trying to add a heuristic *on top* of another heuristic, and that it shouldn't matter how efficient the code is as long as it works.

In the PDF file in the issue the Encoding contains glyphNames of the `Cdd` format, which our existing heuristics will treat as base 10 values. However, in this particular file they actually contain base 16 values, which we thus attempt to detect and fix such that text-selection works.
2019-10-06 10:47:29 +02:00
Jonas Jenwald
572abdcb4a Convert the various image decoder ...Errors to classes extending BaseException (PR 11185 follow-up)
Somehow I missed these in PR 11185, but there's no good reason not to convert them as well.
2019-10-01 13:10:14 +02:00
Tim van der Meij
8c4f4b5eec
Merge pull request #11182 from Snuffleupagus/disableWorker-disable-Dict-postMessage
Forbid sending of `Dict`s and `Stream`s, with `postMessage`, when workers are disabled
2019-09-29 15:09:42 +02:00
Jonas Jenwald
5d93fda4f2 Convert the various ...Exceptions to proper classes, to reduce code duplication
By utilizing a base "class", things become significantly simpler. Unfortunately the new `BaseException` cannot be a proper ES6 class and just extend `Error`, since the SystemJS dependency doesn't seem to play well with that.
Note also that we (generally) need to keep the `name` property on the actual `...Exception` object, rather than on its prototype, since the property will otherwise be dropped during the structured cloning used with `postMessage`.
2019-09-29 10:16:20 +02:00
Jonas Jenwald
3f8fee371b Forbid sending of Dicts and Streams, with postMessage, when workers are disabled
By default, i.e. with workers enabled, it's *purposely* not possible to send `Dict`s and `Stream`s from the worker-thread. This is achieved by defining a `function` on every `Dict` instance, since that ensures that [the structured clone algoritm](https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm) will throw an Error on `postMessage`.

However, with workers *disabled* we fall-back to the `LoopbackPort` implementation which just ignores any `function`s, thus incorrectly allowing sending of data which *should* be unclonable.
2019-09-26 16:16:13 +02:00
Tim van der Meij
cd909c531f
Merge pull request #11169 from Snuffleupagus/Dict-inline-Ref-checks
Reduce the number of function calls in the `Dict` class
2019-09-24 23:33:37 +02:00
Tim van der Meij
f762d59ad2
Merge pull request #11173 from Snuffleupagus/ReadableStream-polyfill
Replace the bundled `ReadableStream` polyfill with the `web-streams-polyfill` npm package (issue 11157)
2019-09-24 23:22:17 +02:00
Jonas Jenwald
2cac68467f Reduce the number of function calls in the Dict class
The following changes were made:
 - Remove unnecessary `typeof` checks in the `get`/`getAsync` methods.
 - Reduce unnecessary code duplication in the `get`/`getAsync` methods.
 - Inline the `Ref` checks in the `get`/`getAsync`/`getArray` methods, since it helps avoid many unnecessary functions calls. I.e. this way it's possible to directly call `XRef.{fetch, fetchAsync)` only when necessary, rather than always having to call `XRef.{fetchIfRef, fetchIfRefAsync)`.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, using the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 250,
       "type": "eq"
    }
]
```
This gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   250 |         2821 |        2790 | -32 | -1.12 |        faster
Firefox | Page Request |   250 |            2 |           2 |   0 |  6.68 |
Firefox | Rendering    |   250 |         2820 |        2788 | -32 | -1.13 |        faster
```
2019-09-24 08:31:39 +02:00
Jonas Jenwald
0ee373f9cc Replace the bundled ReadableStream polyfill with the web-streams-polyfill npm package (issue 11157)
Compared to the recently replaced `URL` polyfill, the new `ReadableStream` polyfill isn't being exported globally for two reasons:
 - We're currently checking for the existence of a global `ReadableStream` implementation when determining if the Fetch API will be used; please see `isFetchSupported` in the src/display/display_utils.js file.
 - Given that it's much newer functionality (compared to `URL`) and that not all browsers may implement all parts of the specification yet, not exposing the `ReadableStream` globally seems safer for now.
2019-09-23 22:16:59 +02:00
Jonas Jenwald
7f18c57c12 Fix the inconsistent return types for Dict.{get, getAsync}
Having these methods fallback to returning `null` in only *one* particular case seems outright wrong, since a "falsy" value will thus be handled incorrectly.
The only reason that this hasn't caused issues in practice is that there's only one call-site passing in three keys, and in that case we're trying to read a font file where falling back to `null` isn't a problem.
2019-09-23 11:41:19 +02:00
Tim van der Meij
1f5ebfbf0c
Replace our URL polyfill with the one from core-js
`core-js` polyfills have proven to be of good quality and using them
prevents us from having to maintain them ourselves.
2019-09-19 14:09:51 +02:00
Tim van der Meij
c71a291317
Upgrade core-js to version 3.2.1
This only required changing the import paths. The `es` folder contains
all polyfills we need now. If we want to import everything, we need to
explicitly require the `index` file.
2019-09-19 13:58:36 +02:00
Tim van der Meij
3da680cdfc
Merge pull request #11158 from janpe2/gradient-stops
Avoid floating point inaccuracy in gradient color stops
2019-09-19 13:15:11 +02:00
Tim van der Meij
58e5f36666
Merge pull request #11159 from Snuffleupagus/issue-11150
For Type1 fonts, replace missing font dictionary /Widths entries with ones from the font data (issue 11150)
2019-09-19 13:14:27 +02:00
Jonas Jenwald
af22dc9b0c For Type1 fonts, replace missing font dictionary /Widths entries with ones from the font data (issue 11150)
Hopefully this patch makes sense, and in order to reduce the regression risk the implementation ensures that only completely missing widths are being replaced.
2019-09-18 10:15:09 +02:00
Jani Pehkonen
911df237f3 Avoid floating point inaccuracy in gradient color stops 2019-09-17 21:01:17 +03:00
Jonas Jenwald
4bd79ec4b3 Inline the resolveOrReject helper function at its call-sites in MessageHandler, and rename an error key to reason
Given that there's only a couple of call-sites, and that the helper function is really simple, it doesn't seem entirely necessary to keep it around. While fewer function calls is always a good thing, in this case the performance impact is small enough to be unmeasurable.

With *one* single exception the code in `MessageHandler` is using `reason` when passing around various Errors, hence this patch also renames an `error` key for consistency.
2019-09-17 14:22:24 +02:00
Jonas Jenwald
0617984b59 Remove unnecessary data.streamId accesses in MessageHandler._processStreamMessage, and use a constant object shape in MessageHandler.sendWithStream
The `streamId` short-hand in `MessageHandler._processStreamMessage` was only used partially througout the method, which seemed kind of strange, hence that's fixed in this patch.
Furthermore, always giving the `streamController` object a constant shape in `MessageHandler.sendWithStream` cannot hurt either.
2019-09-17 14:18:57 +02:00
Jonas Jenwald
281ed33e43 Abort, with a small delay, getOperatorList on the worker-thread when rendering is cancelled (PR 11069 follow-up)
With this patch we're finally able to abort worker-thread parsing of the `OperatorList`, rather than *only* aborting the main-thread rendering itself, when the `RenderTask.cancel` method is being called.
This will help improve perceived performance in the default viewer, especially when reading longer and more complex documents, since pages that've been scrolled out-of-view (and thus evicted from the cache) will no longer compete for parsing resources on the worker-thread.

*Please note:* With the implementation in this patch we're *not* aborting worker-thread parsing immediately on `RenderTask.cancel`, since that would lead to *worse* performance in many cases. For example: When zoom/rotation occurs in the viewer, while parsing/rendering is still ongoing, a `cancel` call will usually be (almost) immediately folled by a new `PDFPageProxy.render` call. In that case you obviously don't want to abort parsing on the worker-thread, since that would risk throwing away a partially parsed `OperatorList` and thus force unnecessary re-parsing which will regress perceived performance (especially for more complex documents).

When choosing a reasonable delay, before cancelling `getOperatorList` on the worker-thread when `RenderTask.cancel` is called, two different positions need to be considered:
 1. The delay needs to be short enough, since a timeout in the multiple seconds range would essentially make this entire functionality meaningless (by always allowing most/all pages enough time to finish parsing).

 2. The delay cannot be *too* short, since that would actually *reduce* performance in the zoom/rotation case outlined above. Furthermore, the time between `RenderTask.cancel` and `PDFPageProxy.render` calls will obviously be affected by both general computer performance and current CPU load.

It's certainly possible that the timeout may require some further tweaks, however the value settled on in this patch was easily *one order* of magnitude larger than the delta between cancel/render in my tests.
2019-09-14 11:30:32 +02:00
Jonas Jenwald
00efff532c Ensure that addLinkAttributes is always called with a valid url parameter
There's no good reason for calling this helper function without a `url` parameter, and this way we can prevent that from happening.
Note how the `PDFOutlineViewer` call-site was already doing the right thing here, and only the `LinkAnnotationElement` call-site needed a small adjustment to make it work.
2019-09-11 13:24:04 +02:00
Jonas Jenwald
12e1c91f73 Don't enqueue unused properties when sending 'GetOperatorList' data from the worker-thread (PR 11069 follow-up)
With the changes made in PR 11069, it's no longer necessary to include the `pageIndex`/`intent` parameters when sending 'GetOperatorList' data. In the previous implementation these properties were used to associate the `OperatorList` with the correct `RenderTask`, however now that `ReadableStream`s are used that's handled automatically and it's thus dead code at this point.
2019-09-09 17:41:26 +02:00
Tim van der Meij
37d5b80ba8
Merge pull request #11118 from Snuffleupagus/FetchBuiltInCMap-sendWithStream
Transfer, rather than copy, CMap data to the worker-thread
2019-09-06 22:56:14 +02:00
Jonas Jenwald
7dea3f9389 [api-minor] Remove the postMessageTransfers parameter, and thus the ability to manually disable transferring of data, from the API
By transfering, rather than copying, `ArrayBuffer`s between the main- and worker-threads, you can avoid unnecessary allocations by only having *one* copy of the same data.
Hence manually setting `postMessageTransfers: false`, when calling `getDocument`, is a performance footgun[1] which will do nothing but waste memory.

Given that every reasonably modern browser supports `postMessage` transfers[2], I really don't see why it should be possible to force-disable this functionality.
Looking at the browser support, for `postMessage` transfers[2], it's highly unlikely that PDF.js is even usable in browsers without it. However, the feature testing of `postMessage` transfers is kept for the time being just to err on the safe side.

---
[1] This is somewhat similar to the, now removed, `disableWorker` parameter which also provided API users a much too simple way of reducing performance.

[2] See e.g. https://developer.mozilla.org/en-US/docs/Web/API/MessagePort/postMessage#Browser_compatibility and https://developer.mozilla.org/en-US/docs/Web/API/Transferable#Browser_compatibility
2019-09-05 13:09:54 +02:00
Jonas Jenwald
f0534b9b51 Adjust the values sent, with the 'test' message, by the WorkerMessageHandler.setup method
Note how the sent values have inconsistent types, with a boolean in one case and an object in the other (normal) case.
Furthermore, explicitly sending a `supportTypedArray: true` property seems superfluous at least to me.
2019-09-05 11:27:27 +02:00
Jonas Jenwald
7212ff4eea Stop checking for the response property, on XMLHttpRequest, when setting up the WorkerMessageHandler
This check was added in PR 2445, however it's no longer necessary since all data[1] is now loaded on the main-thread (and then transferred to the worker-thread).
Furthermore, by default the Fetch API is now (usually) used rather than `XMLHttpRequest`.

All in all, while these checks *were* necessary at one point that's no longer the case and they can thus be removed.

---
[1] This includes both the actual PDF data, as well as the CMap data.
2019-09-05 11:27:22 +02:00
Jonas Jenwald
f11a4ba750 Transfer, rather than copy, CMap data to the worker-thread
It recently occurred to me that the CMap data should be an excellent candidate for transfering.
This will help reduce peak memory usage for PDF documents using CMaps, since transfering of data avoids duplicating it on both the main- and worker-threads.

Unfortunately it's not possible to actually transfer data when *returning* data through `sendWithPromise`, and another solution had to be used.
Initially I looked at using one message for requesting the data, and another message for returning the actual CMap data. While that should have worked, it would have meant adding a lot more complexity particularly on the worker-thread.
Hence the simplest solution, at least in my opinion, is to utilize `sendWithStream` since that makes it *really* easy to transfer the CMap data. (This required PR 11115 to land first, since otherwise CMap fetch errors won't propagate correctly to the worker-thread.)

Please note that the patch *purposely* only changes the API to Worker communication, and not the API *itself* since changing the interface of `CMapReaderFactory` would be a breaking change.
Furthermore, given the relatively small size of the `.bcmap` files (the largest one is smaller than the default range-request size) streaming doesn't really seem necessary either.
2019-09-04 11:46:04 +02:00
Jonas Jenwald
74f5a59f43 Ensure that the cancel/error methods on Streams are always called with valid reason arguments 2019-09-02 23:31:07 +02:00
Jonas Jenwald
02bdacef42 Ensure that Errors are handled correctly when using postMessage with Streams in MessageHandler
Having recently worked with this code, it struck me that most of the `postMessage` calls where `Error`s are involved have never been correctly implemented (i.e. missing `wrapReason` calls).
2019-09-02 23:31:07 +02:00
Tim van der Meij
e59b11860d
Merge pull request #11108 from timvandermeij/es6-annotations
Use more ES6 syntax in the annotation code
2019-09-02 23:13:24 +02:00
Tim van der Meij
2866c8a39e
Use more ES6 syntax in src/core/annotation.js
`let` is converted to `const` where possible.
2019-09-02 22:37:27 +02:00
Tim van der Meij
c37a2c0408
Merge pull request #11112 from Snuffleupagus/TESTING-rm-version-warn
Remove the API/Worker version warning message in `TESTING` mode
2019-09-02 22:22:33 +02:00
Jonas Jenwald
229f6f34d1 Remove the API/Worker version warning message in TESTING mode
The warning messages turn out to be more annoying than helpful when looking at the `console` during tests, so let's just remove them.
2019-09-01 16:47:26 +02:00
Jonas Jenwald
cd82b81bc7 Inline the resolveCall helper function at its call-sites in MessageHandler
There's only three call-sites and one of them doesn't even need the complete functionality of `resolveCall`, hence it seems reasonable to just inline this code.
An additional benefit of this is that the `Function.prototype.apply()` instance can also be converted into "normal" function calls, which should be a tiny bit more efficient.

The patch also replaces a number of unnecessary arrow functions, in relevant parts of the `MessageHandler` code, with "normal" functions instead.
Finally, all `Promise.resolve().then(...)` calls are replaced with `new Promise(...)` instead since the latter is a tiny bit more efficient. This also explains the test failures on the Linux bot, with a prior version of the patch, since the `Promise.resolve().then(...)` format essentially creates two Promises thus causing additional delay.
2019-09-01 13:40:19 +02:00
Jonas Jenwald
055f03938b Remove support for the scope parameter in the MessageHandler.on method
At this point in time it's easy to convert the `MessageHandler.on` call-sites to use arrow functions, and thus let the JavaScript engine handle scopes for us, rather than having to manually keep references to the relevant scopes in `MessageHandler`.[1]
An additional benefit of this is that a couple of `Function.prototype.call()` instances can now be converted into "normal" function calls, which should be a tiny bit more efficient.

All in all, I don't see any compelling reason why it'd be necessary to keep supporting custom `scope`s in the `MessageHandler` implementation.

---
[1] In the event that a custom scope is ever needed, simply using `bind` on the handler function when calling `MessageHandler.on` ought to work as well.
2019-09-01 09:24:15 +02:00
Tim van der Meij
49018482dc
Use more ES6 syntax in src/display/annotation_layer.js
`let` is converted to `const` where possible, `var` usage is disabled
and template strings are used where possible.
2019-08-31 16:40:39 +02:00
Jonas Jenwald
f71ea2de0e Remove the makeReasonSerializable helper function, and use wrapReason instead, in src/shared/message_handler.js
Since `wrapReason` and `makeReasonSerializable` are essentially functionally equivalent it doesn't seem necessary to keep both of them around, especially when `makeReasonSerializable` only has a *single* call-site.
2019-08-30 19:36:10 +02:00
Jonas Jenwald
4e6a9b54c7 Change the internal stream property, as sent when Streams are used, from a String to a Number
Given that the `stream` property is an internal implementation detail, changing its type shouldn't be a problem. By using Numbers instead, we can avoid unnecessary String allocations when creating/processing Streams.
2019-08-30 13:27:18 +02:00
Jonas Jenwald
252a3e35fb Reduce the amount of unnecessary function calls and object allocations, in MessageHandler, when using Streams
With PR 11069 we're now using Streams for OperatorList parsing (in addition to just TextContent parsing), which brings the nice benefit of being able to easily abort parsing on the worker-thread thus saving resources.

However, since we're now creating many more `ReadableStream` there appears to be a tiny bit more overhead because of it (giving ~1% slower runtime of `browsertest` on the bots). In this case we're just going to have to accept such a small regression, since the benefits of using Streams clearly outweighs it.

What we *can* do here, is to try and make the Streams part of the `MessageHandler` implementation slightly more efficient by e.g. removing unnecessary function calls (which has been helpful in other parts of the code-base). To that end, this patch makes the following changes:

 - Actually support `transfers` in `MessageHandler.sendWithStream`, since the parameter was being ignored.

 - Inline the `sendStreamRequest`/`sendStreamResponse` helper functions at their respective call-sites. Obviously this causes some amount of code duplication, however I still think this change seems reasonable since for each call-site:
   - It avoids making one unnecessary function call.
   - It avoids allocating one temporary object.
   - It avoids sending, and thus structure clone, various undefined object properties.

 - Inline objects in the `MessageHandler.{send, sendWithPromise}` methods.

 - Finally, directly call `comObj.postMessage` in various methods when `transfers` are *not* present, rather than calling `MessageHandler.postMessage`, to further reduce the amount of function calls.
2019-08-30 12:32:20 +02:00
Jonas Jenwald
ae0d9e8c2a Replace some instances of implicit function.bind(this) usage, in src/display/api.js, with arrow functions instead 2019-08-30 11:35:05 +02:00
Jonas Jenwald
667e548e5f [TextLayer] Remove setAttribute usage in appendText (issue 8066)
One of the motivations for using `setAttribute` in the first place was to support more efficient DOM updates in the `expandTextDivs` method, since performance of the `enhanceTextSelection` mode can be somewhat bad when there's a lot of `textDivs` on the page.

With recent `TextLayer` changes/optimizations it's no longer necessary to store a complete `style`-string for every `textDiv`, and we can thus re-visit the `setAttribute` usage.
Note that with the current code, in `appendText`, there's only *one* string per `textDiv` which avoids a bunch of temporary strings. While the changes in this patch means that there's now *three* strings per `textDiv` instead, the total length of these strings are now quite a bit shorter (42 characters to be exact).
2019-08-28 16:52:09 +02:00
Jonas Jenwald
106b239c5d [TextLayer] Avoid unnecessary font updates in _layoutText (PR 11097 follow-up)
*This should obviously have been done in PR 11097, but for some reason I completely overlooked it; sorry about that.*

There's no good reason to update the font unless you're actually going to measure the width of the textContent. This can reduce unnecessary font switching a fair bit, even for documents which are somewhat simple/short (in e.g. the `tracemonkey.pdf` file this cuts the amount of font switches almost in half).
2019-08-28 16:08:06 +02:00
Jonas Jenwald
a1398048e5 [TextLayer] Simplify building of the *expanded* transform in expandTextDivs
Rather than essentially re-computing the `originalTransform` every time, we can simply use it directly instead.
2019-08-25 13:09:04 +02:00
Jonas Jenwald
b68f7bb404 [TextLayer] Only measure the width of the text, in _layoutText, for multi-char text divs
For performance reasons single-char text divs aren't being scaled, as outlined in a comment in `appendText`. Hence it doesn't seem necessary, or even a good idea, to unconditionally measuring the width of the text in `_layoutText`.
2019-08-25 12:32:49 +02:00
Jonas Jenwald
711040ecc5 Stop re-throwing errors in the 'GetOperatorList' and 'GetTextContent' handlers, in src/core/worker.js
These functions aren't returning anything, now that they're using `ReadableStream`s, and it thus doesn't seem necessary to re-throw errors (also given the console message that's caused by it).
2019-08-24 15:56:41 +02:00
Yury Delendik
66e0dd1b06 Use streams for OperatorList chunking (issue 10023)
*Please note:* The majority of this patch was written by Yury, and it's simply been rebased and slightly extended to prevent issues when dealing with `RenderingCancelledException`.

By leveraging streams this (finally) provides a simple way in which parsing can be aborted on the worker-thread, which will ultimately help save resources.
With this patch worker-thread parsing will *only* be aborted when the document is destroyed, and not when rendering is cancelled. There's a couple of reasons for this:

 - The API currently expects the *entire* OperatorList to be extracted, or an Error to occur, once it's been started. Hence additional re-factoring/re-writing of the API code will be necessary to properly support cancelling and re-starting of OperatorList parsing in cases where the `lastChunk` hasn't yet been seen.
 - Even with the above addressed, immediately cancelling when encountering a `RenderingCancelledException` will lead to worse performance in e.g. the default viewer. When zooming and/or rotation of the document occurs it's very likely that `cancel` will be (almost) immediately followed by a new `render` call. In that case you'd obviously *not* want to abort parsing on the worker-thread, since then you'd risk throwing away a partially parsed Page and thus be forced to re-parse it again which will regress perceived performance.
 - This patch is already *somewhat* risky, given that it touches fundamentally important/critical code, and trying to keep it somewhat small should hopefully reduce the risk of regressions (and simplify reviewing as well).

Time permitting, once this has landed and been in Nightly for awhile, I'll try to work on the remaining points outlined above.

Co-Authored-By: Yury Delendik <ydelendik@mozilla.com>
Co-Authored-By: Jonas Jenwald <jonas.jenwald@gmail.com>
2019-08-24 15:56:40 +02:00
Jonas Jenwald
29a2516e4c [TextLayer] Use an Array to build the total padding, rather than concatenating Strings, in expandTextDivs
Furthermore, it's possible to re-use the same Array for all `textDiv`s on the page and the resulting padding string also becomes a lot more compact.
Please note that the `paddingLeft` branch was moved, since the padding values need to be ordered as `top, right, bottom, left`.

Finally, with this re-factoring it's no longer necessary to cache the original `style` string for every `textDiv` when `enhanceTextSelection` is enabled.
2019-08-24 01:13:59 +02:00
Tim van der Meij
edbebb8bf7
Merge pull request #11090 from Snuffleupagus/textLayer-expandTextDivs-transform
[TextLayer] Use an Array to build the total `transform`, rather than concatenating Strings, in `expandTextDivs`
2019-08-23 23:12:42 +02:00
Jonas Jenwald
932fcacff8 [TextLayer] Only handle positive padding values in expandTextDivs
Given that browsers will reject padding values smaller than zero (which may be caused by limited numerical precision during calculations in the `expand` code), it makes no sense to include those when expanding the `textDiv`s.
2019-08-23 13:16:20 +02:00
Jonas Jenwald
37e8a8189b [TextLayer] Use an Array to build the total transform, rather than concatenating Strings, in expandTextDivs
Furthermore, it's possible to re-use the same Array for all `textDiv`s on the page.
2019-08-23 12:17:12 +02:00
Tim van der Meij
490deb1b65
Merge pull request #11086 from Snuffleupagus/textLayer-originalTransform
[TextLayer] Only cache the `originalTransform` when `enhanceTextSelection` is enabled
2019-08-22 23:09:07 +02:00
Brendan Dahl
31f319301d
Merge pull request #11087 from brendandahl/disable-links
Add a way to disable external links.
2019-08-22 11:13:11 -07:00
Jonas Jenwald
a519ceffee [TextLayer] Use template strings when updating the font property in the _layoutText method 2019-08-22 14:47:44 +02:00
Jonas Jenwald
6afe3221b7 [TextLayer] Only cache the originalTransform when enhanceTextSelection is enabled
Given that this is completely unused in "regular" text-selection mode, there's no reason to unconditionally store one string for every `textDiv`.
2019-08-22 14:47:18 +02:00
Brendan Dahl
98e989116c Add a way to disable external links. 2019-08-21 11:20:41 -07:00
Jonas Jenwald
431a264126 [TextLayer] Reduce the amount of intermediary strings in expandTextDivs
By using template strings, we can avoid some unnecessary string allocations (which is also helped by shortening a variable name).
2019-08-19 12:09:18 +02:00
Jonas Jenwald
45dfad8640 [TextLayer] Only cache the current textDiv style when enhanceTextSelection is enabled
This will help save a little bit of memory, by not storing one unused string for each `textDiv` in regular text-selection mode.
2019-08-19 11:02:56 +02:00
Jonas Jenwald
1cd9a28c81 Replace the XRef.cache Array with a Map instead
Given that the different types of `Stream`s will never be cached, this thus implies that the `XRef.cache` Array will *always* be more-or-less sparse.
Generally speaking, the longer the document the more sparse the `XRef.cache` will thus become. For example, looking at the `pdf.pdf` file from the test-suite: The length of the `XRef.cache` Array will be a few hundred thousand elements, with approximately 95% of them being empty.

Hence it seems pretty clear that an Array isn't really the best data-structure for this kind of cache, and this patch thus changes it to a Map instead.

This patch-series was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 200,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch-series against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   200 |         2736 |        2736 |   1 |  0.02 |
Firefox | Page Request |   200 |            2 |           2 |   0 | -8.26 |        faster
Firefox | Rendering    |   200 |         2733 |        2734 |   1 |  0.03 |
```
2019-08-18 12:07:18 +02:00
Jonas Jenwald
34a53b9f5d Inline the isRef checks in the various XRef.fetch related methods
The relevant methods are usually not hot enough for these changes to have an easily measurable effect, however there's been a lot of other cases where similiar inlining has helped performance. (And these changes may help offset the changes made in the next patch.)
2019-08-18 11:57:48 +02:00
Tim van der Meij
1565d1849d
Merge pull request #11073 from brendandahl/code-point
Move polyfill for codePointAt to String prototype.
2019-08-17 13:26:35 +02:00
Brendan Dahl
c8129b8787 Move polyfill for codePointAt to String prototype.
This method belongs on the prototype not the String object.
2019-08-16 14:32:43 -07:00
Jonas Jenwald
40d3916f31 Reduce the number of temporary variables in the Parser.getObj method
This avoids allocating approximately 1.7 million short-lived variables when loading the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, in the default viewer.
2019-08-16 13:51:41 +02:00
Jonas Jenwald
7728a6630c Inline the isString check in the Parser.getObj method
For very large and complex PDF files this will help performance *slightly*, since `Parser.getObj` is called *a lot* during parsing in the worker.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 200,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   200 |         2847 |        2830 | -17 | -0.60 |        faster
Firefox | Page Request |   200 |            2 |           2 |   0 | -7.14 |
Firefox | Rendering    |   200 |         2844 |        2827 | -17 | -0.60 |        faster
```
2019-08-16 10:34:24 +02:00
Jonas Jenwald
7f456b3e2e Replace of all usages of var with let/const in the src/shared/util.js file
Also removes a couple of unnecessary (temporary) variable assigments in `arraysToBytes` and uses template strings in a few spots.
2019-08-11 14:35:35 +02:00
Jonas Jenwald
f6c4a1f080 Convert Util to a class with static methods
Also replaces `var` with `const` in all the relevant code.
2019-08-11 14:35:35 +02:00
Jonas Jenwald
7ee370a394 Remove the skipEmpty parameter from Util.intersect (PR 11059 follow-up)
Looking at this again, it struck me that added functionality in `Util.intersect` is probably more confusing than helpful in general; sorry about the churn in this code!
Based on the parameter name you'd probably expect it to only match when the intersection is `[0, 0,  0, 0]` and not when only one component is zero, hence the `skipEmpty` parameter thus feels too tightly coupled to the `Page.view` getter.
2019-08-11 14:33:52 +02:00
Tim van der Meij
fbe8c6127c
Merge pull request #11059 from Snuffleupagus/boundingBox-more-validation
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
2019-08-09 22:39:01 +02:00
Jonas Jenwald
d637b25e36 Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".

The patch makes the following notable changes:
 - Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
 - Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
 - Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
 - Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
 - Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.

---

[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-09 10:18:13 +02:00
Jonas Jenwald
0f78fdb229 Handle some corrupt/truncated JPEG images that are missing the EOI (End of Image) marker (issue 11052)
Note that even Adobe Reader cannot render the PDF file completely, which is always a good indication that it's corrupt.
2019-08-08 10:37:41 +02:00
Jonas Jenwald
e9b7996f2f Actually compare the cropBox and mediaBox correctly in the Page.view getter
The current code will only consider the `cropBox` and `mediaBox` as equal when they both point to the *same* underlying Array. In the case where a PDF file actually specifies both boxes independently, with the exact same values in each, the comparison will currently fail and lead to an unneeded intersection computation.
2019-08-07 17:15:57 +02:00
Jonas Jenwald
5ac9c7c384 Support corrupt PDF files with invalid/non-existent Group /CS entries (issue 11045)
The PDF file in question tries to reference a non-existent ColorSpace, which should be quite rare in practice.
2019-08-06 14:33:05 +02:00
Tim van der Meij
be70ee236d
Merge pull request #11013 from timvandermeij/annotations-quadpoints
[api-minor] Implement quadpoints for annotations in the core layer
2019-08-04 16:06:10 +02:00
Jonas Jenwald
0276385e6e [api-minor] Fix completely broken getStats method by returning stats in Objects, rather than in Arrays (PR 11029 follow-up)
With the changes to the `StreamType`/`FontType` "enums" in PR 11029, one unfortunate result is that `getStats` now *always* returns empty Arrays. Something that everyone, myself included, apparently missed is that you obviously cannot index an Array with Strings :-)

I wrongly assumed that the unit-tests would catch any bugs, but they apparently suffered from the same issue as the code in `src/core/`.

Another possible option could perhaps be to use `Set`s, rather than objects, but that will require larger changes since `LoopbackPort` (in `src/display/api.js`) doesn't support them.
2019-08-02 14:09:24 +02:00
Tim van der Meij
9c8fe3142a
Merge pull request #11034 from Snuffleupagus/cancel-with-AbortException
Ensure that `ReadableStream`s are cancelled with actual Errors
2019-08-02 00:18:44 +02:00
Tim van der Meij
e0b38bed3c
Merge pull request #11029 from brendandahl/pdfjs-telemetry-update
[api-minor] Update telemetry to use 'categorical' histograms.
2019-08-02 00:11:02 +02:00
Brendan Dahl
31d71808e7 [api-minor] Update telemetry to use 'categorical' histograms.
Firefox telemetry supports using string labels now. Convert our integers
that we used for categories to just use strings.

The upstream work will happen in:
https://bugzilla.mozilla.org/show_bug.cgi?id=1566882
2019-08-01 09:51:02 -07:00
Jonas Jenwald
a3150166ec Ensure that ReadableStreams are cancelled with actual Errors
There's a number of spots in the current code, and tests, where `cancel` methods are not called with appropriate arguments (leading to Promises not being rejected with Errors as intended).
In some cases the cancel `reason` is implicitly set to `undefined`, and in others the cancel `reason` is just a plain String. To address this inconsistency, the patch changes things such that cancelling is done with `AbortException`s everywhere instead.
2019-08-01 16:40:46 +02:00
Tim van der Meij
d909b86b28
Merge pull request #11020 from Snuffleupagus/issue-11016
Add a work-around, in `glyphlist.js`, for bad PDF generators which use a non-standard `/f_f` string in the `Encoding` dictionary when referring to the ff ligature (issue 11016)
2019-07-31 23:33:34 +02:00
wangsongyan
c61205d980 decode filename when match an urlencode filename from contentDispositionFilename 2019-07-31 09:33:56 +08:00
Jonas Jenwald
9ad50521b1 Add a work-around, in glyphlist.js, for bad PDF generators which use a non-standard /f_f string in the Encoding dictionary when referring to the ff ligature (issue 11016)
This patch will not incur any (measurable) overhead, since the glyphlist is already quite long and one more entry won't really matter, which is important given that this sort of PDF corruption ought to be very rare.

Furthermore, this patch purposely does *not* add a bunch of similarly modified ligature names on pure speculation. Any similar additions, for other ligatures, should only be made if there's real-world examples of PDF files where that's actually necessary.
2019-07-30 17:06:58 +02:00
Jonas Jenwald
38ccb43436 Reduce the number of function calls in EvaluatorPreprocessor.read
For very large and complex PDF files this will help performance slightly, since `EvaluatorPreprocessor.read` is called a lot during parsing in the worker.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, using the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 200,
       "type": "eq"
    }
]
```

This gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   200 |         3402 |        3358 | -43 | -1.28 |        faster
Firefox | Page Request |   200 |            1 |           1 |   0 | 26.71 |
Firefox | Rendering    |   200 |         3401 |        3357 | -44 | -1.28 |        faster
```
2019-07-29 08:43:36 +02:00
Tim van der Meij
9114004d5b
[api-minor] Implement quadpoints for annotations in the core layer 2019-07-28 20:36:21 +02:00
Jonas Jenwald
ff90aa4323 Inline the isCmd check in the Parser.shift method
For very large and complex PDF files this will help performance slightly, since `Parser.shift` is called *a lot* during parsing.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471 (with well over *four million* `Parser.shift` calls for just the one page), using the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 100,
       "type": "eq"
    }
]
```

This gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   100 |         3386 |        3322 | -65 | -1.92 |        faster
Firefox | Page Request |   100 |            1 |           1 |   0 | -8.08 |
Firefox | Rendering    |   100 |         3385 |        3321 | -65 | -1.92 |        faster
```
2019-07-22 12:07:36 +02:00
Jonas Jenwald
b5254f2745 Attempt to significantly reduce the number of ChunkedStream.{ensureByte, ensureRange} calls by inlining the this.progressiveDataLength checks at the call-sites
The number of in particular `ChunkedStream.ensureByte` calls is often absolutely *huge* (on the order of million calls) when loading and rendering even moderately complicated PDF files, which isn't entirely surprising considering that the `getByte`/`getBytes`/`peekByte`/`peekBytes` methods are used for essentially all data reading/parsing.

The idea implemented in this patch is to inline an inverted `progressiveDataLength` check at all of the `ensureByte`/`ensureRange` call-sites, which in practice will often result in *several* orders of magnitude fewer function calls.
Obviously this patch will only help if the browser supports streaming, which all reasonably modern browsers now do (including the Firefox built-in PDF viewer), and assuming that the user didn't set the `disableStream` option (e.g. for using `disableAutoFetch`). However, I think we should be able to improve performance for the default out-of-the-box use case, without worrying about e.g. older browsers (where this patch will thus incur *one* additional check before calling `ensureByte`/`ensureRange`).

This patch was inspired by the *first* commit in PR 5005, which was subsequently backed out in PR 5145 for causing regressions. Since the general idea of avoiding unnecessary function calls was really nice, I figured that re-attempting this in one way or another wouldn't be a bad idea.
Given that streaming is now supported, which it wasn't back then, using `progressiveDataLength` seemed like an easier approach in general since it also allowed supporting both `ensureByte` and `ensureRange`.

This sort of patch obviously needs data to back it up, hence I've benchmarked the changes using the following manifest file (with the default `tracemonkey` file):
```
[
    {  "id": "tracemonkey-eq",
       "file": "pdfs/tracemonkey.pdf",
       "md5": "9a192d8b1a7dc652a19835f6f08098bd",
       "rounds": 250,
       "type": "eq"
    }
]
```

I get the following complete results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |  3500 |          140 |         134 |  -6 | -4.46 |        faster
Firefox | Page Request |  3500 |            2 |           2 |   0 | -0.10 |
Firefox | Rendering    |  3500 |          138 |         131 |  -6 | -4.54 |        faster
```

Here it's pretty clear that the patch does have a positive net effect, even for a PDF file of fairly moderate size and complexity. However, in this case it's probably interesting to also look at the results per page:
```
-- Grouped By page, stat --
page | stat         | Count | Baseline(ms) | Current(ms) | +/- |     %  | Result(P<.05)
---- | ------------ | ----- | ------------ | ----------- | --- | ------ | -------------
0    | Overall      |   250 |           74 |          75 |   1 |   0.69 |
0    | Page Request |   250 |            1 |           1 |   0 |  33.20 |
0    | Rendering    |   250 |           73 |          74 |   0 |   0.25 |
1    | Overall      |   250 |          123 |         121 |  -2 |  -1.87 |        faster
1    | Page Request |   250 |            3 |           2 |   0 | -11.73 |
1    | Rendering    |   250 |          121 |         119 |  -2 |  -1.67 |
2    | Overall      |   250 |           64 |          63 |  -1 |  -1.91 |
2    | Page Request |   250 |            1 |           1 |   0 |   8.81 |
2    | Rendering    |   250 |           63 |          62 |  -1 |  -2.13 |        faster
3    | Overall      |   250 |           97 |          97 |   0 |  -0.06 |
3    | Page Request |   250 |            1 |           1 |   0 |  25.37 |
3    | Rendering    |   250 |           96 |          95 |   0 |  -0.34 |
4    | Overall      |   250 |           97 |          97 |   0 |  -0.38 |
4    | Page Request |   250 |            1 |           1 |   0 |  -5.97 |
4    | Rendering    |   250 |           96 |          96 |   0 |  -0.27 |
5    | Overall      |   250 |           99 |          97 |  -3 |  -2.92 |
5    | Page Request |   250 |            2 |           1 |   0 | -17.20 |
5    | Rendering    |   250 |           98 |          95 |  -3 |  -2.68 |
6    | Overall      |   250 |           99 |          99 |   0 |  -0.14 |
6    | Page Request |   250 |            2 |           2 |   0 | -16.49 |
6    | Rendering    |   250 |           97 |          98 |   0 |   0.16 |
7    | Overall      |   250 |           96 |          95 |  -1 |  -0.55 |
7    | Page Request |   250 |            1 |           2 |   1 |  66.67 |        slower
7    | Rendering    |   250 |           95 |          94 |  -1 |  -1.19 |
8    | Overall      |   250 |           92 |          92 |  -1 |  -0.69 |
8    | Page Request |   250 |            1 |           1 |   0 | -17.60 |
8    | Rendering    |   250 |           91 |          91 |   0 |  -0.52 |
9    | Overall      |   250 |          112 |         112 |   0 |   0.29 |
9    | Page Request |   250 |            2 |           1 |   0 |  -7.92 |
9    | Rendering    |   250 |          110 |         111 |   0 |   0.37 |
10   | Overall      |   250 |          589 |         522 | -67 | -11.38 |        faster
10   | Page Request |   250 |           14 |          13 |   0 |  -1.26 |
10   | Rendering    |   250 |          575 |         508 | -67 | -11.62 |        faster
11   | Overall      |   250 |           66 |          66 |  -1 |  -0.86 |
11   | Page Request |   250 |            1 |           1 |   0 | -16.48 |
11   | Rendering    |   250 |           65 |          65 |   0 |  -0.62 |
12   | Overall      |   250 |          303 |         291 | -12 |  -4.07 |        faster
12   | Page Request |   250 |            2 |           2 |   0 |  12.93 |
12   | Rendering    |   250 |          301 |         289 | -13 |  -4.19 |        faster
13   | Overall      |   250 |           48 |          47 |   0 |  -0.45 |
13   | Page Request |   250 |            1 |           1 |   0 |   1.59 |
13   | Rendering    |   250 |           47 |          46 |   0 |  -0.52 |
```

Here it's clear that this patch *significantly* improves the rendering performance of the slowest pages, while not causing any big regressions elsewhere. As expected, this patch thus helps larger and/or more complex pages the most (which is also where even small improvements will be most beneficial).
There's obviously the question if this is *slightly* regressing simpler pages, but given just how short the times are in most cases it's not inconceivable that the page results above are simply caused be e.g. limited `Date.now()` and/or limited numerical precision.
2019-07-18 17:30:22 +02:00
Tim van der Meij
6e96a158f4
Merge pull request #10820 from vlastimilmaca/annot-irt-rt-states
Annotations - Added parsing of IRT, RT, State and StateModel
2019-07-17 23:34:31 +02:00
vlastimilmaca
fe49f0f766 Annotations - Implement parsing of IRT, RT, State and StateModel 2019-07-16 23:33:07 +02:00
Jonas Jenwald
bea15b6ce5 Simplify the PDFDocument.fingerprint method slightly
The way that this method handles documents without an `ID` entry in the Trailer dictionary feels overly complicated to me. Hence this patch adds `getByteRange` methods to the various Stream implementations[1], and utilize that rather than manually calling `ensureRange` when computing a fallback `fingerprint`.

---
[1] Note that `PDFDocument` is only ever initialized with either a `Stream` or a `ChunkedStream`, hence why the `DecodeStream.getByteRange` method isn't implemented.
2019-07-15 13:26:08 +02:00
Tim van der Meij
13ebfec903
Merge pull request #10969 from Snuffleupagus/api-test-stopAtErrors
Add an API unit-test for the `stopAtErrors` option (PRs 8240 and 8922 follow-up)
2019-07-14 14:47:57 +02:00
Jonas Jenwald
b548bafef7 Simplify, and inline, the finalize function in the MessageHandler class
The `finalize` helper function has only a *single* call-site, and furthermore it's just a one-liner too. Furthermore it's only ever called with a `Promise` as its argument, meaning that it's unnecessarily convoluted as well (i.e. the `Promise.resolve()` part shouldn't be necessary).
Hence this code can be both simplified *and* inlined at its only call-site instead.
2019-07-13 17:54:32 +02:00
Jonas Jenwald
c7fb7116d6 Add an API unit-test for the stopAtErrors option (PRs 8240 and 8922 follow-up)
Also fixes an inconsistency in the 'PageError' handler, for `getOperatorList`, in the API.
2019-07-13 16:06:05 +02:00
Jonas Jenwald
17116917f7 Remove useless wrapReason calls in the MessageHandler class
Currently `wrapReason` is manually called at *every* `resolveOrReject` call-site, despite it being completely unnecessary unless there's an actual error being handled. This is obviously inefficient, and it's easy enough to avoid by having `resolveOrReject` handle this only when actually needed.
2019-07-13 13:08:29 +02:00
Tim van der Meij
ed3954fc7a
Merge pull request #10851 from brendandahl/shading-bbox
Apply bounding box before using shading patterns.
2019-07-12 22:52:07 +02:00
Tim van der Meij
87f36e3520
Merge pull request #10850 from brendandahl/scale-line-width
Scale stroking line width when using a tiling pattern.
2019-07-12 22:50:32 +02:00
Tim van der Meij
28326165ff
Merge pull request #10958 from Snuffleupagus/api-rm-receivingOperatorList
Remove the `intentState.receivingOperatorList` boolean since it's redundant
2019-07-11 23:55:00 +02:00
Jonas Jenwald
9a4d14bf36 Prevent "Uncaught promise" messages in the console when cancelling TextLayer tasks (PR 10601 follow-up)
Since `finally` won't stop error propagation, this causes unnecessary messages to be printed in the console whenever a `TextLayer` task is cancelled.
2019-07-11 11:48:33 +02:00
Jonas Jenwald
ef48a9a713 Update the PageError handler, in the API, to always mark the operatorList as done and finalize any pending renderTasks
Note that, in the old code, there was a code-path which could prevent this from happening thus affecting future cleanup.
Furthermore, ensure that we'll always attempt to cleanup when handling the 'PageError' message, similar to the code in e.g. the `PDFPageProxy._renderPageChunk` method.
2019-07-10 14:23:59 +02:00
Jonas Jenwald
c6fcdf474b Remove the intentState.receivingOperatorList boolean since it's redundant
The `receivingOperatorList` property is currently tracked *twice* in the rendering code, both directly and inversely through the `intentState.operatorList.lastChunk` boolean. This type of double bookkeeping is never a good idea, since it's just too easy for the properties to accidentally fall out of sync.

In this case there's even a `cleanup`-related bug caused by this, which means that `PDFPageProxy._tryCleanup` will never be able to discard any data if there's an error on the worker-thread (as handled through the 'PageError' message).

Hence the simplest solution seems, at least to me, to update `PDFPageProxy._tryCleanup` to replace the `intentState.receivingOperatorList` check with a `!intentState.operatorList.lastChunk` check and completely remove the former property.
2019-07-10 14:23:10 +02:00
Brendan Dahl
6fab0a0dac Apply bounding box before using shading patterns.
Fixes #8092
2019-07-08 14:05:48 -07:00
Brendan Dahl
446efab707 Scale stroking line width when using a tiling pattern. 2019-07-08 13:47:54 -07:00
Jonas Jenwald
bdc31f8b50 Make the find helper function, in src/core/document.js, more efficient by using peekBytes rather reading the stream one byte at a time
*Please note:* A a similar change was attempted in PR 5005, but it was subsequently backed out in PR 5069.

Unfortunately I don't think anyone ever tried to debug *exactly* why it didn't work, since it ought to have worked, and having re-tested this now I'm not able to reproduce the problem any more. However, given just how inefficient the current code is, with thousands of strictly unnecessary function calls for each `find` invocation, I'd really like to try fixing this again.
2019-07-06 11:44:17 +02:00
Jonas Jenwald
41745a5996 Reduce the number of isCmd calls slightly in the XRef class
This reduces the total number of function calls, when reading the XRef table respectively when fetching uncompressed XRef entries.
Note in particular the `XRef.readXRefTable` method, where there're *two* back-to-back `isCmd` checks rather than just one.
2019-06-29 16:28:45 +02:00
Tim van der Meij
7fc329d6e3
Merge pull request #10902 from ahuglajbclajep/tiling-pattern-support
Implement tiling patterns for the SVG back-end
2019-06-28 12:45:02 +02:00
Tim van der Meij
f1867de492
Merge pull request #10925 from Snuffleupagus/eslint_no-unsanitized
Enable the `eslint-plugin-no-unsanitized` ESLint plugin to disallow unsafe usage of e.g. `innerHTML`
2019-06-27 20:32:24 +02:00
ahuglajbclajep
77940dbd86 Implement tiling patterns for the SVG back-end 2019-06-25 16:25:25 +09:00
Jonas Jenwald
f710eb56e4 Change the signature of the Parser constructor to take a parameter object
A lot of the `new Parser()` call-sites look quite unwieldy/ugly as-is, with a bunch of somewhat randomly ordered arguments, which we can avoid by changing the constructor to accept an object instead. As an added bonus, this provides better documentation without having to add inline argument comments in the code.
2019-06-23 16:01:45 +02:00
Jonas Jenwald
5bb5e7741d Enable the eslint-plugin-no-unsanitized ESLint plugin to disallow unsafe usage of e.g. innerHTML
See https://github.com/mozilla/eslint-plugin-no-unsanitized

Since we've generally never allowed e.g. `innerHTML`, which is enforced during review, there's only one linting failure with this patch. (Which is white-listed, according to the existing comment and the fact that it's test-only code.)
2019-06-23 13:50:30 +02:00
Jonas Jenwald
021e5ffb88 Move PDFWorkerStream and related code to its own file
Since all other `IPDFStream` implementations live in their own files, it seems reasonable for these to do so as well.

Furthermore, converts all of the relevant code to ES6 classes and updates the interface definitions to mark a couple of methods `async`.
2019-06-15 13:05:25 +02:00
Tim van der Meij
06b253d609
Merge pull request #10890 from Snuffleupagus/outline-items-hidden
Add support for outline items, in the default viewer, which default to collapsed when the outline is built
2019-06-09 11:35:49 +02:00
Jonas Jenwald
26bc630e19 Add support for outline items, in the default viewer, which default to collapsed when the outline is built
The PDF specification supports this feature, which is commonly used in large/long documents (such as the spec itself), and it seems reasonably straightforward to implement; see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G11.2095911
2019-06-07 12:26:23 +02:00
Jonas Jenwald
625af8d2ad [api-minor] Attempt to reduce memory usage during printing, by always running cleanup once rendering has finished
Given that `cleanupAfterRender` is already set for large images, when handling 'obj' messages, this patch *should* thus be safe in general (since otherwise there ought be existing bugs related to cleanup and printing).
2019-06-03 00:29:17 +02:00
Jonas Jenwald
876c962235 Ignore Annotations with too large border widths, to prevent the annotationLayer from rendering it over the surrounding document (bug 1552113)
The border `width` will instead fallback to the default value of `1`, rather than ignoring it altoghether, to also ensure that e.g. `LinkAnnotation`s become clickable as intended.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1552113
2019-06-01 15:51:22 +02:00
Tim van der Meij
209e42043a
Merge pull request #10873 from Snuffleupagus/worker-terminate-clearPrimitiveCaches
Ensure that the `Cmd`/`Name`/`Ref` caches are cleared when terminating the worker (PR 10863 follow-up)
2019-05-31 12:56:53 +02:00
Jonas Jenwald
a3742a9f83 Ensure that the Cmd/Name/Ref caches are cleared when terminating the worker (PR 10863 follow-up)
Usually when the worker is terminated it will also be completely destroyed/removed, which means that any global caches (such as the ones in `src/core/primitive.js`) should be automatically cleared in the process.

However, for certain ways of loading the `pdf.worker.js` file, e.g. passing in a re-usable worker to `getDocument`, using the `workerPort` functionality, or even disabling workers completely  (even though this is never a good idea), the worker file may be kept in memory and these caches will not be cleared as expected.
2019-05-30 20:57:28 +02:00
Jonas Jenwald
8857a81c8d Re-use, rather than re-creating, some Arrays when resetting them in src/display/api.js
Calling `someArray = []` will create a new Array, which seems completely unnecessary when it's sufficient to just call `someArray.length = 0` to achieve the same effect.

Even though I cannot imagine these particular cases having any noticeable performance impact, similar changes were made in `core/` code years ago since it's apparently more efficient memory wise.
2019-05-30 16:33:05 +02:00
Jani Pehkonen
343b1381a2 Don't clip if path is undefined in SVG back-end 2019-05-28 18:37:15 +03:00