Commit Graph

5374 Commits

Author SHA1 Message Date
Tim van der Meij
5e5641b147
Merge pull request #13457 from Snuffleupagus/issue-13242
Work-around for HighlightAnnotations without a top-level /ExtGState-entry (issue 13242)
2021-05-28 23:38:39 +02:00
Tim van der Meij
0d56b1c365
Merge pull request #13443 from Snuffleupagus/charsCache
Re-factor the `charsCache` on `Font`-instances
2021-05-28 21:29:57 +02:00
Brendan Dahl
a6484c9861
Merge pull request #13427 from calixteman/xfa_storage
XFA - Add a storage to save fields values
2021-05-28 12:10:08 -07:00
Jonas Jenwald
707a9e3b02 Work-around for HighlightAnnotations without a top-level /ExtGState-entry (issue 13242)
For HighlightAnnotations with a built-in appearance stream, we still rely on it to specify the opacity correctly via a suitable blend mode. However, if the Annotation-drawing operators are placed *within* a /XObject of the /Form-type, the /ExtGState won't apply to the final rendering and the result is that the highlighting obscures the underlying text.

The more *correct* and general solution would likely be to somehow modify the implementation in `src/display/canvas.js`, to special-case handling of /Form-type /XObjects when rendering Annotations. Since we can very easily work-around this problem for now by using the "no appearance stream" code-path, doing *something* here ought to be preferable.

This patch is (obviously) merely a work-around, but given that the referenced issue is (as far as I know) the first case we've seen of this problem a simple solution will hopefully suffice for now.
2021-05-28 13:49:27 +02:00
calixteman
e499521b78
Merge pull request #13456 from calixteman/clazz
Replace clazz by classNames
2021-05-28 12:18:27 +02:00
Jonas Jenwald
2cc3b96351
Merge pull request #13455 from calixteman/italic
Italic angle is defined clockwise in CSS when it's counterclockwise in PDF
2021-05-28 11:49:48 +02:00
Calixte Denizet
f35176a32e Replace clazz by classNames 2021-05-28 11:17:38 +02:00
Calixte Denizet
1b0006093d Italic angle is defined clockwise in CSS when it's counterclockwise in PDF 2021-05-28 11:06:11 +02:00
Jonas Jenwald
70c79c6f69 Fix the JSDocs for PDFDocumentProxy.getPageIndex (issue 13449) 2021-05-27 16:41:08 +02:00
Jonas Jenwald
52c13326cd Support Annotations, without appearance streams, with bogus /Rect-entries (issue 13447)
This extends PR 13106 to apply not only to empty /Rect-entries, but also to bogus /Rect-entries for various Annotation-types.
2021-05-27 16:23:21 +02:00
Jonas Jenwald
a6447f2ca2 Support strokeAlpha/fillAlpha when creating a fallback appearance stream (issue 6810)
This fixes the colours, by respecting the strokeAlpha/fillAlpha-values, for a couple of Annotations in the PDF document from issue 13447.[1]

---
[1] Some of the annotations still won't render at all, when compared with Adobe Reader, but that could/should probably be handled separately.
2021-05-27 16:23:18 +02:00
calixteman
f587d5998e
Merge pull request #13445 from calixteman/ps_name
Fix Postscript name in font to avoid bug when saving in pdf
2021-05-27 13:52:47 +02:00
Calixte Denizet
0c698346b8 Fix Postscript name in font to avoid bug when saving in pdf
- for xfa rendering, fonts are loaded and used in html;
  - when printed and saved in pdf, on linux, Firefox uses cairo backend
  - when subsetting a font, cairo uses the font postscript name and when this one is empty that leads to a bug
    (the append at 63f0d62684/src/cairo-cff-subset.c (L2049) is failing because of null length)
  - so this patch adds a postscript name to the font to make cairo happy.
2021-05-27 12:45:40 +02:00
Jonas Jenwald
8b1d01816b Re-factor the charsCache on Font-instances
Currently `charsCache` is initialized *lazily*, which considering that it just contains a simple `Object` doesn't seem entirely necessary. This first of all forces us to do repeated exists-checks in the `Font.charsToGlyphs` method, and secondly the similar/related `glyphCache` is already initialized eagerly.

Furthermore, this patch also does a bit of clean-up in the `Font.charsToGlyphs` method since this code is quite old.
2021-05-26 13:13:44 +02:00
Tim van der Meij
3da9f077be
Merge pull request #13435 from Snuffleupagus/eslint-no-array-push-push
Enable the `unicorn/no-array-push-push` ESLint plugin rule
2021-05-25 21:10:01 +02:00
Tim van der Meij
6e92b56efa
Merge pull request #13436 from Snuffleupagus/getPathGenerator-buf
Re-factor FontFaceObject.getPathGenerator to use Arrays instead of strings
2021-05-25 20:35:01 +02:00
Calixte Denizet
45c3f00a27 XFA - Move the fake HTML representation of XFA from the worker to the main thread
- the only goal of this patch is to be able to get synchronously the fake html when printing from firefox:
    - in order to print we need to inject some html in beforeprint callback but we cannot block in waiting for all the pages.
  - from a memory point of view: it doesn't change anything since the fake HTML is deleted in the worker;
  - this way we don't break any assumptions.
2021-05-25 19:33:07 +02:00
Calixte Denizet
9478d2f064 XFA - Add a storage to save fields values - this is required to be able to print (or save) a document. Some pages can be unloaded (because pdf.js is lazy) and this storage will help to save their data in order to resuse them when printing or just when displaying a page again. 2021-05-25 19:25:09 +02:00
Calixte Denizet
7cebdbd58c XFA - Fix lot of layout issues
- I thought it was possible to rely on browser layout engine to handle layout stuff but it isn't possible
    - mainly because when a contentArea overflows, we must continue to layout in the next contentArea
    - when no more contentArea is available then we must go to the next page...
    - we must handle breakBefore and breakAfter which allows to "break" the layout to go to the next container
  - Sometimes some containers don't provide their dimensions so we must compute them in order to know where to put
    them in their parents but to compute those dimensions we need to layout the container itself...
  - See top of file layout.js for more explanations about layout.
  - fix few bugs in other places I met during my work on layout.
2021-05-25 17:51:36 +02:00
Jonas Jenwald
9ad7746118 Replace a couple of standard for-loops with for...of in src/display/font_loader.js 2021-05-25 14:11:57 +02:00
Jonas Jenwald
dcbb23d7fa Re-factor FontFaceObject.getPathGenerator to use Arrays instead of strings
This is similar to a lot of other code, where we use "Array + join" rather than repeated string concatenation.
2021-05-25 14:11:54 +02:00
Jonas Jenwald
ec3bcadf56 Enable the unicorn/no-array-push-push ESLint plugin rule
There's generally speaking no need to use multiple consecutive `Array.prototype.push()` calls, since that method accepts multiple arguments, and this ESLint rule helps enforce that pattern.

Please see https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/no-array-push-push.md for additional information.
2021-05-25 13:54:46 +02:00
Calixte Denizet
209ac5ca57 XFA - Don't display images with a href 2021-05-22 15:09:43 +02:00
calixteman
0df1a56619
Merge pull request #13417 from Snuffleupagus/xfa-URL-clone
[XFA] Send URLs as strings, rather than objects (issue 1773)
2021-05-22 14:31:59 +02:00
Tim van der Meij
de680d7777
Merge pull request #13381 from Snuffleupagus/buildFontPaths-ignoreErrors
Handle errors gracefully, in PartialEvaluator.buildFontPaths, when glyph path building fails
2021-05-22 13:06:31 +02:00
Jonas Jenwald
53a70244d0 Use the stringToBytes helper function in more places
Rather than manually reimplementing, more-or-less, this functionality in a few spots we can simply use the existing helper function instead.
2021-05-22 12:23:09 +02:00
Jonas Jenwald
ba13bd8c2d [XFA] Send URLs as strings, rather than objects (issue 1773)
Given that `URL`s aren't supported by the structured clone algorithm, see https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm, the document in issue 1773 will cause the browser to throw `DataCloneError: The object could not be cloned.`-errors and nothing will render.
To fix this, we'll instead simply send the stringified version of the `URL` to prevent these errors from occuring.
2021-05-22 11:58:53 +02:00
Jonas Jenwald
c4429bc3f2 Do the isType3Font-check *once*, rather than repeating it, in PartialEvaluator.translateFont
*This is a small piece of clean-up that I happened to notice while browsing the code.*
2021-05-22 11:46:37 +02:00
Jonas Jenwald
68350378c0 Handle errors gracefully, in PartialEvaluator.buildFontPaths, when glyph path building fails
The building of glyph paths, in the `FontRendererFactory`, can fail in various ways for corrupt font data. However, we're currently not attempting to handle any such errors in the evaluator, which means that a single broken glyph *can* prevent an entire page from rendering.

To address this we simply have to pass along, and check, the existing `ignoreErrors` option in `PartialEvaluator.buildFontPaths` similar to the rest of the `PartialEvaluator` code.
2021-05-22 11:46:31 +02:00
Jonas Jenwald
0dba468e60 Don't allow the LoopbackPort to "clone" a URL
Note that `URL`s aren't supported by the structured clone algorithm, see https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm, and any attempt to send a `URL` using `postMessage` is rejected by the browser. Hence, for consistency when workers are disabled, the `LoopbackPort` should obviously also reject any `URL`s.
2021-05-22 10:11:31 +02:00
Tim van der Meij
b2ffebe978
Merge pull request #13416 from calixteman/xfa_config
XFA - Fix wrong function name
2021-05-21 20:33:35 +02:00
Calixte Denizet
8a8879aed2 XFA - Fix wrong function name 2021-05-21 20:25:26 +02:00
Tim van der Meij
d1d9b9043d
Merge pull request #13415 from Snuffleupagus/getDestination-out-of-order
Improve handling of named destinations in out-of-order NameTrees (PR 10274 follow-up)
2021-05-21 20:15:09 +02:00
Jonas Jenwald
8d5689387b Improve handling of named destinations in out-of-order NameTrees (PR 10274 follow-up)
According to the specification, see https://web.archive.org/web/20210404042322if_/https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.2384179, the keys of a NameTree/NumberTree should be ordered.
For corrupt PDF files, which violate this assumption, it's thus possible that trying to lookup a single entry fails.

Previously, in PR 10274, we implemented a fallback that only applies to the "bottom" node of a NameTree/NumberTree, which in general might not actually help for sufficiently corrupt NameTree/NumberTree data.
Instead we remove the current *limited* fallback from `NameOrNumberTree.get`, and defer to the call-site to handle this case explicitly e.g. by using `NameOrNumberTree.getAll` for data where that makes sense. For well-formed documents, these changes should *not* lead to any additional data fetching/parsing.

Finally, as part of these changes, the validation of named destination data is improved in the `Catalog` and a new unit-test is also added.
2021-05-21 15:48:37 +02:00
Jonas Jenwald
1a8d05fdcf Remove some, with Prettier 2.3.0, unnecessary // prettier-ignore comments
To get the maximum benefit from something like Prettier, you obviously don't want to disable the automatic formatting unless absolutely necessary. When we added Prettier there were a number of cases, mostly involving larger Arrays, which required disabling of the automatic formatting for overall readability and/or to not break inline comments.

With changes in Prettier version `2.3.0`, see [the release notes](https://prettier.io/blog/2021/05/09/2.3.0.html#concise-formatting-of-number-only-arrays-10106httpsgithubcomprettierprettierpull10106-10160httpsgithubcomprettierprettierpull10160-by-thorn0httpsgithubcomthorn0), there's now better formatting support for Arrays containing only numbers. Hence we can now remove a number of `// prettier-ignore` comments, and thus get the benefit of automatic formatting in (slightly) more of the code-base.
2021-05-19 11:36:03 +02:00
calixteman
faf6b10939
Merge pull request #13394 from calixteman/xml_parser
Handle PI with no value in xml parser
2021-05-18 11:14:48 +02:00
Calixte Denizet
4544ebf38a Handle PI with no value in xml parser
- an XML PI contains a target and optionally some content (see https://en.wikipedia.org/wiki/Processing_Instruction)
  - the parser expected to always have some content and so it could lead to wrong parsing.
2021-05-18 10:22:18 +02:00
Brendan Dahl
239d0097fa
Merge pull request #13390 from calixteman/opentype_and_xfa
XFA - Don't move glyphes in private area with non-truetype fonts
2021-05-17 12:39:10 -07:00
Brendan Dahl
46c2eeb19a
Merge pull request #13389 from calixteman/width_in_cff
Get any width (if one is present) in CFF parser
2021-05-17 09:13:45 -07:00
Brendan Dahl
17e9cfcd2a
Merge pull request #13328 from calixteman/js_display1
JS - Add support for display property
2021-05-17 08:47:13 -07:00
Calixte Denizet
a74d19262a XFA - Don't move glyphes in private area with non-truetype fonts
- it has been done in PR #13146 but only for truetype fonts.
2021-05-17 16:52:39 +02:00
Calixte Denizet
d394188835 Get any width (if one is present) in CFF parser
- in charstring specs at page 21 (section 4.2): "Also, it may appear in the charstring as the difference from nominalWidthX" so the number we've on the stack doesn't have to be positive.
  - currently this bug has probably no visible effect
  - but when the font is loaded to be used with XFA, then the rendering is incorrect.
2021-05-17 14:17:08 +02:00
Jonas Jenwald
718f7bf7e1 Fix a few *safe* ESLint no-var failures in src/core/evaluator.js (13371 follow-up)
As can be seen in PR 13371, some of the `no-var` changes in the `PartialEvaluator.{getOperatorList, getTextContent}` methods caused errors in `gulp server`-mode.
However, there's a handful of instances of `var` in other methods which should be completely *safe* to convert since there's no strange scope-issues present in that code.
2021-05-16 15:22:43 +02:00
Tim van der Meij
a5c74f53c1
Merge pull request #13386 from timvandermeij/src-core-bidi-no-var
Enable the `no-var` linting rule in `src/core/bidi.js`
2021-05-16 15:02:18 +02:00
Tim van der Meij
b8a5e797c5
Enable the no-var linting rule in src/core/bidi.js
This is done automatically with `gulp lint --fix` and the following
manual changes:

```diff
diff --git a/src/core/bidi.js b/src/core/bidi.js
index e9e0a7217..32691c0c6 100644
--- a/src/core/bidi.js
+++ b/src/core/bidi.js
@@ -82,7 +82,8 @@ function isEven(i) {
 }

 function findUnequal(arr, start, value) {
-  for (var j = start, jj = arr.length; j < jj; ++j) {
+  let j, jj;
+  for (j = start, jj = arr.length; j < jj; ++j) {
     if (arr[j] !== value) {
       return j;
     }
@@ -251,15 +252,14 @@ function bidi(str, startLevel, vertical) {
   for (i = 0; i < strLength; ++i) {
     if (types[i] === "EN") {
       // do before
-      var j;
-      for (j = i - 1; j >= 0; --j) {
+      for (let j = i - 1; j >= 0; --j) {
         if (types[j] !== "ET") {
           break;
         }
         types[j] = "EN";
       }
       // do after
-      for (j = i + 1; j < strLength; ++j) {
+      for (let j = i + 1; j < strLength; ++j) {
         if (types[j] !== "ET") {
           break;
         }
```
2021-05-16 14:14:26 +02:00
Jonas Jenwald
3cfa316d40 Convert src/core/operator_list.js to use standard classes
With modern JavaScript modules, where only *explicitly* exported properties are visible to the outside, the `QueueOptimizerClosure` should no longer be necessary.

Furthermore, to reduce the possibility of `NullOptimizer` and `QueueOptimizer` getting out of sync (note e.g. the inconsistency fixed in PR 10784), we now let the latter extend the former one.
2021-05-16 13:39:54 +02:00
Tim van der Meij
8a8a67de3b
Merge pull request #13380 from Snuffleupagus/pattern_helper-class
Re-factor and convert the code in `src/display/pattern_helper.js` to use standard classes
2021-05-16 13:11:04 +02:00
Jonas Jenwald
8943bcd3c3 Account for formatting changes in Prettier version 2.3.0
With the exception of one tweaked `eslint-disable` comment, in `web/generic_scripting.js`, this patch was generated automatically using `gulp lint --fix`.

Please find additional information at:
 - https://github.com/prettier/prettier/releases/tag/2.3.0
 - https://prettier.io/blog/2021/05/09/2.3.0.html
2021-05-16 11:44:05 +02:00
Jonas Jenwald
a984431046 Modernize the ShadingIRs structure, in src/display/pattern_helper.js, to use standard classes
This patch replaces the old structure with an abstract base-class, which the new ShadingPattern classes then inherit from.
The old `createMeshCanvasClosure` can now be removed, since it's not necessary any more with modern JavaScript, and the `createMeshCanvas` function is now instead a method on the new `MeshShadingPattern` class (avoids unnecessary parameter passing).
2021-05-15 16:00:00 +02:00
Jonas Jenwald
40939d5955 Convert src/display/pattern_helper.js to use standard classes
Note that this patch only covers `TilingPattern`, since the `ShadingIRs`-implementation required additional re-factoring.
2021-05-15 13:03:07 +02:00
Jonas Jenwald
bb8e15c971 [api-minor] Update minimum supported browser versions (PR 13361 follow-up)
With the changes in PR 13361, we're now using the `CanvasPattern.setTransform()` method when rendering certain Shadings/Patterns.
Note that while `CanvasPattern` itself has been supported since basically "forever", its `setTransform` method is a slightly newer addition to the specification; please refer to https://developer.mozilla.org/en-US/docs/Web/API/CanvasPattern#browser_compatibility

Rather than trying to re-write PR 13361 to not use, or possibly spending time/effort (if possible) polyfilling, `CanvasPattern.setTransform()` this patch thus suggests that we simply update the *minimum* supported browser versions instead.

According to the compatibility data linked above, the *minimum* supported browser versions in the PDF.js library are now as follows:
 - Chrome >= 68, which was released on 2018-07-24.[1]
 - Firefox ESR, see https://wiki.mozilla.org/Release_Management/Calendar.
 - Safari >= 11.1, which was release on 2018-03-29.[2]

(Given that the PDF.js contributors cannot realistically test a bunch of old browsers, it's not unimaginable that some older browser versions are already not working with the PDF.js library.)

Based on these changes, which we should ensure are reflected in the Wiki as well, we can also remove a number of now redundant polyfills. Furthermore we'll no longer "claim" to support Windows XP, note the `gulpfile.js` changes, which should definitely *not* be an issue given that it's no longer officially supported.[3]

---
[1] According to https://en.wikipedia.org/wiki/Google_Chrome_version_history

[2] According to https://en.wikipedia.org/wiki/Safari_version_history#Safari_11

[3] According to https://en.wikipedia.org/wiki/Windows_XP#End_of_support
2021-05-15 09:57:34 +02:00
Tim van der Meij
d2e7161f2c
Merge pull request #13377 from Snuffleupagus/pattern-class
Re-factor and convert the code in `src/core/pattern.js` to use standard classes
2021-05-14 22:23:44 +02:00
Jonas Jenwald
ebe3ee4f25 Modernize the Shadings structure, in src/core/pattern.js, to use standard classes
This patch replaces the old structure with a abstract base-class, which the new RadialAxial/Mesh-shading classes then inherit from.[1]
The old `MeshClosure` can now be removed, since it's not necessary any more, and most of the functions inside of it are now instead methods on the new `MeshShading` class. This is particularly nice, in my opinion, since we previously were *manually* passing around a reference to the current `Mesh`-instance.

---
[1] If we want/need to, in the future, split e.g. the Mesh-handling into multiple classes that should now be easy to do.
2021-05-14 21:44:41 +02:00
Jonas Jenwald
6acb2db4be Convert src/core/pattern.js to use standard classes
Note that this patch only covers `Pattern` and `MeshStreamReader`, since the `Shadings`-implementation required additional re-factoring.
2021-05-14 21:42:21 +02:00
Calixte Denizet
f92e1fa160 Replace terminal null char by a endchar command in CFF charstrings to make OTS happy 2021-05-14 18:34:51 +02:00
Jonas Jenwald
612b43852b Remove unused properties from the Shadings-implementations in src/core/pattern.js
Neither the `type` or the `cs` properties are used outside of the "constructors", and we can thus remove them.[1]
Note that a lot of this code is very old, and that it actually predates the main/worker-thread split before which the *same* file was used on both the main- *and* worker-threads.

---
[1] On the main-thread, a similar `type` property was removed in PR 12591.
2021-05-14 16:11:48 +02:00
Calixte Denizet
1a2cea21a5 Replace command with not enough args by an endchar in CFF font
- Right now, a glyph with an erroneous outline is replaced by an empty glyph
    if the error is far enough from the start there's likely something to render
    so the idea is to replace a command with args by an endchar when no args are
    on the stack: this way OTS is likely happy (no remaining args on stack) and we
    can draw something which is likely better than nothing.
2021-05-14 13:45:45 +02:00
Jonas Jenwald
4248f0745c Improve the Page.content and Page.getContentStream methods
First of all, by using `Dict.getArray` in the `Page.content` getter we remove the need to manually iterate through and fetch the sub-streams (when they exist) in the `Page.getContentStream` method.
Secondly, we can simplify the code in `Page.{getOperatorList, extractTextContent}` by letting `Page.getContentStream` ensure that `content` is available and returning a Promise instead.
2021-05-14 11:47:34 +02:00
Jonas Jenwald
70113131de Inline the data lookup in the Dict.getArray method
Similar to the `get`/`getAsync` methods, this should be a *tiny* bit more efficient which cannot hurt considering that `getArray` is now used a lot more than when initially added.
2021-05-14 11:24:27 +02:00
Tim van der Meij
e394da5861
Merge pull request #13369 from brendandahl/smask-pattern
Fix tiling pattern with smask.
2021-05-13 13:26:38 +02:00
Jonas Jenwald
75208d36c2 Revert "Fix the remaining no-var failures, which couldn't be handled automatically, in the src/core/evaluator.js file" (PR 13344 follow-up)
This reverts commit 0ef9b5aafc, since it cases a lot of warnings (see below) *locally* with e.g. the document from issue 9627.
Strangely enough, this only occurs with `gulp server`-mode and the actual builds are apparently fine. It seems that this *may* be some unfortunate interaction with the old Babel-plugin that's used together with SystemJS.

```
Warning: getTextContent - ignoring ExtGState: "FormatError: ExtGState should be a dictionary.".
```

Rather than taking the risk that this could actually cover a more serious bug, and since I cannot immediately figure out what's wrong, it thus seem safest to revert this for now and we can (carefully) revisit this once SystemJS has been removed (see PR 12563).
2021-05-13 11:19:46 +02:00
Brendan Dahl
53991d0924 Fix tiling pattern with smask.
After drawing a tiling pattern we were not calling
endDrawing, which handles compositing any
active smasks.

Fixes #8565.
2021-05-12 11:42:08 -07:00
Tim van der Meij
ba99e54c66
Merge pull request #13361 from brendandahl/patterns-fixes
Fix several issues with radial/axial shadings and tiling patterns.
2021-05-12 20:27:37 +02:00
Tim van der Meij
1cf9f42ca2
Merge pull request #13366 from Snuffleupagus/primitives-class
Convert the remaining functions in `src/core/primitives.js` to use standard classes
2021-05-12 20:20:35 +02:00
Tim van der Meij
0a3e483c7f
Merge pull request #13360 from Snuffleupagus/renderer-conditional-pref
Only include the `renderer`-preference in builds where `SVGGraphics` is defined
2021-05-12 20:16:53 +02:00
Jonas Jenwald
64c55d381d Fix the Jbig2Image export for the gulp image_decoders build (PR 9729 follow-up, issue 13367) 2021-05-12 19:41:29 +02:00
Jonas Jenwald
757636d519 Convert the remaining functions in src/core/primitives.js to use standard classes
This patch was tested using the PDF file from issue 2618, i.e. https://bug570667.bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 50,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |   %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ---- | -------------
firefox | Overall      |    50 |         3417 |        3426 |   9 | 0.27 |
firefox | Page Request |    50 |            1 |           1 |   0 | 5.41 |
firefox | Rendering    |    50 |         3416 |        3426 |   9 | 0.27 |
```

Based on these results, there's no significant performance regression from using standard classes and this patch should thus be OK.
2021-05-12 09:36:28 +02:00
Brendan Dahl
ac44afa70e Fix several issues with radial/axial shadings and tiling patterns.
Previously, we set the base transformation and pattern matrix
directly to the main rendering ctx of the page, however doing this
caused the current transform to be lost. This would cause issues
with things like shear missing so the pattern was misaligned or when
stroke was used the scale of the line width or dash would be wrong.
Instead we should leave the current transform and use setTransfrom
on the pattern so it is applied correctly. For axial and radial shadings I had
to create a temporary canvas to draw the shading so I could in turn
use setTransform.

Fixes: #13325, #6769, #7847, #11018, #11597, #11473

The following already in the corpus are improved:
issue8078-page1
issue1877-page1
2021-05-11 16:32:24 -07:00
Jonas Jenwald
b068882bd0 Clean-up usage of the TESTING-define in src/pdf.sandbox.js
This patch moves the `PDFJSDev`-checks *inline*, similar to the rest of the code-base, such that the code in question is actually being removed from the *built* files in e.g. the official releases.
2021-05-11 12:39:33 +02:00
Jonas Jenwald
7548dc5ea2 Only include the renderer-preference in builds where SVGGraphics is defined
After PR 13117 it's now (finally) possible for *different* build targets to specify individual options/preferences, and we can utilize that to only expose the `renderer`-preference in builds where `SVGGraphics` is actually defined.
Note that for e.g. `MOZCENTRAL`-builds, trying to enable SVG-rendering will throw immediately and the preference thus doesn't make sense to include there.

Also, update the dummy `SVGGraphics` to use a class, tweak the `PDFJSDev`-check in `src/display/svg.js` to agree fully with the option/preference, and remove an unnecessary `eslint-disable`.
2021-05-10 12:03:53 +02:00
Jonas Jenwald
2ba4b65ca8 [api-minor] Remove the WebGL implementation
Reasons for the removal include:
 - This functionality was always somewhat experimental and has never been enabled by default, partly because of worries about rendering bugs caused by e.g. bad/outdated graphics drivers.

 - After the initial implementation, in PR 4286 (back in 2014), no additional functionality has been added to the WebGL implementation.

 - The vast majority of all documents do not benefit from WebGL rendering, since only a couple of *specific* features are supported (e.g. some Soft Masks and Patterns).

 - There is, and has always been, *zero* test-coverage for the WebGL implementation.

 - Overall performance, in the PDF.js library, has improved since the experimental WebGL implementation was added.

Rather than shipping unused *and* untested code, it seems reasonable to simply remove the WebGL implementation for now; thanks to version control it's always possible to bring back the code should the need ever arise.
2021-05-09 16:38:44 +02:00
Jonas Jenwald
6eef69de22 Export the "raw" toUnicode-data from PartialEvaluator.preEvaluateFont
Compared to other data-structures, such as e.g. `Dict`s, we're purposely *not* caching Streams on the `XRef`-instance.[1]
The, somewhat unfortunate, effect of Streams not being cached is that repeatedly getting the *same* Stream-data requires re-parsing/re-initializing of a bunch of data; see `XRef.fetch` and related methods.

For the font-parsing in particular we're currently fetching the `toUnicode`-data, which is very often a Stream, in `PartialEvaluator.preEvaluateFont` and then *again* in `PartialEvaluator.extractDataStructures` soon afterwards.
By instead letting `PartialEvaluator.preEvaluateFont` export the "raw" `toUnicode`-data, we can avoid *some* unnecessary re-parsing/re-initializing when handling fonts.
*Please note:* In this particular case, given that `PartialEvaluator.preEvaluateFont` only accesses the "raw" `toUnicode` data, exporting a Stream should be safe.

---
[1] The reasons for this include:
 - Streams, especially `DecodeStream`-instances, can become *very* large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so.

 - Attempting to read from the *same* Stream-instance more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position.

 - Given that parsing, even in the worker-thread, is now fairly asynchronous it's generally impossible to assert that any one Stream-instance isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Stream-instance isn't going to work in the general case.
2021-05-08 12:04:13 +02:00
Jonas Jenwald
13fb1654dc Export the firstChar/lastChar-data from PartialEvaluator.preEvaluateFont
Rather than re-fetching/re-parsing these properties immediately in `PartialEvaluator.translateFont`, we can simply export them instead. (Obviously the effect will be really tiny, but there is less parsing overall this way.)
2021-05-08 12:02:49 +02:00
Jonas Jenwald
8a1cb82aee Ensure that the Widths array is parsed correctly in PartialEvaluator.preEvaluateFont
*Please note:* While I don't have a document that this patches fixes, the current code is however not entirely correct as far as I can tell.

Looking at how the `Widths` array is parsed in `PartialEvaluator.extractWidths`, it's clear that the implementation in `PartialEvaluator.preEvaluateFont` is a bit too simplistic. In particular, by only wrapping the data into a TypedArray, there's no attempt to handle *indirect* objects which could potentially lead to colliding `hash`es being computed.
2021-05-07 21:23:44 +02:00
Jonas Jenwald
30b2739adf Ensure that composite/non-composite fonts won't get the same hash in PartialEvaluator.preEvaluateFont
To hopefully help prevent any future bugs, make sure that composite/non-composite fonts cannot accidentally get matching `hash`es. Given the differences between those font types, that's very unlikely to be useful or even correct in general.
2021-05-07 21:22:37 +02:00
Jonas Jenwald
fc59a5f709 Take the W array into account when computing the hash, in PartialEvaluator.preEvaluateFont, for composite fonts (issue 13343)
Without this some *composite* fonts may incorrectly end up with matching `hash`es, thus breaking rendering since we'll not actually try to load/parse some of the fonts.

*Please note:* Given that the document, in the referenced issue, doesn't embed *any* of its fonts there's no guarantee that it renders correctly in all configurations even with this patch.
2021-05-07 21:22:36 +02:00
Tim van der Meij
a3632c0f38
Merge pull request #13344 from Snuffleupagus/evaluator-no-var
Enable the `no-var` rule in the `src/core/evaluator.js` file
2021-05-07 21:02:46 +02:00
Tim van der Meij
5248d0a77d
Merge pull request #13338 from Snuffleupagus/images-class
Convert the `src/core/{jbig2, jpg, jpx}.js` files to use standard classes
2021-05-07 20:59:58 +02:00
Calixte Denizet
af125cd299 JS - Add support for display property
- in annotation_layer, move common properties treatment in a common method instead having duplicated code in each widget.
2021-05-06 11:15:38 +02:00
Jonas Jenwald
0ef9b5aafc Fix the remaining no-var failures, which couldn't be handled automatically, in the src/core/evaluator.js file
The only *slight* complication here were some of the `switch`-cases, in `getOperatorList`/`getTextContent`, where the parsing is done asynchronously.
However, those cases are easy to deal with by wrapping the code within its own block; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/switch#block-scope_variables_within_switch_statements
2021-05-06 10:21:05 +02:00
Jonas Jenwald
f93c3b9aa7 Enable the no-var rule in the src/core/evaluator.js file
These changes were made automatically, using `gulp lint --fix`.
2021-05-06 09:39:21 +02:00
Jonas Jenwald
0a32ad3e42 Remove unnecessary closure in the src/core/font_renderer.js file
With modern JavaScript modules, where you explicitly list the properties that should be exported, it's no longer necessary to wrap *all* of the code within one file into a top-level closure.[1]

This patch reduces the size, of even the *built* `pdf.worker.js` file, since there's now a lot less unnecessary whitespace.

---
[1] For files which contain *different* functionality, some closures may however still make sense in order to separate the code.
It might be possible to remove some of those cases later, e.g. once private class fields becomes generally available/usable in browsers.
2021-05-05 22:35:52 +02:00
Tim van der Meij
afb8c4fd25
Merge pull request #13327 from Snuffleupagus/split-fonts
Split the functionality in `src/core/fonts.js` into multiple files, and use standard classes
2021-05-05 20:16:24 +02:00
Jonas Jenwald
9a1758c6b8 Remove unnecessary closure in src/display/text_layer.js, and use standard classes
With modern JavaScript modules, where you explicitly list the properties that should be exported, it's no longer necessary to wrap all of the code in a closure.[1]

This patch also tries to clean-up/improve a couple of the existing JSDoc-comments.

---
[1] This reduces the size, even of the *built* `pdf.js` file, since there's now a lot less unnecessary whitespace.
2021-05-05 18:44:56 +02:00
Jonas Jenwald
ce14171cf0 Convert src/core/jpx.js to use standard classes
*Please note:* Ignoring whitespace-only changes is probably necessary in order to review this.
2021-05-05 14:02:21 +02:00
Jonas Jenwald
cb65b762eb Fix the remaining no-var failures, which couldn't be handled automatically, in the src/core/jpx.js file 2021-05-05 14:02:21 +02:00
Jonas Jenwald
a273599a12 Enable the no-var rule in the src/core/jpx.js file
These changes were made automatically, using `gulp lint --fix`.
2021-05-05 14:02:21 +02:00
Jonas Jenwald
69dea39a42 Convert src/core/jpg.js to use standard classes
*Please note:* Ignoring whitespace-only changes is probably necessary in order to review this.
2021-05-05 14:02:21 +02:00
Jonas Jenwald
d0a299713c Fix the remaining no-var failures, which couldn't be handled automatically, in the src/core/jpg.js file 2021-05-05 14:02:21 +02:00
Jonas Jenwald
1e5a179600 Enable the no-var rule in the src/core/jpg.js file
These changes were made automatically, using `gulp lint --fix`.
2021-05-05 14:02:21 +02:00
Jonas Jenwald
0addf3a0d4 Convert src/core/jbig2.js to use standard classes
*Please note:* Ignoring whitespace-only changes is probably necessary in order to review this.
2021-05-05 14:02:21 +02:00
Jonas Jenwald
d59c9ab3ab Fix the remaining no-var failures, which couldn't be handled automatically, in the src/core/jbig2.js file 2021-05-05 14:02:21 +02:00
Jonas Jenwald
7ca3a34e1f Enable the no-var rule in the src/core/jbig2.js file
These changes were made automatically, using `gulp lint --fix`.
2021-05-05 14:02:21 +02:00
Jonas Jenwald
99fae47c8e [Regression] Move the super-call in the PredictorStream-constructor to prevent errors (PR 13303)
*My apologies for breaking this; thankfully PR 13303 hasn't reach mozilla-central yet.*

It's (obviously) necessary to initialize a `PredictorStream`-instance fully, since otherwise breakage may occur if there's errors during the actual stream parsing.
To reproduce this issue, try opening the PDF document from issue 13051 locally and observe the following message in the console:
```
Warning: Invalid stream: "ReferenceError: this hasn't been initialised - super() hasn't been called"
```
2021-05-05 13:24:12 +02:00
Calixte Denizet
3f29892d63 [JS] Fix several issues found in pdf in #13269
- app.alert and few other function can use an object as parameter ({cMsg: ...});
  - support app.alert with a question and a yes/no answer;
  - update field siblings when one is changed in an action;
  - stop calculation if calculate is set to false in the middle of calculations;
  - get a boolean for checkboxes when they've been set through annotationStorage instead of a string.
2021-05-04 19:21:51 +02:00
Calixte Denizet
549aae6c3d JS -- add support for page property in field 2021-05-03 15:46:29 +02:00
Jonas Jenwald
5e5daca407 Remove unnecessary MissingDataException check from getHeaderBlock
It shouldn't be possible for the `getBytes`-call to throw a `MissingDataException`, since all resources are loaded *before* e.g. font-parsing ever starts; see f0817015bd/src/core/object_loader.js (L111-L126)

Furthermore, even if we'd *somehow* re-throw a `MissingDataException` here that still won't help considering where the `Type1Font`-instance is created. Note how in the `Font`-constructor we simply catch any errors and fallback to a standard font, which means that a `MissingDataException` would just lead to rendering errors anyway; see f0817015bd/src/core/fonts.js (L648-L691)

All-in-all, it's not possible for a `MissingDataException` to be thrown in `getHeaderBlock` and this code-path can thus be removed.
2021-05-03 13:57:30 +02:00
Jonas Jenwald
b487edd05d Convert src/core/fonts.js to use standard classes
Obviously the `Font`-class is still *very* large, given particularly how TrueType fonts are handled, however this patch-series at least improves things by moving a number of functions/classes into their own files.
As a follow-up it might make sense to try and re-factor/extract the TrueType parsing into its own file, since all of this code is quite old, however that's probably best left for another time.

For e.g. `gulp mozcentral`, the *built* `pdf.worker.js` files decreases from `1 620 332` to `1 617 466` bytes with this patch-series.
2021-05-03 13:57:25 +02:00
Jonas Jenwald
cadc20d8b9 Fix the remaining no-var failures, which couldn't be handled automatically, in the src/core/fonts.js file 2021-05-02 21:00:29 +02:00
Jonas Jenwald
b9cd080c01 Enable the no-var rule in the src/core/fonts.js file
These changes were made automatically, using `gulp lint --fix`.
Given the large size of this patch, the manual fixes are done separately in the next commit.
2021-05-02 21:00:29 +02:00
Jonas Jenwald
f64b7922b3 Convert src/core/type1_font.js to use standard classes 2021-05-02 21:00:29 +02:00
Jonas Jenwald
4bd69556ab Enable the no-var rule in the src/core/type1_font.js file
These changes were made *mostly* automatically, using `gulp lint --fix`, with the following manual changes:

```diff
diff --git a/src/core/type1_font.js b/src/core/type1_font.js
index 50a3e49e6..55a2005fb 100644
--- a/src/core/type1_font.js
+++ b/src/core/type1_font.js
@@ -38,10 +38,9 @@ const Type1Font = (function Type1FontClosure() {
     const scanLength = streamBytesLength - signatureLength;

     let i = startIndex,
-      j,
       found = false;
     while (i < scanLength) {
-      j = 0;
+      let j = 0;
       while (j < signatureLength && streamBytes[i + j] === signature[j]) {
         j++;
       }
@@ -248,14 +247,14 @@ const Type1Font = (function Type1FontClosure() {
         return charCodeToGlyphId;
       }

-      let glyphNames = [".notdef"],
-        glyphId;
+      const glyphNames = [".notdef"];
+      let builtInEncoding, glyphId;
       for (glyphId = 0; glyphId < charstrings.length; glyphId++) {
         glyphNames.push(charstrings[glyphId].glyphName);
       }
       const encoding = properties.builtInEncoding;
       if (encoding) {
-        var builtInEncoding = Object.create(null);
+        builtInEncoding = Object.create(null);
         for (const charCode in encoding) {
           glyphId = glyphNames.indexOf(encoding[charCode]);
           if (glyphId >= 0
```
2021-05-02 21:00:29 +02:00
Jonas Jenwald
ff85bcfc0e Move the Type1Font from src/core/fonts.js and into its own file 2021-05-02 21:00:29 +02:00
Jonas Jenwald
e803584fe7 Convert src/core/cff_font.js to use standard classes 2021-05-02 21:00:29 +02:00
Jonas Jenwald
542ee0d798 Enable the no-var rule in the src/core/cff_font.js file
These changes were made automatically, using `gulp lint --fix`.
2021-05-02 21:00:29 +02:00
Jonas Jenwald
d5d73e3168 Move the CFFFont from src/core/fonts.js and into its own file 2021-05-02 21:00:29 +02:00
Jonas Jenwald
d4606712f2 Enable the no-var rule in the src/core/fonts_utils.js file
These changes were made *mostly* automatically, using `gulp lint --fix`, with the following manual changes:

```diff
diff --git a/src/core/fonts_utils.js b/src/core/fonts_utils.js
index f88ce4a8c..c4b3f3808 100644
--- a/src/core/fonts_utils.js
+++ b/src/core/fonts_utils.js
@@ -167,8 +167,8 @@ function type1FontGlyphMapping(properties, builtInEncoding,
glyphNames) {
   }

   // Lastly, merge in the differences.
-  let differences = properties.differences,
-    glyphsUnicodeMap;
+  const differences = properties.differences;
+  let glyphsUnicodeMap;
   if (differences) {
     for (charCode in differences) {
       const glyphName = differences[charCode];
```
2021-05-02 21:00:29 +02:00
Jonas Jenwald
77b258440b Move some constants and helper functions from src/core/fonts.js and into their own file
- `FontFlags`, is used in both `src/core/fonts.js` and `src/core/evaluator.js`.
 - `getFontType`, same as the above.
 - `MacStandardGlyphOrdering`, is a fairly large data-structure and `src/core/fonts.js` is already a *very* large file.
 - `recoverGlyphName`, a dependency of `type1FontGlyphMapping`; please see below.
 - `SEAC_ANALYSIS_ENABLED`, is used by both `Type1Font`, `CFFFont`, and unit-tests; please see below.
 - `type1FontGlyphMapping`, is used by both `Type1Font` and `CFFFont` which a later patch will move to their own files.
2021-05-02 21:00:29 +02:00
Jonas Jenwald
22539b52fa Convert src/core/to_unicode_map.js to use standard classes 2021-05-02 21:00:29 +02:00
Jonas Jenwald
33ea6b1131 Enable the no-var rule in the src/core/to_unicode_map.js file
These changes were made automatically, using `gulp lint --fix`.
2021-05-02 21:00:29 +02:00
Jonas Jenwald
6912bb5e0a Move the IdentityToUnicodeMap/ToUnicodeMap from src/core/fonts.js and into its own file 2021-05-02 21:00:29 +02:00
Jonas Jenwald
8c1d1a58f7 Convert src/core/opentype_file_builder.js to use standard classes 2021-05-02 21:00:28 +02:00
Jonas Jenwald
1808b2dc96 Enable the no-var rule in the src/core/opentype_file_builder.js file
These changes were made automatically, using `gulp lint --fix`.
2021-05-02 21:00:28 +02:00
Jonas Jenwald
a783c7ca79 Move the OpenTypeFileBuilder from src/core/fonts.js and into its own file 2021-05-02 21:00:28 +02:00
Tim van der Meij
af9feb1307
Merge pull request #13321 from timvandermeij/src-core-no-var
Enable the `no-var` linting rule in `src/core/{crypto,function}.js`
2021-05-02 13:45:33 +02:00
Tim van der Meij
b661cf2b80
Fix no-var linting rule violations in src/core/crypto.js that couldn't be changed automatically by ESLint
This is done in a separate commit due to the required number of changes
so that reviewing is easier than in a plain-text diff in the commit
message.
2021-05-02 13:32:34 +02:00
Tim van der Meij
1f8b452354
Enable the no-var linting rule in src/core/crypto.js
This is done automatically with `gulp lint --fix`.
2021-05-01 20:34:35 +02:00
Tim van der Meij
58e568fe62
Enable the no-var linting rule in src/core/function.js
This is done automatically with `gulp lint --fix` and the following
manual changes:

```diff
diff --git a/src/core/function.js b/src/core/function.js
index 878001057..b7e3e6ccf 100644
--- a/src/core/function.js
+++ b/src/core/function.js
@@ -131,7 +131,7 @@ function toNumberArray(arr) {
   return arr;
 }

-var PDFFunction = (function PDFFunctionClosure() {
+const PDFFunction = (function PDFFunctionClosure() {
   const CONSTRUCT_SAMPLED = 0;
   const CONSTRUCT_INTERPOLATED = 2;
   const CONSTRUCT_STICHED = 3;
@@ -484,7 +484,9 @@ var PDFFunction = (function PDFFunctionClosure() {
         // clip to domain
         const v = clip(src[srcOffset], domain[0], domain[1]);
         // calculate which bound the value is in
-        for (var i = 0, ii = bounds.length; i < ii; ++i) {
+        const length = bounds.length;
+        let i;
+        for (i = 0; i < length; ++i) {
           if (v < bounds[i]) {
             break;
           }
@@ -673,23 +675,21 @@ const PostScriptStack = (function PostScriptStackClosure() {
     roll(n, p) {
       const stack = this.stack;
       const l = stack.length - n;
-      let r = stack.length - 1,
-        c = l + (p - Math.floor(p / n) * n),
-        i,
-        j,
-        t;
-      for (i = l, j = r; i < j; i++, j--) {
-        t = stack[i];
+      const r = stack.length - 1;
+      const c = l + (p - Math.floor(p / n) * n);
+
+      for (let i = l, j = r; i < j; i++, j--) {
+        const t = stack[i];
         stack[i] = stack[j];
         stack[j] = t;
       }
-      for (i = l, j = c - 1; i < j; i++, j--) {
-        t = stack[i];
+      for (let i = l, j = c - 1; i < j; i++, j--) {
+        const t = stack[i];
         stack[i] = stack[j];
         stack[j] = t;
       }
-      for (i = c, j = r; i < j; i++, j--) {
-        t = stack[i];
+      for (let i = c, j = r; i < j; i++, j--) {
+        const t = stack[i];
         stack[i] = stack[j];
         stack[j] = t;
       }
@@ -939,7 +939,7 @@ class PostScriptEvaluator {
 // We can compile most of such programs, and at the same moment, we can
 // optimize some expressions using basic math properties. Keeping track of
 // min/max values will allow us to avoid extra Math.min/Math.max calls.
-var PostScriptCompiler = (function PostScriptCompilerClosure() {
+const PostScriptCompiler = (function PostScriptCompilerClosure() {
   class AstNode {
     constructor(type) {
       this.type = type;
```
2021-05-01 20:04:58 +02:00
Jonas Jenwald
90b5fcb8e0 Remove unnecessary TypedArray re-initialization in FontFaceObject.createFontFaceRule
The `this.data` property is, when defined, sent from the worker-thread as a `Uint8Array` and there's thus no reason to re-initialize the TypedArray here.
Note also the `FontFaceObject.createNativeFontFace` method just above, where we simply use `this.data` as-is.

The explanation for this code looking like it does is, as is often the case, for historical reasons. Originally we only supported `@font-face`, before the Font Loading API existed, and back then we also polyfilled TypedArrays (using regular Arrays) which should explain this particular line of code.
2021-05-01 19:20:36 +02:00
Jonas Jenwald
3624f9eac7 Add a new BaseStream.getString(...) method to replace manual bytesToString(BaseStream.getBytes(...)) calls
Given that the `bytesToString(BaseStream.getBytes(...))` pattern is somewhat common throughout the `src/core/` code, it cannot hurt to add a new `BaseStream`-method which handles that case internally.
2021-05-01 19:20:36 +02:00
Tim van der Meij
f6f335173d
Merge pull request #13303 from Snuffleupagus/BaseStream
Add an abstract base-class, which all the various Stream implementations inherit from
2021-05-01 19:13:36 +02:00
calixteman
af4dc55019
[api-minor] Fix the way to chunk the strings (#13257)
- Improve chunking in order to fix some bugs where the spaces aren't here:
    * track the last position where a glyph has been drawn;
    * when a new glyph (first glyph in a chunk) is added then compare its position with the last saved one and add a space or break:
      - there are multiple ways to move the glyphs and to avoid to have to deal with all the different possibilities it's a way easier to just compare positions;
      - and so there is now one function (i.e. "compareWithLastPosition") where all the job is done.
  - Add some breaks in order to get lines;
  - Remove the multiple whites spaces:
    * some spaces were filled with several whites spaces and so it makes harder to find some sequences of words using the search tool;
    * other pdf readers replace spaces by one white space.

Update src/core/evaluator.js

Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>

Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
2021-04-30 14:41:13 +02:00
Jonas Jenwald
2ac4ad3111 Let ChunkedStream extend Stream, rather than BaseStream directly
Looking at the `ChunkedStream` implementation, it's basically a "regular" `Stream` but with added functionality in order to deal with fetching/loading of missing data.
Hence, by letting `ChunkedStream` extend `Stream`, we can remove some duplicate methods from the `ChunkedStream` class.
2021-04-28 14:05:25 +02:00
Jonas Jenwald
fb0775525e Stop special-casing the dict parameter in the Jbig2Stream/JpegStream/JpxStream constructors
For all of the other `DecodeStream`s we're not passing in a `Dict`-instance manually, but instead get it from the `stream`-parameter. Hence there's no particularly good reason, as far as I can tell, to not do the same thing in `Jbig2Stream`/`JpegStream`/`JpxStream` as well.
2021-04-28 13:44:47 +02:00
Jonas Jenwald
67a1cfc1b1 Improve the handling getBaseStreams, on the various Stream implementations
The way that `getBaseStreams` is currently handled has bothered me from time to time, especially how we're checking if the method exists before calling it.
By adding a dummy `BaseStream.getBaseStreams` method, and having the call-sites simply check the return value, we can improve some of the relevant code.

Note in particular how the `ObjectLoader._walk` method didn't actually check that the data in question is a Stream instance, and instead only checked the `currentNode` (which could be anything) for the existence of a `getBaseStreams` property.
2021-04-28 13:44:47 +02:00
Jonas Jenwald
67415bfabe Add an abstract base-class, which all the various Stream implementations inherit from
By having an abstract base-class, it becomes a lot clearer exactly which methods/getters are expected to exist on all Stream instances.
Furthermore, since a number of the methods are *identical* for all Stream implementations, this reduces unnecessary code duplication in the `Stream`, `DecodeStream`, and `ChunkedStream` classes.

For e.g. `gulp mozcentral`, the *built* `pdf.worker.js` files decreases from `1 619 329` to `1 616 115` bytes with this patch-series.
2021-04-28 13:44:45 +02:00
Jonas Jenwald
6151b4ecac Convert src/core/stream.js to use standard classes 2021-04-28 13:44:10 +02:00
Jonas Jenwald
29cf415a69 Enable the no-var rule in the src/core/stream.js file 2021-04-28 10:16:51 +02:00
Jonas Jenwald
b11f012e52 Convert src/core/decode_stream.js to use standard classes 2021-04-28 10:16:51 +02:00
Jonas Jenwald
8ce2cae4a7 Enable the no-var rule in the src/core/decode_stream.js file 2021-04-28 10:16:51 +02:00
Jonas Jenwald
30a22a168d Move the DecodeStream and StreamsSequenceStream from src/core/stream.js and into its own file 2021-04-28 10:16:51 +02:00
Jonas Jenwald
213e1c389c Convert src/core/flate_stream.js to use standard classes 2021-04-28 10:16:51 +02:00
Jonas Jenwald
aa1deaf93c Enable the no-var rule in the src/core/flate_stream.js file 2021-04-28 10:16:51 +02:00
Jonas Jenwald
1e5bf352a5 Move the FlateStream from src/core/stream.js and into its own file 2021-04-28 10:16:51 +02:00
Jonas Jenwald
40c342ec6c Convert src/core/predictor_stream.js to use standard classes 2021-04-28 10:16:51 +02:00
Jonas Jenwald
b08f9a8182 Enable the no-var rule in the src/core/predictor_stream.js file 2021-04-28 10:16:51 +02:00
Jonas Jenwald
66d9d83dcb Move the PredictorStream from src/core/stream.js and into its own file 2021-04-28 10:16:51 +02:00
Jonas Jenwald
e938c05edb Convert src/core/decrypt_stream.js to use standard classes 2021-04-28 10:16:51 +02:00
Jonas Jenwald
a9476e7dd0 Enable the no-var rule in the src/core/decrypt_stream.js file 2021-04-28 10:16:51 +02:00
Jonas Jenwald
28b0809e60 Move the DecryptStream from src/core/stream.js and into its own file 2021-04-28 10:16:51 +02:00
Jonas Jenwald
cdb583b764 Convert src/core/ascii_85_stream.js to use standard classes 2021-04-28 10:16:51 +02:00
Jonas Jenwald
f6c7a65202 Enable the no-var rule in the src/core/ascii_85_stream.js file 2021-04-28 10:16:51 +02:00
Jonas Jenwald
3294d4d5a3 Move the Ascii85Stream from src/core/stream.js and into its own file 2021-04-28 10:16:51 +02:00
Jonas Jenwald
d2227a7d10 Convert src/core/ascii_hex_stream.js to use standard classes 2021-04-28 10:16:50 +02:00
Jonas Jenwald
59591f8788 Enable the no-var rule in the src/core/ascii_hex_stream.js file 2021-04-28 10:16:50 +02:00
Jonas Jenwald
d63df04854 Move the AsciiHexStream from src/core/stream.js and into its own file 2021-04-28 10:16:50 +02:00
Jonas Jenwald
704514c7cd Convert src/core/run_length_stream.js to use standard classes 2021-04-28 10:16:50 +02:00
Jonas Jenwald
66b898eb58 Enable the no-var rule in the src/core/run_length_stream.js file 2021-04-28 10:16:50 +02:00
Jonas Jenwald
342b0c1bbc Move the RunLengthStream from src/core/stream.js and into its own file 2021-04-28 10:16:50 +02:00
Jonas Jenwald
1f0685cee6 Convert src/core/lzw_stream.js to use standard classes 2021-04-28 10:16:50 +02:00
Jonas Jenwald
1f9b134c6a Enable the no-var rule in the src/core/src/core/lzw_stream.js file 2021-04-28 10:16:50 +02:00
Jonas Jenwald
6c1a321500 Move the LZWStream from src/core/stream.js and into its own file 2021-04-28 10:16:50 +02:00
Tim van der Meij
0acd801b1e
Merge pull request #13305 from timvandermeij/annotation-polygon-polyline-no-appearance-stream
Implement rendering polyline/polygon annotations without appearance stream
2021-04-27 20:03:35 +02:00
Tim van der Meij
60ab15427f
Implement rendering polyline/polygon annotations without appearance stream 2021-04-27 19:02:20 +02:00
Jonas Jenwald
0ecb42f4d7 Convert src/core/jpx_stream.js to use standard classes 2021-04-27 13:29:09 +02:00
Jonas Jenwald
c51ef1f21f Convert src/core/jbig2_stream.js to use standard classes 2021-04-27 13:29:09 +02:00
Jonas Jenwald
d9c1bf96b6 Convert src/core/jpeg_stream.js to use standard classes 2021-04-27 13:29:09 +02:00
Jonas Jenwald
0ca63f94b4 Convert src/core/ccitt_stream.js to use standard classes 2021-04-27 13:29:09 +02:00
Jonas Jenwald
8ff213871b Convert src/core/ccitt.js to use standard classes
Given that we're using modules, meaning that only explicitly `export`ed things are visible to the outside, it's no longer necessary to wrap all of the code in a closure.
2021-04-27 13:29:09 +02:00
Tim van der Meij
ca668587c6
Merge pull request #13300 from Snuffleupagus/canvas-class
Convert the code in `src/display/canvas.js` to use standard classes
2021-04-27 13:19:36 +02:00
Jonas Jenwald
6f4394fcd8
Support InkAnnotations without appearance streams (issue 13298) (#13301)
For now, we keep things purposely simple by using straight lines (rather than curves); please see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G11.2096579
2021-04-27 11:49:03 +02:00
Jonas Jenwald
e6601f4582 Convert the code in src/display/canvas.js to use standard classes
This gets rid of *a lot* of boilerplate that stems from our old way of simulating classes, and it actually reduces the filesize noticeably.
For e.g. `gulp mozcentral`, the *built* `pdf.js` files decreases from `318 404` to `314 722` bytes (~1 percent) with this patch.
2021-04-26 22:10:38 +02:00
Tim van der Meij
270e56dae8
Enable the no-var linting rule in src/core/image.js
This is done automatically with `gulp lint --fix` and the following
manual changes:

```diff
diff --git a/src/core/image.js b/src/core/image.js
index 35c06b8ab..e718b9937 100644
--- a/src/core/image.js
+++ b/src/core/image.js
@@ -97,7 +97,7 @@ class PDFImage {
     if (isName(filter)) {
       switch (filter.name) {
         case "JPXDecode":
-          var jpxImage = new JpxImage();
+          const jpxImage = new JpxImage();
           jpxImage.parseImageProperties(image.stream);
           image.stream.reset();
```
2021-04-25 17:40:00 +02:00
Tim van der Meij
16efd09c9f
Enable the no-var linting rule in src/core/worker.js
This is done automatically with `gulp lint --fix` and the following
manual changes:

```diff
diff --git a/src/core/worker.js b/src/core/worker.js
index aec9c1d39..f88691622 100644
--- a/src/core/worker.js
+++ b/src/core/worker.js
@@ -300,7 +300,7 @@ class WorkerMessageHandler {
         cachedChunks = [];
       };
       const readPromise = new Promise(function (resolve, reject) {
-        var readChunk = function ({ value, done }) {
+        const readChunk = function ({ value, done }) {
           try {
             ensureNotTerminated();
             if (done) {
```
2021-04-25 17:40:00 +02:00
Tim van der Meij
85659b4cf0
Enable the no-var linting rule in src/core/cmap.js
This is done automatically with `gulp lint --fix` and the following
manual changes:

```diff
diff --git a/src/core/cmap.js b/src/core/cmap.js
index 850275a19..8794726dd 100644
--- a/src/core/cmap.js
+++ b/src/core/cmap.js
@@ -519,8 +519,8 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() {

     readHexNumber(num, size) {
       let last;
-      let stack = this.tmpBuf,
-        sp = 0;
+      const stack = this.tmpBuf;
+      let sp = 0;
       do {
         const b = this.readByte();
         if (b < 0) {
@@ -603,7 +603,6 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() {

         const ucs2DataSize = 1;
         const subitemsCount = stream.readNumber();
-        var i;
         switch (type) {
           case 0: // codespacerange
             stream.readHex(start, dataSize);
@@ -614,7 +613,7 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() {
               hexToInt(start, dataSize),
               hexToInt(end, dataSize)
             );
-            for (i = 1; i < subitemsCount; i++) {
+            for (let i = 1; i < subitemsCount; i++) {
               incHex(end, dataSize);
               stream.readHexNumber(start, dataSize);
               addHex(start, end, dataSize);
@@ -633,7 +632,7 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() {
             addHex(end, start, dataSize);
             stream.readNumber(); // code
             // undefined range, skipping
-            for (i = 1; i < subitemsCount; i++) {
+            for (let i = 1; i < subitemsCount; i++) {
               incHex(end, dataSize);
               stream.readHexNumber(start, dataSize);
               addHex(start, end, dataSize);
@@ -647,7 +646,7 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() {
             stream.readHex(char, dataSize);
             code = stream.readNumber();
             cMap.mapOne(hexToInt(char, dataSize), code);
-            for (i = 1; i < subitemsCount; i++) {
+            for (let i = 1; i < subitemsCount; i++) {
               incHex(char, dataSize);
               if (!sequence) {
                 stream.readHexNumber(tmp, dataSize);
@@ -667,7 +666,7 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() {
               hexToInt(end, dataSize),
               code
             );
-            for (i = 1; i < subitemsCount; i++) {
+            for (let i = 1; i < subitemsCount; i++) {
               incHex(end, dataSize);
               if (!sequence) {
                 stream.readHexNumber(start, dataSize);
@@ -692,7 +691,7 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() {
               hexToInt(char, ucs2DataSize),
               hexToStr(charCode, dataSize)
             );
-            for (i = 1; i < subitemsCount; i++) {
+            for (let i = 1; i < subitemsCount; i++) {
               incHex(char, ucs2DataSize);
               if (!sequence) {
                 stream.readHexNumber(tmp, ucs2DataSize);
@@ -717,7 +716,7 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() {
               hexToInt(end, ucs2DataSize),
               hexToStr(charCode, dataSize)
             );
-            for (i = 1; i < subitemsCount; i++) {
+            for (let i = 1; i < subitemsCount; i++) {
               incHex(end, ucs2DataSize);
               if (!sequence) {
                 stream.readHexNumber(start, ucs2DataSize);
```
2021-04-25 17:40:00 +02:00
Jonas Jenwald
4078dd856c Clear some Arrays, rather than re-initialize them, in src/display/-code
It's generally better to re-use the same Array, by clearing out all of its elements, rather than creating a new Array.
2021-04-24 13:00:53 +02:00
Jonas Jenwald
da22146b95 Replace a bunch of Array.prototype.forEach() cases with for...of loops instead
Using `for...of` is a modern and generally much nicer pattern, since it gets rid of unnecessary callback-functions. (In a couple of spots, a "regular" `for` loop had to be used.)
2021-04-24 13:00:19 +02:00
Tim van der Meij
da0e7ea969
Merge pull request #13272 from calixteman/issue13271
Update all the text widgets having the same name with the same value
2021-04-23 21:08:54 +02:00
Tim van der Meij
a6e3ad4c72
Merge pull request #13283 from Snuffleupagus/NameOrNumberTree-getAll-map
Change `NameOrNumberTree.getAll` to return a `Map` rather than an Object
2021-04-23 20:53:52 +02:00
calixteman
762cfd2d1b
[JS] Use heap allocation when initializing quickjs sandbox (#13286)
- In case of large string the sandbox initialization failed because of an OOM
    * so allocate a new string in the heap
    * and free it after use.
  - it requires a quickjs update since we need to export some symbols (stringToNewUTF8 and free).
2021-04-23 12:04:14 +02:00
Jonas Jenwald
4ec0a4fb43 Re-factor the Catalog._collectJavaScript method slightly
This patch first of all moves all checking/validation into the `appendIfJavaScriptDict` function, to avoid duplicating it in multiple places. Secondly, also removes what's now an outdated/incorrect comment since we have implemented scripting support.
2021-04-23 09:42:32 +02:00
Jonas Jenwald
83f7009e4b Change NameOrNumberTree.getAll to return a Map rather than an Object
Given that we're (almost) always iterating through the result of the `getAll`-calls, using a `Map` seems nicer overall since it's more suited to iteration compared to a regular Object.

Also, add a couple of `Dict`-checks in existing code touched by this patch, since it really cannot hurt to prevent *potential* errors in a corrupt PDF document.
2021-04-22 13:15:50 +02:00
Jonas Jenwald
57a1ea840f
Ensure that saveDocument works if there's no /ID-entry in the PDF document (issue 13279) (#13280)
First of all, while it should be very unlikely that the /ID-entry is an *indirect* object, note how we're using `Dict.get` when parsing it e.g. in `PDFDocument.fingerprint`. Hence we definitely should be consistent here, since if the /ID-entry is an *indirect* object the existing code in `src/core/writer.js` would already fail.
Secondly, to fix the referenced issue, we also need to check that the /ID-entry actually is an Array before attempting to access its contents in `src/core/writer.js`.

*Drive-by change:* In the `xrefInfo` object passed to the `incrementalUpdate` function, re-name the `encrypt` property to `encryptRef` since its data is fetched using `Dict.getRaw` (given the names of the other properties fetched similarly).
2021-04-22 12:08:56 +02:00
Brendan Dahl
066cbcfb27
Merge pull request #13277 from Snuffleupagus/adjustToUnicode-cff
For CFF fonts without proper `ToUnicode`/`Encoding` data, utilize the "charset"/"Encoding"-data from the font file to improve text-selection (issue 13260)
2021-04-21 10:41:36 -07:00
Brendan Dahl
5231d922ec
Add presentation role to text layer spans. (#13278)
Keeps screen readers from pausing on every span so
paragraphs are read more naturally. Note: this only seems
to affect Firefox, Chrome automatically combines the spans.
2021-04-21 10:47:51 +02:00
Jonas Jenwald
7fab73ed23 For CFF fonts without proper ToUnicode/Encoding data, utilize the "charset"/"Encoding"-data from the font file to improve text-selection (issue 13260)
This patch extends the approach, implemented in PR 7550, to also apply to CFF fonts.
2021-04-20 20:48:44 +02:00
Jonas Jenwald
8f6543c218 Ensure that the /Properties, used with optional content, is actually loaded *before* parsing the operatorList/textContent (PR 12095 follow-up)
By not waiting for the /Properties to load, before parsing of the operatorList/textContent starts, there's a very real risk that a `MissingDataException` will be thrown when trying to access the data in the `PartialEvaluator.parseMarkedContentProps` method.
If this ever happens it will thus lead to incomplete and/or outright broken rendering, and with e.g. `disableAutoFetch=true` set the likelihood of this occuring would increase quite a bit.

*Please note:* While I've not yet seen this error in an actual PDF document, it can happen during loading if you're unlucky enough with e.g. the structure of the PDF document and/or the download speed offered by the server.
2021-04-20 20:22:44 +02:00
Calixte Denizet
e868ab0051 Update all the text widgets having the same name with the same value 2021-04-20 20:03:19 +02:00
Jonas Jenwald
f560fe6875 A couple of small scripting/XFA-related tweaks in the worker-code
- Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible.

 - Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.)

 - Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be.
   Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.)

 - Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.
2021-04-17 10:34:22 +02:00
Brendan Dahl
ac3fa1e3d7
Merge pull request #13146 from calixteman/xfa_fonts
XFA -- Load fonts permanently from the pdf
2021-04-16 12:55:12 -07:00
Calixte Denizet
7e9579045f XFA -- Load fonts permanently from the pdf
- Different fonts can be used in xfa and some of them are embedded in the pdf.
  - Load all the fonts in window.document.

Update src/core/document.js

Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>

Update src/core/worker.js

Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
2021-04-15 17:57:42 +02:00
Jani Pehkonen
3a96977ea8 Implement visibility expressions for optional content 2021-04-14 17:39:41 +03:00
Jonas Jenwald
1d6d476cab Rename the src/core/obj.js file to src/core/catalog.js
Now that only the `Catalog` remains in this file, after the previous patches, it makes sense to rename the file to reduce confusion.
2021-04-13 21:00:30 +02:00
Jonas Jenwald
088a55f80d Enable the no-var rule in the src/core/xref.js file 2021-04-13 21:00:30 +02:00
Jonas Jenwald
bc828cd41f Convert the XRef to a "normal" class 2021-04-13 21:00:30 +02:00
Jonas Jenwald
e8750cfe95 Move the XRef from src/core/obj.js and into its own file
The size of the `src/core/obj.js` file has increased slowly over the years, and it also contains a fair amount of *distinct* functionality.
In order to improve readability and make it easier to navigate through the code, this patch moves the `XRef` into its own file.
2021-04-13 21:00:30 +02:00
Jonas Jenwald
24e5ecdf76 Move NameTree/NumberTree from src/core/obj.js and into its own file
The size of the `src/core/obj.js` file has increased slowly over the years, and it also contains a fair amount of *distinct* functionality.
In order to improve readability and make it easier to navigate through the code, this patch moves `NameTree`/`NumberTree` into its own file.
2021-04-13 21:00:30 +02:00
Jonas Jenwald
92141e0468 Enable the no-var rule in the src/core/file_spec.js file 2021-04-13 21:00:30 +02:00
Jonas Jenwald
22a066e657 Convert the FileSpec to a "normal" class 2021-04-13 21:00:30 +02:00
Jonas Jenwald
e02d17da93 Move the FileSpec from src/core/obj.js and into its own file
The size of the `src/core/obj.js` file has increased slowly over the years, and it also contains a fair amount of *distinct* functionality.
In order to improve readability and make it easier to navigate through the code, this patch moves the `FileSpec` into its own file.
2021-04-13 21:00:30 +02:00
Jonas Jenwald
6a935682fd Covert the ObjectLoader to a "normal" class 2021-04-13 21:00:30 +02:00
Jonas Jenwald
604cd6d600 Move the ObjectLoader from src/core/obj.js and into its own file
The size of the `src/core/obj.js` file has increased slowly over the years, and it also contains a fair amount of *distinct* functionality.
In order to improve readability and make it easier to navigate through the code, this patch moves the `ObjectLoader` into its own file.
2021-04-13 21:00:30 +02:00
Tim van der Meij
ebeb3f7999
Merge pull request #13234 from Snuffleupagus/hasJSActions-MissingDataException
[api-minor] Ensure that `PDFDocumentProxy.hasJSActions` won't fail if `MissingDataException`s are thrown during the associated worker-thread parsing
2021-04-13 20:44:58 +02:00
Cetin Sert
d498897ab5
Fix annotation input focus trap regression in Safari (#13232)
`setSelectionRange(0, 0)` added in 44b24fcc29 for #12359, required only by Firefox ([bug](https://bugzilla.mozilla.org/show_bug.cgi?id=860329)), causes issues mozilla#13191, mozilla#12592 in Safari.
`scrollLeft = 0` is a fix that breaks the focus trap in Safari while **keeping Firefox behavior same for #12359**.
2021-04-13 20:40:52 +02:00
Tim van der Meij
3d2d8002b0
Merge pull request #13223 from Snuffleupagus/worker-xfa-structTree-tweaks
Remove the unused "GetIsPureXfa" message handler; and avoid unnecessary parsing when no structTree is available (PR 13069 follow-up, PR 13221 follow-up)
2021-04-13 20:39:52 +02:00
Jonas Jenwald
2b2234fd5a [api-minor] Ensure that PDFDocumentProxy.hasJSActions won't fail if MissingDataExceptions are thrown during the associated worker-thread parsing
With the current implementation of `PDFDocument.hasJSActions`, in the worker-thread, we're not actually handling not-yet-loaded data correctly. This can thus fail in *two* different ways:
 - The `PDFDocument.fieldObjects` getter (and its helper method), while it may *return* a Promise, still fetches all of its data synchronously and it can thus throw a `MissingDataException` during parsing.
 - The `Catalog.jsActions` getter, which is completely synchronous, can obviously throw a `MissingDataException` during parsing.

If either of these cases occur currently, the `PDFDocumentProxy.hasJSActions` method in the API can either return a *rejected* Promise (which it never should) or possibly "hang" and never resolve.

*Please note:* While I've not *yet* seen this error in an actual PDF document, it can happen during loading if you're unlucky enough with e.g. the structure of the PDF document and/or the download speed offered by the server.
This patch is thus based on code-inspection *and* on manually throwing a `MissingDataException` on the first access of `Catalog.jsActions` to simulate this situation.

Finally, this patch adds a couple of *API* unit-tests for this (since none existed).
2021-04-13 14:33:56 +02:00
Jonas Jenwald
4aa27cc645 Re-factor Catalog._collectJavaScript to use a Map rather than an Object
Given that this only an internal helper method, used by the `Catalog.{javaScript, jsActions}` getters, this change simplifies iteration of the returned data.
We can also (slightly) re-factor the code of the `jsActions` getter, and remove an obsolete[1] JSDoc-comment from the `openAction` getter.

---
[1] Not really relevant now that we've got proper scripting support.
2021-04-13 14:16:17 +02:00
Calixte Denizet
a4c986515f XFA -- Display text content
- display xhtml;
  - allow spaces in xhtml (xfa-spacerun:yes);
  - support column layout;
  - fix some border issues.
2021-04-12 14:13:49 +02:00
Jonas Jenwald
54ef4370a2 Ensure that the data is loaded, in the "GetPageJSActions" message handler
Similar to all other data accesses, note e.g. the "GetDocJSActions" handler just above, we need to ensure that a `MissingDataException` isn't propagated to the main-thread if this data is accessed while the PDF document is still loading.
2021-04-12 13:54:37 +02:00
Jonas Jenwald
9360c7cbdc Avoid unnecessary parsing, in Page.GetStructTree, when no structTree is available (PR 13221 follow-up)
It's obviously (a bit) more efficient to return early in `Page.getStructTree`, rather than trying to first "parse" an *empty* structTree-root.

*Somehow I didn't think of this yesterday, but this feels like a much better solution overall; sorry about the churn here!*
2021-04-12 08:54:21 +02:00
Jonas Jenwald
0d2dd6c2fe Remove the unused "GetIsPureXfa" message handler in the worker (PR 13069 follow-up)
Looking at the API, there's no code which actually sends this message. Most likely it's a left-over from a previous version of PR 13069, since the `isPureXfa` parameter is being included in the "GetDoc" message.
2021-04-12 08:52:27 +02:00
Jonas Jenwald
5adee0cdd1 [api-minor] Let PDFPageProxy.getStructTree return null, rather than an empty structTree, for documents without any accessibility data (PR 13171 follow-up)
This is first of all consistent with existing API-methods, where we return `null` when the data in question doesn't exist. Secondly, it should also be (slightly) more efficient since there's less dummy-data that we need to transfer between threads.
Finally, this prevents us from adding an empty/unnecessary span to *every* single page even in documents without any structure tree data.
2021-04-11 12:35:33 +02:00
Jonas Jenwald
ff4dae05b0 Ensure that getStructTree won't break with disableAutoFetch = true set (PR 13171 follow-up)
Open http://localhost:8888/web/viewer.html?file=/test/pdfs/pdf.pdf#disableStream=true&disableAutoFetch=true and observe the following message in the console (repeated for each page of the document):
```
Uncaught (in promise)
Object { message: "Missing data [19787293, 19787294)", name: "UnknownErrorException", details: "MissingDataException: Missing data [19787293, 19787294)", stack: "BaseExceptionClosure@http://localhost:8888/src/shared/util.js:458:29\n@http://localhost:8888/src/shared/util.js:462:3\n" }
```
2021-04-11 12:15:33 +02:00
Tim van der Meij
d9d626a5e1
Merge pull request #13214 from calixteman/signatures
Display widget signature
2021-04-10 19:35:16 +02:00
Calixte Denizet
5875ebb1ca Display widget signature
- but don't validate them for now;
  - Firefox will display a bar to warn that the signature validation is not supported (see https://bugzilla.mozilla.org/show_bug.cgi?id=854315)
  - almost all (all ?) pdf readers display signatures;
  - validation is done in edge but for now it's behind a pref.
2021-04-10 19:13:28 +02:00
Tim van der Meij
03c8c89002
Merge pull request #13171 from brendandahl/struct-tree
[api-minor] Add support for basic structure tree for accessibility.
2021-04-09 21:32:44 +02:00
Tim van der Meij
b0473eb353
Merge pull request #13207 from Snuffleupagus/api-AnnotationStorage-params
[api-minor] Remove the manual passing of an `AnnotationStorage`-instance when calling various API-method
2021-04-09 21:09:16 +02:00
Brendan Dahl
fc9501a637 Add support for basic structure tree for accessibility.
When a PDF is "marked" we now generate a separate DOM that represents
the structure tree from the PDF.  This DOM is inserted into the <canvas>
element and allows screen readers to walk the tree and have more
information about headings, images, links, etc. To link the structure
tree DOM (which is empty) to the text layer aria-owns is used. This
required modifying the text layer creation so that marked items are
now tracked.
2021-04-09 09:56:28 -07:00
Jonas Jenwald
737a8e846d Add deprecated handling of the now removed AnnotationStorage API-parameters
These changes are done separately, to make it easier to remove them in the future.
2021-04-09 13:25:03 +02:00
Jonas Jenwald
72ef183085 [api-minor] Remove the manual passing of an AnnotationStorage-instance when calling various API-method
Note how we purposely don't expose the `AnnotationStorage`-class directly in the official API (see `src/pdf.js`), since trying to use *multiple* ones simultaneously doesn't really make sense (e.g. in the viewer).
Instead we lazily initialize, and cache, just *one* instance via `PDFDocumentProxy.annotationStorage` which should thus be available internally in the API itself without having to be manually passed to various methods.

To support these changes, the `AnnotationStorage`-instance initialization is moved into the `WorkerTransport`-class to allow both `PDFDocumentProxy` and `PDFPageProxy` to access it.
This patch implements the following simplifications:
 - Remove the `annotationStorage`-parameter from `PDFDocumentProxy.saveDocument`, since it's already available internally.
   Furthermore, while it's currently possible to call that method without an `AnnotationStorage`-instance, that really does *not* make any sense at all. In this case you're effectively reducing `PDFDocumentProxy.saveDocument` to a "regular" `PDFDocumentProxy.getData` call, but with *a lot* more overhead, which was obviously not the intention of the `PDFDocumentProxy.saveDocument`-method.

 - Try to discourage third-party users from calling `PDFDocumentProxy.saveDocument` unconditionally, as a replacement for `PDFDocumentProxy.getData` (note the previous point).

 - Replace the `annotationStorage`-parameter, in `PDFPageProxy.render`, with a boolean `includeAnnotationStorage`-parameter which simply indicates if the (internally available) `AnnotationStorage`-instance should be used during rendering (e.g. for printing).

 - By removing the need to *manually* provide `annotationStorage`-parameters to various API-methods, using the API should become simpler (e.g. for third-parties) since you no longer need to worry about manually fetching and passing around this data.
2021-04-09 13:24:25 +02:00
Ikko Ashimine
c4c4333d54
Fix typo in canvas.js
Reseting -> Resetting
2021-04-08 23:45:24 +09:00
Tim van der Meij
6429ccc002
Merge pull request #13194 from Snuffleupagus/ttcf-fuzzy-match
Fuzzy-match the fontName, for TrueType Collection fonts, where the "name"-table is wrong (issue 13193)
2021-04-07 20:50:19 +02:00
Tim van der Meij
5945f7c4a1
Merge pull request #13186 from Snuffleupagus/rm-deprecated-code
Remove some `deprecated` code
2021-04-07 20:38:59 +02:00
Jonas Jenwald
f986ccdf0e Fuzzy-match the fontName, for TrueType Collection fonts, where the "name"-table is wrong (issue 13193)
The fontName, as defined in the PDF document, cannot be found in *any* of the "name"-tables in the TrueType Collection font. To work-around that, this patch adds a *fallback* code-path to allow using an approximately matching fontName rather than outright failing.
2021-04-07 15:25:32 +02:00
Jonas Jenwald
4e81e0e14f Remove the deprecated AnnotationStorage.getOrCreateValue-method (PR 12759 follow-up)
While this method has only been deprecated in one releases now, the `AnnotationStorage`-functionality is new enough that third-party implementations hopefully don't rely heavily on it just yet. (And removing this quickly should help reduce the likelihood that someone starts using it.)
2021-04-06 13:22:06 +02:00
Tim van der Meij
fc0cd4a443
Convert the startXRefParsedCache variable, in src/core/obj.js, from an object to a set
We only want to track XRef starting points instead of actual data, so
using a set conveys that intention more clearly and is slightly more
efficient.
2021-04-05 19:32:58 +02:00
Tim van der Meij
228adbf673
Merge pull request #13172 from Snuffleupagus/cleanup-keepFonts
[api-minor] Add an option, in `PDFDocumentProxy.cleanup`, to allow fonts to remain attached to the DOM
2021-04-05 14:21:34 +02:00
Jonas Jenwald
16fd838f52 Convert the renderTasks, used in PDFPageProxy.render/PDFPageProxy.getOperatorList, to a Set
When removing tasks we're currently forced to *indirectly* iterate through the array, which can be avoided by using a Set instead.
Furthermore, we can also (slightly) modernize the code responsible for initializing the `renderTasks`.
2021-04-05 10:51:28 +02:00
Jonas Jenwald
68d3a333ac Change the seenStyles object, in PartialEvaluator.getTextContent, to a Set
Given that what we actually want is only to keep track of the loadedFont-names, rather than storing any actual data, using an object isn't really necessary here. Furthermore, in the current code, we're also using `in` when checking if the data exists, which is generally less efficient than just checking for the value directly.
2021-04-05 10:34:02 +02:00
Jonas Jenwald
a2bc6481a0 [api-minor] Add an option, in PDFDocumentProxy.cleanup, to allow fonts to remain attached to the DOM
As mentioned in the JSDoc comment, this should not be used unless you know what you're doing, since it will lead to increased memory usage. However, in some situations (e.g. SVG-rendering), we still want to be able to run general clean-up on both the main/worker-thread while keeping loaded fonts attached to the DOM.[1]

As part of these changes, `WorkerTransport.startCleanup` is converted to an async method and we'll also skip clean-up when destruction has started (since it's redundant).

---
[1] The SVG-rendering mode is obviously not officially supported, since it's both rather incomplete and inherently slower. However with recent changes, whereby we cache repeated images on the document rather than the page level, memory usage can be *a lot* worse than before if we never attempt to release e.g. cached image-data when the viewer is in SVG-rendering mode.
2021-04-02 12:32:31 +02:00
Jonas Jenwald
48ff20493f Mark some internal PDFDocumentProxy-properties as "private"
These two properties were *never* intended to be anything but "private", hence it really cannot hurt to actually indicate that they're *not* part of any official API.
2021-04-02 12:26:32 +02:00
Jonas Jenwald
0eb1433c78 [api-minor] Change the format of the fontName-property, in defaultAppearanceData, on Annotation-instances (PR 12831 follow-up)
Currently the `fontName`-property contains an actual /Name-instance, which is a problem given that its fallback value is an empty string; see ca7f546828/src/core/default_appearance.js (L35)
The reason that this is a problem can be seen in ca7f546828/src/core/primitives.js (L30-L34), since an empty string short-circuits the cache. Essentially, in PDF documents, a /Name-instance cannot be empty and the way that the `DefaultAppearanceEvaluator` does things is unfortunately not entirely correct.

Hence the `fontName`-property is changed to instead contain a string, rather than a /Name-instance, which simplifies the code overall.

*Please note:* I'm tagging this patch with "[api-minor]", since PR 12831 is included in the current pre-release (although we're not using the `fontName`-property in the display-layer).
2021-04-01 16:47:30 +02:00
Tim van der Meij
ca7f546828
Merge pull request #12908 from calixteman/11918
Slightly rescale lineWidth to workaround chrome rendering issue
2021-03-31 21:56:31 +02:00
Calixte Denizet
a0cfb0841f Slightly rescale lineWidth to workaround chrome rendering issue 2021-03-31 21:49:00 +02:00
Tim van der Meij
5a64157a2f
Merge pull request #13168 from janpe2/ttf-uni-glyphs
Use post table when Encoding has only Differences
2021-03-31 21:35:13 +02:00
Tim van der Meij
1a4af17d07
Merge pull request #13165 from Snuffleupagus/Annotation-rm-defaultAppearance-export
[api-minor] Stop exposing the *raw* `defaultAppearance`-string on Annotation-instances
2021-03-31 21:30:50 +02:00
Tim van der Meij
5be0fbe8f1
Merge pull request #13166 from Snuffleupagus/getDocument-URL
[api-minor] Support proper `URL`-objects, in addition to URL-strings, in `getDocument`
2021-03-31 21:20:08 +02:00
Tim van der Meij
2fb4d02ea5
Merge pull request #13158 from Snuffleupagus/rm-URL-polyfill
Remove the `URL` polyfill
2021-03-31 20:22:02 +02:00
Jani Pehkonen
0117ee5071 Use post table when Encoding has only Differences
Fixes #13107
In the issue, some TrueType glyph names have the format `uniXXXX`.
Font's `Encoding` dictionary has the entry `Differences` but no
`BaseEncoding`. `uniXXXX` names are converted to glyph indices
using font's `post` table but currently that is done only when
`BaseEncoding` exists. We must enable the conversion also when only
`Differences` exists.
2021-03-31 17:58:44 +03:00
Jonas Jenwald
db1e1612df [api-minor] Support proper URL-objects, in addition to URL-strings, in getDocument
Currently only URL-strings are officially supported by `getDocument`, however at this point in time I cannot really see any compelling reason to not support `URL`-objects as well.

Most likely the reason that we've don't already support `URL`-objects, in `getDocument`, is that historically `URL` wasn't fully implemented across browsers and our old polyfill wasn't perfect; see https://developer.mozilla.org/en-US/docs/Web/API/URL/URL#browser_compatibility

*Please note:* Because of how the `url` parameter is currently handled, there's actually *some* cases where passing a `URL`-object to `getDocument` already works. That, in my opinion, provides additional motivation for supporting `URL`-objects officially, since it makes the API more consistent.

The following is an attempt to summarize the *current* situation, based on the actual code rather than the JSDocs:
 - `getDocument("url string")` works and is documented.[1]
 - `getDocument({ url: "url string", })` works and is documented.[1]
 - `getDocument(new URL(...))` throws immediately, since no supported parameters are found.
 - `getDocument({ url: new URL(...), })` actually works even though it's not documented.[1] Originally, when data was fetched on the worker-thread, this would likely have thrown since `URL` isn't clonable.[2]
 - `getDocument({ url: { abc: 123, }, })`, or some similarily meaningless input, will be "accepted" by `getDocument` and then throw a `MissingPDFException` when attempting to fetch the bogus data.

With the changes in this patch, not only is `URL`-objects now officially supported and documented when calling `getDocument`, but we'll also do a much better job at actually validating any URL-data passed to `getDocument` (and instead fail early).

---
[1] In *browsers*, we create a valid URL thus indirectly validating the input. In Node.js environments, on the other hand, no validation is done since obtaining a baseUrl is more difficult (and PDF.js is primarily written for browsers anyway).

[2] https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm#supported_types
2021-03-31 16:21:41 +02:00
Jonas Jenwald
27add0f1f3 Re-factor the source parsing, in getDocument, to use switch rather than if...else
Given the number of parameters that we now need to parse here, this code is no longer as readable as one would like. Hence this re-factoring, which will improve overall readability and also help with the next patch.
2021-03-31 16:21:37 +02:00
Jonas Jenwald
9c6770748c Move the PDFDocumentStats typedef closer to its usage
Currently this typedef appears slightly out-of-place, in the middle of the arguably much more important `getDocument` JSDocs.
2021-03-31 16:21:22 +02:00
calixteman
b3528868c1
XFA - Add support for few ui elements (#13115)
- input;
  - layout;
  - border;
  - margin;
  - color.
2021-03-31 15:42:21 +02:00
Jonas Jenwald
3df24254e3 [api-minor] Stop exposing the *raw* defaultAppearance-string on Annotation-instances
The reasons for making this change are:
 - This property is not, nor has it ever been, used anywhere in the PDF.js display-layer.
 - Related to the previous point, the format of the `defaultAppearance`-string is such that it'd be difficult to use it as-is in the display-layer anyway.
 - It (usually) contains the "raw" appearance-string, from the PDF document, which is neither parsed nor validated and could thus be bogus.
 - We now expose a `defaultAppearanceData`-property, which is first of all used in the display-layer and secondly contains actually parsed/validated data.
 - In the event that a third-party implementation needs the `defaultAppearance`-string, it could be easily constructed from the recently added `defaultAppearanceData`-property.

All-in-all, I'm thus suggesting that we stop exposing an unused and unnecessary property on all Annotation-instances.
2021-03-31 15:09:18 +02:00
Jonas Jenwald
38acde8375 Use template strings, to reduce unnecessary verbosity in a few warn(...) calls in src/core/annotation.js 2021-03-31 14:40:21 +02:00
calixteman
84d7cccb1d
JS - Handle correctly hierarchy of fields (#13133)
* JS - Handle correctly hierarchy of fields
  - it aims to fix #13132;
  - annotations can inherit their actions from the parent field;
  - there are some fields which act as a container for other fields:
    - they can be access through js so need to add them with an empty type (nothing in the spec about that but checked in Acrobat);
    - calculation order list (CO) can reference them so need make them through this.getField;
    - getArray method must return kids.
  - field values are number, string, ... depending of their type but nothing in the spec on how to know what's the type:
    - according to the comment for Canonical Format: https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#page=461
    - it seems that this "type" can be guessed from js action Format (when setting a type in Acrobat DC, the only affected thing is this action).
  - util.scand with an empty string returns the current date.
2021-03-30 08:50:35 -07:00
Jonas Jenwald
fa86a192f9 Remove the URL polyfill
Based on this compatibility information, given that IE 11 is now *explicitly* unsupported, we should no longer need to bundle a `URL` polyfill in any builds: https://developer.mozilla.org/en-US/docs/Web/API/URL/URL#browser_compatibility

Note that the caveat listed for older Safari-versions doesn't apply to any code in the PDF.js library, since we never call `new URL(url, undefined)` in the code-base.

Note also that Node.js has a web-compatible `URL` implementation, which according to the "History" section at https://nodejs.org/api/url.html#url_the_whatwg_url_api has been available since Node.js `10.0.0` (according to https://nodejs.org/en/about/releases/ that branch is one month away from being EOL-ed).
2021-03-29 18:00:36 +02:00
Tim van der Meij
1a2cdaffc5
Merge pull request #13152 from calixteman/13130
Skip extra objects in object stream in using offsets
2021-03-28 15:11:55 +02:00
Jonas Jenwald
19c2dfbb96 Move rotation normalization from PDFViewerApplication and into BaseViewer
The rotation handling that's currently living in `PDFViewerApplication` is *very* old, and pre-dates the introduction of the viewer components by years.
As can be seen in the `BaseViewer.pagesRotation` setter, we're not actually normalizing the rotation as intended and instead rely on the caller to handle that correctly. This is first of all inconsistent, given how other setters are implemented, and secondly it could also lead to the rotation being set to a value outside of the `[0, 360)`-range.

Finally, for improved consistency the rotation handling in `PageViewport` is updated similarly. Please note that this case, it's *not* changing the pre-existing logic.
2021-03-28 14:19:58 +02:00
Calixte Denizet
9296ee6986 Skip extra objects in object stream in using offsets 2021-03-28 13:03:05 +02:00
calixteman
81c602c61c
Set CFF header to 4 when writing it because it contains 4 elements (#13149) 2021-03-26 18:23:18 +01:00
calixteman
63471bcbbe
XFA - Convert some template properties into CSS ones (#13082)
- implement few positioning properties: position, width, height, anchor;
  - implement font element;
  - implement fill element (used by font) and its children (linear, radial, ...);
  - font property is inherited from ancestor container (see https://www.pdfa.org/wp-content/uploads/2020/07/XFA-3_3.pdf#page=43) so let CSS handles that stuff;
  - in order to reduce the number of properties to set, only set non default properties and put the default in CSS;
  - set a background to some containers to be able to see them (will be removed in a future commit).
2021-03-25 13:02:39 +01:00
Tim van der Meij
8269ddbd16
Merge pull request #13105 from Snuffleupagus/BasePdfManager-parseDocBaseUrl
Improve memory usage around the `BasePdfManager.docBaseUrl` parameter (PR 7689 follow-up)
2021-03-19 23:03:20 +01:00
Jonas Jenwald
57e7557235
Actually reset the PDFPageProxy._xfaPromise property as intended (PR 13069 follow-up) (#13119)
Similar to the existing `annotationsPromise` and `_jsActionsPromise` properties, the new `_xfaPromise` should obviously also be reset, since otherwise you might end up holding onto a lot of data for pages that are no longer active.

(That caching wasn't present in the original version of PR 13069, which is why I didn't spot it until now.)
2021-03-19 11:31:54 +01:00
calixteman
24e598a895
XFA - Add a layer to display XFA forms (#13069)
- add an option to enable XFA rendering if any;
  - for now, let the canvas layer: it could be useful to implement XFAF forms (embedded pdf in xml stream for the background and xfa form for the foreground);
  - ui elements in template DOM are pretty close to their html counterpart so we generate a fake html DOM from template one:
    - it makes easier to translate template properties to html ones;
    - it makes faster the creation of the html element in the main thread.
2021-03-19 10:11:40 +01:00
Jonas Jenwald
c4c7216171 Improve memory usage around the BasePdfManager.docBaseUrl parameter (PR 7689 follow-up)
While there is nothing *outright* wrong with the existing implementation, it can however lead to increased memory usage in one particular case (that I completely overlooked when implementing this):
For "data:"-URLs, which by definition contains the entire PDF document and can thus be arbitrarily large, we obviously want to avoid sending, storing, and/or logging the "raw" docBaseUrl in that case.

To address this, this patch makes the following changes:
 - Ignore any non-string in the `docBaseUrl` option passed to `getDocument`, since those are unsupported anyway, already on the main-thread.

 - Ignore "data:"-URLs in the `docBaseUrl` option passed to `getDocument`, to avoid having to send what could potentially be a *very* long string to the worker-thread.

 - Parse the `docBaseUrl` option *directly* in the `BasePdfManager`-constructors, on the worker-thread, to avoid having to store the "raw" docBaseUrl in the first place.
2021-03-17 15:48:24 +01:00
Jonas Jenwald
bd9dee1544 Move the getPdfFilenameFromUrl helper function from web/ui_utils.js and into src/display/display_utils.js
It seems reasonable to place this alongside the *similar* `getFilenameFromUrl` helper function. This way, with the changes in the next patch, we also avoid having to expose the `isDataScheme` function in the API itself and we instead expose `getPdfFilenameFromUrl` in the API (which feels overall more appropriate).
2021-03-17 15:48:24 +01:00
Jonas Jenwald
5099f1977f Support LineAnnotations with empty /Rect-entries (issue 6564)
This extends PR 13033 slightly, with a heuristic to support corrupt PDF documents where the `LineAnnotation`s have an empty /Rect-entry. Please note that while I have no idea if this is "correct", this patch at least makes us output the same /BBox as re-saving in Adobe Reader does.
2021-03-15 16:33:43 +01:00
Tim van der Meij
a2f0573a64
Enable the no-var linting rule for src/core/operator_list.js
This is mostly done using `gulp lint --fix` with a few manual changes in
the following diff:

```diff
diff --git a/src/core/operator_list.js b/src/core/operator_list.js
index 66c26fe05..cbcd12d97 100644
--- a/src/core/operator_list.js
+++ b/src/core/operator_list.js
@@ -40,7 +40,8 @@ const QueueOptimizer = (function QueueOptimizerClosure() {
     // 'count' groups of (save, transform, paintImageMaskXObject, restore)+
     // have been found at iFirstSave.
     const iFirstPIMXO = iFirstSave + 2;
-    for (var i = 0; i < count; i++) {
+    let i;
+    for (i = 0; i < count; i++) {
       const arg = argsArray[iFirstPIMXO + 4 * i];
       const imageMask = arg.length === 1 && arg[0];
       if (
@@ -106,8 +107,8 @@ const QueueOptimizer = (function QueueOptimizerClosure() {
       // assuming that heights of those image is too small (~1 pixel)
       // packing as much as possible by lines
       let maxX = 0;
-      let map = [],
-        maxLineHeight = 0;
+      const map = [];
+      let maxLineHeight = 0;
       let currentX = IMAGE_PADDING,
         currentY = IMAGE_PADDING;
       let q;
@@ -326,9 +327,9 @@ const QueueOptimizer = (function QueueOptimizerClosure() {
           if (fnArray[i] !== OPS.transform) {
             return false;
           }
-          var iFirstTransform = context.iCurr - 2;
-          var firstTransformArg0 = argsArray[iFirstTransform][0];
-          var firstTransformArg3 = argsArray[iFirstTransform][3];
+          const iFirstTransform = context.iCurr - 2;
+          const firstTransformArg0 = argsArray[iFirstTransform][0];
+          const firstTransformArg3 = argsArray[iFirstTransform][3];
           if (
             argsArray[i][0] !== firstTransformArg0 ||
             argsArray[i][1] !== 0 ||
@@ -342,8 +343,8 @@ const QueueOptimizer = (function QueueOptimizerClosure() {
           if (fnArray[i] !== OPS.paintImageXObject) {
             return false;
           }
-          var iFirstPIXO = context.iCurr - 1;
-          var firstPIXOArg0 = argsArray[iFirstPIXO][0];
+          const iFirstPIXO = context.iCurr - 1;
+          const firstPIXOArg0 = argsArray[iFirstPIXO][0];
           if (argsArray[i][0] !== firstPIXOArg0) {
             return false; // images don't match
           }
@@ -423,9 +424,9 @@ const QueueOptimizer = (function QueueOptimizerClosure() {
           if (fnArray[i] !== OPS.showText) {
             return false;
           }
-          var iFirstSetFont = context.iCurr - 3;
-          var firstSetFontArg0 = argsArray[iFirstSetFont][0];
-          var firstSetFontArg1 = argsArray[iFirstSetFont][1];
+          const iFirstSetFont = context.iCurr - 3;
+          const firstSetFontArg0 = argsArray[iFirstSetFont][0];
+          const firstSetFontArg1 = argsArray[iFirstSetFont][1];
           if (
             argsArray[i][0] !== firstSetFontArg0 ||
             argsArray[i][1] !== firstSetFontArg1
```
2021-03-14 11:49:31 +01:00
Tim van der Meij
24ff738e7b
Enable the no-var linting rule for src/core/pattern.js
This is mostly done using `gulp lint --fix` with a few manual changes in
the following diff:

```diff
diff --git a/src/core/pattern.js b/src/core/pattern.js
index 365491ed3..eedd8b686 100644
--- a/src/core/pattern.js
+++ b/src/core/pattern.js
@@ -105,7 +105,7 @@ const Pattern = (function PatternClosure() {
   return Pattern;
 })();

-var Shadings = {};
+const Shadings = {};

 // A small number to offset the first/last color stops so we can insert ones to
 // support extend. Number.MIN_VALUE is too small and breaks the extend.
@@ -597,16 +597,15 @@ Shadings.Mesh = (function MeshClosure() {
       if (!(0 <= f && f <= 3)) {
         throw new FormatError("Unknown type6 flag");
       }
-      var i, ii;
       const pi = coords.length;
-      for (i = 0, ii = f !== 0 ? 8 : 12; i < ii; i++) {
+      for (let i = 0, ii = f !== 0 ? 8 : 12; i < ii; i++) {
         coords.push(reader.readCoordinate());
       }
       const ci = colors.length;
-      for (i = 0, ii = f !== 0 ? 2 : 4; i < ii; i++) {
+      for (let i = 0, ii = f !== 0 ? 2 : 4; i < ii; i++) {
         colors.push(reader.readComponents());
       }
-      var tmp1, tmp2, tmp3, tmp4;
+      let tmp1, tmp2, tmp3, tmp4;
       switch (f) {
         // prettier-ignore
         case 0:
@@ -729,16 +728,15 @@ Shadings.Mesh = (function MeshClosure() {
       if (!(0 <= f && f <= 3)) {
         throw new FormatError("Unknown type7 flag");
       }
-      var i, ii;
       const pi = coords.length;
-      for (i = 0, ii = f !== 0 ? 12 : 16; i < ii; i++) {
+      for (let i = 0, ii = f !== 0 ? 12 : 16; i < ii; i++) {
         coords.push(reader.readCoordinate());
       }
       const ci = colors.length;
-      for (i = 0, ii = f !== 0 ? 2 : 4; i < ii; i++) {
+      for (let i = 0, ii = f !== 0 ? 2 : 4; i < ii; i++) {
         colors.push(reader.readComponents());
       }
-      var tmp1, tmp2, tmp3, tmp4;
+      let tmp1, tmp2, tmp3, tmp4;
       switch (f) {
         // prettier-ignore
         case 0:
@@ -897,7 +895,7 @@ Shadings.Mesh = (function MeshClosure() {
         decodeType4Shading(this, reader);
         break;
       case ShadingType.LATTICE_FORM_MESH:
-        var verticesPerRow = dict.get("VerticesPerRow") | 0;
+        const verticesPerRow = dict.get("VerticesPerRow") | 0;
         if (verticesPerRow < 2) {
           throw new FormatError("Invalid VerticesPerRow");
         }
```
2021-03-14 11:43:05 +01:00
Jonas Jenwald
5b5061afa8 Enable the ESLint no-var rule globally
A significant portion of the code-base has now been converted to use `let`/`const`, rather than `var`, hence it should be possible to simply enable the ESLint `no-var` rule globally.
This way we can ensure that new code won't accidentally use `var`, and it also removes the need to manually enable the rule in various folders.

Obviously it makes sense to continue the efforts to replace `var`, but that should probably happen on a file and/or folder basis.

Please note that this patch excludes the following code:
 - The `extensions/` folder, since that seemed easiest for now (and I don't know exactly what the support situation is for the Chromium-extension).

 - The entire `external/` folder is ignored, since most of it's currently excluded from linting.
   For the code that isn't imported from elsewhere (and should be ignored), we should probably (at some point) bring the code up to the same linting/formatting standard as the rest of the code-base.

 - Various files in the `test/` folder are ignored, as necessary, since the way that a lot of this code is loaded will require some care (or perhaps larger re-factoring) when removing `var` usage.
2021-03-13 16:12:53 +01:00
Tim van der Meij
17c0bf0473
Merge pull request #13084 from Snuffleupagus/type1-class
Enable the ESLint `no-var` rule in a few font-parsing files, and convert `src/core/type1_parser.js` to use "standard" classes
2021-03-13 13:15:53 +01:00
Jonas Jenwald
50681d71c8 Ensure that getDocument handles Node.js Buffers more gracefully (issue 13075)
While the JSDocs have never advertised `getDocument` as supporting Node.js `Buffer`s, that apparently doesn't stop users from passing such data structures to `getDocument`.
In theory the existing `instanceof Uint8Array` check ought to have caught Node.js `Buffer`s, however for reasons that I don't even pretend to understand that check actually passes. Hence this patch which, *only* in Node.js environments, will special-case `Buffer`s to hopefully provide a slightly better out-of-the-box behaviour in Node.js environments[1].

---
[1] Although I'm not sure that we necessarily want to advertise this in the JSDocs, given the specialized use-case.
2021-03-13 10:52:38 +01:00
Tim van der Meij
be4a41960a
Merge pull request #13081 from Snuffleupagus/objectFromMap
Replace the `objectFromEntries` helper function with an `objectFromMap` one instead, and simplify the data lookup in the AnnotationStorage.getValue method
2021-03-12 21:12:03 +01:00
Jonas Jenwald
ab91f42a5e Convert code in src/core/type1_parser.js to use "standard" classes
All of this code predates the existence of native JS classes, however we can now clean this up a little bit.
2021-03-12 12:16:50 +01:00
Jonas Jenwald
8fc8dc020e Enable the ESLint no-var rule in the src/core/type1_parser.js file
Note that the majority of these changes were done automatically, by using `gulp lint --fix`, and the manual changes were limited to the following diff:

```diff
diff --git a/src/core/type1_parser.js b/src/core/type1_parser.js
index 192781de1..05c5fe2e5 100644
--- a/src/core/type1_parser.js
+++ b/src/core/type1_parser.js
@@ -251,7 +251,7 @@ const Type1CharString = (function Type1CharStringClosure() {
               // vhea tables reconstruction -- ignoring it.
               this.stack.pop(); // wy
               wx = this.stack.pop();
-              var sby = this.stack.pop();
+              const sby = this.stack.pop();
               sbx = this.stack.pop();
               this.lsb = sbx;
               this.width = wx;
@@ -263,8 +263,8 @@ const Type1CharString = (function Type1CharStringClosure() {
                 error = true;
                 break;
               }
-              var num2 = this.stack.pop();
-              var num1 = this.stack.pop();
+              const num2 = this.stack.pop();
+              const num1 = this.stack.pop();
               this.stack.push(num1 / num2);
               break;
             case (12 << 8) + 16: // callothersubr
@@ -273,7 +273,7 @@ const Type1CharString = (function Type1CharStringClosure() {
                 break;
               }
               subrNumber = this.stack.pop();
-              var numArgs = this.stack.pop();
+              const numArgs = this.stack.pop();
               if (subrNumber === 0 && numArgs === 3) {
                 const flexArgs = this.stack.splice(this.stack.length - 17, 17);
                 this.stack.push(
@@ -397,9 +397,9 @@ const Type1Parser = (function Type1ParserClosure() {
     if (discardNumber >= data.length) {
       return new Uint8Array(0);
     }
+    const c1 = 52845,
+      c2 = 22719;
     let r = key | 0,
-      c1 = 52845,
-      c2 = 22719,
       i,
       j;
     for (i = 0; i < discardNumber; i++) {
@@ -416,9 +416,9 @@ const Type1Parser = (function Type1ParserClosure() {
   }

   function decryptAscii(data, key, discardNumber) {
-    let r = key | 0,
-      c1 = 52845,
+    const c1 = 52845,
       c2 = 22719;
+    let r = key | 0;
     const count = data.length,
       maybeLength = count >>> 1;
     const decrypted = new Uint8Array(maybeLength);
@@ -429,7 +429,7 @@ const Type1Parser = (function Type1ParserClosure() {
         continue;
       }
       i++;
-      var digit2;
+      let digit2;
       while (i < count && !isHexDigit((digit2 = data[i]))) {
         i++;
       }
@@ -599,7 +599,7 @@ const Type1Parser = (function Type1ParserClosure() {
               if (token !== "/") {
                 continue;
               }
-              var glyph = this.getToken();
+              const glyph = this.getToken();
               length = this.readInt();
               this.getToken(); // read in 'RD' or '-|'
               data = length > 0 ? stream.getBytes(length) : new Uint8Array(0);
@@ -638,7 +638,7 @@ const Type1Parser = (function Type1ParserClosure() {
           case "OtherBlues":
           case "FamilyBlues":
           case "FamilyOtherBlues":
-            var blueArray = this.readNumberArray();
+            const blueArray = this.readNumberArray();
             // *Blue* values may contain invalid data: disables reading of
             // those values when hinting is disabled.
             if (
@@ -672,7 +672,7 @@ const Type1Parser = (function Type1ParserClosure() {
       }

       for (let i = 0; i < charstrings.length; i++) {
-        glyph = charstrings[i].glyph;
+        const glyph = charstrings[i].glyph;
         encoded = charstrings[i].encoded;
         const charString = new Type1CharString();
         const error = charString.convert(
@@ -728,12 +728,12 @@ const Type1Parser = (function Type1ParserClosure() {
         token = this.getToken();
         switch (token) {
           case "FontMatrix":
-            var matrix = this.readNumberArray();
+            const matrix = this.readNumberArray();
             properties.fontMatrix = matrix;
             break;
           case "Encoding":
-            var encodingArg = this.getToken();
-            var encoding;
+            const encodingArg = this.getToken();
+            let encoding;
             if (!/^\d+$/.test(encodingArg)) {
               // encoding name is specified
               encoding = getEncoding(encodingArg);
@@ -764,7 +764,7 @@ const Type1Parser = (function Type1ParserClosure() {
             properties.builtInEncoding = encoding;
             break;
           case "FontBBox":
-            var fontBBox = this.readNumberArray();
+            const fontBBox = this.readNumberArray();
             // adjusting ascent/descent
             properties.ascent = Math.max(fontBBox[3], fontBBox[1]);
             properties.descent = Math.min(fontBBox[1], fontBBox[3]);
```
2021-03-12 12:05:48 +01:00
Jonas Jenwald
82062f7e0d Enable the ESLint no-var rule in the src/core/cff_parser.js file
Note that the majority of these changes were done automatically, by using `gulp lint --fix`, and the manual changes were limited to the following diff:

```diff
diff --git a/src/core/cff_parser.js b/src/core/cff_parser.js
index d684c200e..2e2b811e4 100644
--- a/src/core/cff_parser.js
+++ b/src/core/cff_parser.js
@@ -555,7 +555,7 @@ const CFFParser = (function CFFParserClosure() {
           stackSize %= 2;
           validationCommand = CharstringValidationData[value];
         } else if (value === 10 || value === 29) {
-          var subrsIndex;
+          let subrsIndex;
           if (value === 10) {
             subrsIndex = localSubrIndex;
           } else {
@@ -886,15 +886,15 @@ const CFFParser = (function CFFParserClosure() {
         format = bytes[pos++];
         switch (format & 0x7f) {
           case 0:
-            var glyphsCount = bytes[pos++];
+            const glyphsCount = bytes[pos++];
             for (i = 1; i <= glyphsCount; i++) {
               encoding[bytes[pos++]] = i;
             }
             break;

           case 1:
-            var rangesCount = bytes[pos++];
-            var gid = 1;
+            const rangesCount = bytes[pos++];
+            let gid = 1;
             for (i = 0; i < rangesCount; i++) {
               const start = bytes[pos++];
               const left = bytes[pos++];
@@ -938,7 +938,7 @@ const CFFParser = (function CFFParserClosure() {
           }
           break;
         case 3:
-          var rangesCount = (bytes[pos++] << 8) | bytes[pos++];
+          const rangesCount = (bytes[pos++] << 8) | bytes[pos++];
           for (i = 0; i < rangesCount; ++i) {
             let first = (bytes[pos++] << 8) | bytes[pos++];
             if (i === 0 && first !== 0) {
@@ -1173,7 +1173,7 @@ class CFFDict {
   }
 }

-var CFFTopDict = (function CFFTopDictClosure() {
+const CFFTopDict = (function CFFTopDictClosure() {
   const layout = [
     [[12, 30], "ROS", ["sid", "sid", "num"], null],
     [[12, 20], "SyntheticBase", "num", null],
@@ -1229,7 +1229,7 @@ var CFFTopDict = (function CFFTopDictClosure() {
   return CFFTopDict;
 })();

-var CFFPrivateDict = (function CFFPrivateDictClosure() {
+const CFFPrivateDict = (function CFFPrivateDictClosure() {
   const layout = [
     [6, "BlueValues", "delta", null],
     [7, "OtherBlues", "delta", null],
@@ -1265,11 +1265,12 @@ var CFFPrivateDict = (function CFFPrivateDictClosure() {
   return CFFPrivateDict;
 })();

-var CFFCharsetPredefinedTypes = {
+const CFFCharsetPredefinedTypes = {
   ISO_ADOBE: 0,
   EXPERT: 1,
   EXPERT_SUBSET: 2,
 };
+
 class CFFCharset {
   constructor(predefined, format, charset, raw) {
     this.predefined = predefined;
@@ -1695,7 +1696,7 @@ class CFFCompiler {
             // For offsets we just insert a 32bit integer so we don't have to
             // deal with figuring out the length of the offset when it gets
             // replaced later on by the compiler.
-            var name = dict.keyToNameMap[key];
+            const name = dict.keyToNameMap[key];
             // Some offsets have the offset and the length, so just record the
             // position of the first one.
             if (!offsetTracker.isTracking(name)) {
```
2021-03-12 12:00:38 +01:00
Jonas Jenwald
f3948aeb90 Enable the ESLint no-var rule in the src/core/font_renderer.js file
Note that the majority of these changes were done automatically, by using `gulp lint --fix`, and the manual changes were limited to the following diff:

```diff
diff --git a/src/core/font_renderer.js b/src/core/font_renderer.js
index e1538c481..00f5424cd 100644
--- a/src/core/font_renderer.js
+++ b/src/core/font_renderer.js
@@ -152,9 +152,9 @@ const FontRendererFactory = (function FontRendererFactoryClosure() {
   }

   function lookupCmap(ranges, unicode) {
-    let code = unicode.codePointAt(0),
-      gid = 0;
-    let l = 0,
+    const code = unicode.codePointAt(0);
+    let gid = 0,
+      l = 0,
       r = ranges.length - 1;
     while (l < r) {
       const c = (l + r + 1) >> 1;
@@ -199,7 +199,7 @@ const FontRendererFactory = (function FontRendererFactoryClosure() {
         flags = (code[i] << 8) | code[i + 1];
         const glyphIndex = (code[i + 2] << 8) | code[i + 3];
         i += 4;
-        var arg1, arg2;
+        let arg1, arg2;
         if (flags & 0x01) {
           arg1 = ((code[i] << 24) | (code[i + 1] << 16)) >> 16;
           arg2 = ((code[i + 2] << 24) | (code[i + 3] << 16)) >> 16;
@@ -366,7 +366,7 @@ const FontRendererFactory = (function FontRendererFactoryClosure() {
       while (i < code.length) {
         let stackClean = false;
         let v = code[i++];
-        var xa, xb, ya, yb, y1, y2, y3, n, subrCode;
+        let xa, xb, ya, yb, y1, y2, y3, n, subrCode;
         switch (v) {
           case 1: // hstem
             stems += stack.length >> 1;
@@ -494,7 +494,7 @@ const FontRendererFactory = (function FontRendererFactoryClosure() {
                 bezierCurveTo(xa, y2, xb, y3, x, y);
                 break;
               case 37: // flex1
-                var x0 = x,
+                const x0 = x,
                   y0 = y;
                 xa = x + stack.shift();
                 ya = y + stack.shift();
```
2021-03-12 11:57:27 +01:00
Brendan Dahl
1b42fe2917
Merge pull request #13015 from calixteman/avoid_dl
JS - Avoid a popup to ask for updating Acrobat.
2021-03-11 08:43:49 -08:00
Jonas Jenwald
b326432895 Simplify the data lookup in the AnnotationStorage.getValue method
Rather than first checking if data exists before fetching it from storage, we can simply do the lookup directly and then check its value.
Note that this follows the same pattern as utilized in the `AnnotationStorage.setValue` method.
2021-03-11 16:37:38 +01:00
Jonas Jenwald
a0e584eeb2 Replace the objectFromEntries helper function with an objectFromMap one instead
Given that it's only used with `Map`s, and that it's currently implemented in such a way that we (indirectly) must iterate through the data *twice*, some simplification cannot hurt here.
Note that the only reason that we're not using `Object.fromEntries(...)` directly, at each call-site, is that that one won't guarantee that a `null` prototype is being used.
2021-03-11 16:37:34 +01:00
Calixte Denizet
3243672727 XFA - Create Form DOM in merging template and data trees
- Spec: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=171;
  - support for the 2 ways of merging: consumeData and matchTemplate;
  - create additional nodes in template DOM when occur node allows it;
  - support for global values in data DOM.
2021-03-08 14:10:30 +01:00
Calixte Denizet
c01ef24541 JS - reset correctly radio buttons 2021-03-07 11:04:40 +01:00
Tim van der Meij
5828ff6cb0
Implement rendering line annotations without appearance stream 2021-02-28 18:57:58 +01:00
Tim van der Meij
d6e0b2d92e
Merge pull request #13032 from Snuffleupagus/parseDestDictionary-actionName-warn
Don't warn about actions that require scripting support in `Catalog.parseDestDictionary`
2021-02-28 14:52:06 +01:00
Jonas Jenwald
39cf4a0008 Don't warn about actions that require scripting support in Catalog.parseDestDictionary
Now that we have scripting support, warning about e.g. JavaScript actions doesn't seem necessary anymore. Especially considering that scripting-related actions are/will not be parsed by the `Catalog.parseDestDictionary` method anyway, since it's intended for handling "simple" actions.
2021-02-28 13:13:17 +01:00
Tim van der Meij
fa6cebf045
Implement rendering square/circle annotations without appearance stream 2021-02-27 19:05:12 +01:00
Jonas Jenwald
05de20071a Modernize some of the code in src/core/cmap.js by using classes and async/await
This converts a couple of our old "classes" to proper ECMAScript classes, and replaces a lot of manual Promise-wrapping with async/await instead.
2021-02-27 14:20:43 +01:00
Tim van der Meij
4e96d59fca
Use a buffer instead of string concatenation in reverseIfRtl in src/core/unicode.js
This avoids creating intermediate strings and should be slightly more
efficient.
2021-02-27 13:20:09 +01:00
Tim van der Meij
24f80f1e38
Enable the no-var linting rule in src/core/primitives.js 2021-02-27 12:51:01 +01:00
Tim van der Meij
ed33727419
Enable the no-var linting rule in src/core/glyphlist.js 2021-02-27 12:46:57 +01:00
Tim van der Meij
e051d4d029
Enable the no-var linting rule in src/core/ccitt_stream.js 2021-02-27 12:44:55 +01:00
Tim van der Meij
0897dddbbe
Enable the no-var linting rule in src/core/unicode.js 2021-02-27 12:44:50 +01:00
Tim van der Meij
cb82dda755
Enable the no-var linting rule in src/core/metrics.js 2021-02-27 12:44:45 +01:00
Tim van der Meij
55786a4880
Merge pull request #13026 from Snuffleupagus/crypto-classes
Convert code in `src/core/crypto.js` to use "normal" classes
2021-02-26 22:39:30 +01:00
Tim van der Meij
848753671f
Merge pull request #13025 from Snuffleupagus/function-classes
Convert code in `src/core/function.js` to use "normal" classes
2021-02-26 22:22:38 +01:00
Jonas Jenwald
6b4c4f80e4 Convert code in src/core/crypto.js to use "normal" classes
All of this code predates the existence of native JS classes, however we can now clean this up a bit. This patch thus let us remove some variable "shadowing" from the code.
2021-02-26 15:51:45 +01:00
Jonas Jenwald
b884757873 Inline the concatArrays function in calculatePDF20Hash
This helper function is first of all only called *twice*, and secondly it also leads to unnecessary intermediate allocations given how the `TypedArray`s are handled.
Hence we can simply inline this small function, and thus directly allocate the combined `TypedArray` instead.
2021-02-26 15:51:39 +01:00
Jonas Jenwald
9a9a5b2365 Replace the compareByteArrays functions, in src/core/crypto.js, with the isArrayEqual helper function
The `compareByteArrays` is first of all duplicated in multiple closures in the `src/core/crypto.js` file. Secondly, despite its name, it's also functionally equivalent to the now existing `isArrayEqual` helper function.

The `isArrayEqual` helper function is changed to use a standard `for`-loop, rather than `Array.prototype.every`, since that ought to be slightly more efficient given that we're now using it with (potentially) larger data.
2021-02-26 15:51:32 +01:00
Jonas Jenwald
e69e8622a9 Convert code in src/core/function.js to use "normal" classes
All of this code predates the existence of native JS classes, however we can now clean this up a bit. This patch thus let us remove some variable "shadowing" from the code.
2021-02-26 13:20:59 +01:00
Jonas Jenwald
6fd899dc44 [api-minor] Support the Content-Disposition filename in the Firefox PDF Viewer (bug 1694556, PR 9379 follow-up)
As can be seen [in the mozilla-central code](https://searchfox.org/mozilla-central/rev/a6db3bd67367aa9ddd9505690cab09b47e65a762/toolkit/components/pdfjs/content/PdfStreamConverter.jsm#1222-1225), we're already getting the Content-Disposition filename. However, that data isn't passed through to the viewer nor to the `PDFDataTransportStream`-implementation, which explains why it's currently being ignored.

*Please note:* This will also require a small mozilla-central patch, see https://bugzilla.mozilla.org/show_bug.cgi?id=1694556, to forward the necessary data to the viewer.
2021-02-26 10:50:29 +01:00
Jonas Jenwald
70d1869fe5 Remove the, strictly unnecessary, closure and variable shadowing from createObjectURL
Note that this particular helper function is, with the exception of the `GENERIC` default viewer and the (unsupported) SVG-backend, mostly unused at this point in time. Hence we should be able to clean-up this helper function slightly.

Also, fixes a small inconsistency in the `SVGGraphics` initialization in the viewer, by passing in the `disableCreateObjectURL` compatibility-option. Given that the SVG-backend isn't officially supported/recommended this shouldn't have been an issue, but given that I spotted this it can't hurt to fix it.
2021-02-25 16:34:23 +01:00
Tim van der Meij
8b7dee0aae
Merge pull request #13009 from Snuffleupagus/openOrDownloadData
Move the opening of PDF file attachments into the `DownloadManager`-implementations
2021-02-24 20:57:12 +01:00
Calixte Denizet
ecd45cc9af JS - Avoid a popup to ask for updating Acrobat.
- this popup appears because js is enabled;
  - and because the pdf contains some unsupported features (e.g. XFA);
  - can be tested with: https://www.cbsa-asfc.gc.ca/publications/forms-formulaires/a10.pdf.
2021-02-24 15:52:48 +01:00
calixteman
45329af926
XFA -- Add support for SOM expressions (#12983)
- specifications: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=87;
 - add a parser for SOM expressions;
 - add search functions to resolve those expressions;
 - search functions will be used to bind data into template.
2021-02-24 10:13:02 +01:00
Jonas Jenwald
df931ef685 Move the opening of PDF file attachments into the DownloadManager-implementations
Note how the `PDFAttachmentViewer` handles PDF file attachments specially, by opening them in a new window/tab, rather than forcing them to be downloaded. This is done to improve the overall UX, since browsers in general are able to handle PDF files internally.
However, for file *annotations* we're currently not attempting to do the same thing and are instead just downloading them directly. In order to unify the behaviour, without having to duplicate a lot of code, the opening of PDF file attachments is thus moved into a new `DownloadManager.openOrDownloadData` method.
2021-02-23 13:44:23 +01:00
Tim van der Meij
f3aa4408a5
Merge pull request #13005 from calixteman/colors
JS - Fix setting a color on an annotation
2021-02-21 14:50:03 +01:00
Calixte Denizet
4a5f1d1b7a JS - Fix setting a color on an annotation
- strokeColor corresponds to borderColor;
 - support fillColor and textColor;
 - support colors on the different annotations;
 - fix typo in aforms (+test).
2021-02-20 15:24:37 +01:00
Jonas Jenwald
d69cf702f3 Add a this-bound method for InternalRenderTask.cancel
This is similar to the other methods, and the only reason for this not having been done originally is that the `cancel` functionality is a later addition.
2021-02-20 14:47:57 +01:00
Jonas Jenwald
e9038cc3d1 Send the AnnotationStorage-data to the worker-thread as a Map
Rather than converting the `AnnotationStorage`-data to an Object, before sending it to the worker-thread, we should be able to simply send the internal `Map` directly.
The "structured clone algorithm" doesn't have a problem with `Map`s, however the `LoopbackPort` used when workers are *disabled* (e.g. in Node.js environments) didn't use to support them. With PR 12997 having lifted that restriction, we should now be able to simply send the `AnnotationStorage`-data as-is rather than having to iterate through it to first create an Object.

*Please note:* The changes in `src/core/annotation.js` could have been a lot more compact if we were able to use optional chaining in the `src/core` folder. Unfortunately that's still not possible, since SystemJS is being used in the development viewer (i.g. `gulp server`) and fixing that is *still* blocked by [bug 1247687](https://bugzilla.mozilla.org/show_bug.cgi?id=1247687).
2021-02-18 17:13:43 +01:00
calixteman
0fa9976268
XFA - Add support for prototypes (#12979)
- specifications: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=225&zoom=auto,-207,784
 - add a clone method on nodes in order to be able to clone a proto;
 - support ids in template namespace;
 - prevent from cycle when applying protos.
2021-02-18 10:32:25 +01:00
Tim van der Meij
4619b1b568
Merge pull request #12997 from Snuffleupagus/metadata-worker
Move the Metadata parsing to the worker-thread
2021-02-17 20:57:46 +01:00
Tim van der Meij
77862bdb8e
Merge pull request #12999 from Snuffleupagus/LoopbackPort-rm-sync
[api-minor] Remove support for synchronous event dispatching in `LoopbackPort`
2021-02-17 20:39:54 +01:00
Tim van der Meij
1d3af89cae
Merge pull request #12998 from Snuffleupagus/renderInteractiveForms-defaults
Simplify the default value handling of `renderInteractiveForms` in the viewer components
2021-02-17 20:37:03 +01:00
calixteman
b5be515375
XFA - Add a lexer/parser for FormCalc language (#12936)
- the language specifications are: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=1049
 - it can be used to:
   * as a scripting language for calculation, validations, ...
   * in SOM expressions to select nodes: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=101
2021-02-17 20:28:06 +01:00
Jonas Jenwald
3398070e26 [api-minor] Remove support for synchronous event dispatching in LoopbackPort
*Please note:* The `defer` parameter has been enabled by default ever since PR 9777 (in 2018), which first shipped in PDF.js release `2.0.943`.
With workers *disabled*, e.g. in Node.js environments, this has been used ever since without any problems reported[1].

The impetus for this change was that I happened to notice that *if* the `LoopbackPort` was used with synchronous event dispatching, we'd simply send that data as-is to the listeners. This created an inconsistency in the data returned from the `pdf.worker.js` file, since `postMessage` used with *actual* workers (or the `LoopbackPort` with `defer = true`) will ignore/throw when encountering unclonable data.
Originally my intention was simply to just call `cloneValue` regardless of the event dispatching used in `LoopbackPort`, however looking at the use-cases (or lack thereof) of the `LoopbackPort` it seemed reasonable to simply remove the `defer` parameter instead.

This patch is tagged "[api-minor]" since the `LoopbackPort` is still exposed in the API, although I really hope that no third-party is using this (since disabling workers leads to bad performance).

Finally, this patch changes a `forEach` loop to `for...of` and makes uses of optional changing in existing code.

---
[1] As evident by the `npm test` command run by Github Actions, and previously by Travis.
2021-02-17 16:12:29 +01:00
Jonas Jenwald
d366bbdf51 Move the encodeToXmlString helper function to src/core/core_utils.js
With the previous patch this function is now *only* accessed on the worker-thread, hence it's no longer necessary to include it in the *built* `pdf.js` file.
2021-02-17 13:12:01 +01:00
Jonas Jenwald
b66f294f64 Move the XML-parser to the src/core/-folder
With the previous patch this functionality is now *only* accessed on the worker-thread, hence it's no longer necessary to include it in the *built* `pdf.js` file.
2021-02-17 13:12:01 +01:00
Jonas Jenwald
cc3a6563ee Move the Metadata parsing to the worker-thread
The only reason, as far as I can tell, for parsing the Metadata on the main-thread is how it was originally implemented. When Metadata support was first implemented, it utilized the [`DOMParser`](https://developer.mozilla.org/en-US/docs/Web/API/DOMParser) which isn't available in workers.
Today, with the custom XML-parser being used, that's no longer an issue and it seems reasonable to move the Metadata parsing to the worker-thread[1], since that's where all parsing should happen (for performance reasons).

Based on these changes, we'll be able to reduce the now unnecessary duplication of the XML-parser (and related code) in both of the *built* `pdf.js`/`pdf.worker.js` files.

Finally, this patch changes the `_repair` method to use "Array + join" rather than string concatenation.

---
[1] This needed the previous patch, to enable sending of `Map`s between threads with workers disabled.
2021-02-17 13:12:01 +01:00
Jonas Jenwald
73bf45e64b Support Map and Set, with postMessage, when workers are disabled
The `LoopbackPort` currently doesn't support `Map` and `Set`, which it should since the "structured clone algorithm" used in browsers does support both of them; please see https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm#supported_types
2021-02-17 13:11:59 +01:00
Jonas Jenwald
0a28e51e40 Simplify the default value handling of renderInteractiveForms in the viewer components
I happened to look at this code, and I can't for the life of me figure out why I didn't just implement it like this patch in the first place (since the current format feels overly verbose).
2021-02-17 10:47:55 +01:00
Calixte Denizet
82f75a8ac2 JS -- Fix doc.getField and add missing field methods
- getField("foo") was wrongly returning a field named "foobar";
 - field object had few missing unimplemented methods
2021-02-17 10:42:52 +01:00
Tim van der Meij
bab059d8fd
Merge pull request #12964 from calixteman/12963
Avoid infinite loop when getting annotation field name
2021-02-16 22:36:24 +01:00
Calixte Denizet
0fc8267576 Avoid infinite loop when getting annotation field name
- aims to fix issue #12963;
 - use a Set to track already visited objects;
 - remove the loop limit in getInheritableProperty and use a RefSet too.
2021-02-14 19:58:19 +01:00
Jonas Jenwald
b26c7974fe [api-minor] Change the dc:subject Metadata field to an Array
This patch simply extends the existing handling of the `dc:creator` field, which should hopefully suffice here; please refer to https://wwwimages2.adobe.com/content/dam/acom/en/devnet/xmp/pdfs/XMP%20SDK%20Release%20cc-2016-08/XMPSpecificationPart1.pdf#page=34
2021-02-14 17:16:40 +01:00
Tim van der Meij
c79fd71457
Merge pull request #12896 from calixteman/text_layer
Modifiy the way to compute baseline to have a better match between canvas and text layer
2021-02-13 15:12:58 +01:00
Jonas Jenwald
1ee747a620 Remove unneeded instanceof MissingDataException checks
The following checks are all unneeded, and could easily cause confusion when reading the code. (All of them are my fault as well, since I've sometimes added those checks without really thinking about the surrounding code.)

 - In `PartialEvaluator.hasBlendModes` there cannot be any `MissingDataException`s thrown, given that the `Page.getOperatorList` method waits for all the necessary /Resources to load first. Furthermore, note also that if an error is thrown from `PartialEvaluator.hasBlendModes` then it'd completely break rendering of that page, since any errors thrown from `Page.getOperatorList` are simply sent to the main-thread.

 - In `PartialEvaluator.handleColorN` there cannot be any `MissingDataException`s thrown, given that again the `Page.getOperatorList` method waits for all the necessary /Resources to load before operatorList parsing starts.

 - In `XRef.readXRef` there cannot be any `MissingDataException`s thrown, given that we're *explicitly* requesting (and waiting for) the entire document in `pdfManagerReady` (in `src/core/worker.js`) before re-parsing of a corrupt document starts.
2021-02-13 12:26:05 +01:00
Calixte Denizet
ea06bb0e36 [api-minor] Annotation -- Don't compute appearance when nothing has changed
* don't set a value in annotationStorage by default:
   - having an undefined when the annotation is rendered for saving/printing means nothing has changed so use normal appearance
   - aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1681687
 * change the way to compute font size when this one is null in DA:
   - make fontSize proportional to line height
   - in multiline case, take into account the number of lines for text entered to adapt the font size
2021-02-12 19:27:21 +01:00
Calixte Denizet
b4421b076a Modifiy the way to compute baseline to have a better match between canvas and text layer
- use ascent of the fallback font instead of the one from pdf to position spans
 - use TextMetrics.fontBoundingBoxAscent if available or
 - use a basic heuristic to guess ascent in drawing char on a canvas
 - compute ascent as a ratio of font height
2021-02-12 11:28:02 +01:00
dhufnagel
fc925827b2
fix initial state of checkboxes in display layer (#12904)
consider the export value when multiple checkboxes have the same name
2021-02-12 11:22:54 +01:00
Jonas Jenwald
133158e4d5 Update the year in the license_header files 2021-02-11 17:52:26 +01:00
calixteman
0479deef4e
XFA -- Add other objects (#12949)
- connectionSet: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=969
 - datasets: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=1038
  - signature: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=1040
  - stylesheet: the same
  - xhtml: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=1187
2021-02-11 12:30:37 +01:00
calixteman
3787bd41ef
XFA -- Add localset object (#12948)
- Specifications: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=943
2021-02-10 18:04:43 +01:00
Jonas Jenwald
0068dba009 [api-minor] Rename -es5 to -legacy, to reduce confusion over what's actually supported (issue 12976)
*Please note that this will also require some edits of the Wiki.*
2021-02-10 16:01:59 +01:00
Jonas Jenwald
31098c404d
Use Math.hypot, instead of Math.sqrt with manual squaring (#12973)
When the PDF.js project started `Math.hypot` didn't exist yet, and until recently we still supported browsers (IE 11) without a native `Math.hypot` implementation; please see this compatibility information: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/hypot#browser_compatibility

Furthermore, somewhat recently there were performance improvements of `Math.hypot` in Firefox; see https://bugzilla.mozilla.org/show_bug.cgi?id=1648820

Finally, this patch also replaces a couple of multiplications with the exponentiation operator.
2021-02-10 12:28:49 +01:00
Tim Nguyen
2ca886baee Use DOM hidden property instead of attribute methods 2021-02-08 00:21:49 +01:00
Jonas Jenwald
e6fe8a7d53 Handle errors gracefully, in PartialEvaluator.translateFont, when fetching the font file (issue 9462)
The *third* page of the referenced PDF document currently fails to render completely, since one of its font files fail to load.
Since that error isn't handled, a large part of the text is thus missing which looks quite bad. By "replacing" the font data with an *empty* stream, we'll thus be able to fallback to rendering the text with a standard font (instead of using `ErrorFont`). While there's obviously no guarantee that things will look perfect, actually rendering the text at all should be an improvement in general.

Also, print a warning in `PartialEvaluator.loadFont` when the `PartialEvaluator.translateFont` method rejects, since that'd have helped debug/fix the issue faster.
2021-02-06 19:44:53 +01:00
Jonas Jenwald
d3e65f24e3 Request all data, rather than throwing, when encountering general errors in ObjectLoader._walk (issue 9462, PR 3289 follow-up)
*As far as I can tell, this has been broken ever since PR 3289 (back in 2013) without anyone noticing.*

For any non-`MissingDataException` errors encountered in `ObjectLoader._walk`, we're simply throwing immediately which thus has the potential to *completely* break rendering of an entire page.
In practice this is obviously only an issue for PDF documents which are in one way or another corrupt, since that's the only way that `XRef.fetch` will throw non-`MissingDataException` errors. To make matters worse these errors are *intermittent*, since they can only occur if the document is still loading when the `ObjectLoader`-code runs (note the early return in `ObjectLoader.load`).

Please note that we cannot simply catch the error and let "normal" parsing continue in `ObjectLoader._walk`, since that could lead to errors elsewhere given that resources "below" the current one (in the graph) might not be checked as intended then.
All-in-all, the only way to make absolutely sure that we won't cause *unexpected* `MissingDataException`s somewhere else in the code-base is to fallback to fetching the *entire* document in this edge-case.
2021-02-06 14:33:50 +01:00
Brendan Dahl
a392082e30
Merge pull request #12944 from calixteman/xfa_config
XFA -- Update config object
2021-02-05 15:06:09 -08:00
Calixte Denizet
9d47e69771 XFA -- Update config object 2021-02-05 19:22:51 +01:00
Calixte Denizet
652ff57897 XFA -- Add template object
- Specifications: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=596
2021-02-03 21:05:10 +01:00
Calixte Denizet
7e0554afe2 XFA -- Add attributes and children in XFAObject
- in order to evaluate SOM expressions nodes and their attributes must be checked in the same order as in the xml;
 - add an object XFAObjectArray with a parameter max to handle multiple children with the same name.
2021-02-03 18:56:00 +01:00
Calixte Denizet
0ff5cd7eb5 XFA - Add a parser for XFA files
- the parser is base on a class extending XMLParserBase
 - it handle xml namespaces:
   * each namespace is assocated with a builder
   * builder builds nodes belonging to the namespace
   * when a node is inserted in the parent namespace compatibility is checked (if required)
 - to avoid name collision between xml names and object properties, use Symbol.
2021-02-01 13:45:31 +01:00
Jonas Jenwald
fec8c4c43f Access this._onUnsupportedFeature directly in FontFaceObject.getPathGenerator
Given that `FontFaceObject` is not exposed in the public API, but only accessed internally, there's no need to assume that a `FontFaceObject`-instance is ever initialized without `onUnsupportedFeature` being provided. This is also consistent with the `BaseFontLoader` implementation.
2021-01-29 16:48:55 +01:00
Tim van der Meij
e4e92d10e8
Merge pull request #12922 from Snuffleupagus/getTextContent-globalImageCache
Ignore globally cached images in `PartialEvaluator.getTextContent` (PR 11930 follow-up)
2021-01-28 23:44:10 +01:00
Tim van der Meij
8805614a03
Merge pull request #12924 from brendandahl/fix-clone
Fix font data clone error when pdfBug is enabled.
2021-01-28 23:42:12 +01:00
Jonas Jenwald
72da2aa166 Ignore globally cached images in PartialEvaluator.getTextContent (PR 11930 follow-up)
Given that we'll only cache `/XObject`s of the `Image`-type globally, we can utilize that in `PartialEvaluator.getTextContent` as well. This way, in cases such as e.g. issue 12098, we can avoid having to fetch/parse `/XObject`s that we already know to be `Image`s. This is helpful, since `Stream`s are not cached on the `XRef` instance (given their potential size) and the lookup can thus be somewhat expensive in general.

Also, skip a redundant `RefSetCache.has` check in the `GlobalImageCache.getData` method.
2021-01-28 10:19:26 +01:00
Brendan Dahl
52fb5abb0b Fix font data clone error when pdfBug is enabled.
The widths property should be an object to match what metrics returns.

In ZapfDingbats.pdf I was getting a data clone error with pdfBug enabled.
In buildCharCodeToWidth() there was an encoding with the name "at" which
is also the name of a method on an array. buildCharCodeToWidth assumes an
object is passed in, so when it checked for the "at" property, it found the
method and copied it over.

This only seemed to affect Firefox.
2021-01-27 14:38:43 -08:00
Tim van der Meij
d52e5b0505
Merge pull request #12903 from Snuffleupagus/GlobalImageCache-byteSize
Improve global image caching for small images (PR 11912 follow-up, issue 12098)
2021-01-27 22:20:58 +01:00
Tim van der Meij
286271152f
Merge pull request #12910 from calixteman/bidi
Add back dir property in spans in text layer
2021-01-27 22:09:00 +01:00
calixteman
465697eb10
JS - QuickJS sandbox initialization must be the last evaluated string (#12914)
- aims to fix issue #12912
2021-01-26 14:56:01 +01:00
Jonas Jenwald
1ab6d2c604 Improve global image caching for small images (PR 11912 follow-up, issue 12098)
When implementing the `GlobalImageCache` functionality I was mostly worried about the effect of *very large* images, hence the maximum number of cached images were purposely kept quite low[1].
However, there's one fairly obvious problem with that approach: In documents with hundreds, or even thousands, of *small* images the `GlobalImageCache` as implemented becomes essentially pointless.

Hence this patch, where the `GlobalImageCache`-implementation is changed in the following ways:
 - We're still guaranteed to be able to cache a *minimum* number of images, set to `10` (similar as before).
 - If the *total* size of all the cached image data is below a threshold[2], we're allowed to cache additional images.

This patch thus *improve*, but doesn't completely fix, issue 12098. Note that that document is created by a *very poor* PDF generator, since every single page contains the *entire* document (with all of its /Resources) and to create the individual pages clipping is used.[3]

---
[1] Currently set to `10` images; imagine what would happen to overall memory usage if we encountered e.g. 50 images each 10 MB in size.

[2] This value was chosen, somewhat randomly, to be `40` megabytes; basically five times the [maximum individual image size per page](6249ef517d/src/display/api.js (L2483-L2484)).

[3] This surely has to be some kind of record w.r.t. how badly PDF generators can mess things up...
2021-01-26 12:00:12 +01:00
Calixte Denizet
539256c351 Add back dir property in spans in text layer
- aims to fix #12909
2021-01-26 12:00:05 +01:00
calixteman
a3f6882b06
JS -- add support for choice widget (#12826) 2021-01-25 23:40:57 +01:00
Tim van der Meij
f2c7338b02
Merge pull request #12897 from calixteman/12895
JS - Fix mouse event names
2021-01-24 12:28:24 +01:00
Jonas Jenwald
eb015ca023 Update the ESLint env to use "es2021"
Currently this is inconsistent, since we're using ECMAScript 2021 in the parser options.
2021-01-24 11:25:43 +01:00
Calixte Denizet
34d2e72df2 JS - Fix mouse event names
- fix issue #12895
2021-01-23 20:26:22 +01:00
Tim van der Meij
25b84ce84c
Merge pull request #12828 from dhufnagel/feature/annotation_layer_display_fontsize
[api-minor] Set font size and color for text widget annotations
2021-01-23 16:08:07 +01:00
Jonas Jenwald
6bcb4e3ad9 Ensure that parseDefaultAppearance won't attempt to access a not yet defined variable (PR 12831 follow-up)
Note how, in the `if (this.stateManager.stateStack.length !== 0) {` branch, we're attempting to access the not yet defined variable[1] `args`. If this code-path is ever hit, an Error will be thrown and parsing will thus be aborted immediately (likely leading to e.g. rendering bugs).

Note that I found this purely by accident, since I happened to glance at the LGTM report. However, I've since found that the error is also present during the unit-test[2] and with this patch we're actually testing the *intended* thing here.

As part of fixing this, and to avoid re-introducing a similar bug in the future, we'll now instead always reset `args.length` *before* attempting to read the next operator.
Also, we can use the existing `EvaluatorPreprocessor.savedStatesDepth` getter to simplify the save/restore detection a tiny bit.

---
[1] The ESLint rule `no-use-before-define` would have helped catch this problem, but unfortunately we cannot enable that without quite a bit of refactoring all over the code-base.

[2] The unit-test was updated such that it would fail in the `master`-branch.
2021-01-23 15:33:28 +01:00
Dominik Hufnagel
c5083cda02 set font size and color on annotation layer
use the default appearance to set the font size and color of a text
annotation widget
2021-01-22 23:12:14 +01:00
Tim van der Meij
6ffb6b1c0c
Merge pull request #12885 from Snuffleupagus/worker-tweak-caching
Simplify the `PDFFunctionFactory._localFunctionCache` initialization (PR 12034 follow-up); Fix the `gStateObj` lookup in `TranslatedFont._removeType3ColorOperators` (PR 12718 follow-up)
2021-01-22 20:24:33 +01:00
Jonas Jenwald
ca1f58ea42 Use _defaultAppearanceData directly in WidgetAnnotation._getSaveFieldResources (PR 12831 follow-up)
With the changes in PR 12831, it's no longer necessary to keep track of the `fontName`-string separately since it's available through the `_defaultAppearanceData`-property as well.
2021-01-22 13:23:04 +01:00
Jonas Jenwald
8137c0547d Fix the gStateObj lookup in TranslatedFont._removeType3ColorOperators (PR 12718 follow-up)
As can be seen in 2cba290361/src/core/evaluator.js (L986) the `gStateObj` (which is actually an Array despite its name), is wrapped in Array when it's inserted into the OperatorList. Hence we obviously need to take this into account when accessing it in `TranslatedFont._removeType3ColorOperators`; this mistake happened because we don't have any test-cases for this particular code-path as far as I know.
2021-01-22 12:27:38 +01:00
Jonas Jenwald
cfaf23dee2 Simplify the PDFFunctionFactory._localFunctionCache initialization (PR 12034 follow-up)
By changing this a `shadow`ed getter, we can simply access it directly and not worry about it being initialized. I have no idea why I didn't just implement it this way in the first place.
2021-01-22 12:25:05 +01:00
Brendan Dahl
2cba290361
Merge pull request #12836 from calixteman/update_buttons
JS -- update radio/checkbox values even if there are no actions
2021-01-21 14:00:26 -08:00
calixteman
1039698697
Add a parser to get font data from the default appearance (#12831)
* Add a parser to get font data from the default appearance
 - pdfium & poppler use a special parser too to get these info.

* Update src/core/default_appearance.js

Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>

Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
2021-01-21 20:15:31 +01:00
Brendan Dahl
4142001fc2
Merge pull request #12869 from calixteman/lw
Fix zoom issue with too thin lines
2021-01-21 08:31:59 -08:00
Tim van der Meij
9d4bad91e2
Merge pull request #12878 from Snuffleupagus/worker-compat-checks
Remove redundant compatibility checks, for modern `generic` builds, in `src/core/worker.js`
2021-01-20 21:03:04 +01:00
Brendan Dahl
f45ba02fd3
Merge pull request #12850 from calixteman/missing_cstes
JS -- Add few missing constants in global scope
2021-01-20 11:33:02 -08:00
Jonas Jenwald
b4eb55250e Remove redundant compatibility checks, for modern generic builds, in src/core/worker.js
With the recent additions of optional chaining and nullish coalescing to the PDF.js code-base, a couple of the checks in `src/core/worker.js` are now redundant; please see this compatibility information:
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Optional_chaining#browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Nullish_coalescing_operator#browser_compatibility

In practice, for the non-translated/non-polyfilled PDF.js builds, browsers without support for optional chaining and nullish coalescing will simply throw immediately upon loading of the code.
Hence both the `globalThis` and `Promise.allSettled` checks are now unnecessary, given this compatibility information:
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/globalThis#browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/allSettled#browser_compatibility

*Please note:* The `ReadableStream` check is however still necessary, since Node.js doesn't support that.
2021-01-20 13:09:56 +01:00
Jonas Jenwald
298ee5cfbb Replace some ternary operators with optional chaining, and nullish coalescing, in the src/display/-folder
This way, we can further reduce unnecessary code-repetition in some cases.
2021-01-19 17:20:02 +01:00
Calixte Denizet
9754216c60 Fix zoom issue with too thin lines
- aims to fix issue #12868: apply zoom factor to linewidth after setting it to 1.
 - only apply 1px-width when required
 - the sign of getSinglePixelWidth is used to know if 1px-width is required
2021-01-16 15:52:27 +01:00
Calixte Denizet
0d1b19632d Enforce linewidth to 1px when at least one of scale factor is lower than 1 2021-01-15 13:18:24 +01:00
Jonas Jenwald
cf7eb87934 Remove a duplicated reference test (PR 12812 follow-up)
- Remove a *duplicated* reference test, see "issue12810", from the manifest.

 - Improve the spelling in a couple of comments in `src/core/canvas.js`, most notable of the word "parallelogram".

 - Update a comment, also in `src/core/canvas.js`, to actually agree with the value used to reduce confusion when reading the code.
2021-01-15 10:57:15 +01:00
Brendan Dahl
6619f1f3f2
Merge pull request #12812 from calixteman/too_thin
Enforce line width to be at least 1px after applied transform
2021-01-14 15:21:44 -08:00
Jonas Jenwald
2600e59acb Always re-measure non-embedded ArialNarrow fonts (bug 1671312, PR 12725 follow-up)
While PR 12725 fixed bug 1671312 as reported, i.e. the "In the upper right corner "Purposes' has bad kerning."-part, it however broke other parts of the text rendering.
Note in particular the tables, e.g. on page 2 and beyond, where the glyphs are now rendered too close together. The reason for this is that the fonts in question are non-embedded ArialNarrow, which we just replace with Helvetica which obviously is not narrow. Given that the font replacement isn't a perfect fit for non-embedded ArialNarrow, we still need to re-measure the glyph widths in this case.
2021-01-14 15:51:48 +01:00
Jonas Jenwald
13742eb82d Inlude the JS actions for the page when dispatching the "pageopen"-event in the BaseViewer
Note first of all how the `PDFDocumentProxy.getJSActions` method in the API caches the result, which makes repeated lookups cheap enough to not really be an issue.
Secondly, with the previous patch, we're now only dispatching "pageopen"/"pageclose"-events when there's actually a sandbox that listens for them.

All-in-all, with these changes we can thus simplify the default-viewer "pageopen"-event handler a fair bit.
2021-01-12 20:28:50 +01:00
calixteman
1de1ae0be6
Merge pull request #12838 from calixteman/authors
[api-minor] Change the "dc:creator" Metadata field to an Array
2021-01-12 02:44:58 -08:00
Calixte Denizet
43d5512f5c [api-minor] Change the "dc:creator" Metadata field to an Array
- add scripting support for doc.info.authors
 - doc.info.metadata is the raw string with xml code
2021-01-11 21:34:07 +01:00
Calixte Denizet
8e6bec6e2e JS -- Add few missing constants in global scope
- these constants are available in pdfium implementation too
 - fix error code in aform.js
2021-01-11 17:19:28 +01:00
Calixte Denizet
b3dccd66ab Enforce line width to be at least 1px after applied transform
* add a comment to explain how minimal linewidth is computed.
 * when context.linewidth < 1 after transform, firefox and chrome
   don't render in the same way (issue #12810).
 * set lineWidth to 1 after transform and before stroking
   - aims fix issue #12295
   - a pixel can be transformed into a rectangle with both heights < 1.
     A single rescale leads to a rectangle with dim equals to 1 and
     the other to something greater than 1.
 * change the way to render rectangle with null dimensions:
   - right now we rely on the lineWidth set before "re" but
     it can be set after "re" and before "S" and in this case the rendering
     will be wrong.
   - render such rectangles as a single line.
2021-01-10 18:02:12 +01:00
Tim van der Meij
f85b8721d1
Merge pull request #12842 from Snuffleupagus/issue-12841
Improve handling of JPEG images without an EOI marker (issue 12841)
2021-01-10 13:21:28 +01:00
Jonas Jenwald
81525fd446 Use ESLint to ensure that exports are sorted alphabetically
There's built-in ESLint rule, see `sort-imports`, to ensure that all `import`-statements are sorted alphabetically, since that often helps with readability.
Unfortunately there's no corresponding rule to sort `export`-statements alphabetically, however there's an ESLint plugin which does this; please see https://www.npmjs.com/package/eslint-plugin-sort-exports

The only downside here is that it's not automatically fixable, but the re-ordering is a one-time "cost" and the plugin will help maintain a *consistent* ordering of `export`-statements in the future.
*Note:* To reduce the possibility of introducing any errors here, the re-ordering was done by simply selecting the relevant lines and then using the built-in sort-functionality of my editor.
2021-01-09 20:37:51 +01:00
Jonas Jenwald
cd9422a075 Improve handling of JPEG images without an EOI marker (issue 12841)
Given that the PDF document in the issue contains the same very large JPEG image *three* times, this patch includes a test-case where only the first page has been extracted from it.
2021-01-09 20:19:39 +01:00
Tim van der Meij
c0a6d6cd21
Merge pull request #12394 from calixteman/appearance
In a text widget, Font resources can be in the appearance
2021-01-08 21:03:41 +01:00
Jonas Jenwald
941b65f683 Remove unncessary CanvasFactory/CMapReaderFactory/FileReaderFactory duplication in unit-tests
Given that the API will now, after PR 12039, automatically pick the correct factories to use depending on the environment (browser vs. Node.js), we can utilize that in the unit-tests as well. This way we don't have to manually repeat the same initialization code in *multiple* unit-tests.
*Note:* The *official* PDF.js API is defined in `src/pdf.js`, hence the new exports in `src/display/api.js` will not affect that.

Also, updates the unit-test `FileReaderFactory` helpers similarily.

*Drive-by change:* Fix the `CMapReaderFactory` usage in the annotation unit-tests, since the cache should only contain raw data and not a Promise. While this obviously works as-is, having unit-tests that "abuse" the intended data format can easily lead to unnecessary failures if changes are made to the relevant `src/core/` code.
2021-01-08 17:33:59 +01:00
Calixte Denizet
7172f0a928 JS -- update radio/checkbox values even if there are no actions 2021-01-08 16:43:16 +01:00
Calixte Denizet
83119b9000 In a text widget, Font resources can be in the appearance 2021-01-08 10:13:47 +01:00
Tim van der Meij
048081fb69
Merge pull request #12824 from Snuffleupagus/preEvaluateFont-errors
Improve the handling of errors, in `PartialEvaluator.loadFont`, occuring in `PartialEvaluator.preEvaluateFont` (issue 12823)
2021-01-07 23:15:41 +01:00
Tim van der Meij
5bde4b71f8
Merge pull request #12292 from calixteman/encoding
Fix encoding issues when printing/saving a form with non-ascii characters
2021-01-07 22:56:42 +01:00
Jonas Jenwald
78c32c2697 Improve the handling of errors, in PartialEvaluator.loadFont, occuring in PartialEvaluator.preEvaluateFont (issue 12823)
Currently any errors thrown in `preEvaluateFont`, which is a *synchronous* method, will not be handled at all in the `loadFont` method and we were thus failing to return an `ErrorFont`-instance as intended here.

Also, add an *explicit* check in `PartialEvaluator.preEvaluateFont` to ensure that Type0-fonts always have a *valid* dictionary.
2021-01-07 11:38:38 +01:00
Calixte Denizet
6523f8880b JS -- Plug PageOpen and PageClose actions 2021-01-06 13:31:15 +01:00
Calixte Denizet
56424967f2 Fix encoding issues when printing/saving a form with non-ascii characters 2021-01-05 17:23:18 +01:00
Tim van der Meij
ca18af6af3
Merge pull request #12774 from calixteman/doc_action_test
JS -- Add tests for print/save actions
2021-01-03 18:46:37 +01:00
Tim van der Meij
50303fc8f4
Merge pull request #12766 from Snuffleupagus/issue-11004
Ignore, rather than throwing on, unsupported Coding style default (COD) options in JPEG 2000 images (issue 11004)
2020-12-28 20:26:10 +01:00
Calixte Denizet
ffd4bc790c JS -- Add tests for print/save actions
* change PDFDocument::hasJSActions to return true when there are JS actions in catalog.
2020-12-24 18:51:00 +01:00
Calixte Denizet
7c3facb174 JS -- Add support for buttons
* radio buttons
 * checkboxes
2020-12-22 16:41:51 +01:00
Jonas Jenwald
cffb7af3b0 Ignore, rather than throwing on, unsupported Coding style default (COD) options in JPEG 2000 images (issue 11004)
Similar to other markers that we currently skip, by ignoring unsupported Coding style default (COD) options we'll at least render *something* here (although some JPEG 2000 images may look slightly wrong).
Note that if the unsupported COD options lead to additional errors, during parsing, we'll still abort parsing of the JPEG 2000 image.
2020-12-21 20:35:52 +01:00
Brendan Dahl
3ea1c43b15
Merge pull request #12751 from calixteman/da_not_a_string
Add a default DA for textfield to avoid issues when printing or saving
2020-12-21 09:44:08 -08:00
Calixte Denizet
a7c682c600 Add a default DA for textfield to avoid issues when printing or saving
* it aims to fix issue #12750
2020-12-19 23:38:45 +01:00
Tim van der Meij
1c8ead133a
Merge pull request #12758 from Snuffleupagus/AnnotationStorage-rm-event
Run `AnnotationStorage.resetModified` when destroying the `PDFDocumentLoadingTask`/`PDFDocumentProxy`
2020-12-19 21:13:28 +01:00
Jonas Jenwald
f9530e56da Run AnnotationStorage.resetModified when destroying the PDFDocumentLoadingTask/PDFDocumentProxy
This will, in a very simple way using the existing events, thus allow the viewer to remove the "beforeunload" `window` event listener when the document is closed.
Generally speaking we want to avoid having *global* event listeners for the PDF document instance, which is why the `EventBus` exists, and instead reserve global events for the viewer itself. However, the `AnnotationStorage` "beforeunload" event unfortunately needs to be document-specific and we should thus ensure that it's correctly removed when the document is destroyed.
2020-12-19 14:05:31 +01:00
Jonas Jenwald
c78f153bda Remove the ENABLE_SCRIPTING build-target, since it's not necessary
There's no really compelling reason, as far as I can tell, to introduce the `ENABLE_SCRIPTING` build-target, instead of simply re-using the existing `TESTING` build-target for the new `gulp integrationtest` task.

In general there should be no problem with just always enable scripting in TESTING-builds, and if I were to *guess* the reason that this didn't seem to work was most likely because the Preferences ended up over-writing the `AppOptions`.
As it turns out the GENERIC-viewer has already has built-in support for disabling of Preferences, via the `AppOptions`, and this can be utilized in TESTING-builds as well to ensure that whatever `AppOptions` are set they're always respected.
2020-12-18 22:10:36 +01:00
Jonas Jenwald
eff4d8182d Update the events, used with scripting, to use lower-case names and avoid using DOM events internally in the viewer
For DOM events all event names are lower-case, and the newly added PDF.js scripting-events thus "stick out" quite a bit. Even more so, considering that our internal `eventBus`-events follow the same naming convention.
Hence this patch, which changes the "updateFromSandbox"/"dispatchEventInSandbox" events to be lower-case instead.

Furthermore, using DOM events for communication *within* the PDF.js code itself (i.e. between code in `web/app.js` and `src/display/annotation_layer.js/`) feels *really* out of place.
That's exactly the reason that we have the `EventBus` abstraction, since it allowed us to remove prior use of DOM events, and this patch thus re-factors the code to make use of the `EventBus` instead for scripting-related events.
Obviously for events targeting a *specific element* using DOM events is still fine, but the "updatefromsandbox"/"dispatcheventinsandbox" ones should be using the `EventBus` internally.

*Drive-by change:* Use the `BaseViewer.currentScaleValue` setter unconditionally in `PDFViewerApplication._initializeJavaScript`, since it accepts either a string or a number.
2020-12-18 22:10:32 +01:00
Jonas Jenwald
e2b6d79dee Tweak the LinkAnnotationElement._bindJSAction and WidgetAnnotationElement.{_setEventListener, _setEventListeners} methods
- Update the `LinkAnnotationElement._bindJSAction` call-site to actually agree with the JSDocs, by passing in the `data`.

 - Prevent the links created by `LinkAnnotationElement._bindJSAction` from being displayed with empty hashes; compare with e.g. `LinkAnnotationElement. _bindNamedAction`.

 - The overall indentation-level in `WidgetAnnotationElement._setEventListener` can be reduced slightly by using early returns, which improves the overall readability of this method a bit. (We're also able to avoid unnecessary `in` usage here.)

 - The code can also be made *slightly* more efficient overall, by moving the `this.data.actions` check into `WidgetAnnotationElement._setEventListeners` instead. This way we can avoid useless `this._setEventListener`-calls when there are no actions present.
2020-12-18 22:03:41 +01:00
Jonas Jenwald
6dc39cb873 Tweak the new mouseState parameter, and its usage, in the viewer components and the AnnotationLayer
- Actually remove the `isDown` property when destroying the scripting-instance.

 - Mark all `mouseState` usage as "private" in the various classes.

 - Ensure that the `AnnotationLayer` actually treats the parameter as properly *optional*, the same way that the viewer components do.

 - For now remove the `mouseState` parameter from the `PDFPageView` class, and keep it only on the `BaseViewer`, since it's questionable if all of the scripting-functionality will work all that well without e.g. a full `BaseViewer`.

 - Append the `mouseState` to the JSDoc for the `AnnotationElement` class, and just move its definition into the base-`AnnotationElement` class.
2020-12-18 22:03:41 +01:00
calixteman
e6e2809825
Merge pull request #12702 from calixteman/doc_actions
JS - Collect and execute actions at doc level
2020-12-18 21:33:32 +01:00
Tim van der Meij
b7fc916f48
Merge pull request #12753 from Snuffleupagus/issue-12752
Ignore, rather than throwing on, Coding style component (COC) markers in JPEG 2000 images (issue 12752)
2020-12-18 21:14:06 +01:00
Calixte Denizet
1e2173f038 JS - Collect and execute actions at doc and pages level
* the goal is to execute actions like Open or OpenAction
 * can be tested with issue6106.pdf (auto-print)
 * once #12701 is merged, we can add page actions
2020-12-18 20:03:59 +01:00
Jonas Jenwald
48a76aea2b Ignore, rather than throwing on, Coding style component (COC) markers in JPEG 2000 images (issue 12752)
Similar to other markers that we currently skip, by ignoring the Coding style component (COC) marker we'll at least prevent outright errors (although some JPEG 2000 images may look slightly wrong).
2020-12-18 18:18:32 +01:00
Calixte Denizet
167ff1a7fc JS -- Actions must be evaluated in global scope
* All the public properties of doc are injected into globalThis, in order to make them available through `this`
 * Put event in the global scope too.
2020-12-17 22:01:45 +01:00
Calixte Denizet
8bff4f1ea9 In order to simplify m-c code, move some in pdf.js
* move set/clear|Timeout/Interval and crackURL code in pdf.js
 * remove the "backdoor" in the proxy (used to dispatch event) and so return the dispatch function in the initializer
 * remove listeners if an error occured during sandbox initialization
 * add support for alert and prompt in the sandbox
 * add a function to eval in the global scope
2020-12-17 15:03:26 +01:00
Calixte Denizet
03814bd6a2 Don't use 'in' operator to check if key is in a Map 2020-12-16 16:00:12 +01:00
Brendan Dahl
3c603fb28b
Merge pull request #12635 from calixteman/js_display_evts
JS -- Send events to the sandbox from annotation layer
2020-12-15 21:24:55 -08:00
Calixte Denizet
6502ae889d JS -- Send events to the sandbox from annotation layer 2020-12-15 16:28:47 +01:00
Jonas Jenwald
499d865ebf Change the minimum "supported" version of the Safari-browser to Safari 10
According to https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support, Safari 9 is still listed as "mostly supported".

Given that the *last* release from the Safari 9 branch was on [September 1, 2016](https://en.wikipedia.org/wiki/Safari_version_history#Safari_9), it's questionable at least to me if it actually makes sense for us to even pretend to "support" such an old browser.
Especially when the *first* release from the Safari 10 branch was on [September 20, 2016](https://en.wikipedia.org/wiki/Safari_version_history#Safari_10), which is now over four years ago.

Based on the MDN compatibility data, this patch thus removes the following polyfills:
 - `TypedArray.prototype.slice()`, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray/slice#Browser_compatibility
 - `String.prototype.codePointAt()`, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/codePointAt#Browser_compatibility
 - `String.fromCodePoint()`, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/fromCodePoint#Browser_compatibility
2020-12-15 09:49:32 +01:00
Jonas Jenwald
a2874b380a Remove the remaining IE 11 polyfills
We really ought to settle on the *lowest* supported versions of various browsers[1], since that should allow even more clean-up, but given that https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support *explicitly* lists IE 11 as unsupported after PDF.js version `2.6.347` there's a number of polyfills that are no longer needed.

Based on the MDN compatibility data, this patch thus removes the following polyfills:
 - `String.prototype.startsWith()`, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/startsWith#Browser_compatibility

 - `String.prototype.endsWith()`, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/endsWith#Browser_compatibility

  - `String.prototype.includes()`, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/includes#Browser_compatibility

 - `Array.prototype.includes()`, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/includes#Browser_compatibility

  - `Array.from()`, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/from#Browser_compatibility

  - `Object.assign()`, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/assign#Browser_compatibility

  - `Math.log2()`, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/log2#Browser_compatibility

  - `Number.isNaN()`, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/isNaN#Browser_compatibility

  - `Number.isInteger()`, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/isInteger#Browser_compatibility

  - `Map.prototype.entries()`, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map/entries#Browser_compatibility

  - `Set.prototype.entries()`, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Set/entries#Browser_compatibility

  - `WeakMap`, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/WeakMap#Browser_compatibility

  - `WeakSet`, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/WeakSet#Browser_compatibility

  - `Symbol`, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol#Browser_compatibility

Finally, this patch also attempts to update the compatibility information for the remaining polyfills.

---
[1] For example: It's questionable if Safari 9 should be listed as supported, given that the last release from that branch was in 2016.
2020-12-14 14:31:25 +01:00
Tim van der Meij
d1848f5022
Merge pull request #12725 from brendandahl/remeasure-std
Use widths defined by font for standard fonts.
2020-12-11 20:36:19 +01:00
Jonas Jenwald
67e5db75d8 Ignore color-operators in Type3 glyphs beginning with a d1 operator (issue 12705)
Please refer to the PDF specification at https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1977497 and https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.3998470

This patch removes the color-operators in the evaluator, since that should be more efficient than doing it repeatedly in the main-thread when rendering the Type3 glyphs.
2020-12-11 15:49:13 +01:00
Brendan Dahl
45d9ab6e45 Use widths defined by font for standard fonts.
There doesn't seem to be anything definitive about this in
the spec, but from experimenting, it seems acrobat lets
PDFs override the widths of the standard fonts.
2020-12-10 15:30:39 -08:00
Tim van der Meij
00b4f86db3
Merge pull request #12717 from Snuffleupagus/issue-12714
Ensure that the /Annots-entry, on /Page-instances, is actually an Array (issue 12714)
2020-12-10 23:06:59 +01:00
Tim van der Meij
954ac3d944
Merge pull request #12719 from calixteman/emailvalidate
JS -- add function eMailValidate used to validate an email address
2020-12-10 22:19:37 +01:00
Calixte Denizet
f94269c0d1 JS -- add function eMailValidate used to validate an email address 2020-12-10 21:51:37 +01:00
Tim van der Meij
7097114e0c
Merge pull request #12720 from calixteman/fix_co
Be sure that CalculationOrder is either null or a non-empty array
2020-12-10 21:43:35 +01:00
Tim van der Meij
85ab53fef0
Merge pull request #12722 from calixteman/printf
JS -- fix printd issue with negative number
2020-12-10 21:41:11 +01:00
Calixte Denizet
c7b09b8efc JS -- fix printd issue with negative number 2020-12-10 18:43:04 +01:00
Calixte Denizet
25bf504ff5 Be sure that CalculationOrder is either null or a non-empty array 2020-12-10 16:02:11 +01:00
Jonas Jenwald
796a0d3155 Ensure that the /Annots-entry, on /Page-instances, is actually an Array (issue 12714)
In the referenced PDF document, the second and third page has *corrupt* /Annots-entries which contain /Dict-data rather than the intended Arrays.
2020-12-10 11:42:00 +01:00
Tim van der Meij
f48cfba945
Merge pull request #12707 from calixteman/radio_check
Checkboxes with the same name must behave like a radio buttons group
2020-12-09 23:29:03 +01:00
Jonas Jenwald
7ce6634c51 Ensure that the pdf.sandbox.js scriptElement is also removed from the DOM (PR 12695 follow-up)
I completely missed this previously, but we obviously should remove the scriptElement as well to *really* clean-up everything properly.

Given that there's multiple existing usages of `loadScript` in the code-base, the safest/quickest solution seemed to be to have call-sites opt-in to remove the scriptElement using a new parameter.
2020-12-09 22:15:47 +01:00
Calixte Denizet
1fcffe8034 Checkboxes with the same name must behave like a radio buttons group
* aims to fix issue #12706
2020-12-08 15:53:19 +01:00
Calixte Denizet
0f899edfc8 JS -- Add aform functions
* These functions aren't in the PDF specs but seems to be widely used
 * So the specs for these functions are:
   * http://www.sfu.ca/~wcs/ForGraham/Aladdin%20stuff/Acrobat%20Reader%205.0/Contents/MacOS/JavaScripts/AForm.js
   * pdfium source code
2020-12-07 19:37:34 +01:00
Tim van der Meij
d784af3f38
Merge pull request #12696 from timvandermeij/annotation-quadpoints
Fix non-standard quadpoints orders for annotations
2020-12-06 16:52:33 +01:00
Tim van der Meij
012e15f7a3
Fix non-standard quadpoints orders for annotations
This change requires us to use valid quadpoints arrays in the existing
unit tests too due to the normalization.
2020-12-06 16:02:41 +01:00
Jonas Jenwald
c549069ebd Replace the testMode parameter in src/pdf.sandbox.js with a constant, set using the pre-processor
This simplifies not just this code, but the unit-tests as well, and should be sufficient as far as I can tell.
Note also that currently, in the *built* `pdf.sandbox.js` file, there's even a line reading `testMode = testMode && false;` because of an accidentally flipped pre-processor statement.

Finally, in the `scripting_spec.js` unit-test, defines `sandboxBundleSrc` at the top of the file to make it easier to find and/or change it when necessary.
2020-12-05 23:04:34 +01:00
Tim van der Meij
e3b6a9fb23
Implement quadpoints rendering for link annotations
Each quadrilateral needs to have its own link element, so the first
quadrilateral can use the already created element, but the next
quadrilaterals need to clone that element.
2020-12-05 21:39:38 +01:00
Tim van der Meij
7be4a14d87
Move documentation of the render methods to the AnnotationElement class
Not only does this reduce boilerplate since the documentation is the
same for all annotation classes, it also wasn't correct for the
annotation types that support quadpoints since they return an array of
section elements instead of a single one.
2020-12-05 20:16:40 +01:00
Tim van der Meij
cb422a80b0
Move quadrilateral rendering logic into a method on the AnnotationElement class
Doing so avoids some code duplication.
2020-12-05 20:07:17 +01:00
Tim van der Meij
0273080031
Move quadrilateral creation logic to the constructor of the AnnotationElement class
Using an object for the various constructor options makes extensions
easier and makes the code self-documenting.
2020-12-05 20:01:07 +01:00
Jonas Jenwald
d742e3cde8 Actually utilize the PDF.js build-system fully when bundling the pdf.sandbox.js file
There's no good reason, as far as I can tell, to use search-and-replace to include the *stringified* `pdf.scripting.js` file in the built `pdf.sandbox.js` file. Instead we could, and even should, utilize the existing `PDFJSDev.eval(...)`-functionality, which is not only simpler but will also be more efficient as well (no need for a regular expression).
2020-12-05 11:15:11 +01:00
Jonas Jenwald
715b8aa389 Move, and rename, the src/scripting_api/quickjs-sandbox.js file to src/pdf.sandbox.js
The current location feels somewhat strange, and also inconsistent with the existing way that bundling is done.

Finally, add the version/build numbers at the top of the *built* `pdf.sandbox.js` files, since all other built files include that information given that it's often helpful to be able to easily determine the *exact* version.
2020-12-05 11:15:11 +01:00
Jonas Jenwald
155c17c99a [Regression] Prevent the *built* pdf.scripting.js/pdf.sandbox.js files from accidentally including most of the main-thread code (PR 12631 follow-up)
*This is a recent regression, which I stumbled upon while working on cleaning-up the gulpfile related to `pdf.sandbox.js` building.*

By placing the `ColorConverters` functionality in the `src/display/display_utils.js` file, you end up including a *significant* chunk of the `pdf.js` file in the built `pdf.scripting.js`/`pdf.sandbox.js` files.
Given that I cannot imagine that this was actually intended, since it inflates the built files with unnecessary/unused code, this moves `ColorConverters` to a new file instead (thus breaking the dependencies).

To hopefully reduce the risk future bugs, along these lines, a big comment is also placed at the top of the new file.
Finally, the `ColorConverters` is converted to a class with static methods, since this felt slightly cleaner overall.
2020-12-04 14:17:26 +01:00
Calixte Denizet
d4f4b43d29 Fix issue #12684: replace bitwise ORs by ORs 2020-12-02 23:02:11 +01:00
Brendan Dahl
956fcab967
Merge pull request #12631 from calixteman/app
JS -- Implement app object
2020-12-01 16:50:16 -08:00
Calixte Denizet
830070ca41 Split underline, strikeout, squiggly annotions div into multiple divs
* Follow up of #12505
 * Fix bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1679696
2020-12-01 16:04:48 +01:00
Jonas Jenwald
c42029489e Run gulp lint --fix, to account for changes in Prettier version 2.2.1
Please refer to https://github.com/prettier/prettier/blob/master/CHANGELOG.md#221 for additional details.
2020-11-29 10:01:46 +01:00
Tim van der Meij
256068556d
Merge pull request #12662 from Snuffleupagus/issue-12402
Check the top-level /Pages dictionary when finding the trailer in `XRef.indexObjects` (issue 12402)
2020-11-25 21:54:41 +01:00
Jonas Jenwald
8a132f584d Check the top-level /Pages dictionary when finding the trailer in XRef.indexObjects (issue 12402)
In addition to the existing /Root and /Pages validation, also check that the /Pages-entry actually is a dictionary and that it has a valid /Count-entry.
This way we can avoid picking a trailer candidate which e.g. the `Catalog.numPages` getter will just end up rejecting, thus breaking PDF document loading completely.
2020-11-25 15:14:53 +01:00
Calixte Denizet
18b525de2e Parenthesis in names are not escaped when saving 2020-11-25 12:28:12 +01:00
Calixte Denizet
283aac4c53 JS -- Implement app object
* https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/AcrobatDC_js_api_reference.pdf
 * Add color, fullscreen objects + few constants.
2020-11-20 15:46:52 +01:00
Jonas Jenwald
01d12b465c [api-minor] Add "contentLength" to the information returned by the getMetadata method
Given that we already include the "Content-Disposition"-header filename, when it exists, it shouldn't hurt to also include the information from the "Content-Length"-header.
For PDF documents opened via a URL, which should be a very common way for the PDF.js library to be used, this will[1] thus provide a way of getting the PDF filesize without having to wait for the `getDownloadInfo`-promise to resolve[2].

With these API improvements, we can also simplify the filesize handling in the `PDFDocumentProperties` class.

---
[1] Assuming that the server is correctly configured, of course.

[2] Since that's not *guaranteed* to happen in general, with e.g. `disableAutoFetch = true` set.
2020-11-20 15:30:36 +01:00
Brendan Dahl
c88e805870
Merge pull request #12604 from calixteman/quickjs
JS -- Add a sandbox based on quickjs
2020-11-19 08:40:21 -08:00
Brendan Dahl
4ba28de260
Merge pull request #12567 from calixteman/hidden
[api-minor] JS -- hidden annotations must be built in case a script show them
2020-11-19 08:35:47 -08:00
Calixte Denizet
c7974e9996 JS -- Add a sandbox based on quickjs
* quickjs-eval.js has been generated using https://github.com/mozilla/pdf.js.quickjs/
 * lazy load of sandbox code
 * Rewrite tests to use the sandbox
 * Add a task `watch-sandbox` which update bundle pdf.sandbox.js on change in the sandbox code
2020-11-19 13:40:46 +01:00
Brendan Dahl
f39d87bff1
Merge pull request #12569 from calixteman/events
JS -- Fix events dispatchment and add tests
2020-11-16 10:26:29 -08:00
Tim van der Meij
2aefc406b4
Merge pull request #12622 from Snuffleupagus/hasJSActions-cleanup
Some `hasJSActions`, and general annotation-code, related cleanup in the viewer and API
2020-11-14 17:04:15 +01:00
Jonas Jenwald
de628cec59 Some hasJSActions, and general annotation-code, related cleanup in the viewer and API
- Add support for logical assignment operators, i.e. `&&=`, `||=`, and `??=`, with a Babel-plugin. Given that these required incrementing the ECMAScript version in the ESLint and Acorn configurations, and that platform/browser support is still fairly limited, always transpiling them seems appropriate for now.

 - Cache the `hasJSActions` promise in the API, similar to the existing `getAnnotations` caching. With this implemented, the lookup should now be cheap enough that it can be called unconditionally in the viewer.

 - Slightly improve cleanup of resources when destroying the `WorkerTransport`.

 - Remove the `annotationStorage`-property from the `PDFPageView` constructor, since it's not necessary and also brings it more inline with the `BaseViewer`.

 - Update the `BaseViewer.createAnnotationLayerBuilder` method to actaually agree with the `IPDFAnnotationLayerFactory` interface.[1]

 - Slightly tweak a couple of JSDoc comments.

---
[1] We probably ought to re-factor both the `IPDFTextLayerFactory` and `IPDFAnnotationLayerFactory` interfaces to take parameter objects instead, since especially the `IPDFAnnotationLayerFactory` one is becoming quite unwieldy. Given that that would likely be a breaking change for any custom viewer-components implementation, this probably requires careful deprecation.
2020-11-14 13:58:35 +01:00
Calixte Denizet
03018cfe40 Follow-up for #12585: set elements class in render instead of in _createQuadrilaterals 2020-11-14 11:42:35 +01:00
Calixte Denizet
611207d2c9 Fix popup for highlights without popup (follow-up of #12505)
* remove 1st param of _createPopup (almost useless for a method)
 * prepend popup div to avoid to have them on top of some highlights (and so "disable" partially mouse events)
 * add a ref test for issue #12504
2020-11-10 17:33:54 +01:00
Calixte Denizet
2dfac4cb41 JS -- Fix events dispatchment and add tests
* dispatch event to take into account calculation order
 * use a map for actions in Field
2020-11-10 17:26:29 +01:00
Calixte Denizet
8de98079ca JS -- Implement doc object
* https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/js_api_reference.pdf#page=335
 * it has all the properties/methods defined in the spec
 * unimplemented methods are there but with an empty body to avoid exception when calling an undefined method
 * implement zoom, zoomType, layout, pageNum, ...
2020-11-10 16:16:42 +01:00
Calixte Denizet
b11592a756 JS -- hidden annotations must be built in case a script show them
* in some pdf, there are actions with "event.source.hidden = ..."
 * in order to handle visibility when printing, annotationStorage is extended to store multiple properties (value, hidden, editable, ...)
2020-11-10 12:48:34 +01:00
Calixte Denizet
a5279897a7 JS -- Add listener for sandbox events only if there are some actions
* When no actions then set it to null instead of empty object
* Even if a field has no actions, it needs to listen to events from the sandbox in order to be updated if an action changes something in it.
2020-11-09 18:37:59 +01:00
Tim van der Meij
55f55f5859
Merge pull request #12598 from Snuffleupagus/globalThis-check
Fail early, in modern `GENERIC` builds, if `globalThis` isn't available (PR 11799 follow-up, issue 12596)
2020-11-07 23:42:21 +01:00
Tim van der Meij
b068cdb725
Merge pull request #12595 from Snuffleupagus/src-display-optional-chaining
Convert files in the `src/display/`-folder to use optional chaining where possible
2020-11-07 23:40:46 +01:00
Jonas Jenwald
a03b383edb Fail early, in modern GENERIC builds, if globalThis isn't available (PR 11799 follow-up, issue 12596)
It probably doesn't hurt to explicitly check for `globalThis` as well, in addition to the existing checks.
2020-11-07 19:00:33 +01:00
Jonas Jenwald
1dad255784 Convert files in the src/display/-folder to use optional chaining where possible
By using optional chaining, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Optional_chaining, it's possible to reduce unnecessary code-repetition in many cases.
Note that these changes also reduce the size of the *built* `pdf.js` file, when `SKIP_BABEL == true` is set, and for the `MOZCENTRAL` build-target that result in a `0.1%` filesize reduction from a simple and mostly mechanical code change.
2020-11-07 13:22:06 +01:00
Jonas Jenwald
9602844368 Enable the ESLint no-useless-escape rule (PR 12551 follow-up)
Note that a number of these cases are covered by existing unit-tests, and a few others only matter for the development/build scripts.
Furthermore, I've also tried to the best of my ability to test each case *manually* to hopefully further reduce the likelihood of this patch introducing any bugs.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-useless-escape
2020-11-07 13:06:24 +01:00
Tim van der Meij
e3851a6765
Merge pull request #12591 from Snuffleupagus/strokeColor-Pattern
Improve the Pattern-detection in `CanvasGraphics.stroke`
2020-11-06 22:16:26 +01:00
Brendan Dahl
018fd43096
Merge pull request #12530 from calixteman/js_utils
JS -- Add 'util' object
2020-11-06 09:59:50 -08:00
Calixte Denizet
f69e848b1c JS -- Add 'util' object
This patch provides an implementation of the util object as described:
 * https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/js_api_reference.pdf#page=716
2020-11-06 18:12:29 +01:00
Jonas Jenwald
78de919bf4 Improve the Pattern-detection in CanvasGraphics.stroke
The vast majority of the time, unless a Pattern is active, the `strokeColor`-property contains a "simple" colour value represented by a String. Hence it seems somewhat ridiculous to do a `hasOwnProperty` check on a String, and it's should thus be possible to improve things a tiny bit here.

Unfortunately using a simple `instanceof` check would only work for `TilingPattern`s, but not for the `ShadingIRs` given how they are implemented; see `src/display/pattern_helper.js`. (While that file could probably do with some clean-up, given the age of some of its code, that probably shouldn't happen here.)

Finally, the `this.type = "Pattern"`-property of the various Shadings/TilingPatterns were removed, since I cannot see why it's necessary when we can simply check for a `getPattern` method instead. Note that part of this code even pre-dates the main/worker-thread split, which probably in part explains why it looks the way it does.
2020-11-06 11:46:35 +01:00
Tim van der Meij
be0794cb08
Merge pull request #12564 from hakubo/patch-1
Make sure that Popup is rendered next to trigger for textAnnotation
2020-11-06 00:02:11 +01:00
Tim van der Meij
99ac2d1036
Merge pull request #12583 from Snuffleupagus/nonBlendModesSet
Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in `PartialEvaluator.hasBlendModes`
2020-11-05 23:53:39 +01:00
Tim van der Meij
646f895d35
Merge pull request #12568 from calixteman/defaultvalue
[api-minor] JS -- Add default value in annotation data
2020-11-05 22:53:21 +01:00
Jonas Jenwald
082cd8fc6c Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in PartialEvaluator.hasBlendModes
The `PartialEvaluator.hasBlendModes` method is necessary to determine if there's any blend modes on a page, which unfortunately requires *synchronous* parsing of the /Resources of each page before its rendering can start (see the "StartRenderPage"-message).
In practice it's not uncommon for certain /Resources-entries to be found on more than one page (referenced via the XRef-table), which thus leads to unnecessary re-fetching/re-parsing of data in `PartialEvaluator.hasBlendModes`.

To improve performance, especially in pathological cases, we can cache /Resources-entries when it's absolutely clear that they do not contain *any* blend modes at all[1]. This way, subsequent `PartialEvaluator.hasBlendModes` calls can be made significantly more efficient.

This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf:
```
[
    {  "id": "issue6961",
       "file": "../web/pdfs/issue6961.pdf",
       "md5": "a80e4357a8fda758d96c2c76f2980b03",
       "rounds": 100,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat         | Count | Baseline(ms) | Current(ms) |  +/- |     %  | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | 0    | Overall      |   100 |         1034 |         555 | -480 | -46.39 |        faster
firefox | 0    | Page Request |   100 |          489 |           7 | -482 | -98.67 |        faster
firefox | 0    | Rendering    |   100 |          545 |         548 |    2 |   0.45 |
firefox | 1    | Overall      |   100 |          912 |         428 | -484 | -53.06 |        faster
firefox | 1    | Page Request |   100 |          487 |           1 | -486 | -99.77 |        faster
firefox | 1    | Rendering    |   100 |          425 |         427 |    2 |   0.51 |
```

---
[1] In the case where blend modes *are* found, it becomes a lot more difficult to know if it's generally safe to skip /Resources-entries. Hence we don't cache anything in that case, however note that most document/pages do not utilize blend modes anyway.
2020-11-05 16:59:08 +01:00
Calixte Denizet
39f5954729 JS -- Add default value in annotation data
* these values are used when a form is resetted
2020-11-05 13:44:23 +01:00
Jakub Olek
b642d49108 Make sure that Popup is rendered next to trigger for textAnnotation 2020-11-05 06:45:17 +01:00
Brendan Dahl
1de2bc4816
Merge pull request #12505 from calixteman/12504
Split highlight annotation div into multiple divs
2020-11-04 10:41:28 -08:00
Tim van der Meij
3e52098e29
Merge pull request #12555 from calixteman/color
Replace css color rgb(...) by #...
2020-11-02 23:55:39 +01:00
Calixte Denizet
9d11b51a3e Replace css color rgb(...) by #...
* it's faster to generate the color code in using a table for components
* it's very likely a way faster to parse (when setting the color in the canvas)
2020-11-02 10:25:04 +01:00
Jonas Jenwald
a177afc206 Fix some static static analyzer warnings (issue 11965)
This fixes only those warnings, as reported by https://lgtm.com/projects/g/mozilla/pdf.js?mode=list, that make sense (as far as I'm concerned).

Hence this patch leaves the following things unaddressed:
 - The "recommendation"-category, since it only complains about unused variables. However, note that all of those cases are purposely included and that there's thus ESLint-disable comments added to explictly allow them.
 - The "warning"-category, which still contains two complaints. However, as far as I can tell, they are both false positives.

Given first of all the false positives of the LGTM static analyzer, and secondly that we'd need to add (essentially duplicated) disable-comments for the unused variable cases, it's not entirely clear to me if we actually want to work towards including LGTM in the PDF.js project (e.g. running alongside Travis) or if we should just close issue 11965.
2020-11-01 12:08:38 +01:00
Tim van der Meij
46e60a266c
Merge pull request #12552 from Snuffleupagus/annotation-fixes
Miscellaneous (small) improvements in `src/core/annotation.js`
2020-10-31 00:41:39 +01:00
Tim van der Meij
e341e6e542
Merge pull request #12525 from brendandahl/mark-info
[api-minor] Implement API to get MarkInfo from the catalog.
2020-10-31 00:05:19 +01:00
Brendan Dahl
f5c821e9c3 [api-minor] Implement API to get MarkInfo from the catalog. 2020-10-30 10:59:45 -07:00
Jonas Jenwald
fdb6520012 Change the Catalog.openAction getter back to using an Object internally (PR 12543 follow-up)
Given that the `Map`-pattern apparently has undesirable performance characteristics, change this getter back to using an Object instead and check its size before returning it.
2020-10-30 13:27:05 +01:00
Jonas Jenwald
a1e5581a0b Let Annotation._collectActions return null when no actions are present
Rather than returning an *empty* Object[1] we should be returning `null` instead, since that's consistent with existing API-functionality.
To avoid having to *manually* track if the Object is empty, this patch also introduces a small helper function to check its size.
2020-10-30 13:23:05 +01:00
Jonas Jenwald
8540b4cc76 Stop calling Font.charsToGlyphs, in src/core/annotation.js, with unused arguments
As can be seen in `src/core/fonts.js`, this method only accepts *one* parameter, hence it's somewhat difficult to understand what the Annotation-code is actually attempting to do here.
The only possible explanation that I can imagine, is that the intention was initially to call `Font.charToGlyph` *directly* instead. However, note that that'd would not actually have been correct, since that'd ignore one level of font-caching (see `this.charsCache`). Hence the unused arguments are removed, in `src/core/annotation.js`, and the `Font.charToGlyph` method is now marked as "private" as intended.
2020-10-30 13:17:52 +01:00
Jonas Jenwald
46e94cad17 Fix *some* errors reported by the ESLint no-useless-escape rule
This patch removes unnecessary escape-sequence in (mostly) strings, as a first step, since the ones in regular expressions probably requires more careful testing (just in case).
The only exception is a regular expression in `src/core/annotation.js`, since we should have both unit- and reference-tests for this code *and* given [this information on MDN](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Character_Classes#Types):
 > Inside a character set, the dot loses its special meaning and matches a literal dot.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-useless-escape
2020-10-29 15:40:40 +01:00
Jonas Jenwald
820fb7f969 Update all Object.fromEntries call-sites to ensure that a null prototype is used
Given that `Object.fromEntries` doesn't seem to *guarantee* that a `null` prototype is used, we thus hack around that by using `Object.assign` with `Object.create(null)`.
2020-10-28 14:43:44 +01:00
Jonas Jenwald
9fc7cdcc9d Use a Map, rather than an Object, internally in the Catalog.openAction getter (PR 11644 follow-up)
This provides a work-around to avoid having to conditionally try to initialize the `openAction`-object in multiple places.
Given that `Object.fromEntries` doesn't seem to *guarantee* that a `null` prototype is used, we thus hack around that by using `Object.assign` with `Object.create(null)`.
2020-10-28 14:43:28 +01:00
Tim van der Meij
ea4d88a330
Merge pull request #12395 from calixteman/checks
Render not displayed annotations in using normal appearance when printing
2020-10-28 00:11:10 +01:00
Calixte Denizet
6be2f84b4e Render not displayed annotations in using normal appearance when printing 2020-10-27 19:00:31 +01:00
Tim van der Meij
71a14be8e7
Merge pull request #12534 from Snuffleupagus/murmurhash-slice
Ensure that `MurmurHash3_64.update` handles `ArrayBuffer` input correctly, to avoid hash-collisions (issue 12533)
2020-10-26 23:34:03 +01:00
Jonas Jenwald
f2fa053c51 Ensure that MurmurHash3_64.update handles ArrayBuffer input correctly, to avoid hash-collisions (issue 12533)
Different fonts incorrectly end up with *identical* hashes, despite having different /ToUnicode data.
The issue, and it's very interesting that we've apparently not seen it before, appears to be caused by the fact that different /ToUnicode entries share the *same* underlying `ArrayBuffer`, which thus becomes problematic at the `const dataUint32 = new Uint32Array(data.buffer, 0, blockCounts);` line. The simplest solution thus seem to be to just *copy* the input, when it's an `ArrayBuffer`, rather than using it as-is. (Note that if we'd stringified the input, when calling `MurmurHash3_64.update`, the issue would also have been fixed. In this case, we're already creating an unique TypedArray.)
2020-10-26 16:27:33 +01:00
Jonas Jenwald
666535be47 Prevent use of optional chaining and nullish coalescing in the src/shared/ folder
Given that this code is used on the worker-thread, where SystemJS is still used during development, we need to (for now) handle this folder the same way as the `src/core/` one.
2020-10-26 13:16:01 +01:00
Jonas Jenwald
c293fc2b8f Add (some) optional chaining usage in src/display/api.js
Since we no longer use SystemJS to load the unit-tests, there's now nothing that prevents us from using optional chaining and nullish coalescing in the `src/display/` directory.
2020-10-26 11:11:48 +01:00
Jonas Jenwald
d9084c0be2 Load the fake worker, in non-PRODUCTION mode, with native async import
This removes the last SystemJS usage from both the API and the default viewer.
2020-10-26 11:11:48 +01:00
Jonas Jenwald
56fa6d414c Add a getArrayLookupTableFactory helper function and use it to re-format src/core/{glyphlist, unicode}.js
*Please note:* Once https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 is implemented, and we've removed SystemJS completely, this entire patch can (and even should) be reverted.

This is similar to the existing `getLookupTableFactory` helper function, but is implemented as outlined in issue 6774.
The re-formatting of the tables were done automatically, by using find-and-replace with regular expressions.

For reasons that I don't even pretend to understand, using this particular structure for these *very* long lookup tables allow SystemJS to process the files correctly/quickly and the development viewer thus works as intended.
2020-10-26 11:08:00 +01:00
Jonas Jenwald
441d9c8cc0 Change src/core/{glyphlist, unicode}.js to use standard import/export statements
While the *built* `pdf.worker.js` file still works correctly with these changes, despite these two files being excluded by Babel[1], the development viewer does not because of issues with SystemJS[2] and/or its Babel-plugin (both of which are old).
Furthermore, note also that excluding these two files from Babel-processing isn't *generally* necessary since e.g. the `gulp mozcentral` command works anyway. The explanation is rather that it's actually the source-map generation which fails for these huge sequences when building the `pdf.worker.js` file.

However, not using standard `import`/`export` statements in all files means we also need to use SystemJS when e.e. running the unit-tests. This is very unfortunate, since SystemJS (or its old Babel-version) doesn't support modern ECMAScript features such as e.g. optional chaining and nullish coalescing.

Unfortunately it also seems that https://bugzilla.mozilla.org/show_bug.cgi?id=1247687, which tracks the implementation of worker-modules in Firefox, has stalled since there hasn't been any updates for six months now.

To hopefully address all of the above, this patch is the first in a series that attempts to further reduce our reliance on SystemJS.

---
[1] The only difference being how the dependencies are handled, in the Webpack-bundled file.

[2] Parsing takes way too long and consumes too much memory, thus rendering the development viewer essentially unusable.
2020-10-26 11:08:00 +01:00
Jonas Jenwald
61ffa9caa9 Tweak the pdf.scripting.js bundling, to improve overall consistency
This brings the new `pdf.scripting.js` bundling more in-line with the pre-existing handling for the  `pdf.js`/`pdf.worker.js` files:
 - Add a new `src/pdf.scripting.js` file as the entry-point for the build scripts.

 - Add the version/build numbers at the top of the *built* `pdf.scripting.js` files, since all other built files include that information given that it's often helpful to be able to easily determine the *exact* version.

 - Tweak the `createScriptingBundle` in the gulp-file, since it looks like a little bit too much copy-and-paste in the variable names.
2020-10-25 16:36:56 +01:00
Tim van der Meij
b4ca3d55b8
Merge pull request #12508 from calixteman/button_fallback_font
Fallback font for buttons must be ZapfDingbats.
2020-10-24 18:56:12 +02:00
Tim van der Meij
da73537fdb
Merge pull request #12524 from Snuffleupagus/pr-12333-followup
A couple of small (viewer) tweaks of tooltip-only Annotations (PR 12333 follow-up)
2020-10-24 16:06:01 +02:00
Tim van der Meij
180f35ee91
Merge pull request #12526 from Snuffleupagus/TilingPattern-args
Improve argument/name handling when parsing TilingPatterns (PR 12458 follow-up)
2020-10-24 15:47:57 +02:00
Tim van der Meij
c493dc96fa
Merge pull request #12516 from Snuffleupagus/fieldObjects-annotation-undefined
Prevent issues, in `PDFDocument.fieldObjects`, for invalid Annotations
2020-10-24 15:42:33 +02:00
Jonas Jenwald
b478d3e7b9 Improve argument/name handling when parsing TilingPatterns (PR 12458 follow-up)
- Handle the arguments correctly in `PartialEvaluator.handleColorN`.
   For TilingPatterns with a base-ColorSpace, we're currently using the `args` when computing the color. However, as can be seen we're passing the Array as-is to the `ColorSpace.getRgb` method, which means that the `Name` is included as well.[1]
   Thankfully this hasn't, as far as I know, caused any actual bugs, but that may be more luck than anything else given how the `ColorSpace` code is implemented. This can be easily fixed though, simply by popping the `Name`-object off of the `args` Array.

 - Cache TilingPatterns using the `Name`-string, rather than the object directly.
   This is not only consistent with other caches in `PartialEvaluator`, but importantly it also ensures that the cache lookup always works correctly. Note that since `Name`-objects, similar to other primitives, uses a cache themselves a *manually* triggered `cleanup`-call could thus (theoretically) cause the `LocalTilingPatternCache` to not find an existing entry. While the likelihood of this happening is *extremely* small, it's still something that we should fix.

---
[1] The `args` Array can e.g. look like this: `[0.043, 0.09, 0.188, 0.004, /P1]`, which means that we're passing in the `Name`-object to the `ColorSpace` method.
2020-10-24 13:49:46 +02:00
Calixte Denizet
37c86b2daa Fallback font for buttons must be ZapfDingbats.
Fix bug https://bugzilla.mozilla.org/show_bug.cgi?id=1669099.
2020-10-24 12:00:03 +02:00
Calixte Denizet
85e6c67cf3 Split highlight annotation div into multiple divs
Fix for issue #12504.
Highlight annotation may have several rectangles so we must have several divs to add mouse events handlers.
2020-10-23 15:26:16 +02:00
Jonas Jenwald
9f8d9802f9 A couple of small (viewer) tweaks of tooltip-only Annotations (PR 12333 follow-up)
Ensure that these tooltip-only Annotations are handled as "internalLink"s, to ensure that they behave as expected in PresentationMode (e.g. they should still use a `pointer`-cursor).

Ensure that `PDFLinkService.getDestinationHash` won't create links with empty hashes, since those don't really make a lot of sense in general (this improves things for tooltip-only Annotations).

This PDF file can be used for testing: http://mirrors.ctan.org/macros/latex/contrib/pdfcomment/doc/pdfcomment.pdf#page=14
2020-10-23 14:31:45 +02:00
Brendan Dahl
1eaf9c961b
Merge pull request #12432 from calixteman/scripting_api
JS - Add the basic architecture to be able to execute embedded js
2020-10-22 19:57:58 -07:00
Tim van der Meij
8cf27494b3
Merge pull request #12503 from calixteman/no_quad
Invalidate an annotation with no quadPoints (when it's required)
2020-10-23 00:25:52 +02:00
Jonas Jenwald
b44a975d7c Prevent issues, in PDFDocument.fieldObjects, for invalid Annotations
For an invalid Annotation, there's one code-path where `undefined` is returned from `AnnotationFactory._create`. That'd currently, incorrectly, trigger an error during the `PDFDocument._collectFieldObjects` parsing which thus seem good to avoid.
Along these lines, the filtering in `PDFDocument.fieldObjects` is also updated to handle both `null` and `undefined` the same way.
2020-10-22 13:24:43 +02:00
Calixte Denizet
e76a96892a JS - Add the basic architecture to be able to execute embedded js 2020-10-21 19:00:56 +02:00
Calixte Denizet
d2ef878702 Invalidate an annotation with no quadPoints (when it's required)
Some pdf softwares don't remove highlight annotations but make the QuadPoints array empty.
And the Rect for the annotation can be [-32768, -32768, 32768, 32768] so it leads to have a giant div which catches all the mouse events and make the pdf unusable when there are some forms elements.
2020-10-21 13:53:19 +02:00
Jonas Jenwald
8431cfe482 Re-name and re-factor the PDFLinkService.navigateTo method
This modernizes and improves the code, by using `async`/`await` and by extracting the helper function to its own method.
To hopefully avoid confusion, given the next patch, the method is also re-named to `goToDestination` to make is slightly clearer what it actually does.
2020-10-18 14:29:59 +02:00
Calixte Denizet
c30a3a94f0 JS - Add a function in api to get the fields ids in AcroForm::CO 2020-10-17 12:56:40 +02:00
Tim van der Meij
ff2631493e
Merge pull request #12481 from calixteman/issue_12475
Get urls if any in AA::D dictionary for pushbuttons
2020-10-16 22:55:43 +02:00
Tim van der Meij
32bceae732
Merge pull request #12483 from Snuffleupagus/formInfo-hasFields
Don't store complex data in `PDFDocument.formInfo`, and replace the `fields` object with a `hasFields` boolean instead
2020-10-16 22:40:40 +02:00
Jonas Jenwald
f956d0a96a Stop caching the *parsed* Font data on its Dict object (PR 7347 follow-up)
Given that *all* fonts are, ever since PR 7347, now cached in the "normal" `fontCache` there's actually no reason for the special `font.translated` construction. (Given how Objects in JavaScript are references, rather than raw values, the old code shouldn't have caused any significant memory overhead.)

Instead we can simply store the `cacheKey`, which is a simple string, on only the Font `Dict`s where it's needed and thus look-up all fonts using the `fontCache` instead.
2020-10-16 17:45:01 +02:00
Jonas Jenwald
29af15f37e Add more validation in the PDFDocument._hasOnlyDocumentSignatures method
If this method is ever passed invalid/unexpected data, or if during the course of parsing (since it's used recursively) such data is found, it will fail in a non-graceful way.
Hence this patch, which ensures that we don't attempt to access non-existent properties and also that errors such as the one fixed in PR 12479 wouldn't have occured.
2020-10-16 13:03:47 +02:00
Jonas Jenwald
3351d3476d Don't store complex data in PDFDocument.formInfo, and replace the fields object with a hasFields boolean instead
*This patch is based on a couple of smaller things that I noticed when working on PR 12479.*

 - Don't store the /Fields on the `formInfo` getter, since that feels like overloading it with unintended (and too complex) data, and utilize a `hasFields` boolean instead.
   This functionality was originally added in PR 12271, to help determine what kind of form data a PDF document contains, and I think that we should ensure that the return value of `formInfo` only consists of "simple" data.
   With these changes the `fieldObjects` getter instead has to look-up the /Fields manually, however that shouldn't be a problem since the access is guarded by a `formInfo.hasFields` check which ensures that the data both exists and is valid. Furthermore, most documents doesn't even have any /AcroForm data anyway.

 - Determine the `hasFields` property *first*, to ensure that it's always correct even if there's errors when checking e.g. the /XFA or /SigFlags entires, since the `fieldObjects` getter depends on it.

 - Simplify a loop in `fieldObjects`, since the object being accessed is a `Map` and those have built-in iteration support.

 - Use a higher logging level for errors in the `formInfo` getter, and include the actual error message, since that'd have helped with fixing PR 12479 a lot quicker.

 - Update the JSDoc comment in `src/display/api.js` to list the return values correctly, and also slightly extend/improve the description.
2020-10-16 12:47:27 +02:00
Calixte Denizet
ce3d3a6ff8 Get urls if any in AA::D dictionary for pushbuttons 2020-10-15 19:42:36 +02:00
Jonas Jenwald
bc6b47a50e Convert PartialEvaluator.translateFont to an async method
This allows us to make a slight simplification in `PartialEvaluator.loadFont`, which thus removes an old TODO-comment from the method.
Furthermore, in `PartialEvaluator.translateFont`, the CMap-handling is now limited to only *composite* fonts to avoid having to wait for a "dummy"-Promise for most fonts.
2020-10-15 09:42:58 +02:00
Tim van der Meij
a373137304
Merge pull request #12429 from calixteman/collect_js
[api-minor] Add the possibility to collect Javascript actions
2020-10-14 23:27:47 +02:00
Calixte Denizet
71ecc3129b Add the possibility to collect Javascript actions 2020-10-14 10:44:16 +02:00
Tim van der Meij
1034769ca1
Merge pull request #12477 from Snuffleupagus/SaveDocument-WorkerTask
Handle `WorkerTask`s, and various PDF document properties, correctly in the "SaveDocument" handler in `src/core/worker.js`
2020-10-13 21:11:54 +02:00
Jonas Jenwald
65132ba5d8 Handle WorkerTasks, and various PDF document properties, correctly in the "SaveDocument" handler in src/core/worker.js
- Actually register/unregister the `WorkerTask`s, used when saving each page, correctly.
   To prevent issues when terminating the Worker, we purposely wait for all running `WorkerTask`s to complete first. Hence we need to actually handle `WorkerTask`s the same way in "SaveDocument" as in the rest of this file, see e.g. "GetOperatorList" and "GetTextContent".

 - Access `PDFDocument` properties in a generally safe/consistent way.
   While the current code works fine, given how the PDF document is being loaded, it still seems like a very good idea to be *consistent* in how we access these kind of properties (since in general you need to avoid `MissingDataException` everywhere in this file).

 - Change a variable name, since there's essentially no precedent in the code-base for *local* variable names to start with an underscore.
2020-10-13 19:30:43 +02:00
Jonas Jenwald
38629c345d Remove the scope parameter from the "GetOperatorList" handler in src/core/worker.js (PR 11110 follow-up)
Support for the `scope` parameter, in `MessageHandler.on`, was removed in PR 11110 however this particular case was unused/unnecessary for years prior to that change. (From a quick look through the history, I'm not even sure if it was actually needed in the first place.)
2020-10-13 15:58:38 +02:00
Jonas Jenwald
30e8d5dea1 Add local caching of TilingPatterns in PartialEvaluator.getOperatorList (issue 2765 and 8473)
In practice it's not uncommon for PDF documents to re-use the same TilingPatterns more than once, and parsing them is essentially equal to parsing of a (small) page since a `getOperatorList` call is required.

By caching the internal TilingPattern representation we can thus avoid having to re-parse the same data over and over, and there's also *less* asynchronous parsing required for repeated TilingPatterns.

Initially I had intended to include (standard) benchmark results with this patch, however it's not entirely clear that this is actually necessary here given the preliminary results.
When testing this manually in the development viewer, using `pdfBug=Stats`, the following (approximate) reduction in rendering times were observed when comparing `master` against this patch:
 - http://pubs.usgs.gov/sim/3067/pdf/sim3067sheet-2.pdf (from issue 2765): `6800 ms` -> `4100 ms`.
 - https://github.com/mozilla/pdf.js/files/1046131/stepped.pdf (from issue 8473): `54000 ms` -> `13000 ms`
 - https://github.com/mozilla/pdf.js/files/1046130/proof.pdf (from issue 8473): `5900 ms` -> `2500 ms`

As always, whenever you're dealing with documents which are "slow", there's usually a certain level of subjectivity involved with regards to what's deemed acceptable performance.
Hence it's not clear to me that we want to regard any of the referenced issues as fixed, however the improvements are significant enough to warrant caching of TilingPatterns in my opinion.
2020-10-08 18:43:21 +02:00
Jani Pehkonen
935568c2f1 Fix invalid XUID entries in CFF fonts
In CFF fonts, entry `XUID` should be an array that has no more than
16 elements. In the issue, the length is 20, which causes the fonts to fail.
See Appendix B, "Implementation Limits" in PostScript Language Reference Manual
https://web.archive.org/web/20170218093716/https://www.adobe.com/products/postscript/pdfs/PLRM.pdf
Actually entries `XUID` and `UniqueID` are obsolete altogether.
https://blogs.adobe.com/CCJKType/2016/06/no-more-xuid-arrays.html
2020-10-05 17:38:01 +03:00
Jonas Jenwald
9416b14e8b Re-factor how the ESLint no-var rule is enabled in the src/ folder
This simplifies/consolidates the ESLint configuration slightly in the `src/` folder, and prevents the addition of any new files where `var` is being used.[1]
Hence we no longer need to manually add `/* eslint no-var: error */` in files, which is easy to forget, and can instead disable the rule in the `src/core/` files where `var` is still in use.

---
[1] Obviously the `no-var` rule can, in the same way as every other rule, be disabled on a case-by-case basis where actually necessary.
2020-10-03 20:15:29 +02:00
Tim van der Meij
48e27a1a22
Merge pull request #12437 from Snuffleupagus/src-display-no-var
Enable the ESLint `no-var` rule in the `src/display/` folder
2020-10-03 19:59:56 +02:00
Tim van der Meij
6ff1fe4ea9
Merge pull request #12333 from calixteman/tooltip
Add tooltip if any in annotations layer
2020-10-03 19:50:39 +02:00
Jonas Jenwald
2a7d1557f9 Enable the ESLint no-var rule in the src/shared/ folder
Previously this rule has been enabled in the `web/` folder, and in select files in the `src/` sub-folders.
In this case, enabling of this rule didn't actually require any further code changes.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-var
2020-10-03 08:27:45 +02:00
Jonas Jenwald
52f6016e6c Fix the remaining ESLint no-var errors in the src/display/ folder
While most of necessary changes were fixed automatically, see the previous patch, there's a number of cases that needed to be fixed manually.
2020-10-02 16:29:13 +02:00
Jonas Jenwald
e557be5a17 Re-format the src/display/ files to enforce the ESLint no-var rule
This was done automatically, using `gulp lint --fix`.
2020-10-02 16:17:28 +02:00
Jonas Jenwald
2a8983d76b Enable the ESLint no-var rule in the src/display/ folder
Previously this rule has been enabled in the `web/` folder, and in select files in the `src/` sub-folders.
Note that a number of the files in the `src/display/` folder were already enforcing the `no-var` rule, and thanks to Prettier the necessary re-writing will be (mostly) handled automatically.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-var
2020-10-02 16:16:23 +02:00
calixteman
20b12d2bda Add tooltip if any in annotations layer 2020-10-02 10:11:18 +02:00
Jonas Jenwald
bd3b15b897 Use the cidToGidMap, if it exists, when building the glyph mapping for non-embedded composite fonts (issue 12418) 2020-09-28 14:40:43 +02:00
Tim van der Meij
120c5c2261
Merge pull request #12409 from Snuffleupagus/bug-1627030
Compute the `transformOrigin` correctly, for negative values, when rendering `AnnotationElement`s (bug 1627030)
2020-09-24 23:48:21 +02:00
Calixte Denizet
5af352e65a Need to reset the streams when printing 2020-09-24 19:13:09 +02:00
Jonas Jenwald
fca53a8eb0 Compute the transformOrigin correctly, for negative values, when rendering AnnotationElements (bug 1627030)
This changes the `transformOrigin` calculations in `AnnotationElement._createContainer` and `PopupAnnotationElement.render`, to ensure that e.g. the clickable area of annotations and/or popups are both positioned correctly.

The problem occurs for *negative* values, since they're not negated correctly because of how the `transformOrigin` strings were build; see issue 12406 for a more in-depth explanation. Previously, for negative values, the `transformOrigin` strings would thus be ignored since they're not valid.
2020-09-24 10:28:29 +02:00
Jonas Jenwald
2497e8eab9 Prevent errors if the InkList property, in InkAnnotations, is missing and/or not an Array (issue 12392)
To prevent a future bug, the `Vertices` property in PolylineAnnotations are handled the same way.
2020-09-19 15:34:32 +02:00
Calixte Denizet
d51e7e86ff Use the same kind of strings for radio values 2020-09-16 18:47:25 +02:00
Tim van der Meij
558d3870d3
Merge pull request #12369 from emilio/better-cancelation-follow-up
canvas: fix restore() with existing SMask groups and re-land #12363.
2020-09-15 23:19:17 +02:00
Tim van der Meij
374aad77c4
Merge pull request #12375 from Snuffleupagus/emptyDict-set
Ensure that the empty dictionary won't be accidentally modified, and slightly improve the "SaveDocument" handler in `src/core/worker.js`
2020-09-15 23:04:57 +02:00
Calixte Denizet
16dd5403c7 Set parent of radio annotation even if there is no 'V' field 2020-09-15 14:41:57 +02:00
Jonas Jenwald
ed4e7cd8a4 A couple of small improvements in the "SaveDocument" handler in src/core/worker.js
- Check that the "Info"-entry, in the XRef-trailer, is actually a dictionary before accessing it. This is similar to the `PDFDocument.documentInfo` method and follows the general principal of validating data carefully before accessing it, given how often PDF-software may create corrupt PDF files.

 - Slightly simplify the "XFA"-lookup, since there's no point in trying to fetch something from the empty dictionary.
2020-09-15 09:57:40 +02:00
Jonas Jenwald
a531c98cd2 Ensure that the empty dictionary won't be accidentally modified
Currently there's nothing that prevents modification of the `Dict.empty` primitive, which obviously needs to be *truly* empty to prevent any future (hard to find) bugs.
2020-09-15 09:29:00 +02:00
Tim van der Meij
b0c7a74a0c
Merge pull request #12361 from Snuffleupagus/_getSaveFieldResources
Ensure that all necessary /Font resources are included when saving a `WidgetAnnotation`-instance (issue 12294)
2020-09-15 00:09:31 +02:00
Tim van der Meij
9d7b1d89ca
Merge pull request #12370 from timvandermeij/annotation-reset
Implement resetting of created streams for annotations
2020-09-14 23:16:17 +02:00
Tim van der Meij
3ecd984758
Implement resetting of created streams for annotations 2020-09-14 23:08:50 +02:00
Calixte Denizet
0c8de5aaf9 Replace \n and \r by \n and \r when saving a string 2020-09-14 17:34:39 +02:00
Jonas Jenwald
c992b8e460 Ensure that all necessary /Font resources are included when saving a WidgetAnnotation-instance (issue 12294)
This patch contains a possible approach for fixing issue 12294, which compared to other PRs is purposely limited to the affected `WidgetAnnotation` code.

As mentioned elsewhere, considering that we're (at least for now) trying to fix *one specific* case, I think that we should avoid modifying the `Dict` primitive[1] and/or avoid a solution that (indirectly) modifies an existing `Dict`-instance[2].
This patch simply fixes the issue at hand, since that seems easiest for now, and I'd suggest that we worry about a more general approach if/when that actually becomes necessary.

Hence the solution implemented here, for `WidgetAnnotation`, is to simply use a combination of the local *and* AcroForm /DR resources during OperatorList-parsing to ensure that things work correctly regardless of where a particular /Font resource is found.
For saving of form-data, on the other hand, we want to avoid increasing the file-size unnecessarily and need to be smarter than just merging all of the available resources. To achive this, a new `WidgetAnnotation._getSaveFieldResources` method will when necessary produce a combined resources `Dict` with only the minimum amount of data from the AcroForm /DR resources included.

---
[1] You want to avoid anything that could cause the general `Dict` implementation to become slower, or more complex, just for handling an edge-case in my opinion.

[2] If an existing `Dict`-instance is modified unexpectedly, that could very easily lead to problems elsewhere since e.g. `Dict`-instances created during parsing are not expected to be changed.
2020-09-14 15:22:40 +02:00
Emilio Cobos Álvarez
bf8b1adf73 canvas: Properly restore all the remaining items in stateStack in endDrawing.
We were correctly finishing the SMask group but not restoring all the extra
transformations applied in stateStack, so if somebody ends up drawing to the
same context after canceling mid-draw we'd get artifacts.

This re-lands #12363 and fixes Mozilla bug 1664178[1].

[1]: https://bugzilla.mozilla.org/show_bug.cgi?id=1664178
2020-09-12 16:37:54 +02:00
Emilio Cobos Álvarez
3a277f3ba5 canvas: restore() should reflect that smask groups are finished when stateStack is empty.
This fixes the issue that caused #12363 to get reverted, see #12367.
When we end the SMask group and stateStack.length is zero, nothing updates
this.current to reflect it.
2020-09-12 16:37:54 +02:00
Jonas Jenwald
f43d1b316b
Revert "canvas: Properly restore all the remaining items in stateStack in endDrawing" 2020-09-12 16:15:33 +02:00
Tim van der Meij
cdac6f4e68
Merge pull request #12363 from emilio/better-cancelation
canvas: Properly restore all the remaining items in stateStack in endDrawing
2020-09-12 15:03:34 +02:00
Emilio Cobos Álvarez
ef1e9a1a3e
canvas: Properly restore all the remaining items in stateStack in endDrawing.
We were correctly finishing the SMask group but not restoring all the extra
transformations applied in stateStack, so if somebody ends up drawing to the
same context after canceling mid-draw we'd get artifacts.

This fixes Mozilla bug 1664178[1].

[1]: https://bugzilla.mozilla.org/show_bug.cgi?id=1664178
2020-09-12 13:50:56 +02:00
Tim van der Meij
dfebe7b907
Merge pull request #12365 from Snuffleupagus/forbid-DecodeStream.length
Ensure that the `length` property won't be *accidentally* accessed on a `DecodeStream`-instance
2020-09-11 22:18:30 +02:00
Jonas Jenwald
a11b7341a1 Ensure that the length property won't be *accidentally* accessed on a DecodeStream-instance
For these streams, compared to `Stream` and `ChunkedStream`, there's no well defined concept of length and consequently no `length` getter.[1] However, attempting to access the non-existent `length` won't currently error, but just return `undefined`, which could thus easily lead to bugs elsewhere in the code-base.

---
[1] However, note that *all* stream implementations have an `isEmpty` getter which can be used instead.
2020-09-11 13:25:40 +02:00
Calixte Denizet
fc154590e8 Dict keys need to be escaped too when saving 2020-09-11 12:25:05 +02:00
Tim van der Meij
8cfcd7a488
Merge pull request #12360 from calixteman/12359
Reset cursor position when focus is out of text field
2020-09-10 23:11:35 +02:00
Calixte Denizet
dc4eb71ff1 PDF names need to be escaped when saving 2020-09-10 16:08:13 +02:00
Calixte Denizet
44b24fcc29 Reset cursor position when focus is out of text field 2020-09-10 10:37:13 +02:00
Tim van der Meij
f9d56320f5
Merge pull request #12349 from calixteman/followup_12344
Follow-up of pr #12344
2020-09-09 23:40:53 +02:00
Calixte Denizet
908e7ae5e4 Set the modification date to the current day when saving 2020-09-09 19:06:39 +02:00
Calixte Denizet
64a6efd95e Follow-up of pr #12344 2020-09-09 11:46:02 +02:00
Brendan Dahl
e51e9d1f33
Merge pull request #12345 from calixteman/save_btn
Don't try to save something for a button which is neither a checkbox nor a radio
2020-09-08 15:44:04 -07:00
calixteman
68b99c59ee
Save form data in XFA datasets when pdf is a mix of acroforms and xfa (#12344)
* Move display/xml_parser.js in shared to use it in worker

* Save form data in XFA datasets when pdf is a mix of acroforms and xfa

Co-authored-by: Brendan Dahl <brendan.dahl@gmail.com>
2020-09-08 15:13:52 -07:00
Calixte Denizet
7e5026dfc5 Don't try to save something for a button which is neither a checkbox nor a radio 2020-09-08 20:47:46 +02:00
Tim van der Meij
20c891542b
Merge pull request #12269 from calixteman/highlight
Add support for missing appearances for hightlights, strikeout, squiggly and underline annotations.
2020-09-06 22:25:36 +02:00
Jonas Jenwald
babeae9448 Remove, manually implemented, DOM polyfills only necessary for IE 11 support
Please refer to the following compatibility information:
 - https://developer.mozilla.org/en-US/docs/Web/API/ChildNode/remove#Browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/API/DOMTokenList/add#Browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/API/DOMTokenList/remove#Browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/API/DOMTokenList/toggle#Browser_compatibility

Finally, for the `pushState`/`replaceState` polyfills, please refer to PRs 10461 and 11318 for additional details.
2020-09-06 18:24:17 +02:00
Calixte Denizet
65ecd981fe Add support for missing appearances for hightlights, strikeout, squiggly and underline annotations. 2020-09-06 15:40:15 +02:00
Tim van der Meij
50958c46f7
Merge pull request #12331 from Snuffleupagus/Promise-polyfill
[api-minor] Only support browsers/environments that have *basic* support for `Promise` natively
2020-09-06 14:19:55 +02:00
Jonas Jenwald
449c7763d5 [api-minor] Only support browsers/environments that have *basic* support for Promise natively
Based on https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise#Browser_compatibility and https://caniuse.com/#feat=promises, all even remotely modern browsers already support *basic* `Promise` functionality natively.

The only reason for keeping the `Promise` polyfill (at all) is to be able to support recent additions to the specification, such as e.g. `finally` and `allSettled`.
Note that this patch will, on its own, remove support for IE 11/Edge (non-Chromium based) in both the general PDF.js library and the default viewer.
2020-09-06 13:45:56 +02:00
Jonas Jenwald
6c8f1f7d6f Run gulp lint --fix, to account for changes in Prettier version 2.1.x 2020-09-06 12:23:59 +02:00
Jonas Jenwald
784a420027 Add support, in Dict.merge, for merging of "sub"-dictionaries
This allows for merging of dictionaries one level deeper than previously. This could be useful e.g. for /Resources dictionaries, where you want to e.g. merge their respective /Font dictionaries (and other) together rather than picking just the first one.
2020-08-30 23:18:32 +02:00
Jonas Jenwald
66aabe3ec7 [api-minor] Add support for toggling of Optional Content in the viewer (issue 12096)
*Besides, obviously, adding viewer support:* This patch attempts to improve the general API for Optional Content Groups slightly, by adding a couple of new methods for interacting with the (more complex) data structures of `OptionalContentConfig`-instances. (Thus allowing us to mark some of the data as "private", given that it probably shouldn't be manipulated directly.)

By utilizing not just the "raw" Optional Content Groups, but the data from the `/Order` array when available, we can thus display the Layers in a proper tree-structure with collapsible headings for PDF documents that utilizes that feature.

Note that it's possible to reset all Optional Content Groups to their default visibility state, simply by double-clicking on the Layers-button in the sidebar.
(Currently that's indicated in the Layers-button tooltip, which is obviously easy to overlook, however it's probably the best we can do for now without adding more buttons, or even a dropdown-toolbar, to the sidebar.)

Also, the current Layers-button icons are a little rough around the edges, quite literally, but given that the viewer will soon have its UI modernized anyway they hopefully suffice in the meantime.

To give users *full* control of the visibility of the various Optional Content Groups, even those which according to the `/Order` array should not (by default) be toggleable in the UI, this patch will place those under a *custom* heading which:
 - Is collapsed by default, and placed at the bottom of the Layers-tree, to be a bit less obtrusive.
 - Uses a slightly different formatting, compared to the "regular" headings.
 - Is localizable.

Finally, note that the thumbnails are *purposely* always rendered with all Optional Content Groups at their default visibility state, since that seems the most useful and it's also consistent with other viewers.
To ensure that this works as intended, we'll thus disable the `PDFThumbnailView.setImage` functionality when the Optional Content Groups have been changed in the viewer. (This obviously means that we'll re-render thumbnails instead of using the rendered pages. However, this situation ought to be rare enough for this to not really be a problem.)
2020-08-30 16:28:40 +02:00
Jonas Jenwald
2393443e73 Include the /Order array, if available, when parsing the Optional Content configuration
The `/Order` array is used to improve the display of Optional Content groups in PDF viewers, and it allows a PDF document to e.g. specify that Optional Content groups should be displayed as a (collapsable) tree-structure rather than as just a list.

Note that not all available Optional Content groups must be present in the `/Order` array, and PDF viewers will often (by default) hide those toggles in the UI.
To allow us to improve the UX around toggling of Optional Content groups, in the default viewer, these hidden-by-default groups are thus appended to the parsed `/Order` array under a *custom* nesting level (with `name == null`).

Finally, the patch also slightly tweaks an `OptionalContentConfig` related JSDoc-comment in the API.
2020-08-30 16:28:40 +02:00
Tim van der Meij
06b53d770a
Merge pull request #12259 from brendandahl/cmap-fix
Fix handling of symbolic fonts and unicode cmaps.
2020-08-30 16:01:24 +02:00
Brendan Dahl
45e8a31cc0 Fix handling of symbolic fonts and unicode cmaps.
In issue 12120, the font has a 1,0 cmap and is marked symbolic which
according to the spec means we should directly use the cmap instead of
the extra steps that are defined in 9.6.6.4.

However, just fixing that caused bug 1057544 to break. The font in bug
1057544 has a 0,1 cmap (Unicode 1.1) which we were not using, but is
easy to support. We're also easily able to support some of the other
unicode cmaps, so I added those as well.

There was also a second issue with bug 1057544, the cmap doesn't have
a mapping for the "quoteright" glyph, but it is defined in the post
table. To handle this, I've moved post table as a  fallback for any
font that has an encoding.
2020-08-27 14:33:11 -07:00
Calixte Denizet
ba94f04ba3 Bug 1661226 - Push button are not rendered with renderInteractiveForms enabled 2020-08-27 10:45:14 +02:00
Tim van der Meij
0f229d537f
Inline the setup method in the parse method in src/core/document.js
Now that the `parse` method is simplified we can inline the `setup`
method in the `parse` method since it's only two lines of code. This
avoids some indirection.
2020-08-25 23:28:55 +02:00
Tim van der Meij
280207c740
Redo the form type detection logic and include unit tests
Good form type detection is important to get reliable telemetry and to
only show the fallback bar if a form cannot be filled out by the user.

PDF.js only supports AcroForm data, so XFA data is explicitly unsupported
(tracked in issue #2373). However, the previous form type detection
couldn't separate AcroForm and XFA well enough, causing form type
telemetry to be incorrect sometimes and the fallback bar to be shown for
forms that could in fact be filled out by the user.

The solution in this commit is found by studying the specification and
the form documents that are available to us. In a nutshell the rules are:

- There is XFA data if the `XFA` entry is a non-empty array or stream.
- There is AcroForm data if the `Fields` entry is a non-empty array and
  it doesn't consist of only document signatures.

The document signatures part was not handled in the old code, causing a
document with only XFA data to also be marked as having AcroForm data.
Moreover, the old code didn't check all the data types.

Now that AcroForm and XFA can be distinguished, the viewer is configured
to only show the fallback bar for documents that only have XFA data. If
a document also has AcroForm data, the viewer can use that to render the
form. We have not found documents where the XFA data was necessary in
that case.

Finally, we include unit tests to ensure that all cases are covered and
move the form type detection out of the `parse` function so that it's
only executed if the document information is actually requested
(potentially making initial parsing a tiny bit faster).
2020-08-25 23:28:55 +02:00
Tim van der Meij
f0bf62ff54
Mark the catDict member as private in the Catalog class
Not only is `catDict` never accessed anymore outside of this file, it
should also never happen since it's internal to the catalog. If data
from it is needed elsewhere, the catalog should provide a getter for it
that can do basic data integrity checks and abstract away any
unnecessary details.
2020-08-25 23:28:55 +02:00
Tim van der Meij
f20f0bcc78
Move the AcroForm logic from the document to the catalog
The `AcroForm` entry is part of the catalog, not of the document, so its
logic should be placed there instead. The document should look in the
catalog to fetch it, and not have knowledge of `catDict`, which is a
member internal to the catalog.

Moreover, make the AcroForm member private on the document instance. It's
only used internally and was also never intended to be public. For users
it's exposed by the `getMetadata` API endpoint as `IsAcroFormPresent`.
Only a boolean is exposed, so we now also only store the boolean on the
document instance.

Finally, the annotation code needs access to the full AcroForm
dictionary, so it's updated to fetch the data from the catalog instead
of the document that now only holds the boolean.
2020-08-25 23:28:55 +02:00
Tim van der Meij
b41a2f4d5a
Move the collection logic from the document to the catalog
The `Collection` entry is part of the catalog, not of the document, so
its logic should be placed there instead. The document should look in the
catalog to fetch it, and not have knowledge of `catDict`, which is a
member internal to the catalog.

Moreover, remove the collection member from the document instance. It's
only used internally and was also never intended to be public. For users
it's exposed by the `getMetadata` API endpoint as `IsCollectionPresent`.
Moving this out of the `parse` function makes sure that the getter is
only executed if the document information is actually requested
(potentially making initial parsing a tiny bit faster).
2020-08-25 23:28:55 +02:00
Tim van der Meij
935d95b462
Move the version logic from the document to the catalog
The `Version` entry is part of the catalog, not of the document, so its
logic should be placed there instead. The document should look in the
catalog to fetch it, and not have knowledge of `catDict`, which is a
member internal to the catalog.

Moreover, make the version member private on the document instance. It's
only used internally and was also never intended to be public. For users
it's exposed by the `getMetadata` API endpoint as `PDFFormatVersion`.

Finally, clarify how the version from the header and the version from
the catalog are treated using a comment.
2020-08-25 23:28:55 +02:00
Jonas Jenwald
bd16c363ce Access the Catalog data correctly in the "GetPageIndex" handler in src/core/worker.js
Even though the code obviously works as-is, given that we have unit-tests for it, it still feels incorrect to just *assume* that the `Catalog`-instance has all of its properties immediately available. Especially when (almost) all of the other handlers, in `src/core/worker.js`, protect their data accesses with appropriate `pdfManager.ensure` calls.
2020-08-25 12:14:14 +02:00
Jonas Jenwald
2e6e2c3b41 Access the XRef data correctly in the "GetStats" handler in src/core/worker.js
Even though the code obviously works as-is, given that we have unit-tests for it, it still feels incorrect to just *assume* that the `XRef`-instance has all of its properties immediately available. Especially when (almost) all of the other handlers, in `src/core/worker.js`, protect their data accesses with appropriate `pdfManager.ensure` calls.
2020-08-25 12:14:11 +02:00
Jani Pehkonen
e7febbf0f7 Accent positioning in Type1 seac glyphs
In `display/canvas.js` the accent offsets must be multiplied by `fontSize` to make the offsets large enough. Another problem is in `core/type1_parser.js` when the Type1 command `seac` is handled. There is an error in the Adobe Type1 spec. See chapter 6 in Type1 Font Format Supplement, which provides an errata: The arguments of `seac` specify the offset of the left side bearing (LSB) points, not the offset of origins. This can be fixed in `core/type1_parser.js` by adding the difference of the LSB values.
2020-08-23 21:01:25 +03:00
Tim van der Meij
7df8aa34a5
Merge pull request #12263 from timvandermeij/acroform-fixes
Fix AcroForm printing/saving edge cases
2020-08-23 13:37:30 +02:00
Tim van der Meij
a8efc0296b
Obtain the export values for choice widgets from the normal appearance
The down appearance (`D`) is optional and not available in the document
from #12233, so the checkboxes are never saved/printed as checked
because the checked appearance is based on the export value that is
missing because the `D` entry is not available.

Instead, we should use the normal appearance (`N`) since that one is
required and therefore always available.

Finally, the /Off appearance is optional according to section 12.7.4.2.3
of the specification, so that needs to be taken into account to match
the specification and to fix reference test failures for the
`annotation-button-widget-print` test. That is a file that doesn't
specify an /Off appearance in the normal appearance dictionary.
2020-08-23 13:00:02 +02:00
Tim van der Meij
1b82ad8fff
Decode widget form values consistently
The helper method `_decodeFormValue` is used to ensure that it happens
in one place. Note that form values are field values, display values
and export values.
2020-08-23 13:00:01 +02:00
Jonas Jenwald
fa02808f76 Mark the setModified method, on AnnotationStorage, as "private" (PR 12241 follow-up)
Since it shouldn't be called manually, we can just mark it as "private".
2020-08-22 20:04:25 +02:00
Jonas Jenwald
1f5021d76a Prevent errors if PDFDocumentProxy.saveDocument is called without the annotationStorage parameter (PR 12241 follow-up)
Obviously it doesn't make sense to call that method without providing an `AnnotationStorage`-instance, however we should ensure that doing so won't cause errors.
Hence we need to check that `annotationStorage` is actually defined, before attempting to call its `resetModified` method.
2020-08-22 18:09:17 +02:00
Tim van der Meij
36e149800e
Unconditionally set the field value for choice widgets in the annotation storage
This commit makes the following improvements:

- The code is similar to the other interactive form widgets now, with a
  clear note for the only difference.
- Calling `getOrCreateValue` unconditionally ensures that choice widgets
  always have a value in the annotation storage. Previously we only
  inserted a value in the annotation storage when an option matched or
  when a selection was changed. However, this causes breakage when
  saving/printing because comboboxes, which we don't fully support yet
  but are rendered, might not have a value in storage at all. Their
  field value might not match any option since it allows the user to
  enter a custom value.
- Calling `getOrCreateValue` unconditionally ensures that forms with
  choice widgets no longer always trigger a warning when the user
  navigates away from the page. This fixes
  https://github.com/mozilla/pdf.js/pull/12241#discussion_r474279654
2020-08-22 16:01:33 +02:00
Jonas Jenwald
a8de614a9f Also enable renderInteractiveForms by default in the viewer components (PR 12201 follow-up)
Given that `renderInteractiveForms` is now enabled by default in "full" viewer, it seems reasonable to enable it by default in the viewer components as well.
Especially considering that it's simple to disable, when creating the affected components, for anyone implementing their own viewer.
2020-08-22 14:24:04 +02:00
Tim van der Meij
5fed7112a2
Use the export value instead of the display value for choice widget option selection
The export value is used when the document is saved, so it should also
be used when the document is opened to determine which choice widget
option is selected. The display value is, as the name implies, only to
be used for viewer display purposes and not for other logic.

This makes sure that in the document from #12233 the "Favourite colour"
choice widget is correctly initialized with "Red" instead of "Black"
because the field value is equal to the export value (always the case),
but not the display value (not always the case). Moreover, saving now
also correctly uses the export value and not the display value.
2020-08-22 14:11:41 +02:00
Tim van der Meij
3c790936c1
Merge pull request #12247 from timvandermeij/acroform-choice-null
Improve the field value parsing for choice widgets to handle `null` values
2020-08-21 23:17:20 +02:00
Brendan Dahl
8023175103 Support file save triggered from the Firefox integrated version.
Related to https://bugzilla.mozilla.org/show_bug.cgi?id=1659753

This allows Firefox trigger a "save" event from ctrl/cmd+s or the "Save
Page As" context menu, which in turn lets pdf.js generate a new PDF if
there is form data to save.

I also now use `sourceEventType` on downloads so Firefox can determine if
it should launch the "open with" dialog or "save as" dialog.
2020-08-20 18:05:08 -07:00
Aki Sasaki
83365a3756 confirm if leaving a modified form without saving 2020-08-20 17:23:06 -07:00
Tim van der Meij
12c20772ac
Improve the field value parsing for choice widgets to handle null values
The specification states that the field value is `null` if no item is
selected and we didn't handle this case properly. Even though this did
not break the rendering because we always convert the value to an array
and the `includes` check in the display layer would simply not match,
the field value would be `[null]` which is not expected and strange from
an API perspective.

This commit fixes that by ensuring that we return an empty array in
case the field value is `null`. The API therefore still always gives an
array for the field value, but now the code is more specific so that the
value is either an empty array or an array of strings.
2020-08-19 23:27:50 +02:00
Jonas Jenwald
1058f16605 Add (basic) support for transfer functions to Images (issue 6931, bug 1149713)
This is *similar* to the existing transfer function support for SMasks, but extended to simple image data.
Please note that the extra amount of data now being sent to the worker-thread, for affected /ExtGState entries, is limited to *at most* 4 `Uint8Array`s each with a length of 256 elements.

Refer to https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G9.1658137 for additional details.
2020-08-17 10:34:12 +02:00
Jonas Jenwald
9d3e046a4f Don't cache /ExtGState entries that contain fonts (PR 12087 follow-up)
I completely overlooked the fact that `PartialEvaluator.handleSetFont` also updates the current `state`, which means that currently we're not actually handling font data correctly for cached /ExtGState data. (Thankfully, using /ExtGState to set a font is somewhat rare in practice.)
2020-08-17 08:17:25 +02:00
Jonas Jenwald
b26d736809 Ensure that the "DocException" message handler, in the API, will always either error or warn (depending on the build) if a valid Error isn't found
Having this present would have made debugging issues 11941 and 12209 so much quicker and easier.
2020-08-13 13:17:30 +02:00
Calixte Denizet
1a6816ba98 Add support for saving forms 2020-08-12 10:32:59 +02:00
Tim van der Meij
57c988853b
Merge pull request #12192 from Snuffleupagus/misc-AnnotationStorage-improvements
A couple of (small) tweaks of the `AnnotationStorage` (PR 12173 follow-up)
2020-08-11 23:46:13 +02:00
Brendan Dahl
7fb01f9f2a
Merge pull request #12186 from brendandahl/loca-2
Fix bad truetype loca tables.
2020-08-10 20:34:19 -07:00
Brendan Dahl
f6dff81223 Fix bad truetype loca tables.
Some fonts have loca tables that aren't sorted or use 0 as an offset to
signal a missing glyph. This fixes the bad loca tables by sorting them
and then rewriting the loca table and potentially re-ordering the glyf
table to match.

Fixes #11131 and bug 1650302.
2020-08-10 14:15:49 -07:00
Jonas Jenwald
4d351eab93 A couple of (small) tweaks of the AnnotationStorage (PR 12173 follow-up)
- Initialize the `AnnotationStorage`-instance, on `PDFDocumentProxy`, lazily.
 - Change the `AnnotationStorage` to use a `Map` internally, rather than a regular Object (simplifies the following points).
 - Let `AnnotationStorage.getAll` return `null` when there's no data stored, to avoid unnecessary parsing on the worker-thread. This ought to "just work", since the worker-thread code *should* already handle the `!annotationStorage` case everywhere.
 - Add a new `AnnotationStorage.size` getter, to be able to easily tell if there's any data stored.
2020-08-10 17:07:24 +02:00
Calixte Denizet
88b112ab0c Support comb textfields for printing 2020-08-09 14:41:26 +02:00
Tim van der Meij
b061c300b4
Merge pull request #12176 from calixteman/multiline
Support multiline textfields for printing
2020-08-09 13:37:36 +02:00
Calixte Denizet
cd8bb7293b Support multiline textfields for printing 2020-08-09 12:14:34 +02:00
Jonathan Grimes
ac723a1760 Allow loading pdf fonts into another document. 2020-08-08 02:52:32 +00:00
Takashi Tamura
4ac62d8787 Fix the type of PDFDocumentLoadingTask.destroy. 2020-08-07 16:10:19 +09:00
Tim van der Meij
8c162f57f7
Merge pull request #12175 from calixteman/textfield
Support textfield and choice widgets for printing
2020-08-07 00:20:29 +02:00
Calixte Denizet
1747d259f9 Support textfield and choice widgets for printing 2020-08-06 14:45:23 +02:00
Jonas Jenwald
16fa9dc4ea Add support for Object.fromEntries
This provides a simpler way of creating an `Object` from e.g. a `Map`, without having to manually iterate over it.
Please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/fromEntries
2020-08-06 14:39:51 +02:00
Jonas Jenwald
5e44b241b2 [api-minor] Fix the annotationStorage parameter in PDFPageProxy.render
While the parameter name (clearly) suggests that an `AnnotationStorage`-instance is expected, looking at the only call-sites that include the parameter (i.e. the `PDFPrintServiceFactory` instances) it actually contains just a normal Object.

Hence it seems much more reasonable to actually pass a valid `AnnotationStorage`-instance, as the name suggests, and simply have `PDFPageProxy.render` do the `annotationStorage.getAll()` call. (Since we cannot send an `AnnotationStorage`-instance as-is to the worker-thread, given the "structured clone algorithm".)
2020-08-05 23:02:30 +02:00
Takashi Tamura
a0f0ab78f3 Fix the type definition of TypedArray. 2020-08-05 17:01:08 +09:00
Tim van der Meij
56ca027c08
Improve consistency for the API documentation comments
Over time we used multiple different formats for JSDoc comments. This
commit standardizes those formats to the one we used most often.

Moreover, this removes the example in the outline endpoint documentation
since it now has a proper type definition and it didn't render correctly
in JSDoc.
2020-08-04 23:27:22 +02:00
Tim van der Meij
ba4a07ce07
Fix incorrect types in the API documentation 2020-08-04 23:19:59 +02:00
Tim van der Meij
3116216e1d
Improve the API documentation for PDFDocumentLoadingTask
This commit:
- formats the documentation block according to the standards;
- replaces the callback definitions with the `function` type (we have
  that for other definitions already and the callback type was not
  rendered correctly by JSDoc);
- synchronizes the type documentation and the class documentation;
- fixes the documentation by making it easier to read and making sure
  that all optional properties are indicated as such;
- uses the `@link` tag to indicate links to other code.

The `typestest` still passes and JSDoc now renders this class correctly.
2020-08-04 23:17:24 +02:00
Brendan Dahl
ac494a2278 Add support for optional marked content.
Add a new method to the API to get the optional content configuration. Add
a new render task param that accepts the above configuration.
For now, the optional content is not controllable by the user in
the viewer, but renders with the default configuration in the PDF.

All of the test files added exhibit different uses of optional content.

Fixes #269.

Fix test to work with optional content.

- Change the stopAtErrors test to ensure the operator list has something,
  instead of asserting the exact number of operators.
2020-08-04 09:26:55 -07:00
Tim van der Meij
e68ac05f18
Merge pull request #12160 from tamuratak/worker_options
Use typedef to define the type of GlobalWorkerOptions (PR 12102 follow-up)
2020-08-03 22:55:49 +02:00
Tim van der Meij
0b75701012
Merge pull request #12157 from tamuratak/fix_svggraphics
Fix the type of SVGGraphics (PR 12102 follow-up)
2020-08-03 22:52:10 +02:00
Tim van der Meij
adc7645a44
Merge pull request #12161 from tamuratak/exported_func
Add types to functions exported as API in src/pdf.js (PR 12102 follow-up)
2020-08-03 22:43:50 +02:00
Takashi Tamura
923ba27f1f Tweak for the type of PageViewportParameters.viewBox 2020-08-03 20:42:42 +09:00
Takashi Tamura
bc4648c0a6 Add types to functions exported as API in src/pdf.js. 2020-08-03 19:19:48 +09:00
Takashi Tamura
f6fd8e9e7f Use typedef to define the type of GlobalWorkerOptions. 2020-08-03 19:06:28 +09:00
Takashi Tamura
d72bbecee2 Fix the type of SVGGraphics. 2020-08-03 09:58:19 +09:00
Tim van der Meij
00a8b42e67
Merge pull request #12102 from ineiti/add_types_annotations
Add types annotations
2020-08-02 16:45:37 +02:00
Tim van der Meij
5a66c56eca
Merge pull request #12108 from calixteman/radio
Add support for radios printing
2020-08-02 14:47:46 +02:00
Jonas Jenwald
05baa4c89f
Revert "[api-minor] Allow loading pdf fonts into another document." 2020-08-01 12:52:39 +02:00
Tim van der Meij
173b92a873
Merge pull request #12131 from jsg2021/issue-8271
[api-minor] Allow loading pdf fonts into another document.
2020-08-01 01:13:41 +02:00
Tim van der Meij
0d20a2b7b4
Merge pull request #12144 from Snuffleupagus/uncaught-promise-AbortException
Prevent `Uncaught (in promise) AbortException` when running the unit-tests
2020-08-01 00:22:10 +02:00
Jonas Jenwald
6d192f987e Prevent Uncaught (in promise) AbortException when running the unit-tests
These errors can/will occur if data is still loading when the document is destroyed, which is the case in the API unit-tests that load the `tracemonkey.pdf` file.
While this patch prevents these kind of problems, and thus allows us to update Jasmine again, I cannot help but thinking that it's slightly "hacky". Basically, we'll simply catch and ignore (some) rejected promises once the document is destroyed and/or its data loading is aborted. However, I don't *think* that these changes should cause issues in general, since we don't really care about errors once document destruction has started (note e.g. the fair number of `catch` handlers ignoring `AbortException`s already).
2020-07-31 23:29:05 +02:00
Jonathan Grimes
9b16b8ef71
Allow loading pdf fonts into another document. 2020-07-31 11:41:48 -05:00
Jonas Jenwald
346afd1e1c [api-minor] Fix the AnnotationStorage usage properly in the viewer/tests (PR 12107 and 12143 follow-up)
*The [api-minor] label probably ought to have been added to the original PR, given the changes to the `createAnnotationLayerBuilder` signature (if nothing else).*

This patch fixes the following things:
 - Let the `AnnotationLayer.render` method create an `AnnotationStorage`-instance if none was provided, thus making the parameter *properly* optional. This not only fixes the reference tests, it also prevents issues when the viewer components are used.
 - Stop exporting `AnnotationStorage` in the official API, i.e. the `src/pdf.js` file, since it's no longer necessary given the change above. Generally speaking, unless absolutely necessary we probably shouldn't export unused things in the API.
 - Fix a number of JSDocs `typedef`s, in `src/display/` and `web/` code, to actually account for the new `annotationStorage` parameter.
 - Update `web/interfaces.js` to account for the changes in `createAnnotationLayerBuilder`.
 - Initialize the storage, in `AnnotationStorage`, using `Object.create(null)` rather than `{}` (which is the PDF.js default).
2020-07-31 16:32:46 +02:00
Calixte Denizet
538017f7a7 Add support for radios printing 2020-07-31 14:31:49 +02:00
Aki Sasaki
7bb65bab7f fix reftests after #12107
The f1040-annotations reftest started hanging after #12107. We traced
this to `TypeError: can't access property "getOrCreateValue", storage is
undefined`.

We essentially need to add `annotationStorage` to the parameters in
test/driver.js.
2020-07-30 12:25:27 -07:00
Linus Gasser
f1bbfdc16d Add typescript definitions
This PR adds typescript definitions from the JSDoc already present.
It adds a new gulp-target 'types' that calls 'tsc', the typescript
compiler, to create the definitions.

To use the definitions, users can simply do the following:

```
import {getDocument, GlobalWorkerOptions} from "pdfjs-dist";
import pdfjsWorker from "pdfjs-dist/build/pdf.worker.entry";
GlobalWorkerOptions.workerSrc = pdfjsWorker;

const pdf = await getDocument("file:///some.pdf").promise;
```

Co-authored-by: @oBusk
Co-authored-by: @tamuratak
2020-07-30 11:10:37 +02:00
Tim van der Meij
eb4d6a0652
Merge pull request #12107 from calixteman/checkbox
Add support for checkboxes printing
2020-07-30 00:11:41 +02:00
Calixte Denizet
cb60523a15 Add support for checkboxes printing 2020-07-29 16:42:57 +02:00
Tim van der Meij
6537e64cb8
Merge pull request #12136 from Snuffleupagus/PDFFetchStreamRangeReader-AbortError
Ignore `fetch()` errors, in `PDFFetchStreamRangeReader`, once the request has been aborted
2020-07-29 00:09:51 +02:00
Jonas Jenwald
1b720a4b23 Ignore fetch() errors, in PDFFetchStreamRangeReader, once the request has been aborted
Besides making general sense, as far as I can tell, this patch should also prevent *one* source of `Uncaught (in promise) ...` exceptions.
Unfortunately `reason instanceof AbortError` doesn't work here, since `AbortError` isn't actually defined in browsers; note how even the DOM specification contains an example using the `name` property: https://dom.spec.whatwg.org/#aborting-ongoing-activities

This patch prevents the following errors from being logged in the console, when the unit-tests are running:
 - Firefox: `Uncaught (in promise) DOMException: The operation was aborted.`
 - Chrome: `Uncaught (in promise) DOMException: The user aborted a request.`
2020-07-28 17:18:49 +02:00
Jonas Jenwald
fbe90b63ec [src/core/worker.js] Remove a useless Promise handler from the pdfManagerReady function
Looking carefully at this code, you'll notice that the `loadDocument` function has no less than *three* Promise handling functions. This obviously makes no sense, since a Promise can only have one resolve and one reject handler.

Hence the final `onFailure`-case is unreachable, which only serves to add confusion when reading the code. Note that this code has been re-factored more than once over the years, but it seems as if this may even have been incorrect already in PR 3310 (and no-one have noticed for seven years :-).
2020-07-28 14:51:50 +02:00
Jonas Jenwald
835b5ffddd Only check isType3Font the first time that TranslatedFont.loadType3Data is called
If the `TranslatedFont.type3Loaded` property exists, then you already know that the font must be a Type3 one.
2020-07-27 13:20:15 +02:00
Jonas Jenwald
f3ff526019 Send/receive Type3 images the same way as other globally-cached images
There's quite frankly no particular reason to special-case Type3-fonts with image resources, which are very rare anyway, now that we have a general mechanism for sending/receiving images globally.
2020-07-27 13:20:15 +02:00
Jonas Jenwald
7c9d0d5939 Improve how Type3-fonts with dependencies are handled
While the `CharProcs` streams of Type3-fonts *usually* don't rely on dependencies, such as e.g. images, it does happen in some cases.

Currently any dependencies are simply appended to the parent operatorList, which in practice means *only* the operatorList of the *first* page where the Type3-font is being used.
However, there's one thing that's slightly unfortunate with that approach: Since fonts are global to the PDF document, we really ought to ensure that any Type3 dependencies are appended to the operatorList of *all* pages where the Type3-font is being used. Otherwise there's a theoretical risk that, if one page has its rendering paused, another page may try to use a Type3-font whose dependencies are not yet fully resolved. In that case there would be errors, since Type3 operatorLists are executed synchronously.

Hence this patch, which ensures that all relevant pages will have Type3 dependencies appended to the main operatorList. (Note here that the `OperatorList.addDependencies` method, via `OperatorList.addDependency`, ensures that a dependency is only added *once* to any operatorList.)

Finally, these changes also remove the need for the "waiting for the main-thread"-hack that was added to `PartialEvaluator.buildPaintImageXObject` as part of fixing issue 10717.
2020-07-27 13:20:13 +02:00
Calixte Denizet
584902dbf8 Add an annotation storage in order to save annotation data in acroforms 2020-07-24 10:50:11 +02:00
Jonas Jenwald
684a7b89ac Remove unnecessary duplication in the addChildren helper function (used by the ObjectLoader)
Besides being fewer lines of code overall, this also avoids *one* `node instanceof Dict` check for both of the `Dict`/`Stream`-cases.
2020-07-17 16:32:24 +02:00
Jonas Jenwald
ea8e432c45 Add a getRawValues method, to Dict instances, to provide an easier way of getting all *raw* values
When the old `Dict.getAll()` method was removed, it was replaced with a `Dict.getKeys()` call and `Dict.get(...)` calls (in a loop).
While this pattern obviously makes a lot of sense in many cases, there's some instances where we actually want the *raw* `Dict` values (i.e. `Ref`s where applicable). In those cases, `Dict.getRaw(...)` calls are instead used within the loop. However, by introducing a new `Dict.getRawValues()` method we can reduce the number of (strictly unnecessary) function calls by simply getting the *raw* `Dict` values directly.
2020-07-17 16:32:00 +02:00
Jonas Jenwald
6381b5b08f Add a size getter, to Dict instances, to provide an easier way of checking the number of entries
This removes the need to manually call `Dict.getKeys()` and check its length.
2020-07-17 16:06:11 +02:00
Tim van der Meij
e63d1ebff5
Merge pull request #12087 from Snuffleupagus/LocalGStateCache
Add local caching of "simple" Graphics State (ExtGState) data in `PartialEvaluator.{getOperatorList, getTextContent}` (issue 2813)
2020-07-17 16:02:45 +02:00
Tim van der Meij
b19a1796ac
Convert RefSetCache to a proper class and to use a Map internally
Using a `Map` instead of an `Object` provides some advantages such as
cheaper ways to get the size of the cache, to find out if an entry is
contained in the cache and to iterate over the cache. Moreover, we can
clear and re-use the same `Map` object now instead of creating a new
one.
2020-07-17 13:35:29 +02:00
Tim van der Meij
a604973cc7
Merge pull request #12085 from tamuratak/fix_isnodejs
Make the detection of Node.js environments on Electron strict.
2020-07-17 13:29:59 +02:00
Jonas Jenwald
b3480842b3 Use a RefSet, rather than a plain Object, for tracking already processed nodes in PartialEvaluator.hasBlendModes 2020-07-17 09:52:36 +02:00
Jonas Jenwald
03547b5633 Change PartialEvaluator.setGState to an async method
Since this method calls `Dict.get` to fetch data, there could thus be `Error`s thrown in corrupt PDF documents when attempting to resolve an indirect object.
To ensure that this won't ever become a problem, we change the method to be `async` such that a rejected Promise would be returned and general OperatorList parsing won't break.
2020-07-15 14:27:18 +02:00
Jonas Jenwald
f20aeb9343 Slightly simplify the code in PartialEvaluator.hasBlendModes, e.g. by using for...of loops
- Replace the existing loops with `for...of` variants instead.

 - Make use of `continue`, to reduce indentation and to make the code (slightly) easier to follow, when checking `/Resources` entries.
2020-07-15 12:47:11 +02:00
Jonas Jenwald
15fa3f8518 Remove a redundant /XObject stream dictionary objId check in PartialEvaluator.hasBlendModes (PR 6971 follow-up)
This case should no longer happen, given the `instanceof Ref` branch just above (added in PR 6971).
Also, I've run the entire test-suite locally with `continue` replaced by `throw new Error(...)` and didn't find any problems.
2020-07-15 12:47:11 +02:00
Jonas Jenwald
84476da26e Handle lookup errors "silently" in PartialEvaluator.hasBlendModes (PR 11680 follow-up)
Given that this method is used during what's essentially a *pre*-parsing stage, before the actual OperatorList parsing occurs, on second thought it doesn't seem at all necessary to warn and trigger fallback in cases where there's lookup errors.

*Please note:* Any any errors will still be either suppressed or thrown, according to the `ignoreErrors` option, during the *actual* OperatorList parsing.
2020-07-15 12:47:07 +02:00
Jonas Jenwald
981ff41b5f Add local caching of non-font Graphics State (ExtGState) data in PartialEvaluator.getTextContent
It turns out that `getTextContent` suffers from *similar* problems with repeated GStates as `getOperatorList`; please see the previous patch.

While only `/ExtGState` resources containing Fonts will actually be *parsed* by `PartialEvaluator.getTextContent`, we're still forced to fetch/validate repeated `/ExtGState` resources even though *most* of them won't affect the textContent (since they mostly contain purely graphical state).

With these changes we also no longer need to immediately reset the current text-state when encountering a `setGState` operator, which may thus improve text-selection in some cases.
2020-07-14 10:34:43 +02:00
Jonas Jenwald
90eb579713 Add local caching of "simple" Graphics State (ExtGState) data in PartialEvaluator.getOperatorList (issue 2813)
This patch will help pathological cases the most, with issue 2813 being a particularily problematic example. While there's only *four* `/ExtGState` resources, there's a total `29062` of `setGState` operators. Even though parsing of a single `/ExtGState` resource is quite fast, having to re-parse them thousands of times does add up quite significantly.

For simplicity we'll only cache "simple" `/ExtGState` resource, since e.g. the general `SMask` case cannot be easily cached (without re-factoring other code, which may have undesirable effects on general parsing).

By caching "simple" `/ExtGState` resource, we thus improve performance by:
 - Not having to fetch/validate/parse the same `/ExtGState` data over and over.
 - Handling of repeated `setGState` operators becomes *synchronous* during the `OperatorList` building, instead of having to defer to the event-loop/microtask-queue since the `/ExtGState` parsing is done asynchronously.

---

Obviously I had intended to include (standard) benchmark results with this patch, but for reasons I don't understand the test run-time (even with `master`) of the document in issue 2813 is *a lot* slower than in the development viewer (making normal benchmarking infeasible).
However, testing this manually in the development viewer (using `pdfBug=Stats`) shows a *reduction* of `~10 %` in the rendering time of the PDF document in issue 2813.
2020-07-14 10:34:43 +02:00
Jonas Jenwald
d4d7ac1b88 Stop special-casing the (very unlikely) "no /XObject found"-scenario, when parsing OPS.paintXObject operators, in PartialEvaluator.{getOperatorList, getTextContent}
Originally there weren't any (generally) good ways to handle errors gracefully, on the worker-side, however that's no longer the case and we can simply fallback to the existing `ignoreErrors` functionality instead.
Also, please note that the "no `/XObject` found"-scenario should be *extremely* unlikely in practice and would only occur in corrupt/broken documents.

Note that the `PartialEvaluator.getOperatorList` case is especially bad currently, since we'll simply (attempt to) send the data as-is to the main-thread. This is quite bad, since in a corrupt/broken document the data *could* contain anything and e.g. be unclonable (which would cause breaking errors).
Also, we're (obviously) not attempting to do anything with this "raw" `OPS.paintXObject` data on the main-thread and simply ensuring that we never send it definately seems like the correct approach.
2020-07-12 21:59:59 +02:00
Takashi Tamura
473ea1f1a4 Make the detection of Node.js environments on Electron strict.
The main process and its child processes should be detected as Node.js environments.
2020-07-12 10:56:17 +09:00
Tim van der Meij
7dabc5ecc8
Merge pull request #12063 from Snuffleupagus/issue-10989
Tweak the heuristic, in `src/core/jpg.js`, that handles JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10989)
2020-07-11 00:05:11 +02:00
Jonas Jenwald
d18cf47419 Remove the special handling, used when creating Indexed ColorSpaces, for the case where the lookup-data is a Stream
This special-case was added in PR 1992, however it became unnecessary with the changes in PR 4824 since all of the ColorSpace parsing is now done on the worker-thread (with only RGB-data being sent to the main-thread).
2020-07-10 17:22:55 +02:00
Jonas Jenwald
ea6a0e4435 Remove the IR (internal representation) part of the ColorSpace parsing
Originally ColorSpaces were only *partially* parsed on the worker-thread, to obtain an IR-format which was sent to the main-thread. This had the somewhat unfortunate side-effect of causing the majority of the (potentially heavy) ColorSpace parsing to happen on the main-thread.
Hence PR 4824 which, among other things, changed ColorSpaces to be *fully* parsed on the worker-thread with only RGB-data being sent to the main-thread.

While it thus originally was necessary to have `ColorSpace.{parseToIR, fromIR}` methods, to handle the worker/main-thread split, that's no longer the case and we can thus reduce all of the ColorSpace parsing to one method instead.

Currently, when parsing a ColorSpace, we call `ColorSpace.parseToIR` which parses the ColorSpace-data from the document and then creates the IR-format. We then, immediately, call `ColorSpace.fromIR` which parses the IR-format and then finally creates the actual ColorSpace.[1]
All-in-all, this leads to a fair amount of unnecessary indirection which also (in my opinion) makes the code less clear.

Obviously these changes are not really expected to have a significant effect on performance, especially with the recently added caching of ColorSpaces, however there'll now be strictly fewer function calls, less memory allocated, and overall less parsing required during ColorSpace-handling.

---
[1] For ICCBased ColorSpaces, given the validation necessary, this currently even leads to parsing an /Alternate ColorSpace *twice*.
2020-07-10 17:22:44 +02:00
Jonas Jenwald
4cc6797f17 Re-factor the idFactory functionality, used in the core/-code, and move the fontID generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.

For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.

In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 16:33:31 +02:00
Jonas Jenwald
1d66fce781 Tweak the heuristic, in src/core/jpg.js, that handles JPEG images with a wildly incorrect SOF (Start of Frame) scanLines parameter (issue 10989) 2020-07-06 13:06:49 +02:00
Jonas Jenwald
c95fbb6e21 Convert the code in src/core/evaluator.js to use standard classes
This removes additional `// eslint-disable-next-line no-shadow` usage, which our old pseudo-classes necessitated.

Most of the re-formatting changes, after the `class` definitions and methods were fixed, were done automatically by Prettier.

*Please note:* I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.
2020-07-05 16:01:04 +02:00
Jonas Jenwald
32a0b6fa73 Move some constants and helper functions out of the PartialEvaluator closure
This will simplify the `class` conversion in the next patch, and with modern JavaScript the moved code is still limited to the current module scope.

*Please note:* For improved consistency with our usual formatting, the `TILING_PATTERN`/`SHADING_PATTERN` constants where re-factored slightly.
2020-07-05 15:56:23 +02:00
Tim van der Meij
c4255fdbfd
Merge pull request #12059 from Snuffleupagus/image-class
Convert the code in `src/core/image.js` to use ES6 classes
2020-07-05 14:08:55 +02:00
Jonas Jenwald
59da1d5829 Convert the code in src/core/image.js to use ES6 classes
This removes additional `// eslint-disable-next-line no-shadow` usage, which our old pseudo-classes necessitated.

*Please note:* I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.
2020-07-05 09:34:14 +02:00
Jonas Jenwald
85ced3fbfd Allow BaseLocalCache to, optionally, only allocate storage for caching of references (PR 12034 follow-up)
*Yet another instalment in the never-ending series of things that you think of __after__ a patch has landed.*

Since `Function`s are only cached by reference, we thus don't need to allocate storage for names in `LocalFunctionCache` instances. Obviously the effect of these changes are *really tiny*, but it seems reasonable in principle to avoid allocating data structures that are guaranteed to be unused.
2020-07-04 15:01:32 +02:00
Jonas Jenwald
ca719ecaa4 Add local caching of Functions, by reference, in the PDFFunctionFactory (issue 2541)
Note that compared other structures, such as e.g. Images and ColorSpaces, `Function`s are not referred to by name, which however does bring the advantage of being able to share the cache for an *entire* page.
Furthermore, similar to ColorSpaces, the parsing of individual `Function`s are generally fast enough to not really warrant trying to cache them in any "smarter" way than by reference. (Hence trying to do caching similar to e.g. Fonts would most likely be a losing proposition, given the amount of data lookup/parsing that'd be required.)

Originally I tried implementing this similar to e.g. the recently added ColorSpace caching (and in a couple of different ways), however it unfortunately turned out to be quite ugly/unwieldy given the sheer number of functions/methods where you'd thus need to pass in a `LocalFunctionCache` instance. (Also, the affected functions/methods didn't exactly have short signatures as-is.)
After going back and forth on this for a while it seemed to me that the simplest, or least "invasive" if you will, solution would be if each `PartialEvaluator` instance had its *own* `PDFFunctionFactory` instance (since the latter is already passed to all of the required code). This way each `PDFFunctionFactory` instances could have a local `Function` cache, without it being necessary to provide a `LocalFunctionCache` instance manually at every `PDFFunctionFactory.{create, createFromArray}` call-site.

Obviously, with this patch, there's now (potentially) more `PDFFunctionFactory` instances than before when the entire document shared just one. However, each such instance is really quite small and it's also tied to a `PartialEvaluator` instance and those are *not* kept alive and/or cached. To reduce the impact of these changes, I've tried to make as many of these structures as possible *lazily initialized*, specifically:

 - The `PDFFunctionFactory`, on `PartialEvaluator` instances, since not all kinds of general parsing actually requires it. For example: `getTextContent` calls won't cause any `Function` to be parsed, and even some `getOperatorList` calls won't trigger `Function` parsing (if a page contains e.g. no Patterns or "complex" ColorSpaces).

 - The `LocalFunctionCache`, on `PDFFunctionFactory` instances, since only certain parsing requires it. Generally speaking, only e.g. Patterns, "complex" ColorSpaces, and/or (some) SoftMasks will trigger any `Function` parsing.

To put these changes into perspective, when loading/rendering all (14) pages of the default `tracemonkey.pdf` file there's now a total of 6 `PDFFunctionFactory` and 1 `LocalFunctionCache` instances created thanks to the lazy initialization.
(If you instead would keep the document-"global" `PDFFunctionFactory` instance and pass around `LocalFunctionCache` instances everywhere, the numbers for the `tracemonkey.pdf` file would be instead be something like 1 `PDFFunctionFactory` and 6 `LocalFunctionCache` instances.)
All-in-all, I thus don't think that the `PDFFunctionFactory` changes should be generally problematic.

With these changes, we can also modify (some) call-sites to pass in a `Reference` rather than the actual `Function` data. This is nice since `Function`s can also be `Streams`, which are not cached on the `XRef` instance (given their potential size), and this way we can avoid unnecessary lookups and thus save some additional time/resources.

Obviously I had intended to include (standard) benchmark results with these changes, but for reasons I don't really understand the test run-time (even with `master`) of the document in issue 2541 is quite a bit slower than in the development viewer.
However, logging the time it takes for the relevant `PDFFunctionFactory`/`PDFFunction ` parsing shows that it takes *approximately* `0.5 ms` for the `Function` in question. Looking up a cached `Function`, on the other hand, is *one order of magnitude faster* which does add up when the same `Function` is invoked close to 2000 times.
2020-07-04 00:55:18 +02:00
Jonas Jenwald
4a7e29865d [api-minor] Use the NodeCanvasFactory/NodeCMapReaderFactory classes as defaults in Node.js environments (issue 11900)
This moves, and slightly simplifies, code that's currently residing in the unit-test utils into the actual library, such that it's bundled with `GENERIC`-builds and used in e.g. the API-code.

As an added bonus, this also brings out-of-the-box support for CMaps in e.g. the Node.js examples.
2020-07-02 04:44:23 +02:00
Brendan Dahl
fe3df495cc
Merge pull request #12040 from wojtekmaj/replace-non-inclusive
Replace non-inclusive "whitelist" term with "allowlist"
2020-07-01 15:41:41 -07:00
Jonas Jenwald
fef24658e7 Adjust the heuristics used when dealing with rectangles, i.e. re operators, with zero width/height (issue 12010) 2020-07-02 00:02:49 +02:00
Wojciech Maj
78970bbbe1
Replace non-inclusive "whitelist" term with "allowlist" 2020-06-29 17:15:14 +02:00
Jonas Jenwald
28d2ada59c Attempt to detect inline images which contain "EI" sequence in the actual image data (issue 11124)
This should reduce the possibility of accidentally truncating some inline images, while *not* causing the "EI" detection to become significantly slower.[1]
There's obviously a possibility that these added checks are not sufficient to catch *every* single case of "EI" sequences within the actual inline image data, but without specific test-cases I decided against over-engineering the solution here.

*Please note:* The interpolation issues are somewhat orthogonal to the main issue here, which is the truncated image, and it's already tracked elsewhere.

---
[1] I've looked at the issue a few times, and this is the first approach that I was able to come up with that didn't cause *unacceptable* performance regressions in e.g. issue 2618.
2020-06-26 13:15:06 +02:00
Jonas Jenwald
b8e1352934 Stop passing in unnecessary parameters when parsing the Alternate entry of ICCBased ColorSpaces (PR 9659 follow-up)
With the changes made in PR 9659, `ColorSpace.fromIR` no longer takes a second `pdfFunctionFactory` parameter and there's thus one call-site that can be simplified.
2020-06-24 23:53:10 +02:00
Jonas Jenwald
19d7976483 Improve (local) caching of parsed ColorSpaces (PR 12001 follow-up)
This patch contains the following *notable* improvements:
 - Changes the `ColorSpace.parse` call-sites to, where possible, pass in a reference rather than actual ColorSpace data (necessary for the next point).
 - Adds (local) caching of `ColorSpace`s by `Ref`, when applicable, in addition the caching by name. This (generally) improves `ColorSpace` caching for e.g. the SMask code-paths.
 - Extends the (local) `ColorSpace` caching to also apply when handling Images and Patterns, thus further reducing unneeded re-parsing.
 - Adds a new `ColorSpace.parseAsync` method, almost identical to the existing `ColorSpace.parse` one, but returning a Promise instead (this simplifies some code in the `PartialEvaluator`).
2020-06-24 23:53:10 +02:00
Jonas Jenwald
51e87b9248 Add a proper LocalColorSpaceCache class, rather than piggybacking on the image one (PR 12001 follow-up)
This will allow caching of ColorSpaces by either `Name` *or* `Ref`, which doesn't really make sense for images, thus allowing (better) caching for ColorSpaces used with e.g. Images and Patterns.
2020-06-24 23:53:10 +02:00
Jonas Jenwald
e22bc483a5 Re-factor ColorSpace.parse to take a parameter object, rather than a bunch of (randomly) ordered parameters
Given the number of existing parameters, this will avoid needlessly unwieldy call-sites especially with upcoming changes in later patches.
2020-06-24 23:53:10 +02:00
Jonas Jenwald
f0708717a9 Move the fetchBuiltInCMap method to the PartialEvaluator.prototype
Defining this *inline* in the "constructor" looks slightly weird (I really don't know why I wrote it like that originally), and it can simply be changed to a regular method instead.
2020-06-24 17:29:47 +02:00
Tim van der Meij
c1cb9ee9fc
Merge pull request #12016 from Snuffleupagus/issue-8078
Tweak the `QueueOptimizer` to recognize `OPS.paintImageMaskXObject` operators as *repeated* when the "skew" transformation matrix elements are non-zero (issue 8078)
2020-06-21 19:38:27 +02:00
Jonas Jenwald
4cb0c032f3 Convert the PDFPageProxy.intentStates property from an Object to a Map
As can be seen in the code there's a handful of places where this structure needs to be iterated, something that becomes cumbersome when dealing with `Object`s. Hence, by changing this to a `Map` instead we can both simplify the code and avoid creating unnecessary closures.

Particularily the `PDFPageProxy._tryCleanup` method becomes a lot more readable, at least in my opinion.

Finally, since this property is intended to be "private" the name is adjusted to reflect that.
2020-06-21 17:02:42 +02:00
Jonas Jenwald
cabc2cc4fc Add a InternalRenderTask.completed getter and use it to simplify PDFPageProxy._destroy
This patch aims to simplify the `PDFPageProxy._destroy` method, by:
 - Replacing the unnecessary `forEach` with a "regular" `for`-loop instead.
 - Use a more appropriate variable name, since `intentState.renderTasks` contain instances of `InternalRenderTask`.
 - Move the "is rendering completed"-handling to a new `InternalRenderTask.completed` getter, to abstract away some (mostly) internal `InternalRenderTask` state.
2020-06-21 15:56:14 +02:00
Jonas Jenwald
a04a5d8325 Tweak the loop in ChunkedStreamManager.abort to clarify what's being iterated (PR 11985 follow-up)
In hindsight, using the `for (let [key, value] of myMap) { ... }`-format when we don't care about the `key` probably wasn't such a great idea. Since `Map`s have explicit support for iterating either `key`s or `value`s, we should probably use that instead here.
2020-06-21 11:29:05 +02:00
Jonas Jenwald
e18fa3fc45 Tweak the QueueOptimizer to recognize OPS.paintImageMaskXObject operators as *repeated* when the "skew" transformation matrix elements are non-zero (issue 8078)
*First of all, I should mention that my understanding of the finer details of the `QueueOptimizer` (and its related `CanvasGraphics` methods) is somewhat limited.*
Hence I'm not sure if there's actually a very good reason for *only* considering ImageMasks where the "skew" transformation matrix elements are zero as *repeated*, however simply looking at the code I just don't see why these elements cannot be non-zero as long as they are *all identical* for the ImageMasks.
Furthermore, looking at the *group* case (which is what we're currently falling back to), there's no particular limitation placed upon the transformation matrix elements.

While this patch obviously isn't enough to *completely* fix the issue, since there should be a visible Pattern rendered as well[1], it seem (at least to me) like enough of an improvement that submitting this is justified.
With these changes the referenced PDF document will no longer hang the *entire* browser, and rendering also finishes in a *reasonable* time (< 10 seconds for me) which seem fine given the *huge* number of identical inline images present.[2]

---
[1] Temporarily changing the Pattern to a solid color *does* render the correct/expected area, which suggests that the remaining problem is a pre-existing issue related to the Pattern-handling itself rather than the `QueueOptimizer` functionality.

[2] The document isn't exactly rendered immediately in e.g. Adobe Reader either.
2020-06-20 12:18:48 +02:00
Tim van der Meij
8cfdfb237a
Merge pull request #12005 from Snuffleupagus/cff-class
Convert the code in `src/core/cff_parser.js` to use ES6 classes
2020-06-17 23:30:28 +02:00
Jonas Jenwald
880a0a0f59 Convert the code in src/core/cff_parser.js to use ES6 classes
This removes multiple instances of `// eslint-disable-next-line no-shadow`, which our old pseudo-classes necessitated.

*Please note:* I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.
2020-06-16 12:33:21 +02:00
Jonas Jenwald
fb9b574f3d Convert the code in src/core/worker.js to use ES6 classes
This removes one instance of `// eslint-disable-next-line no-shadow`, which our old pseudo-classes necessitated.

*Please note:* I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.
2020-06-16 11:54:59 +02:00
Jonas Jenwald
87b089ba42 Lazily initialize, and cache, the regular expression used in CFFCompiler.encodeFloat
There's no particular reason for re-creating the regular expression over and over for every `encodeFloat` invocation, as far as I can tell.
2020-06-15 13:51:28 +02:00
Jonas Jenwald
517d92a121 Simplify the "is integer" checks in CFFCompiler.encodeNumber
The `isNaN` check is obviously redundant, since `NaN` is the only value that isn't equal to itself; see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/NaN#Examples

The `parseFloat`/`parseInt` comparison would make sense if the `value` ever contains a String, which however is never actually the case. Besides looking through the code, I've also run the entire test-suite locally with `assert(typeof value === "number", "encodeNumber");` added at the top of the method and there were no failures.

Hence we can simplify the "is integer" check a bit in the `CFFCompiler.encodeNumber` method.
2020-06-15 13:51:20 +02:00
Jonas Jenwald
5c39de805c Add local caching of ColorSpaces, by name, in PartialEvaluator.getOperatorList (issue 2504)
By caching parsed `ColorSpace`s, we thus don't need to re-parse the same data over and over which saves CPU cycles *and* reduces peak memory usage. (Obviously persistent memory usage *may* increase a tiny bit, but since the caching is done per `PartialEvaluator.getOperatorList` invocation and given that `ColorSpace` instances generally hold very little data this shouldn't be much of an issue.)
Furthermore, by caching `ColorSpace`s we can also lookup the already parsed ones *synchronously* during the `OperatorList` building, instead of having to defer to the event loop/microtask queue since the parsing is done asynchronously (such that error handling is easier).

Possible future improvements:
 - Cache/lookup parsed `ColorSpaces` used in `Pattern`s and `Image`s.
 - Attempt to cache *local* `ColorSpace`s by reference as well, in addition to only by name, assuming that there's documents where that would be beneficial and that it's not too difficult to implement.
 - Assuming there's documents that would benefit from it, also cache repeated `ColorSpace`s *globally* as well.

Given that we've never, until now, been doing *any* caching of parsed `ColorSpace`s and that even using a simple name-only *local* cache helps tremendously in pathological cases, I purposely decided against complicating the implementation too much initially.
Also, compared to parsing of `Image`s, simply creating a `ColorSpace` instance isn't that expensive (hence I'd be somewhat surprised if adding a *global* cache would help much).

---

This patch was tested using:
 - The default `tracemonkey` PDF file, which was included mostly to show that "normal" documents aren't negatively affected by these changes.
 - The PDF file from issue 2504, i.e. https://dl-ctlg.panasonic.com/jp/manual/sd/sd_rbm1000_0.pdf, where most pages will switch *thousands* of times between a handful of `ColorSpace`s.

with the following manifest file:
```
[
    {  "id": "tracemonkey",
       "file": "pdfs/tracemonkey.pdf",
       "md5": "9a192d8b1a7dc652a19835f6f08098bd",
       "rounds": 100,
       "type": "eq"
    },
    {  "id": "issue2504",
       "file": "../web/pdfs/issue2504.pdf",
       "md5": "",
       "rounds": 20,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
 - Overall
```
-- Grouped By browser, pdf, stat --
browser | pdf         | stat         | Count | Baseline(ms) | Current(ms) |  +/- |     %  | Result(P<.05)
------- | ----------- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | issue2504   | Overall      |   640 |          977 |         497 | -479 | -49.08 |        faster
firefox | issue2504   | Page Request |   640 |            3 |           4 |    1 |  59.18 |
firefox | issue2504   | Rendering    |   640 |          974 |         493 | -481 | -49.37 |        faster
firefox | tracemonkey | Overall      |  1400 |          116 |         111 |   -5 |  -4.43 |
firefox | tracemonkey | Page Request |  1400 |            2 |           2 |    0 |  -2.86 |
firefox | tracemonkey | Rendering    |  1400 |          114 |         109 |   -5 |  -4.47 |
```

 - Page-specific
```
-- Grouped By browser, pdf, page, stat --
browser | pdf         | page | stat         | Count | Baseline(ms) | Current(ms) |   +/- |      %  | Result(P<.05)
------- | ----------- | ---- | ------------ | ----- | ------------ | ----------- | ----- | ------- | -------------
firefox | issue2504   | 0    | Overall      |    20 |         2295 |        1268 | -1027 |  -44.76 |        faster
firefox | issue2504   | 0    | Page Request |    20 |            6 |           7 |     1 |   15.32 |
firefox | issue2504   | 0    | Rendering    |    20 |         2288 |        1260 | -1028 |  -44.93 |        faster
firefox | issue2504   | 1    | Overall      |    20 |         3059 |        2806 |  -252 |   -8.25 |        faster
firefox | issue2504   | 1    | Page Request |    20 |           11 |          14 |     3 |   23.25 |        slower
firefox | issue2504   | 1    | Rendering    |    20 |         3047 |        2792 |  -255 |   -8.37 |        faster
firefox | issue2504   | 2    | Overall      |    20 |          411 |         295 |  -116 |  -28.20 |        faster
firefox | issue2504   | 2    | Page Request |    20 |            2 |          42 |    40 | 1897.62 |
firefox | issue2504   | 2    | Rendering    |    20 |          409 |         253 |  -156 |  -38.09 |        faster
firefox | issue2504   | 3    | Overall      |    20 |          736 |         299 |  -437 |  -59.34 |        faster
firefox | issue2504   | 3    | Page Request |    20 |            2 |           2 |     0 |    0.00 |
firefox | issue2504   | 3    | Rendering    |    20 |          734 |         297 |  -437 |  -59.49 |        faster
firefox | issue2504   | 4    | Overall      |    20 |          356 |         458 |   102 |   28.63 |
firefox | issue2504   | 4    | Page Request |    20 |            1 |           2 |     1 |   57.14 |        slower
firefox | issue2504   | 4    | Rendering    |    20 |          354 |         455 |   101 |   28.53 |
firefox | issue2504   | 5    | Overall      |    20 |         1381 |         765 |  -616 |  -44.59 |        faster
firefox | issue2504   | 5    | Page Request |    20 |            3 |           5 |     2 |   50.00 |        slower
firefox | issue2504   | 5    | Rendering    |    20 |         1378 |         760 |  -617 |  -44.81 |        faster
firefox | issue2504   | 6    | Overall      |    20 |          757 |         299 |  -459 |  -60.57 |        faster
firefox | issue2504   | 6    | Page Request |    20 |            2 |           5 |     3 |  150.00 |        slower
firefox | issue2504   | 6    | Rendering    |    20 |          755 |         294 |  -462 |  -61.11 |        faster
firefox | issue2504   | 7    | Overall      |    20 |          394 |         302 |   -92 |  -23.39 |        faster
firefox | issue2504   | 7    | Page Request |    20 |            2 |           1 |    -1 |  -34.88 |        faster
firefox | issue2504   | 7    | Rendering    |    20 |          392 |         301 |   -91 |  -23.32 |        faster
firefox | issue2504   | 8    | Overall      |    20 |         2875 |         979 | -1896 |  -65.95 |        faster
firefox | issue2504   | 8    | Page Request |    20 |            1 |           2 |     0 |   11.11 |
firefox | issue2504   | 8    | Rendering    |    20 |         2874 |         978 | -1896 |  -65.99 |        faster
firefox | issue2504   | 9    | Overall      |    20 |          700 |         332 |  -368 |  -52.60 |        faster
firefox | issue2504   | 9    | Page Request |    20 |            3 |           2 |     0 |   -4.00 |
firefox | issue2504   | 9    | Rendering    |    20 |          698 |         329 |  -368 |  -52.78 |        faster
firefox | issue2504   | 10   | Overall      |    20 |         3296 |         926 | -2370 |  -71.91 |        faster
firefox | issue2504   | 10   | Page Request |    20 |            2 |           2 |     0 |  -18.75 |
firefox | issue2504   | 10   | Rendering    |    20 |         3293 |         924 | -2370 |  -71.96 |        faster
firefox | issue2504   | 11   | Overall      |    20 |          524 |         197 |  -327 |  -62.34 |        faster
firefox | issue2504   | 11   | Page Request |    20 |            2 |           3 |     1 |   58.54 |
firefox | issue2504   | 11   | Rendering    |    20 |          522 |         194 |  -328 |  -62.81 |        faster
firefox | issue2504   | 12   | Overall      |    20 |          752 |         369 |  -384 |  -50.98 |        faster
firefox | issue2504   | 12   | Page Request |    20 |            3 |           2 |    -1 |  -36.51 |        faster
firefox | issue2504   | 12   | Rendering    |    20 |          749 |         367 |  -382 |  -51.05 |        faster
firefox | issue2504   | 13   | Overall      |    20 |          679 |         487 |  -193 |  -28.38 |        faster
firefox | issue2504   | 13   | Page Request |    20 |            4 |           2 |    -2 |  -48.68 |        faster
firefox | issue2504   | 13   | Rendering    |    20 |          676 |         485 |  -191 |  -28.28 |        faster
firefox | issue2504   | 14   | Overall      |    20 |          474 |         283 |  -191 |  -40.26 |        faster
firefox | issue2504   | 14   | Page Request |    20 |            2 |           4 |     2 |   78.57 |
firefox | issue2504   | 14   | Rendering    |    20 |          471 |         279 |  -192 |  -40.79 |        faster
firefox | issue2504   | 15   | Overall      |    20 |          860 |         618 |  -241 |  -28.05 |        faster
firefox | issue2504   | 15   | Page Request |    20 |            2 |           3 |     0 |   10.87 |
firefox | issue2504   | 15   | Rendering    |    20 |          857 |         616 |  -241 |  -28.15 |        faster
firefox | issue2504   | 16   | Overall      |    20 |          389 |         243 |  -147 |  -37.71 |        faster
firefox | issue2504   | 16   | Page Request |    20 |            2 |           2 |     0 |    2.33 |
firefox | issue2504   | 16   | Rendering    |    20 |          387 |         240 |  -147 |  -37.94 |        faster
firefox | issue2504   | 17   | Overall      |    20 |         1484 |         672 |  -812 |  -54.70 |        faster
firefox | issue2504   | 17   | Page Request |    20 |            2 |           3 |     1 |   37.21 |
firefox | issue2504   | 17   | Rendering    |    20 |         1482 |         669 |  -812 |  -54.84 |        faster
firefox | issue2504   | 18   | Overall      |    20 |          575 |         252 |  -323 |  -56.12 |        faster
firefox | issue2504   | 18   | Page Request |    20 |            2 |           2 |     0 |  -16.22 |
firefox | issue2504   | 18   | Rendering    |    20 |          573 |         251 |  -322 |  -56.24 |        faster
firefox | issue2504   | 19   | Overall      |    20 |          517 |         227 |  -290 |  -56.08 |        faster
firefox | issue2504   | 19   | Page Request |    20 |            2 |           2 |     0 |   21.62 |
firefox | issue2504   | 19   | Rendering    |    20 |          515 |         225 |  -290 |  -56.37 |        faster
firefox | issue2504   | 20   | Overall      |    20 |          668 |         670 |     2 |    0.31 |
firefox | issue2504   | 20   | Page Request |    20 |            4 |           2 |    -1 |  -34.29 |
firefox | issue2504   | 20   | Rendering    |    20 |          664 |         667 |     3 |    0.49 |
firefox | issue2504   | 21   | Overall      |    20 |          486 |         309 |  -177 |  -36.44 |        faster
firefox | issue2504   | 21   | Page Request |    20 |            2 |           2 |     0 |   16.13 |
firefox | issue2504   | 21   | Rendering    |    20 |          484 |         307 |  -177 |  -36.60 |        faster
firefox | issue2504   | 22   | Overall      |    20 |          543 |         267 |  -276 |  -50.85 |        faster
firefox | issue2504   | 22   | Page Request |    20 |            2 |           2 |     0 |   10.26 |
firefox | issue2504   | 22   | Rendering    |    20 |          541 |         265 |  -276 |  -51.07 |        faster
firefox | issue2504   | 23   | Overall      |    20 |         3246 |         871 | -2375 |  -73.17 |        faster
firefox | issue2504   | 23   | Page Request |    20 |            2 |           3 |     1 |   37.21 |
firefox | issue2504   | 23   | Rendering    |    20 |         3243 |         868 | -2376 |  -73.25 |        faster
firefox | issue2504   | 24   | Overall      |    20 |          379 |         156 |  -223 |  -58.83 |        faster
firefox | issue2504   | 24   | Page Request |    20 |            2 |           2 |     0 |   -2.86 |
firefox | issue2504   | 24   | Rendering    |    20 |          378 |         154 |  -223 |  -59.10 |        faster
firefox | issue2504   | 25   | Overall      |    20 |          176 |         127 |   -50 |  -28.19 |        faster
firefox | issue2504   | 25   | Page Request |    20 |            2 |           1 |     0 |  -15.63 |
firefox | issue2504   | 25   | Rendering    |    20 |          175 |         125 |   -49 |  -28.31 |        faster
firefox | issue2504   | 26   | Overall      |    20 |          181 |         108 |   -74 |  -40.67 |        faster
firefox | issue2504   | 26   | Page Request |    20 |            3 |           2 |    -1 |  -39.13 |        faster
firefox | issue2504   | 26   | Rendering    |    20 |          178 |         105 |   -72 |  -40.69 |        faster
firefox | issue2504   | 27   | Overall      |    20 |          208 |         104 |  -104 |  -49.92 |        faster
firefox | issue2504   | 27   | Page Request |    20 |            2 |           2 |     1 |   48.39 |
firefox | issue2504   | 27   | Rendering    |    20 |          206 |         102 |  -104 |  -50.64 |        faster
firefox | issue2504   | 28   | Overall      |    20 |          241 |         111 |  -131 |  -54.16 |        faster
firefox | issue2504   | 28   | Page Request |    20 |            2 |           2 |    -1 |  -33.33 |
firefox | issue2504   | 28   | Rendering    |    20 |          239 |         109 |  -130 |  -54.39 |        faster
firefox | issue2504   | 29   | Overall      |    20 |          321 |         196 |  -125 |  -39.05 |        faster
firefox | issue2504   | 29   | Page Request |    20 |            1 |           2 |     0 |   17.86 |
firefox | issue2504   | 29   | Rendering    |    20 |          319 |         194 |  -126 |  -39.35 |        faster
firefox | issue2504   | 30   | Overall      |    20 |          651 |         271 |  -380 |  -58.41 |        faster
firefox | issue2504   | 30   | Page Request |    20 |            1 |           2 |     1 |   50.00 |
firefox | issue2504   | 30   | Rendering    |    20 |          649 |         269 |  -381 |  -58.60 |        faster
firefox | issue2504   | 31   | Overall      |    20 |         1635 |         647 |  -988 |  -60.42 |        faster
firefox | issue2504   | 31   | Page Request |    20 |            1 |           2 |     0 |   30.43 |
firefox | issue2504   | 31   | Rendering    |    20 |         1634 |         645 |  -988 |  -60.49 |        faster
firefox | tracemonkey | 0    | Overall      |   100 |           51 |          51 |     0 |    0.02 |
firefox | tracemonkey | 0    | Page Request |   100 |            1 |           1 |     0 |   -4.76 |
firefox | tracemonkey | 0    | Rendering    |   100 |           50 |          50 |     0 |    0.12 |
firefox | tracemonkey | 1    | Overall      |   100 |           97 |          91 |    -5 |   -5.52 |        faster
firefox | tracemonkey | 1    | Page Request |   100 |            3 |           3 |     0 |   -1.32 |
firefox | tracemonkey | 1    | Rendering    |   100 |           94 |          88 |    -5 |   -5.73 |        faster
firefox | tracemonkey | 2    | Overall      |   100 |           40 |          40 |     0 |    0.50 |
firefox | tracemonkey | 2    | Page Request |   100 |            1 |           1 |     0 |    3.16 |
firefox | tracemonkey | 2    | Rendering    |   100 |           39 |          39 |     0 |    0.54 |
firefox | tracemonkey | 3    | Overall      |   100 |           62 |          62 |    -1 |   -0.94 |
firefox | tracemonkey | 3    | Page Request |   100 |            1 |           1 |     0 |   17.05 |
firefox | tracemonkey | 3    | Rendering    |   100 |           61 |          61 |    -1 |   -1.11 |
firefox | tracemonkey | 4    | Overall      |   100 |           56 |          58 |     2 |    3.41 |
firefox | tracemonkey | 4    | Page Request |   100 |            1 |           1 |     0 |   15.31 |
firefox | tracemonkey | 4    | Rendering    |   100 |           55 |          57 |     2 |    3.23 |
firefox | tracemonkey | 5    | Overall      |   100 |           73 |          71 |    -2 |   -2.28 |
firefox | tracemonkey | 5    | Page Request |   100 |            2 |           2 |     0 |   12.20 |
firefox | tracemonkey | 5    | Rendering    |   100 |           71 |          69 |    -2 |   -2.69 |
firefox | tracemonkey | 6    | Overall      |   100 |           85 |          69 |   -16 |  -18.73 |        faster
firefox | tracemonkey | 6    | Page Request |   100 |            2 |           2 |     0 |   -9.90 |
firefox | tracemonkey | 6    | Rendering    |   100 |           83 |          67 |   -16 |  -18.97 |        faster
firefox | tracemonkey | 7    | Overall      |   100 |           65 |          64 |     0 |   -0.37 |
firefox | tracemonkey | 7    | Page Request |   100 |            1 |           1 |     0 |  -11.94 |
firefox | tracemonkey | 7    | Rendering    |   100 |           63 |          63 |     0 |   -0.05 |
firefox | tracemonkey | 8    | Overall      |   100 |           53 |          54 |     1 |    2.04 |
firefox | tracemonkey | 8    | Page Request |   100 |            1 |           1 |     0 |   17.02 |
firefox | tracemonkey | 8    | Rendering    |   100 |           52 |          53 |     1 |    1.82 |
firefox | tracemonkey | 9    | Overall      |   100 |           79 |          73 |    -6 |   -7.86 |        faster
firefox | tracemonkey | 9    | Page Request |   100 |            2 |           2 |     0 |  -15.14 |
firefox | tracemonkey | 9    | Rendering    |   100 |           77 |          71 |    -6 |   -7.86 |        faster
firefox | tracemonkey | 10   | Overall      |   100 |          545 |         519 |   -27 |   -4.86 |        faster
firefox | tracemonkey | 10   | Page Request |   100 |           14 |          13 |     0 |   -3.56 |
firefox | tracemonkey | 10   | Rendering    |   100 |          532 |         506 |   -26 |   -4.90 |        faster
firefox | tracemonkey | 11   | Overall      |   100 |           42 |          41 |    -1 |   -2.50 |
firefox | tracemonkey | 11   | Page Request |   100 |            1 |           1 |     0 |  -27.42 |        faster
firefox | tracemonkey | 11   | Rendering    |   100 |           41 |          40 |    -1 |   -1.75 |
firefox | tracemonkey | 12   | Overall      |   100 |          350 |         332 |   -18 |   -5.16 |        faster
firefox | tracemonkey | 12   | Page Request |   100 |            3 |           3 |     0 |   -5.17 |
firefox | tracemonkey | 12   | Rendering    |   100 |          347 |         329 |   -18 |   -5.15 |        faster
firefox | tracemonkey | 13   | Overall      |   100 |           31 |          31 |     0 |    0.52 |
firefox | tracemonkey | 13   | Page Request |   100 |            1 |           1 |     0 |    4.95 |
firefox | tracemonkey | 13   | Rendering    |   100 |           30 |          30 |     0 |    0.20 |
```
2020-06-14 11:51:45 +02:00
Jonas Jenwald
4b51bcc733 Ensure that PDFImage.buildImage won't accidentally swallow errors, e.g. from ColorSpace parsing (issue 6707, PR 11601 follow-up)
Because of a really stupid `Promise`-related mistake on my part, when re-factoring `PDFImage.buildImage` during the `NativeImageDecoder` removal, we're no longer re-throwing errors occuring during image parsing/decoding as intended.
The result is that some (fairly) corrupt documents will never finish loading, and unfortunately there were apparently no sufficiently corrupt images in the test-suite to catch this.
2020-06-13 15:02:37 +02:00
Jonas Jenwald
10f31bb46d Change the dependencies property, on OperatorList instances, from an Object to a Set
Since this is completely internal functionality, and furthermore limited to the worker-thread, this change should thus not have any observable effect for e.g. an API-user.
2020-06-11 16:27:13 +02:00
Jonas Jenwald
02a1d0f6c5 Remove the unused intent/pageIndex properties from OperatorList instances (PR 11069 follow-up)
Apparently I completely overlooked the fact that with the changes in PR 11069 these properties became *completely* unused, and consequently they thus ought to be removed.
2020-06-11 16:05:38 +02:00
Jonas Jenwald
00d45fce33 Update SVGGraphics to account for globally cached images (PR 11912 follow-up)
Since there's (essentially) no tests for the SVG-backend, these changes didn't make in into PR 11912 when the code in the `src/display/canvas.js` file was modified.
2020-06-10 15:31:26 +02:00
Jonas Jenwald
88fdb482b0 Move the isEmptyObj helper function from src/shared/util.js to test/unit/test_utils.js
Since this helper function is no longer used anywhere in the main code-base, but only in a couple of unit-tests, it's thus being moved to a more appropriate spot.

Finally, the implementation of `isEmptyObj` is also tweaked slightly by removing the manual loop.
2020-06-09 17:50:16 +02:00
Jonas Jenwald
159e13c4e4 Convert the ChunkedStreamManager.promisesByRequest property to a Map
Compared to regular `Object`s, `Map`s have a number of advantageous properties: Of particular importance in this case is the built-in iteration support, and that determining if the structure is empty is easy.
2020-06-09 17:50:14 +02:00
Jonas Jenwald
dda7a5d1b7 Convert the ChunkedStreamManager.requestsByChunk property to a Map
Compared to regular `Object`s, `Map`s have a number of advantageous properties: Of particular importance in this case is the built-in iteration support, and that determining if the structure is empty is easy.
2020-06-09 17:50:11 +02:00
Jonas Jenwald
17e23ffb33 Convert the ChunkedStreamManager.chunksNeededByRequest property to a Map (containing Sets)
Compared to regular `Object`s, `Map`s (and `Set`s) have a number of advantageous properties: Of particular importance in this case is the built-in iteration support, and that determining if the structure is empty is easy.
2020-06-09 17:49:53 +02:00
Tim van der Meij
a4fa4554d6
Merge pull request #11977 from timvandermeij/refset
Convert the `RefSet` primitive to a proper class and use a `Set` internally
2020-06-07 23:15:35 +02:00
Tim van der Meij
4c2e056796
Convert the RefSet primitive to a proper class and use a Set internally
The `RefSet` primitive predates ES6, so that most likely explains why an
object is used internally to track the entries. However, nowadays we can
use built-in JavaScript sets for this purpose. Built-in types are often
more efficient/optimized and using it makes the code a bit more clear
since we don't have to assign `true` to keys anymore just to indicate
their presence.
2020-06-07 19:01:29 +02:00
Jonas Jenwald
466d10f6fc Remove unused methods from NetworkManager, in src/display/network.js
Both of the removed methods were added in PR 2719, however they are no longer used:
 - It appears that `hasPendingRequests` was never used at all, even from the beginning.
 - The only general PDF.js library usage of `abortAllRequests` was removed in PR 6879, which is now four years ago. (Originally the Firefox-specific network implementation, see https://searchfox.org/mozilla-central/source/browser/extensions/pdfjs/content/PdfJsNetwork.jsm, was shared with the `src/display/network.js` file and *there* this method is used. However, since all of the Firefox-specific code now lives directly in mozilla-central, that's not relevant for the removal in this patch.)
2020-06-07 16:03:32 +02:00
Tim van der Meij
c97200ff59
Merge pull request #11974 from Snuffleupagus/sendImgData
A couple of small image caching/sending improvements
2020-06-07 13:53:26 +02:00
Jonas Jenwald
df7d8c74ca Extract the actual sending of image data from the PartialEvaluator.buildPaintImageXObject method
After PRs 10727 and 11912, the code responsible for sending the decoded image data to the main-thread has now become a fair bit more involved the previously.
To reduce the amount of duplication here, the actual code responsible for sending the data is thus extracted into a new helper method instead.
2020-06-07 12:01:51 +02:00
Jonas Jenwald
aff0d56326 Remove an unnecessary RefSetCache.prototype.has() call from GlobalImageCache.getData
We can simply attempt to get the data *directly*, and instead check the result, rather than first checking if it exists.
2020-06-07 11:56:04 +02:00
Takashi Tamura
7acb112ca9 Optimization:
Avoid calling Math.pow if possible when calculating the transfer
function of the CalRGB color space since calling Math.pow is expensive.

If the value of color is larger than the threshold, 0.99554525,
the final result of the transform is larger that 254.5
since ((1 + 0.055) * 0.99554525 ** (1 / 2.4) - 0.055) * 255 === 254.50000003134699
2020-06-07 13:17:18 +09:00
Jonas Jenwald
b7272a34eb Change the loadedChunks property, on ChunkedStream instances, from an Array to a Set
In the old code the use of an Array meant that we had to *manually* track the `numChunksLoaded` property, given that simply using the Array `length` wouldn't have worked since there's no guarantee that the data is loaded in order when e.g. range requests are in use.

Tracking closely related state *separately* in this manner never seem like a good idea, and we can now instead utilize a Set to avoid that.
2020-06-05 15:03:06 +02:00
Carlos Rodríguez
802aa14a99 Jpeg encoded with RGB -instead of YCbCr- write the components index as "RGB" in ASCII to say it so
On ISO/IEC 10918-6:2013 (E), section 6.1: (http://www.itu.int/rec/T-REC-T.872-201206-I/en)

"Images encoded with three components are assumed to be RGB data encoded as YCbCr unless the image contains an APP14 marker segment as specified in 6.5.3, in which case the colour encoding is considered either RGB or YCbCr according to the application data of the APP14 marker segment"

But common jpeg libraries consider RGB too if components index are ASCII R (0x52), G (0x47) and B (0x42): https://stackoverflow.com/questions/50798014/determining-color-space-for-jpeg/50861048

Issue #11931
2020-06-04 15:08:47 +02:00
Jonas Jenwald
64378fc366 [api-minor] Remove the deprecated PDFDocumentProxy.getOpenActionDestination method (PR 11644 follow-up)
This method has been printing a `deprecated` warning in two releases, hence it should hopefully be safe to remove now.
2020-06-02 12:28:00 +02:00
Jonas Jenwald
af815e417d Ensure that that we don't attempt to cache *inline* images in the GlobalImageCache (PR 11912 follow-up)
Since *inline* images, i.e. those defined inside of `/Contents` streams, are by their very definition page-specific it thus seem like a good idea to actually enforce that they won't accidentally end up in the `GlobalImageCache`.
2020-06-01 01:00:30 +02:00
Tim van der Meij
fe5689705d
Merge pull request #11930 from Snuffleupagus/LocalImageCache
Improve the *local* image caching in `PartialEvaluator.getOperatorList`
2020-05-28 00:12:37 +02:00
Jonas Jenwald
4d60430b1c Add comments to the export list in the src/pdf.js file (PR 11914 follow-up)
When converting this file to use standard `import`/`export` statements, I sorted the exports in the same order as the imports to simplify things.

However, looking at the list of `export`ed properties it probably doesn't hurt to add a couple of comments to clarify from where specifically the `export`s originated.
2020-05-27 13:57:25 +02:00
Jonas Jenwald
4ef547f400 Improve caching of empty /XObjects in the PartialEvaluator.getTextContent method
It turns out that `getTextContent` suffers from *similar* problems with repeated images as `getOperatorList`; please see the previous patch.

While only `/XObject` resources of the `Form`-type will actually be *parsed* in `PartialEvaluator.getTextContent`, since those are the only ones that may contain text, we're still forced to fetch repeated image resources where the name differs (but not the reference).
Obviously it's less bad in this case, since we're not actually parsing `/XObject`s of e.g. the `Image`-type. However, you still want to avoid even fetching the data whenever possible, since `Stream`s are not cached on the `XRef` instance (given their potential size) and the lookup can thus be somewhat expensive in general.

To address these issues, we can simply replace the exiting name-only caching in `PartialEvaluator.getTextContent` with a new cache backed by `LocalImageCache` instead.
2020-05-26 09:49:01 +02:00
Jonas Jenwald
d62c9181bd Improve the *local* image caching in PartialEvaluator.getOperatorList
Currently the local `imageCache`, as used in `PartialEvaluator.getOperatorList`, will miss certain cases of repeated images because the caching is *only* done by name (usually using a format such as e.g. "Im0", "Im1", ...).
However, in some PDF documents the `/XObject` dictionaries many contain hundreds (or even thousands) of distinctly named images, despite them referring to only a handful of actual image objects (via the XRef table).

With these changes we'll now cache *local* images using both name and (where applicable) reference, thus improving re-usage of images resources even further.

This patch was tested using the PDF file from [bug 857031](https://bugzilla.mozilla.org/show_bug.cgi?id=857031), i.e. https://bug857031.bmoattachments.org/attachment.cgi?id=732270, with the following manifest file:
```
[
    {  "id": "bug857031",
       "file": "../web/pdfs/bug857031.pdf",
       "md5": "",
       "rounds": 250,
       "lastPage": 1,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
firefox | 0    | Overall      |   250 |         2749 |        2656 | -93 | -3.38 |        faster
firefox | 0    | Page Request |   250 |            3 |           4 |   1 | 50.14 |        slower
firefox | 0    | Rendering    |   250 |         2746 |        2652 | -94 | -3.44 |        faster
```

While this is certainly an improvement, since we now avoid re-parsing ~1000 images on the first page, all of the image resources are small enough that the total rendering time doesn't improve that much in this particular case.

In pathological cases, such as e.g. the PDF document in issue 4958, the improvements with this patch can be very significant. Looking for example at page 2, from issue 4958, the rendering time drops from ~60 seconds with `master` to ~30 seconds with this patch (obviously still slow, but it really showcases the potential of this patch nicely).

Finally, note that there's also potential for additional improvements by re-using `LocalImageCache` instances for e.g. /XObject data of the `Form`-type. However, given that recent changes in this area I purposely didn't want to complicate *this* patch more than necessary.
2020-05-25 15:14:14 +02:00
Tim van der Meij
f14215da37
Implement fill opacity for shading patterns in the SVG back-end
In the PDF file from the issue below, the fill alpha (`ca`) is set
before drawing the circles using the `setGState` operator. Doing so
causes the global alpha to be set on the canvas' context for the canvas
back-end, but this was not handled in the SVG back-end. This patch fixes
that by taking the fill opacity into account when drawing shading
patterns in the same way as done elsewhere so it is only included if the
value is non-default.

Fixes #11812.
2020-05-24 14:25:40 +02:00
Tim van der Meij
3b615e4ca3
Merge pull request #11601 from Snuffleupagus/rm-nativeImageDecoderSupport
[api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js`
2020-05-23 15:33:46 +02:00
Jonas Jenwald
8af70d75aa Allow GlobalImageCache.clear to, optionally, only remove the actual data (PR 11912 follow-up)
When "Cleanup" is triggered, you obviously need to remove all globally cached data on *both* the main- and worker-threads.
However, the current the implementation of the `GlobalImageCache.clear` method also means that we lose *all* information about which images were cached and not just their data. This thus has the somewhat unfortunate side-effect of requiring images, which were previously known to be "global", to *again* having to reach `NUM_PAGES_THRESHOLD` before being cached again.

To avoid doing unnecessary parsing after "Cleanup", we can thus let `GlobalImageCache.clear` keep track of which images were cached while still removing their actual data. This should not have any significant impact on memory usage, since the only extra thing being kept is a `RefSetCache` (essentially an Object) with a couple of `Set`s containing only integers.
2020-05-23 11:30:24 +02:00
Jonas Jenwald
56ebf01ae0 Avoid hanging the worker-thread for CMap data with ridiculously large ranges (issue 11922)
This patch was inspired by ad2b64f124/xpdf/CharCodeToUnicode.cc (L480-L484)
2020-05-22 15:23:17 +02:00
Jonas Jenwald
18e0b10d3c [api-minor] Remove the disableCreateObjectURL option from the getDocument parameters, since it's now unused in the API
With the changes in previous patches, the `disableCreateObjectURL` option/functionality is no longer used for anything in the API and/or in the Worker code.

Note however that there's some functionality, mainly related to file loading/downloading, in the GENERIC version of the default viewer which still depends on this option.
Hence the `disableCreateObjectURL` option (and related compatibility code) is moved into the viewer, see e.g. `web/app_options.js`, such that it's still available in the default viewer.
2020-05-22 00:22:48 +02:00
Jonas Jenwald
cc4cc8b11b Remove the, now unused, releaseImageResources helper function
With the changes in the previous patch, this is now dead code which should thus be removed.
2020-05-22 00:22:48 +02:00
Jonas Jenwald
0351852d74 [api-minor] Decode all JPEG images with the built-in PDF.js decoder in src/core/jpg.js
Currently some JPEG images are decoded by the built-in PDF.js decoder in `src/core/jpg.js`, while others attempt to use the browser JPEG decoder. This inconsistency seem unfortunate for a number of reasons:

 - It adds, compared to the other image formats supported in the PDF specification, a fair amount of code/complexity to the image handling in the PDF.js library.

 - The PDF specification support JPEG images with features, e.g. certain ColorSpaces, that browsers are unable to decode natively. Hence, determining if a JPEG image is possible to decode natively in the browser require a non-trivial amount of parsing. In particular, we're parsing (part of) the raw JPEG data to extract certain marker data and we also need to parse the ColorSpace for the JPEG image.

 - While some JPEG images may, for all intents and purposes, appear to be natively supported there's still cases where the browser may fail to decode some JPEG images. In order to support those cases, we've had to implement a fallback to the PDF.js JPEG decoder if there's any issues during the native decoding. This also means that it's no longer possible to simply send the JPEG image to the main-thread and continue parsing, but you now need to actually wait for the main-thread to indicate success/failure first.
   In practice this means that there's a code-path where the worker-thread is forced to wait for the main-thread, while the reverse should *always* be the case.

 - The native decoding, for anything except the *simplest* of JPEG images, result in increased peak memory usage because there's a handful of short-lived copies of the JPEG data (see PR 11707).
Furthermore this also leads to data being *parsed* on the main-thread, rather than the worker-thread, which you usually want to avoid for e.g. performance and UI-reponsiveness reasons.

 - Not all environments, e.g. Node.js, fully support native JPEG decoding. This has, historically, lead to some issues and support requests.

 - Different browsers may use different JPEG decoders, possibly leading to images being rendered slightly differently depending on the platform/browser where the PDF.js library is used.

Originally the implementation in `src/core/jpg.js` were unable to handle all of the JPEG images in the test-suite, but over the last couple of years I've fixed (hopefully) all of those issues.
At this point in time, there's two kinds of failure with this patch:

 - Changes which are basically imperceivable to the naked eye, where some pixels in the images are essentially off-by-one (in all components), which could probably be attributed to things such as different rounding behaviour in the browser/PDF.js JPEG decoder.
   This type of "failure" accounts for the *vast* majority of the total number of changes in the reference tests.

 - Changes where the JPEG images now looks *ever so slightly* blurrier than with the native browser decoder. For quite some time I've just assumed that this pointed to a general deficiency in the `src/core/jpg.js` implementation, however I've discovered when comparing two viewers side-by-side that the differences vanish at higher zoom levels (usually around 200% is enough).
   Basically if you disable [this downscaling in canvas.js](8fb82e939c/src/display/canvas.js (L2356-L2395)), which is what happens when zooming in, the differences simply vanish!
   Hence I'm pretty satisfied that there's no significant problems with the `src/core/jpg.js` implementation, and the problems are rather tied to the general quality of the downscaling algorithm used. It could even be seen as a positive that *all* images now share the same downscaling behaviour, since this actually fixes one old bug; see issue 7041.
2020-05-22 00:22:48 +02:00
Jonas Jenwald
dda6626f40 Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]

Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.

In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.

Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]

*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.

---
[1] There's e.g. PDF documents that use the same image as background on all pages.

[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.

[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-21 18:13:45 +02:00
Jonas Jenwald
8d56a69e74 Reduce usage of SystemJS, in the development viewer, even further
With these changes SystemJS is now only used, during development, on the worker-thread and in the unit/font-tests, since Firefox is currently missing support for worker modules; please see https://bugzilla.mozilla.org/show_bug.cgi?id=1247687

Hence all the JavaScript files in the `web/` and `src/display/` folders are now loaded *natively* by the browser (during development) using standard `import` statements/calls, thanks to a nice `import-maps` polyfill.

*Please note:* As soon as https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 is fixed in Firefox, we should be able to remove all traces of SystemJS and thus finally be able to use every possible modern JavaScript feature.
2020-05-20 13:36:52 +02:00
Jonas Jenwald
e2c3312416 Convert the src/pdf.js and src/pdf.worker.js files to use standard import/export statements
As part of reducing our reliance on SystemJS in the development viewer, this patch replaces usage of `require` statements with modern standards `import`/`export` statements instead.

If we want to try and move forward with reducing usage of SystemJS, we don't have much choice but to make these kind changes (despite what prior test-results showed, however I'm no longer able to reproduce the issues locally).
2020-05-20 13:18:23 +02:00
Jonas Jenwald
d4d933538b Re-factor setPDFNetworkStreamFactory, in src/display/api.js, to also accept an asynchronous function
As part of trying to reduce the usage of SystemJS in the development viewer, this patch is a necessary step that will allow removal of some `require` statements.

Currently this uses `SystemJS.import` in non-PRODUCTION mode, but it should be possible to replace those with standard *dynamic* `import` calls in the future.
2020-05-20 13:18:18 +02:00
Jonas Jenwald
ec0ab91a2b Reduce the usage of require statements in code-paths not protected by pre-processor and/or run-time checks
This replaces some additional `require`/`exports` usage with standard `import`/`export` statements instead.
Hence another, small, part in the effort to reduce the reliance on SystemJS-specific functionality in the development viewer.
2020-05-14 15:57:49 +02:00
Jonas Jenwald
73636e052a Handle errors individually for each annotation in the _parsedAnnotations getter
While working on PR 11872, it occurred to me that it probably wouldn't be a bad idea to change the `_parsedAnnotations` getter to handle errors individually for each annotation. This way, one broken/corrupt annotation won't prevent the rest of them from being e.g. fetched through the API.
2020-05-09 12:33:39 +02:00
Jonas Jenwald
e1f340a0c2 Use the ESLint no-restricted-syntax rule to ensure that assert is always called with two arguments
Having `assert` calls without a message string isn't very helpful when debugging, and it turns out that it's easy enough to make use of ESLint to enforce better `assert` call-sites.
In a couple of cases the `assert` calls were changed to "regular" throwing of errors instead, since that seemed more appropriate.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-restricted-syntax
2020-05-05 13:40:05 +02:00
Tim van der Meij
491904d30a
Merge pull request #11872 from Snuffleupagus/issue-11871
Gracefully handle annotation parsing errors in `Page.getOperatorList` (issue 11871)
2020-05-04 22:19:27 +02:00
Brendan Dahl
b1be33c96f Add more categories of unsupported features.
Fixes #11815
2020-05-04 11:02:16 -07:00
Jonas Jenwald
4aabd063fc Gracefully handle annotation parsing errors in Page.getOperatorList (issue 11871)
This should ensure that a page will always render successfully, even if there's errors during the Annotation fetching/parsing.
Additionally the `OperatorList.addOpList` method is also adjusted to ignore invalid data, to make it slightly more robust.
2020-05-04 17:09:48 +02:00
roccobeno
371e699905
Include the name for interactive form elements
We already rendered the name for radio buttons, but it was missing for
all other interactive form elements. This commit adds that so that
values entered in form elements can be read based on the element name.
2020-04-27 16:55:35 +02:00
Jonas Jenwald
911c33f025 Move the maybeValidDimensions check, used with JPEG images, to occur earlier (PR 11523 follow-up)
Given that the `NativeImageDecoder.{isSupported, isDecodable}` methods require both dictionary lookups *and* ColorSpace parsing, in hindsight it actually seems more reasonable to the `JpegStream.maybeValidDimensions` checks *first*.
2020-04-26 12:07:46 +02:00
Jonas Jenwald
c355f91d2e [api-minor] Immediately release the font.data property once the font been attached to the DOM (PR 11777 follow-up)
*This patch implements https://github.com/mozilla/pdf.js/pull/11777#issuecomment-609741348*

This extends the work from PR 11773 and 11777 further, by immediately releasing the `font.data` property once the font been attached to the DOM. By not unnecessarily holding onto this data on the main-thread, we'll thus reduce the memory usage of fonts even further (especially beneficial in longer documents with composite fonts).

The new behaviour is controlled by the recently added `fontExtraProperties` API option (adding a new option just for this patch didn't seem necessary), since there's one edge-case in the SVG renderer where the `font.data` property is necessary (see the `pdf2svg` example).

Note that while the default viewer does run clean-up with an idle timeout, that timeout will be reset whenever rendering occurs *or* when scrolling happens in the viewer. In practice this means that unless the user doesn't interact with the viewer in *any* way during an extended period of time, currently set to 30 seconds, the `PDFDocumentProxy.cleanup` method will never be called and font resources will thus not be cleaned-up.
2020-04-23 13:04:57 +02:00
Jonas Jenwald
cdc60402f6 [api-minor] Change PageViewport to throw when the rotation is not a multiple of 90 degrees
As evident from the code, `PageViewport` only supports[1] `rotation` values which are a multiple of 90 degrees. Besides it being somewhat difficult to imagine meaningful use-cases for a non-multiple of 90 degrees `rotation`, the code also becomes both simpler and more efficient by not having to consider arbitrary `rotation` values.

However, any invalid rotation will *silently* fallback to assume zero `rotation` which probably isn't great for e.g. `PDFPageProxy.getViewport` in the API. Hence this patch, which will now enforce that only valid `rotation` values are accepted.

---
[1] As far as I can tell, from looking through the history, nothing else has ever been supported either.
2020-04-22 15:19:13 +02:00
Jonas Jenwald
695140728a [src/core/fonts.js] Improve the validateOS2Table function
Rather than creating a new `Stream` just to validate the OS/2 TrueType table, it's simpler/better to just pass in a reference to the font data and use that instead (similar to other TrueType helper functions).
2020-04-19 11:25:25 +02:00
Jonas Jenwald
033d27fc25 [src/core/fonts.js] Replace some unnecessary Stream.getUint16() calls with Stream.skip(2) instead
There's a handful of cases in the code where the intention is simply to advance the `Stream` position, but rather than only doing that the code instead fetches/computes a Uint16 value (and without using the result for anything).
2020-04-19 11:18:20 +02:00
Jonas Jenwald
4fae1ac5c4 [src/core/fonts.js] Replace some unnecessary Stream.getBytes(...) calls with Stream.skip(...) instead
There's a handful of cases in the code where the intention is simply to advance the `Stream` position, but rather than only doing that the code instead fetches the bytes in question (and without using the result for anything).
2020-04-19 11:18:15 +02:00
Tim van der Meij
44da021012
Merge pull request #11814 from tamuratak/svg_text_vertical
Support the vertical writing mode with SVG backend.
2020-04-18 00:34:39 +02:00
Tim van der Meij
7b23476e61
Merge pull request #11818 from Snuffleupagus/eslint-dot-notation
Enable the `dot-notation` ESLint rule
2020-04-18 00:19:47 +02:00
Jonas Jenwald
518d26dfb4 [src/core/jpg.js] Remove redundant marker validation at the end of the decodeScan function (PR 11805 follow-up)
With the MCU parsing changes made in PR 11805, the final marker validation is no longer necessary before the `decodeScan` function returns.
2020-04-17 15:40:02 +02:00
Jonas Jenwald
1cc3dbb694 Enable the dot-notation ESLint rule
*Please note:* These changes were done automatically, using the `gulp lint --fix` command.

This rule is already enabled in mozilla-central, see https://searchfox.org/mozilla-central/rev/567b68b8ff4b6d607ba34a6f1926873d21a7b4d7/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#103-104

The main advantage, besides improved consistency, of this rule is that it reduces the size of the code (by 3 bytes for each case). In the PDF.js code-base there's close to 8000 instances being fixed by the `dot-notation` ESLint rule, which end up reducing the size of even the *built* files significantly; the total size of the `gulp mozcentral` build target changes from `3 247 456` to `3 224 278` bytes, which is a *reduction* of `23 178` bytes (or ~0.7%) for a completely mechanical change.

A large number of these changes affect the (large) lookup tables used on the worker-thread, but given that they are still initialized lazily I don't *think* that the new formatting this patch introduces should undo any of the improvements from PR 6915.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/dot-notation
2020-04-17 12:24:46 +02:00
Takashi Tamura
32f9cabf82 Support the vertical writing mode with SVG backend. 2020-04-17 09:01:51 +09:00
Tim van der Meij
96923eb2a6
Merge pull request #11805 from Snuffleupagus/issue-11794
Always skip over any additional, unexpected, RSTx (restart) markers in corrupt JPEG images (issue 11794)
2020-04-16 00:08:58 +02:00
Tim van der Meij
a7def05aa1
Merge pull request #11810 from Snuffleupagus/fromCodePoint-followup
A couple of small `String.fromCodePoint` improvements (PR 11698 and 11769 follow-up)
2020-04-16 00:08:16 +02:00
Jonas Jenwald
44b4a74f48 A couple of small String.fromCodePoint improvements (PR 11698 and 11769 follow-up)
- Add a reduced test-case for issue 11768, to prevent future regressions.
   (Given that PR 11769 is only a work-around, rather than a proper solution, it may not be entirely accurate for the issue to be closed as fixed.)

 - Add more validation of the charCode, as found by the heuristics, in `PartialEvaluator._buildSimpleFontToUnicode` to prevent future issues.
2020-04-15 13:45:08 +02:00
Jonas Jenwald
06f6f8719f Always skip over any additional, unexpected, RSTx (restart) markers in corrupt JPEG images (issue 11794) 2020-04-14 23:27:08 +02:00
Jonas Jenwald
26cffd03b0 [src/core/jpg.js] Remove some redundant marker validation during the MCU parsing in the decodeScan function
Some of the code in `src/core/jpg.js` is fairly old, and has with time become unnecessary when the surrounding code has been updated to handle various types of JPEG corruption.
In particular the `if (!marker || marker <= 0xff00) { ... }` branch is now dead code, since:

 - The `!marker` case can no longer happen, since we would already have broken out of the loop thanks to the `!fileMarker` branch a handful of lines above.

 - The `marker <= 0xff00` case can also no longer happen, since the `findNextFileMarker` function validate markers much more thoroughly (by checking `marker >= 0xffc0 && marker <= 0xfffe`). Hence we'd again have broken out of the loop via the `!fileMarker` branch above when no valid marker was found.
2020-04-14 23:27:08 +02:00
Jonas Jenwald
746eaf3154 [api-minor] Fix the return value of PDFDocumentProxy.getViewerPreferences when no viewer preferences are present (PR 10738 follow-up)
This patch fixes yet another instalment in the never-ending series of "what the *bleep* was I thinking", by changing the `PDFDocumentProxy.getViewerPreferences` method to return `null` by default.
Not only is this method now consistent with many other API methods, for the data not present case, but it also avoids having to e.g. loop through an object to check if it's actually empty (note the old unit-test).
2020-04-14 23:25:50 +02:00
Jonas Jenwald
426945b480 Update Prettier to version 2.0
Please note that these changes were done automatically, using `gulp lint --fix`.

Given that the major version number was increased, there's a fair number of (primarily whitespace) changes; please see https://prettier.io/blog/2020/03/21/2.0.0.html
In order to reduce the size of these changes somewhat, this patch maintains the old "arrowParens" style for now (once mozilla-central updates Prettier we can simply choose the same formatting, assuming it will differ here).
2020-04-14 12:28:14 +02:00
Jonas Jenwald
ecbcde7ff3 Fail early, in modern GENERIC builds, if certain required browser functionality is missing (PR 11771 follow-up)
With two kind of builds now being produced, with/without translation/polyfills, it's unfortunately somewhat easy for users to accidentally pick the wrong one.

In the case where a user would attempt to use a modern build of PDF.js in an older browser, such as e.g. IE11, the failure would be immediate when the code is loaded (given the use of unsupported ECMAScript features).
However in some browsers/environments, a modern PDF.js build may load correctly and thus *appear* to function, only to fail for e.g. certain API calls. To hopefully lessen the support burden, and to try and improve things overall, this patch adds additional checks to ensure that a modern build of PDF.js cannot be used in browsers/environments which lack native support for `Promise.allSettled`.[1] Hence we'll fail early, with an error message telling users to pick an ES5-compatible build instead.

*Please note:* While it's probably too early to tell if this will be a widespread issue, it's possible that this is the sort of patch that *may* warrant being `git cherry-pick`ed onto the current beta version (v2.4.456).

---
[1] This was a fairly recent addition to the web platform, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/allSettled#Browser_compatibility
2020-04-11 13:42:03 +02:00
Jonas Jenwald
91efde5246 Add a heuristic to scale even single-char text, when the horizontal/vertical scaling differs significantly (issue 11713)
At this point in time, compared to when the "ignore single-char" code was added, we *should* generally be doing a much better job of combining text into as few chunks as possible.
However, there's still bad cases where we're not able to combine text as much as one would like, which is why I'm *not* proposing to simply measure/scale all text. Instead this patch will to only measure/scale single-char text in cases where the horizontal/vertical scale is off significantly, since that's were you'd expect bad text-selection behaviour otherwise.

Note that most of the movement caused by this patch is with Type3 fonts, which is a somewhat special font type and one where our current text-selection behaviour is probably the least good.
2020-04-07 00:36:23 +02:00
Tim van der Meij
70c54ab9d9
Merge pull request #11746 from Snuffleupagus/issue-11740
Create the glyph mapping correctly for composite Type1, i.e. CIDFontType0, fonts (issue 11740)
2020-04-07 00:10:12 +02:00
Jonas Jenwald
2d46230d23 [api-minor] Change Font.exportData to, by default, stop exporting properties which are completely unused on the main-thread and/or in the API (PR 11773 follow-up)
For years now, the `Font.exportData` method has (because of its previous implementation) been exporting many properties despite them being completely unused on the main-thread and/or in the API.
This is unfortunate, since among those properties there's a number of potentially very large data-structures, containing e.g. Arrays and Objects, which thus have to be first structured cloned and then stored on the main-thread.

With the changes in this patch, we'll thus by default save memory for *every* `Font` instance created (there can be a lot in longer documents). The memory savings obviously depends a lot on the actual font data, but some approximate figures are: For non-embedded fonts it can save a couple of kilobytes, for simple embedded fonts a handful of kilobytes, and for composite fonts the size of this auxiliary can even be larger than the actual font program itself.

All-in-all, there's no good reason to keep exporting these properties by default when they're unused. However, since we cannot be sure that every property is unused in custom implementations of the PDF.js library, this patch adds a new `getDocument` option (named `fontExtraProperties`) that still allows access to the following properties:

 - "cMap": An internal data structure, only used with composite fonts and never really intended to be exposed on the main-thread and/or in the API.
   Note also that the `CMap`/`IdentityCMap` classes are a lot more complex than simple Objects, but only their "internal" properties survive the structured cloning used to send data to the main-thread. Given that CMaps can often be *very* large, not exporting them can also save a fair bit of memory.

 - "defaultEncoding": An internal property used with simple fonts, and used when building the glyph mapping on the worker-thread. Considering how complex that topic is, and given that not all font types are handled identically, exposing this on the main-thread and/or in the API most likely isn't useful.

 - "differences": An internal property used with simple fonts, and used when building the glyph mapping on the worker-thread. Considering how complex that topic is, and given that not all font types are handled identically, exposing this on the main-thread and/or in the API most likely isn't useful.

 - "isSymbolicFont": An internal property, used during font parsing and building of the glyph mapping on the worker-thread.

  - "seacMap": An internal map, only potentially used with *some* Type1/CFF fonts and never intended to be exposed in the API. The existing `Font.{charToGlyph, charToGlyphs}` functionality already takes this data into account when handling text.

 - "toFontChar": The glyph map, necessary for mapping characters to glyphs in the font, which is built upon the various encoding information contained in the font dictionary and/or font program. This is not directly used on the main-thread and/or in the API.

 - "toUnicode": The unicode map, necessary for text-extraction to work correctly, which is built upon the ToUnicode/CMap information contained in the font dictionary, but not directly used on the main-thread and/or in the API.

 - "vmetrics": An array of width data used with fonts which are composite *and* vertical, but not directly used on the main-thread and/or in the API.

 - "widths": An array of width data used with most fonts, but not directly used on the main-thread and/or in the API.
2020-04-06 11:47:09 +02:00
Jonas Jenwald
8770ca3014 Make the decryptAscii helper function, in src/core/type1_parser.js, slightly more efficient
By slicing the Uint8Array directly, rather than using the prototype and a `call` invocation, the runtime of `decryptAscii` is decreased slightly (~30% based on quick logging).
The `decryptAscii` function is still less efficient than `decrypt`, however ASCII encoded Type1 font programs are sufficiently rare that it probably doesn't matter much (we've only seen *two* examples, issue 4630 and 11740).
2020-04-06 11:21:02 +02:00
Jonas Jenwald
938d519192 Create the glyph mapping correctly for composite Type1, i.e. CIDFontType0, fonts (issue 11740)
This updates `Type1Font.getGlyphMapping` with a code-path "borrowed" from `CFFFont.getGlyphMapping`.
2020-04-06 11:21:02 +02:00
Jonas Jenwald
6a8c591301 Improve detection of binary/ASCII eexec encrypted Type1 font programs in Type1Parser (issue 11740)
The PDF document, in the referenced issue, actually contains ASCII-encoded Type1 data which we currently *incorrectly* identify as binary.

According to the specification, see https://www-cdf.fnal.gov/offline/PostScript/T1_SPEC.PDF#[{%22num%22%3A203%2C%22gen%22%3A0}%2C{%22name%22%3A%22XYZ%22}%2C87%2C452%2Cnull], the current checks are insufficient to decide between binary/ASCII encoded Type1 font programs.
2020-04-06 11:21:02 +02:00
Jonas Jenwald
2619272d73 Change the signature of TranslatedFont, and convert it to a proper class
In preparation for the next patch, this changes the signature of `TranslatedFont` to take an object rather than individual parameters. This also, in my opinion, makes the call-sites easier to read since it essentially provides a small bit of documentation of the arguments.

Finally, since it was necessary to touch `TranslatedFont` anyway it seemed like a good idea to also convert it to a proper `class`.
2020-04-05 20:53:48 +02:00
Tim van der Meij
0400109b87
Merge pull request #11773 from Snuffleupagus/Font-exportData-1
[api-minor] Change `Font.exportData` to use an explicit white-list of exportable properties, and stop exporting internal/unused properties
2020-04-05 20:50:33 +02:00
Jonas Jenwald
59f54b946d Ensure that all Font instances have the vertical property set to a boolean
Given that the `vertical` property is always accessed on the main-thread, ensuring that the property is explicitly defined seems like the correct thing to do since it also avoids boolean casting elsewhere in the code-base.
2020-04-05 16:27:50 +02:00
Jonas Jenwald
c5e1fd3fde Use "standard" shadowing in the Font.spaceWidth method
With `Font.exportData` now only exporting white-listed properties, there should no longer be any reason to not use standard shadowing in the `Font.spaceWidth` method.
Furthermore, considering the amount of other changes to the code-base over the years it's not even clear to me that the special-case was necessary any more (regardless of the preceding patches).
2020-04-05 16:27:50 +02:00
Jonas Jenwald
a5e4cccf13 [api-minor] Prevent Font.exportData from exporting internal/unused properties
A number of *internal* font properties, which only make sense on the worker-thread, were previously exported. Some of these properties could also contain potentially large Arrays/Objects, which thus unnecessarily increases memory usage since we're forced to copy these to the main-thread and also store them there.

This patch stops exporting the following font properties:

 - "_shadowWidth": An internal property, which was never intended to be exported.

 - "charsCache": An internal cache, which was never intended to be exported and doesn't make any sense on the main-thread. Furthermore, by the time `Font.exportData` is called it's usually `undefined` or a mostly empty Object as well.

 - "cidEncoding": An internal property used with (some) composite fonts.
   As can be seen in the `PartialEvaluator.translateFont` method, `cidEncoding` will only be assigned a value when the font dictionary has an "Encoding" entry which is a `Name` (and not in the `Stream` case, since those obviously cannot be cloned).
   All-in-all this property doesn't really make sense on the main-thread and/or in the API, and note also that the resulting `cMap` property is (partially) available already.

 - "fallbackToUnicode": An internal map, part of the heuristics used to improve text-selection in (some) badly generated PDF documents with simple fonts. This was never intended to be exposed on the main-thread and/or in the API.

 - "glyphCache": An internal cache, which was never intended to be exported and which doesn't make any sense on the main-thread. Furthermore, by the time `Font.exportData` is called it's usually a mostly empty Object as well.

 - "isOpenType": An internal property, used only during font parsing on the worker-thread. In the *very* unlikely event that an API consumer actually needs that information, then `fontType` should be a (generally) much better property to use.

Finally, in the (hopefully) unlikely event that any of these properties become necessary on the main-thread, re-adding them to the white-list is easy to do.
2020-04-05 16:27:50 +02:00
Jonas Jenwald
664f7de540 Change Font.exportData to use an explicit white-list of exportable properties
This patch addresses an existing, and very long standing, TODO in the code such that it's no longer possible to send arbitrary/unnecessary font properties to the main-thread.
Furthermore, by having a white-list it's also very easy to see *exactly* which font properties are being exported.

Please note that in its current form, the list of exported properties contains *every* possible enumerable property that may exist in a `Font` instance.
In practice no single font will contain *all* of these properties, and e.g. embedded/non-embedded/Type3 fonts will all differ slightly with respect to what properties are being defined. Hence why only explicitly set properties are included in the exported data, to avoid half of them being `undefined`, which however should not be a problem for any existing consumer (since they'd already need to handle those cases).

Since a fair number of these font properties are completely *internal* functionality, and doesn't make any sense to expose on the main-thread and/or in the API, follow-up patch(es) will be required to trim down the list. (I purposely included all properties here for brevity and future documentation purposes.)
2020-04-05 16:27:48 +02:00
Jonas Jenwald
87142a635e Ensure that Font.charToGlyph won't fail because String.fromCodePoint is given an invalid code point (issue 11768)
*Please note:* This patch on its own is *not* sufficient to address the underlying problem in the referenced issue, hence why no test-case is included since the *actual* bug still needs to be fixed.

As can be seen in the specification, https://tc39.es/ecma262/#sec-string.fromcodepoint, `String.fromCodePoint` will throw a RangeError for invalid code points.

In the event that a CMap, in a composite font, contains invalid data and/or we fail to parse it correctly, it's thus possible that the glyph mapping that we build end up with entires that cause `String.fromCodePoint` to throw and thus `Font.charToGlyph` to break.
If that happens, as is the case in issue 11768, significant portions of a page/document may fail to render which seems very unfortunate.

While this patch doesn't fix the underlying problem, it's hopefully deemed useful not only for the referenced issue but also to prevent similar bugs in the future.
2020-04-03 09:49:50 +02:00
Jonas Jenwald
710704508c Fail early, in modern GENERIC builds, if certain required browser functionality is missing (issue 11762)
With two kind of builds now being produced, with/without translation/polyfills, it's unfortunately somewhat easy for users to accidentally pick the wrong one.

In the case where a user would attempt to use a modern build of PDF.js in an older browser, such as e.g. IE11, the failure would be immediate when the code is loaded (given the use of unsupported ECMAScript features).
However in some browsers/environments, in particular Node.js, a modern PDF.js build may load correctly and thus *appear* to function, only to fail for e.g. certain API calls. To hopefully lessen the support burden, and to try and improve things overall, this patch adds checks to ensure that a modern build of PDF.js cannot be used in browsers/environments which lack native support for critical functionality (such as e.g. `ReadableStream`). Hence we'll fail early, with an error message telling users to pick an ES5-compatible build instead.

To ensure that we actually test things better especially w.r.t. usage of the PDF.js library in Node.js environments, the `gulp npm-test` task as used by Node.js/Travis was changed (back) to test an ES5-compatible build.
(Since the bots still test the code as-is, without transpilation/polyfills, this shouldn't really be a problem as far as I can tell.)
As part of these changes there's now both `gulp lib` and `gulp lib-es5` build targets, similar to e.g. the generic builds, which thanks to some re-factoring only required adding a small amount of code.

*Please note:* While it's probably too early to tell if this will be a widespread issue, it's possible that this is the sort of patch that *may* warrant being `git cherry-pick`ed onto the current beta version (v2.4.456).
2020-04-01 19:42:48 +02:00
Jonas Jenwald
14c999e3ee Remove the unused sizes and encoding properties on Font instances
The `sizes` property doesn't appear to have been used ever since the code was first split into main/worker-threads, which is so many years ago that I wasn't able to easily find exactly in which PR/commit it became unused.

The `encoding` property is always assigned the `properties.baseEncoding` value, however the `PartialEvaluator` doesn't actually compute/set that value any more. Again it was difficult to determine when it became unused, but it's been that way for years.
2020-03-27 10:12:01 +01:00
Jonas Jenwald
dcb16af968 Whitelist closure related cases to address the remaining no-shadow linting errors
Given the way that "classes" were previously implemented in PDF.js, using regular functions and closures, there's a fair number of false positives when the `no-shadow` ESLint rule was enabled.

Note that while *some* of these `eslint-disable` statements can be removed if/when the relevant code is converted to proper `class`es, we'll probably never be able to get rid of all of them given our naming/coding conventions (however I don't really see this being a problem).
2020-03-25 11:57:12 +01:00
Jonas Jenwald
1d2f787d6a Enable the ESLint no-shadow rule
This rule is *not* currently enabled in mozilla-central, but it appears commented out[1] in the ESLint definition file; see https://searchfox.org/mozilla-central/rev/c80fa7258c935223fe319c5345b58eae85d4c6ae/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#238-239

Unfortunately this rule is, for fairly obvious reasons, impossible to `--fix` automatically (even partially) and each case thus required careful manual analysis.
Hence this ESLint rule is, by some margin, probably the most difficult one that we've enabled thus far. However, using this rule does seem like a good idea in general since allowing variable shadowing could lead to subtle (and difficult to find) bugs or at the very least confusing code.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-shadow

---
[1] Most likely, a very large number of lint errors have prevented this rule from being enabled thus far.
2020-03-25 11:56:05 +01:00
Tim van der Meij
475fa1f97f
Merge pull request #11744 from janpe2/cff-glyph-zero
The first glyph in CFF CIDFonts must be named 0 instead of ".notdef"
2020-03-24 23:52:21 +01:00
Tim van der Meij
292b77fe7b
Merge pull request #11707 from Snuffleupagus/issue-11694
Always prefer the PDF.js JPEG decoder for very large images, in order to reduce peak memory usage (issue 11694)
2020-03-24 23:51:31 +01:00
Tim van der Meij
f85105379e
Merge pull request #11738 from Snuffleupagus/no-shadow-src-core
Remove variable shadowing from the JavaScript files in the `src/core/` folder
2020-03-24 23:10:37 +01:00
Jani Pehkonen
a22c0eab48 The first glyph in CFF CIDFonts must be named 0 instead of ".notdef"
Fixes #11718 in which the `ff` ligature glyph is at index zero in a CFF font. Beacuse this is a CIDFont, glyph names are CIDs, which are integers. Thus the string `".notdef"` is not correct. The rest of the charset data is already parsed correctly as integers when the boolean argument `cid` is true.
2020-03-24 15:56:50 +02:00
Jonas Jenwald
216cbca16c Remove variable shadowing from the JavaScript files in the src/core/ folder
*This is part of a series of patches that will try to split PR 11566 into smaller chunks, to make reviewing more feasible.*

Once all the code has been fixed, we'll be able to eventually enable the ESLint no-shadow rule; see https://eslint.org/docs/rules/no-shadow
2020-03-23 18:28:30 +01:00
Jonas Jenwald
5eabe08c74 Exclude the setPDFNetworkStreamFactory function from the built API docs
Please note that the `setPDFNetworkStreamFactory` functionality isn't exposed in the public API, i.e. not listed among the exports in the `src/pdf.js` file, and that even if it were it wouldn't really be useful considering that none of the `PDFNetworkStream`/`PDFFetchStream`/`PDFNodeStream` classes are exported either.
2020-03-23 16:41:06 +01:00
Jonas Jenwald
c3c197d87a Remove old API methods which were previously converted to throwing (PR 11219 follow-up)
These methods were deprecated already in PDF.js version `2.1.266`, see PRs 10246 and 10369, and were converted to throw `Error`s upon invocation in PDF.js version `2.4.456`, see PR 11219.
Hence it ought to be possible to remove these methods now.
2020-03-23 16:41:03 +01:00
Jonas Jenwald
3539a17d2a Remove variable shadowing from the JavaScript files in the src/display/ folder
*This is part of a series of patches that will try to split PR 11566 into smaller chunks, to make reviewing more feasible.*

Once all the code has been fixed, we'll be able to eventually enable the ESLint no-shadow rule; see https://eslint.org/docs/rules/no-shadow
2020-03-20 23:09:41 +01:00
Tim van der Meij
6ecc9fae1c
Merge pull request #11720 from Snuffleupagus/eslint-no-unsanitized
Update the `eslint-plugin-no-unsanitized` package to the latest version
2020-03-20 21:04:24 +01:00
Jonas Jenwald
62a9c26cda Always prefer the PDF.js JPEG decoder for very large images, in order to reduce peak memory usage (issue 11694)
When JPEG images are decoded by the browser, on the main-thread, there's a handful of short-lived copies of the image data; see c3f4690bde/src/display/api.js (L2364-L2408)
That code thus becomes quite problematic for very big JPEG images, since it increases peak memory usage a lot during decoding. In the referenced issue there's a couple of JPEG images whose dimensions are `10006 x 7088` (i.e. ~68 mega-pixels), which causes the *peak* memory usage to increase by close to `1 GB` (i.e. one giga-byte) in my testing.

By letting the PDF.js JPEG decoder, rather than the browser, handle very large images the *peak* memory usage is considerably reduced and the allocated memory also seem to be reclaimed faster.

*Please note:* This will lead to movement in some existing `eq` tests.
2020-03-20 16:37:19 +01:00
Jonas Jenwald
b02be3b268 Update the eslint-plugin-no-unsanitized package to the latest version 2020-03-20 11:25:39 +01:00
Jonas Jenwald
1cd9d5a8fd Remove the unused wideChars property on Font instances
This property was added in PR 1599 (almost eight years ago), but has been unused ever since PR 3674 (six and a half years ago).
2020-03-20 10:37:32 +01:00
Tim van der Meij
228a591ca3
Merge pull request #11716 from Snuffleupagus/private-PDFPageProxy-pageIndex
[api-minor] Change the pageIndex, on `PDFPageProxy` instances, to a private property
2020-03-19 22:35:29 +01:00
Jonas Jenwald
ae2900e510 [api-minor] Change the pageIndex, on PDFPageProxy instances, to a private property
This property has never been documented and/or *intentionally* exposed through the API, instead the `PDFPageProxy.pageNumber` property is the documented/intended API to use here.
Hence pageIndex is changed to a "private" property on `PDFPageProxy` instances, and internal API functionality is also updated to *consistently* use `this._pageIndex` rather than a mix of formats.
2020-03-19 15:47:11 +01:00
Jonas Jenwald
e011be037e Enable the prefer-exponentiation-operator ESLint rule
Please see https://eslint.org/docs/rules/prefer-exponentiation-operator for additional information.
2020-03-19 12:41:25 +01:00
Tim van der Meij
1bc5cef2b5
Merge pull request #11698 from Snuffleupagus/issue-11697
Don't accidentally accept invalid glyphNames which *appear* to follow the Cdd{d}/cdd{d} format in `PartialEvaluator._buildSimpleFontToUnicode` (issue 11697)
2020-03-15 13:36:09 +01:00
Tim van der Meij
4dc1058ceb
Merge pull request #11553 from tamuratak/svg_texthscale
Fix the horizontal scaling of texts with SVG backend. #10988
2020-03-15 13:25:08 +01:00
Tim van der Meij
aa3e5a2b8f
Merge pull request #11644 from Snuffleupagus/openAction
[api-minor] Add more general OpenAction support (PR 10334 follow-up, issue 11642)
2020-03-15 13:16:37 +01:00
Jonas Jenwald
15e8692eff Don't accidentally accept invalid glyphNames which *appear* to follow the Cdd{d}/cdd{d} format in PartialEvaluator._buildSimpleFontToUnicode (issue 11697)
The /Differences array of the problematic font contains a `/c.1` entry, which is consequently detected as a *possible* Cdd{d}/cdd{d} glyphName by the existing heuristics.
Because of how the base 10 conversion is implemented, which is necessary for the base 16 special case, the parsed charCode becomes `0.1` thus causing `String.fromCodePoint` to throw since that obviously isn't a valid code point.

To fix the referenced issue, and to hopefully prevent similar ones in the future, the patch adds *additional* validation of the charCode found by the heuristics.
2020-03-13 23:35:47 +01:00
Jonas Jenwald
c5f67300e9 Rename the isSpace helper function to isWhiteSpace
Trying to enable the ESLint rule `no-shadow`, against the `master` branch, would result in a fair number of errors in the `Glyph` class in `src/core/fonts.js`.
Since the glyphs are exposed through the API, we can't very well change the `isSpace` property on `Glyph` instances. Thus the best approach seems, at least to me, to simply rename the `isSpace` helper function to `isWhiteSpace` which shouldn't cause any issues given that it's only used in the `src/core/` folder.
2020-03-12 11:36:59 +01:00
Jonas Jenwald
e4758beaaa Move IsLittleEndianCached and IsEvalSupportedCached to src/shared/util.js
Rather than duplicating the lookup and caching in multiple files, it seems easier to simply move all of this functionality into `src/shared/util.js` instead.
This will also help avoid a bunch of ESLint errors once the `no-shadow` rule is eventually enabled.
2020-03-12 11:36:26 +01:00
Jonas Jenwald
3adbba55b2 Limit the number of warning messages printed by any one Lexer.getHexString invocation
*This patch fixes something that's annoyed me every now and then over the years, when debugging/fixing corrupt PDF documents.*

For corrupt PDF documents where `Lexer.getHexString` encounters invalid characters, there's very rarely just a handful of them. In practice it's not uncommon for there to be many hundreds, or even many thousands, invalid hex characters found.
Not only is the resulting console warning spam utterly useless in these cases, there's often enough of it that performance may even suffer; hence this patch which limits the amount of messages that any one `Lexer.getHexString` invocation may print.
2020-03-09 13:34:53 +01:00
Jonas Jenwald
65e6ea2cb2 Prevent lookup errors in PartialEvaluator.hasBlendModes from breaking all parsing/rendering of a page (issue 11678)
The PDF document in question is *corrupt*, since it contains an XObject with a truncated dictionary and where the stream contents start without a "stream" operator.
2020-03-09 12:00:12 +01:00
Tim van der Meij
1a97c142b3
Merge pull request #11523 from Snuffleupagus/issue-10880
Add a heuristic, in `src/core/jpg.js`, to handle JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10880)
2020-03-06 23:03:09 +01:00
Tim van der Meij
1bb25f5cb8
Merge pull request #11673 from Snuffleupagus/FontLoader-bind-more-await
Update the `FontLoader.bind` method to avoid explicitly returning `undefined`
2020-03-06 22:51:39 +01:00
Jonas Jenwald
7d4be08dad Update the FontLoader.bind method to avoid explicitly returning undefined
The only reason for the `return undefined;` lines was to appease the ESLint `consistent-return` rule, but that's not actually necessary if you make use of the fact that the method is `async` and that we can thus await the Promise rather than returning it.
2020-03-06 17:45:24 +01:00
Jonas Jenwald
160cfc4084 Slightly simplify the lookup of data in Dict.{get, getAsync, has}
Note that `Dict.set` will only be called with values returned through `Parser.getObj`, and thus indirectly via `Lexer.getObj`. Since neither of those methods will ever return `undefined`, we can simply assert that that's the case when inserting data into the `Dict` and thus get rid of `in` checks when doing the data lookups.
In this case, since `Dict.set` is fairly hot, the patch utilizes an *inline check* and when necessary a direct call to `unreachable` to not affect performance of `gulp server/test` too much (rather than always just calling `assert`).

For very large and complex PDF files this will help performance *slightly*, since `Dict.{get, getAsync, has}` is called *a lot* during parsing in the worker.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 250,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   250 |         2838 |        2820 | -18 | -0.65 |        faster
Firefox | Page Request |   250 |            1 |           2 |   0 | 11.92 |        slower
Firefox | Rendering    |   250 |         2837 |        2818 | -19 | -0.65 |        faster
```
2020-03-06 14:12:14 +01:00
Jonas Jenwald
01fb309a2a [api-minor] Add more general OpenAction support (PR 10334 follow-up, issue 11642)
This patch deprecates the existing `getOpenActionDestination` API method, in favor of a better and more general `getOpenAction` method instead. (For now JavaScript actions, related to printing, are still handled as before.)

By clearly separating "regular" Print actions from the JavaScript handling, it's thus possible to get rid of the somewhat annoying and strictly incorrect warning when the viewer loads.
2020-03-06 13:03:00 +01:00
Tim van der Meij
c95b9b1e17
Merge pull request #11653 from Snuffleupagus/ensureStateFont
Ensure that there's always a setFont (Tf) operator before text rendering operators (issue 11651)
2020-03-03 23:33:13 +01:00
Jani Pehkonen
71e7686950 Fix Type1 font parsing when .notdef is not at index zero
Fixes #11477
The PDF draws many space characters but the embedded fonts don't have a glyph named `space`, so `.notdef` should be drawn instead. PDF.js assumed that Type1 fonts define `.notdef` as the first glyph (index 0). However, now the fonts have the glyph `A` at index 0 and `.notdef` is the last one, so `A` appears where spaces are expected.

Because the rest of the font machinery in `core/fonts.js` assumes `.notdef` is at index zero, it's easiest to modify `core/type1_parser.js` so that it "repairs" fonts and makes sure `.notdef` is at index 0.
2020-03-03 21:55:51 +02:00
Jonas Jenwald
65e514e063 Ensure that there's always a setFont (Tf) operator before text rendering operators (issue 11651)
The PDF document in question is *corrupt*, since it contains multiple instances of incorrect operators.
We obviously don't want to slow down parsing of *all* documents (since most are valid), just to accommodate a particular bad PDF generator, hence the reason for the inline check before calling the `ensureStateFont` method.
2020-03-03 10:05:18 +01:00
Jonas Jenwald
1ad65cf405 Simplify the BaseFontLoader.isFontLoadingAPISupported getter
It's no longer necessary to special-case this getter in the `GenericFontLoader` case, since the GENERIC build hasn't been using `mozPrintCallback` for years now (furthermore Firefox 63 is really old as well).
2020-03-02 23:14:48 +01:00
Takashi Tamura
d6b67cd28a Fix the horizontal scaling of texts with SVG backend. #10988 2020-03-02 14:54:41 +09:00
Takashi Tamura
d8c9f119b0 Fix the vertical writing mode with horizontal scaling. #11555.
It is not valid to multiply textHScale when the writing mode is vertical.

See 9.4.4 Text Space Details, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1694762
2020-02-29 07:48:29 +09:00
Tim van der Meij
e1586016c5
Merge pull request #11577 from Snuffleupagus/Pages-tree-refs
Prevent circular references in the /Pages tree
2020-02-27 23:36:11 +01:00
Tim van der Meij
965ebe63fd
Merge pull request #11540 from tamuratak/charspacing
Fix text spacing with vertical fonts. #7687 and #11526.
2020-02-26 22:26:27 +01:00
Jonas Jenwald
c55d30a715 Use the same non-embedded Wingdings fallback for fonts named "Wingdings-Regular" too (PR 5463 follow-up, issue 11451)
This patch extends the existing heuristics, which are really the best that we can do in general for these kinds of non-embedded *and* non-standard fonts.

Furthermore, this patch also tries to improve the copy-and-paste behaviour for non-embedded Wingdings fonts by also using the `ZapfDingbatsEncoding` in this case.

*Note:* I'm not sure that adding additional tests for Wingdings fonts matters that much, given how limited our "support" for them really is.
2020-02-24 17:40:06 +01:00
Jonas Jenwald
bf09d79eea Use the ESLint no-restricted-syntax rule to prevent direct usage of new Cmd()/new Name()/new Ref()
Given that all of these primitives implement caching, to avoid unnecessarily duplicating those objects *a lot* during parsing, it would thus be good to actually enforce usage of `Cmd.get()`/`Name.get()`/`Ref.get()` in the code-base.
Luckily it turns out that there's an ESLint rule, which is fairly easy to use, that can be used to disallow arbitrary JavaScript syntax.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-restricted-syntax
2020-02-22 21:15:00 +01:00
Jonas Jenwald
c3c3b8cd81 Add a heuristic, in src/core/jpg.js, to handle JPEG images with a wildly incorrect SOF (Start of Frame) scanLines parameter (issue 10880)
*This whole patch feels somewhat arbitrary, and I'd be slightly worried about possibly breaking something else.*

To limit the impact of these changes, we only re-parse JPEG images using a reduced `scanLines` value if and only if: An unexpected EOI (End of Image) marker was encountered during decoding of Scan data *and* the "actual" `scanLines` value is at least one order of magnitude smaller than expected.
2020-02-22 14:16:07 +01:00
Jonas Jenwald
5494f7d5bc Add basic validation of the scanLines parameter in JPEG images, before delegating decoding to the browser
In some cases PDF documents can contain JPEG images that the native browser decoder cannot handle, e.g. images with DNL (Define Number of Lines) markers or images where the SOF (Start of Frame) marker contains a wildly incorrect `scanLines` parameter.
Currently, for "simple" JPEG images, we're relying on native image decoding to *fail* before falling back to the implementation in `src/core/jpg.js`. In some cases, note e.g. issue 10880, the native image decoder doesn't outright fail and thus some images may not render.

In an attempt to improve the current situation, this patch adds additional validation of the JPEG image SOF data to force the use of `src/core/jpg.js` directly in cases where the native JPEG decoder cannot be trusted to do the right thing.
The only way to implement this is unfortunately to parse the *beginning* of the JPEG image data, looking for a SOF marker. To limit the impact of this extra parsing, the result is cached on the `JpegStream` instance and this code is only run for images which passed all of the pre-existing "can the JPEG image be natively rendered and/or decoded" checks.

---

*Slightly off-topic:* Working on this *really* makes me start questioning if native rendering/decoding of JPEG images is actually a good idea.
There's certain kinds of JPEG images not supported natively, and all of the validation which is now necessary isn't "free". At this point, in the `NativeImageDecoder`, we're having to check for certain properties in the image dictionary, parse the `ColorSpace`, and finally read the actual image data to find the SOF marker.
Furthermore, we cannot just send the image to the main-thread and be done in the "JpegStream" case, but we also need to wait for rendering to complete (or fail) before continuing with other parsing.
In the "JpegDecode" case we're even having to parse part of the image on the main-thread, which seems completely at odds with the principle of doing all heavy parsing in the Worker, and there's also a couple of potentially large (temporary) allocations/copies of TypedArray data involved as well.
2020-02-22 14:16:07 +01:00
Jonas Jenwald
6b44ae2170 Remove the unused thisArg from RefSetCache.forEach
Given that this is completely unused, and that a "normal" function call may be a *tiny* bit more efficient, there's no good reason as far as I can tell to keep it.
2020-02-21 14:23:05 +01:00
Jonas Jenwald
3c7b7be100 Prevent circular references in the /Pages tree 2020-02-19 01:49:39 +01:00
Tim van der Meij
4092aa9fbd
Merge pull request #11604 from Snuffleupagus/eslint-prefer-starts-ends-with
Enable the `unicorn/prefer-starts-ends-with` ESLint plugin rule
2020-02-16 13:17:27 +01:00