Commit Graph

2148 Commits

Author SHA1 Message Date
Jonas Jenwald
68d3a333ac Change the seenStyles object, in PartialEvaluator.getTextContent, to a Set
Given that what we actually want is only to keep track of the loadedFont-names, rather than storing any actual data, using an object isn't really necessary here. Furthermore, in the current code, we're also using `in` when checking if the data exists, which is generally less efficient than just checking for the value directly.
2021-04-05 10:34:02 +02:00
Jonas Jenwald
0eb1433c78 [api-minor] Change the format of the fontName-property, in defaultAppearanceData, on Annotation-instances (PR 12831 follow-up)
Currently the `fontName`-property contains an actual /Name-instance, which is a problem given that its fallback value is an empty string; see ca7f546828/src/core/default_appearance.js (L35)
The reason that this is a problem can be seen in ca7f546828/src/core/primitives.js (L30-L34), since an empty string short-circuits the cache. Essentially, in PDF documents, a /Name-instance cannot be empty and the way that the `DefaultAppearanceEvaluator` does things is unfortunately not entirely correct.

Hence the `fontName`-property is changed to instead contain a string, rather than a /Name-instance, which simplifies the code overall.

*Please note:* I'm tagging this patch with "[api-minor]", since PR 12831 is included in the current pre-release (although we're not using the `fontName`-property in the display-layer).
2021-04-01 16:47:30 +02:00
Tim van der Meij
5a64157a2f
Merge pull request #13168 from janpe2/ttf-uni-glyphs
Use post table when Encoding has only Differences
2021-03-31 21:35:13 +02:00
Tim van der Meij
1a4af17d07
Merge pull request #13165 from Snuffleupagus/Annotation-rm-defaultAppearance-export
[api-minor] Stop exposing the *raw* `defaultAppearance`-string on Annotation-instances
2021-03-31 21:30:50 +02:00
Jani Pehkonen
0117ee5071 Use post table when Encoding has only Differences
Fixes #13107
In the issue, some TrueType glyph names have the format `uniXXXX`.
Font's `Encoding` dictionary has the entry `Differences` but no
`BaseEncoding`. `uniXXXX` names are converted to glyph indices
using font's `post` table but currently that is done only when
`BaseEncoding` exists. We must enable the conversion also when only
`Differences` exists.
2021-03-31 17:58:44 +03:00
calixteman
b3528868c1
XFA - Add support for few ui elements (#13115)
- input;
  - layout;
  - border;
  - margin;
  - color.
2021-03-31 15:42:21 +02:00
Jonas Jenwald
3df24254e3 [api-minor] Stop exposing the *raw* defaultAppearance-string on Annotation-instances
The reasons for making this change are:
 - This property is not, nor has it ever been, used anywhere in the PDF.js display-layer.
 - Related to the previous point, the format of the `defaultAppearance`-string is such that it'd be difficult to use it as-is in the display-layer anyway.
 - It (usually) contains the "raw" appearance-string, from the PDF document, which is neither parsed nor validated and could thus be bogus.
 - We now expose a `defaultAppearanceData`-property, which is first of all used in the display-layer and secondly contains actually parsed/validated data.
 - In the event that a third-party implementation needs the `defaultAppearance`-string, it could be easily constructed from the recently added `defaultAppearanceData`-property.

All-in-all, I'm thus suggesting that we stop exposing an unused and unnecessary property on all Annotation-instances.
2021-03-31 15:09:18 +02:00
Jonas Jenwald
38acde8375 Use template strings, to reduce unnecessary verbosity in a few warn(...) calls in src/core/annotation.js 2021-03-31 14:40:21 +02:00
calixteman
84d7cccb1d
JS - Handle correctly hierarchy of fields (#13133)
* JS - Handle correctly hierarchy of fields
  - it aims to fix #13132;
  - annotations can inherit their actions from the parent field;
  - there are some fields which act as a container for other fields:
    - they can be access through js so need to add them with an empty type (nothing in the spec about that but checked in Acrobat);
    - calculation order list (CO) can reference them so need make them through this.getField;
    - getArray method must return kids.
  - field values are number, string, ... depending of their type but nothing in the spec on how to know what's the type:
    - according to the comment for Canonical Format: https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#page=461
    - it seems that this "type" can be guessed from js action Format (when setting a type in Acrobat DC, the only affected thing is this action).
  - util.scand with an empty string returns the current date.
2021-03-30 08:50:35 -07:00
Calixte Denizet
9296ee6986 Skip extra objects in object stream in using offsets 2021-03-28 13:03:05 +02:00
calixteman
81c602c61c
Set CFF header to 4 when writing it because it contains 4 elements (#13149) 2021-03-26 18:23:18 +01:00
calixteman
63471bcbbe
XFA - Convert some template properties into CSS ones (#13082)
- implement few positioning properties: position, width, height, anchor;
  - implement font element;
  - implement fill element (used by font) and its children (linear, radial, ...);
  - font property is inherited from ancestor container (see https://www.pdfa.org/wp-content/uploads/2020/07/XFA-3_3.pdf#page=43) so let CSS handles that stuff;
  - in order to reduce the number of properties to set, only set non default properties and put the default in CSS;
  - set a background to some containers to be able to see them (will be removed in a future commit).
2021-03-25 13:02:39 +01:00
Tim van der Meij
8269ddbd16
Merge pull request #13105 from Snuffleupagus/BasePdfManager-parseDocBaseUrl
Improve memory usage around the `BasePdfManager.docBaseUrl` parameter (PR 7689 follow-up)
2021-03-19 23:03:20 +01:00
calixteman
24e598a895
XFA - Add a layer to display XFA forms (#13069)
- add an option to enable XFA rendering if any;
  - for now, let the canvas layer: it could be useful to implement XFAF forms (embedded pdf in xml stream for the background and xfa form for the foreground);
  - ui elements in template DOM are pretty close to their html counterpart so we generate a fake html DOM from template one:
    - it makes easier to translate template properties to html ones;
    - it makes faster the creation of the html element in the main thread.
2021-03-19 10:11:40 +01:00
Jonas Jenwald
c4c7216171 Improve memory usage around the BasePdfManager.docBaseUrl parameter (PR 7689 follow-up)
While there is nothing *outright* wrong with the existing implementation, it can however lead to increased memory usage in one particular case (that I completely overlooked when implementing this):
For "data:"-URLs, which by definition contains the entire PDF document and can thus be arbitrarily large, we obviously want to avoid sending, storing, and/or logging the "raw" docBaseUrl in that case.

To address this, this patch makes the following changes:
 - Ignore any non-string in the `docBaseUrl` option passed to `getDocument`, since those are unsupported anyway, already on the main-thread.

 - Ignore "data:"-URLs in the `docBaseUrl` option passed to `getDocument`, to avoid having to send what could potentially be a *very* long string to the worker-thread.

 - Parse the `docBaseUrl` option *directly* in the `BasePdfManager`-constructors, on the worker-thread, to avoid having to store the "raw" docBaseUrl in the first place.
2021-03-17 15:48:24 +01:00
Jonas Jenwald
5099f1977f Support LineAnnotations with empty /Rect-entries (issue 6564)
This extends PR 13033 slightly, with a heuristic to support corrupt PDF documents where the `LineAnnotation`s have an empty /Rect-entry. Please note that while I have no idea if this is "correct", this patch at least makes us output the same /BBox as re-saving in Adobe Reader does.
2021-03-15 16:33:43 +01:00
Tim van der Meij
a2f0573a64
Enable the no-var linting rule for src/core/operator_list.js
This is mostly done using `gulp lint --fix` with a few manual changes in
the following diff:

```diff
diff --git a/src/core/operator_list.js b/src/core/operator_list.js
index 66c26fe05..cbcd12d97 100644
--- a/src/core/operator_list.js
+++ b/src/core/operator_list.js
@@ -40,7 +40,8 @@ const QueueOptimizer = (function QueueOptimizerClosure() {
     // 'count' groups of (save, transform, paintImageMaskXObject, restore)+
     // have been found at iFirstSave.
     const iFirstPIMXO = iFirstSave + 2;
-    for (var i = 0; i < count; i++) {
+    let i;
+    for (i = 0; i < count; i++) {
       const arg = argsArray[iFirstPIMXO + 4 * i];
       const imageMask = arg.length === 1 && arg[0];
       if (
@@ -106,8 +107,8 @@ const QueueOptimizer = (function QueueOptimizerClosure() {
       // assuming that heights of those image is too small (~1 pixel)
       // packing as much as possible by lines
       let maxX = 0;
-      let map = [],
-        maxLineHeight = 0;
+      const map = [];
+      let maxLineHeight = 0;
       let currentX = IMAGE_PADDING,
         currentY = IMAGE_PADDING;
       let q;
@@ -326,9 +327,9 @@ const QueueOptimizer = (function QueueOptimizerClosure() {
           if (fnArray[i] !== OPS.transform) {
             return false;
           }
-          var iFirstTransform = context.iCurr - 2;
-          var firstTransformArg0 = argsArray[iFirstTransform][0];
-          var firstTransformArg3 = argsArray[iFirstTransform][3];
+          const iFirstTransform = context.iCurr - 2;
+          const firstTransformArg0 = argsArray[iFirstTransform][0];
+          const firstTransformArg3 = argsArray[iFirstTransform][3];
           if (
             argsArray[i][0] !== firstTransformArg0 ||
             argsArray[i][1] !== 0 ||
@@ -342,8 +343,8 @@ const QueueOptimizer = (function QueueOptimizerClosure() {
           if (fnArray[i] !== OPS.paintImageXObject) {
             return false;
           }
-          var iFirstPIXO = context.iCurr - 1;
-          var firstPIXOArg0 = argsArray[iFirstPIXO][0];
+          const iFirstPIXO = context.iCurr - 1;
+          const firstPIXOArg0 = argsArray[iFirstPIXO][0];
           if (argsArray[i][0] !== firstPIXOArg0) {
             return false; // images don't match
           }
@@ -423,9 +424,9 @@ const QueueOptimizer = (function QueueOptimizerClosure() {
           if (fnArray[i] !== OPS.showText) {
             return false;
           }
-          var iFirstSetFont = context.iCurr - 3;
-          var firstSetFontArg0 = argsArray[iFirstSetFont][0];
-          var firstSetFontArg1 = argsArray[iFirstSetFont][1];
+          const iFirstSetFont = context.iCurr - 3;
+          const firstSetFontArg0 = argsArray[iFirstSetFont][0];
+          const firstSetFontArg1 = argsArray[iFirstSetFont][1];
           if (
             argsArray[i][0] !== firstSetFontArg0 ||
             argsArray[i][1] !== firstSetFontArg1
```
2021-03-14 11:49:31 +01:00
Tim van der Meij
24ff738e7b
Enable the no-var linting rule for src/core/pattern.js
This is mostly done using `gulp lint --fix` with a few manual changes in
the following diff:

```diff
diff --git a/src/core/pattern.js b/src/core/pattern.js
index 365491ed3..eedd8b686 100644
--- a/src/core/pattern.js
+++ b/src/core/pattern.js
@@ -105,7 +105,7 @@ const Pattern = (function PatternClosure() {
   return Pattern;
 })();

-var Shadings = {};
+const Shadings = {};

 // A small number to offset the first/last color stops so we can insert ones to
 // support extend. Number.MIN_VALUE is too small and breaks the extend.
@@ -597,16 +597,15 @@ Shadings.Mesh = (function MeshClosure() {
       if (!(0 <= f && f <= 3)) {
         throw new FormatError("Unknown type6 flag");
       }
-      var i, ii;
       const pi = coords.length;
-      for (i = 0, ii = f !== 0 ? 8 : 12; i < ii; i++) {
+      for (let i = 0, ii = f !== 0 ? 8 : 12; i < ii; i++) {
         coords.push(reader.readCoordinate());
       }
       const ci = colors.length;
-      for (i = 0, ii = f !== 0 ? 2 : 4; i < ii; i++) {
+      for (let i = 0, ii = f !== 0 ? 2 : 4; i < ii; i++) {
         colors.push(reader.readComponents());
       }
-      var tmp1, tmp2, tmp3, tmp4;
+      let tmp1, tmp2, tmp3, tmp4;
       switch (f) {
         // prettier-ignore
         case 0:
@@ -729,16 +728,15 @@ Shadings.Mesh = (function MeshClosure() {
       if (!(0 <= f && f <= 3)) {
         throw new FormatError("Unknown type7 flag");
       }
-      var i, ii;
       const pi = coords.length;
-      for (i = 0, ii = f !== 0 ? 12 : 16; i < ii; i++) {
+      for (let i = 0, ii = f !== 0 ? 12 : 16; i < ii; i++) {
         coords.push(reader.readCoordinate());
       }
       const ci = colors.length;
-      for (i = 0, ii = f !== 0 ? 2 : 4; i < ii; i++) {
+      for (let i = 0, ii = f !== 0 ? 2 : 4; i < ii; i++) {
         colors.push(reader.readComponents());
       }
-      var tmp1, tmp2, tmp3, tmp4;
+      let tmp1, tmp2, tmp3, tmp4;
       switch (f) {
         // prettier-ignore
         case 0:
@@ -897,7 +895,7 @@ Shadings.Mesh = (function MeshClosure() {
         decodeType4Shading(this, reader);
         break;
       case ShadingType.LATTICE_FORM_MESH:
-        var verticesPerRow = dict.get("VerticesPerRow") | 0;
+        const verticesPerRow = dict.get("VerticesPerRow") | 0;
         if (verticesPerRow < 2) {
           throw new FormatError("Invalid VerticesPerRow");
         }
```
2021-03-14 11:43:05 +01:00
Jonas Jenwald
5b5061afa8 Enable the ESLint no-var rule globally
A significant portion of the code-base has now been converted to use `let`/`const`, rather than `var`, hence it should be possible to simply enable the ESLint `no-var` rule globally.
This way we can ensure that new code won't accidentally use `var`, and it also removes the need to manually enable the rule in various folders.

Obviously it makes sense to continue the efforts to replace `var`, but that should probably happen on a file and/or folder basis.

Please note that this patch excludes the following code:
 - The `extensions/` folder, since that seemed easiest for now (and I don't know exactly what the support situation is for the Chromium-extension).

 - The entire `external/` folder is ignored, since most of it's currently excluded from linting.
   For the code that isn't imported from elsewhere (and should be ignored), we should probably (at some point) bring the code up to the same linting/formatting standard as the rest of the code-base.

 - Various files in the `test/` folder are ignored, as necessary, since the way that a lot of this code is loaded will require some care (or perhaps larger re-factoring) when removing `var` usage.
2021-03-13 16:12:53 +01:00
Jonas Jenwald
ab91f42a5e Convert code in src/core/type1_parser.js to use "standard" classes
All of this code predates the existence of native JS classes, however we can now clean this up a little bit.
2021-03-12 12:16:50 +01:00
Jonas Jenwald
8fc8dc020e Enable the ESLint no-var rule in the src/core/type1_parser.js file
Note that the majority of these changes were done automatically, by using `gulp lint --fix`, and the manual changes were limited to the following diff:

```diff
diff --git a/src/core/type1_parser.js b/src/core/type1_parser.js
index 192781de1..05c5fe2e5 100644
--- a/src/core/type1_parser.js
+++ b/src/core/type1_parser.js
@@ -251,7 +251,7 @@ const Type1CharString = (function Type1CharStringClosure() {
               // vhea tables reconstruction -- ignoring it.
               this.stack.pop(); // wy
               wx = this.stack.pop();
-              var sby = this.stack.pop();
+              const sby = this.stack.pop();
               sbx = this.stack.pop();
               this.lsb = sbx;
               this.width = wx;
@@ -263,8 +263,8 @@ const Type1CharString = (function Type1CharStringClosure() {
                 error = true;
                 break;
               }
-              var num2 = this.stack.pop();
-              var num1 = this.stack.pop();
+              const num2 = this.stack.pop();
+              const num1 = this.stack.pop();
               this.stack.push(num1 / num2);
               break;
             case (12 << 8) + 16: // callothersubr
@@ -273,7 +273,7 @@ const Type1CharString = (function Type1CharStringClosure() {
                 break;
               }
               subrNumber = this.stack.pop();
-              var numArgs = this.stack.pop();
+              const numArgs = this.stack.pop();
               if (subrNumber === 0 && numArgs === 3) {
                 const flexArgs = this.stack.splice(this.stack.length - 17, 17);
                 this.stack.push(
@@ -397,9 +397,9 @@ const Type1Parser = (function Type1ParserClosure() {
     if (discardNumber >= data.length) {
       return new Uint8Array(0);
     }
+    const c1 = 52845,
+      c2 = 22719;
     let r = key | 0,
-      c1 = 52845,
-      c2 = 22719,
       i,
       j;
     for (i = 0; i < discardNumber; i++) {
@@ -416,9 +416,9 @@ const Type1Parser = (function Type1ParserClosure() {
   }

   function decryptAscii(data, key, discardNumber) {
-    let r = key | 0,
-      c1 = 52845,
+    const c1 = 52845,
       c2 = 22719;
+    let r = key | 0;
     const count = data.length,
       maybeLength = count >>> 1;
     const decrypted = new Uint8Array(maybeLength);
@@ -429,7 +429,7 @@ const Type1Parser = (function Type1ParserClosure() {
         continue;
       }
       i++;
-      var digit2;
+      let digit2;
       while (i < count && !isHexDigit((digit2 = data[i]))) {
         i++;
       }
@@ -599,7 +599,7 @@ const Type1Parser = (function Type1ParserClosure() {
               if (token !== "/") {
                 continue;
               }
-              var glyph = this.getToken();
+              const glyph = this.getToken();
               length = this.readInt();
               this.getToken(); // read in 'RD' or '-|'
               data = length > 0 ? stream.getBytes(length) : new Uint8Array(0);
@@ -638,7 +638,7 @@ const Type1Parser = (function Type1ParserClosure() {
           case "OtherBlues":
           case "FamilyBlues":
           case "FamilyOtherBlues":
-            var blueArray = this.readNumberArray();
+            const blueArray = this.readNumberArray();
             // *Blue* values may contain invalid data: disables reading of
             // those values when hinting is disabled.
             if (
@@ -672,7 +672,7 @@ const Type1Parser = (function Type1ParserClosure() {
       }

       for (let i = 0; i < charstrings.length; i++) {
-        glyph = charstrings[i].glyph;
+        const glyph = charstrings[i].glyph;
         encoded = charstrings[i].encoded;
         const charString = new Type1CharString();
         const error = charString.convert(
@@ -728,12 +728,12 @@ const Type1Parser = (function Type1ParserClosure() {
         token = this.getToken();
         switch (token) {
           case "FontMatrix":
-            var matrix = this.readNumberArray();
+            const matrix = this.readNumberArray();
             properties.fontMatrix = matrix;
             break;
           case "Encoding":
-            var encodingArg = this.getToken();
-            var encoding;
+            const encodingArg = this.getToken();
+            let encoding;
             if (!/^\d+$/.test(encodingArg)) {
               // encoding name is specified
               encoding = getEncoding(encodingArg);
@@ -764,7 +764,7 @@ const Type1Parser = (function Type1ParserClosure() {
             properties.builtInEncoding = encoding;
             break;
           case "FontBBox":
-            var fontBBox = this.readNumberArray();
+            const fontBBox = this.readNumberArray();
             // adjusting ascent/descent
             properties.ascent = Math.max(fontBBox[3], fontBBox[1]);
             properties.descent = Math.min(fontBBox[1], fontBBox[3]);
```
2021-03-12 12:05:48 +01:00
Jonas Jenwald
82062f7e0d Enable the ESLint no-var rule in the src/core/cff_parser.js file
Note that the majority of these changes were done automatically, by using `gulp lint --fix`, and the manual changes were limited to the following diff:

```diff
diff --git a/src/core/cff_parser.js b/src/core/cff_parser.js
index d684c200e..2e2b811e4 100644
--- a/src/core/cff_parser.js
+++ b/src/core/cff_parser.js
@@ -555,7 +555,7 @@ const CFFParser = (function CFFParserClosure() {
           stackSize %= 2;
           validationCommand = CharstringValidationData[value];
         } else if (value === 10 || value === 29) {
-          var subrsIndex;
+          let subrsIndex;
           if (value === 10) {
             subrsIndex = localSubrIndex;
           } else {
@@ -886,15 +886,15 @@ const CFFParser = (function CFFParserClosure() {
         format = bytes[pos++];
         switch (format & 0x7f) {
           case 0:
-            var glyphsCount = bytes[pos++];
+            const glyphsCount = bytes[pos++];
             for (i = 1; i <= glyphsCount; i++) {
               encoding[bytes[pos++]] = i;
             }
             break;

           case 1:
-            var rangesCount = bytes[pos++];
-            var gid = 1;
+            const rangesCount = bytes[pos++];
+            let gid = 1;
             for (i = 0; i < rangesCount; i++) {
               const start = bytes[pos++];
               const left = bytes[pos++];
@@ -938,7 +938,7 @@ const CFFParser = (function CFFParserClosure() {
           }
           break;
         case 3:
-          var rangesCount = (bytes[pos++] << 8) | bytes[pos++];
+          const rangesCount = (bytes[pos++] << 8) | bytes[pos++];
           for (i = 0; i < rangesCount; ++i) {
             let first = (bytes[pos++] << 8) | bytes[pos++];
             if (i === 0 && first !== 0) {
@@ -1173,7 +1173,7 @@ class CFFDict {
   }
 }

-var CFFTopDict = (function CFFTopDictClosure() {
+const CFFTopDict = (function CFFTopDictClosure() {
   const layout = [
     [[12, 30], "ROS", ["sid", "sid", "num"], null],
     [[12, 20], "SyntheticBase", "num", null],
@@ -1229,7 +1229,7 @@ var CFFTopDict = (function CFFTopDictClosure() {
   return CFFTopDict;
 })();

-var CFFPrivateDict = (function CFFPrivateDictClosure() {
+const CFFPrivateDict = (function CFFPrivateDictClosure() {
   const layout = [
     [6, "BlueValues", "delta", null],
     [7, "OtherBlues", "delta", null],
@@ -1265,11 +1265,12 @@ var CFFPrivateDict = (function CFFPrivateDictClosure() {
   return CFFPrivateDict;
 })();

-var CFFCharsetPredefinedTypes = {
+const CFFCharsetPredefinedTypes = {
   ISO_ADOBE: 0,
   EXPERT: 1,
   EXPERT_SUBSET: 2,
 };
+
 class CFFCharset {
   constructor(predefined, format, charset, raw) {
     this.predefined = predefined;
@@ -1695,7 +1696,7 @@ class CFFCompiler {
             // For offsets we just insert a 32bit integer so we don't have to
             // deal with figuring out the length of the offset when it gets
             // replaced later on by the compiler.
-            var name = dict.keyToNameMap[key];
+            const name = dict.keyToNameMap[key];
             // Some offsets have the offset and the length, so just record the
             // position of the first one.
             if (!offsetTracker.isTracking(name)) {
```
2021-03-12 12:00:38 +01:00
Jonas Jenwald
f3948aeb90 Enable the ESLint no-var rule in the src/core/font_renderer.js file
Note that the majority of these changes were done automatically, by using `gulp lint --fix`, and the manual changes were limited to the following diff:

```diff
diff --git a/src/core/font_renderer.js b/src/core/font_renderer.js
index e1538c481..00f5424cd 100644
--- a/src/core/font_renderer.js
+++ b/src/core/font_renderer.js
@@ -152,9 +152,9 @@ const FontRendererFactory = (function FontRendererFactoryClosure() {
   }

   function lookupCmap(ranges, unicode) {
-    let code = unicode.codePointAt(0),
-      gid = 0;
-    let l = 0,
+    const code = unicode.codePointAt(0);
+    let gid = 0,
+      l = 0,
       r = ranges.length - 1;
     while (l < r) {
       const c = (l + r + 1) >> 1;
@@ -199,7 +199,7 @@ const FontRendererFactory = (function FontRendererFactoryClosure() {
         flags = (code[i] << 8) | code[i + 1];
         const glyphIndex = (code[i + 2] << 8) | code[i + 3];
         i += 4;
-        var arg1, arg2;
+        let arg1, arg2;
         if (flags & 0x01) {
           arg1 = ((code[i] << 24) | (code[i + 1] << 16)) >> 16;
           arg2 = ((code[i + 2] << 24) | (code[i + 3] << 16)) >> 16;
@@ -366,7 +366,7 @@ const FontRendererFactory = (function FontRendererFactoryClosure() {
       while (i < code.length) {
         let stackClean = false;
         let v = code[i++];
-        var xa, xb, ya, yb, y1, y2, y3, n, subrCode;
+        let xa, xb, ya, yb, y1, y2, y3, n, subrCode;
         switch (v) {
           case 1: // hstem
             stems += stack.length >> 1;
@@ -494,7 +494,7 @@ const FontRendererFactory = (function FontRendererFactoryClosure() {
                 bezierCurveTo(xa, y2, xb, y3, x, y);
                 break;
               case 37: // flex1
-                var x0 = x,
+                const x0 = x,
                   y0 = y;
                 xa = x + stack.shift();
                 ya = y + stack.shift();
```
2021-03-12 11:57:27 +01:00
Calixte Denizet
3243672727 XFA - Create Form DOM in merging template and data trees
- Spec: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=171;
  - support for the 2 ways of merging: consumeData and matchTemplate;
  - create additional nodes in template DOM when occur node allows it;
  - support for global values in data DOM.
2021-03-08 14:10:30 +01:00
Tim van der Meij
5828ff6cb0
Implement rendering line annotations without appearance stream 2021-02-28 18:57:58 +01:00
Tim van der Meij
d6e0b2d92e
Merge pull request #13032 from Snuffleupagus/parseDestDictionary-actionName-warn
Don't warn about actions that require scripting support in `Catalog.parseDestDictionary`
2021-02-28 14:52:06 +01:00
Jonas Jenwald
39cf4a0008 Don't warn about actions that require scripting support in Catalog.parseDestDictionary
Now that we have scripting support, warning about e.g. JavaScript actions doesn't seem necessary anymore. Especially considering that scripting-related actions are/will not be parsed by the `Catalog.parseDestDictionary` method anyway, since it's intended for handling "simple" actions.
2021-02-28 13:13:17 +01:00
Tim van der Meij
fa6cebf045
Implement rendering square/circle annotations without appearance stream 2021-02-27 19:05:12 +01:00
Jonas Jenwald
05de20071a Modernize some of the code in src/core/cmap.js by using classes and async/await
This converts a couple of our old "classes" to proper ECMAScript classes, and replaces a lot of manual Promise-wrapping with async/await instead.
2021-02-27 14:20:43 +01:00
Tim van der Meij
4e96d59fca
Use a buffer instead of string concatenation in reverseIfRtl in src/core/unicode.js
This avoids creating intermediate strings and should be slightly more
efficient.
2021-02-27 13:20:09 +01:00
Tim van der Meij
24f80f1e38
Enable the no-var linting rule in src/core/primitives.js 2021-02-27 12:51:01 +01:00
Tim van der Meij
ed33727419
Enable the no-var linting rule in src/core/glyphlist.js 2021-02-27 12:46:57 +01:00
Tim van der Meij
e051d4d029
Enable the no-var linting rule in src/core/ccitt_stream.js 2021-02-27 12:44:55 +01:00
Tim van der Meij
0897dddbbe
Enable the no-var linting rule in src/core/unicode.js 2021-02-27 12:44:50 +01:00
Tim van der Meij
cb82dda755
Enable the no-var linting rule in src/core/metrics.js 2021-02-27 12:44:45 +01:00
Tim van der Meij
55786a4880
Merge pull request #13026 from Snuffleupagus/crypto-classes
Convert code in `src/core/crypto.js` to use "normal" classes
2021-02-26 22:39:30 +01:00
Jonas Jenwald
6b4c4f80e4 Convert code in src/core/crypto.js to use "normal" classes
All of this code predates the existence of native JS classes, however we can now clean this up a bit. This patch thus let us remove some variable "shadowing" from the code.
2021-02-26 15:51:45 +01:00
Jonas Jenwald
b884757873 Inline the concatArrays function in calculatePDF20Hash
This helper function is first of all only called *twice*, and secondly it also leads to unnecessary intermediate allocations given how the `TypedArray`s are handled.
Hence we can simply inline this small function, and thus directly allocate the combined `TypedArray` instead.
2021-02-26 15:51:39 +01:00
Jonas Jenwald
9a9a5b2365 Replace the compareByteArrays functions, in src/core/crypto.js, with the isArrayEqual helper function
The `compareByteArrays` is first of all duplicated in multiple closures in the `src/core/crypto.js` file. Secondly, despite its name, it's also functionally equivalent to the now existing `isArrayEqual` helper function.

The `isArrayEqual` helper function is changed to use a standard `for`-loop, rather than `Array.prototype.every`, since that ought to be slightly more efficient given that we're now using it with (potentially) larger data.
2021-02-26 15:51:32 +01:00
Jonas Jenwald
e69e8622a9 Convert code in src/core/function.js to use "normal" classes
All of this code predates the existence of native JS classes, however we can now clean this up a bit. This patch thus let us remove some variable "shadowing" from the code.
2021-02-26 13:20:59 +01:00
calixteman
45329af926
XFA -- Add support for SOM expressions (#12983)
- specifications: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=87;
 - add a parser for SOM expressions;
 - add search functions to resolve those expressions;
 - search functions will be used to bind data into template.
2021-02-24 10:13:02 +01:00
Jonas Jenwald
e9038cc3d1 Send the AnnotationStorage-data to the worker-thread as a Map
Rather than converting the `AnnotationStorage`-data to an Object, before sending it to the worker-thread, we should be able to simply send the internal `Map` directly.
The "structured clone algorithm" doesn't have a problem with `Map`s, however the `LoopbackPort` used when workers are *disabled* (e.g. in Node.js environments) didn't use to support them. With PR 12997 having lifted that restriction, we should now be able to simply send the `AnnotationStorage`-data as-is rather than having to iterate through it to first create an Object.

*Please note:* The changes in `src/core/annotation.js` could have been a lot more compact if we were able to use optional chaining in the `src/core` folder. Unfortunately that's still not possible, since SystemJS is being used in the development viewer (i.g. `gulp server`) and fixing that is *still* blocked by [bug 1247687](https://bugzilla.mozilla.org/show_bug.cgi?id=1247687).
2021-02-18 17:13:43 +01:00
calixteman
0fa9976268
XFA - Add support for prototypes (#12979)
- specifications: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=225&zoom=auto,-207,784
 - add a clone method on nodes in order to be able to clone a proto;
 - support ids in template namespace;
 - prevent from cycle when applying protos.
2021-02-18 10:32:25 +01:00
Tim van der Meij
4619b1b568
Merge pull request #12997 from Snuffleupagus/metadata-worker
Move the Metadata parsing to the worker-thread
2021-02-17 20:57:46 +01:00
calixteman
b5be515375
XFA - Add a lexer/parser for FormCalc language (#12936)
- the language specifications are: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=1049
 - it can be used to:
   * as a scripting language for calculation, validations, ...
   * in SOM expressions to select nodes: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=101
2021-02-17 20:28:06 +01:00
Jonas Jenwald
d366bbdf51 Move the encodeToXmlString helper function to src/core/core_utils.js
With the previous patch this function is now *only* accessed on the worker-thread, hence it's no longer necessary to include it in the *built* `pdf.js` file.
2021-02-17 13:12:01 +01:00
Jonas Jenwald
b66f294f64 Move the XML-parser to the src/core/-folder
With the previous patch this functionality is now *only* accessed on the worker-thread, hence it's no longer necessary to include it in the *built* `pdf.js` file.
2021-02-17 13:12:01 +01:00
Jonas Jenwald
cc3a6563ee Move the Metadata parsing to the worker-thread
The only reason, as far as I can tell, for parsing the Metadata on the main-thread is how it was originally implemented. When Metadata support was first implemented, it utilized the [`DOMParser`](https://developer.mozilla.org/en-US/docs/Web/API/DOMParser) which isn't available in workers.
Today, with the custom XML-parser being used, that's no longer an issue and it seems reasonable to move the Metadata parsing to the worker-thread[1], since that's where all parsing should happen (for performance reasons).

Based on these changes, we'll be able to reduce the now unnecessary duplication of the XML-parser (and related code) in both of the *built* `pdf.js`/`pdf.worker.js` files.

Finally, this patch changes the `_repair` method to use "Array + join" rather than string concatenation.

---
[1] This needed the previous patch, to enable sending of `Map`s between threads with workers disabled.
2021-02-17 13:12:01 +01:00
Calixte Denizet
0fc8267576 Avoid infinite loop when getting annotation field name
- aims to fix issue #12963;
 - use a Set to track already visited objects;
 - remove the loop limit in getInheritableProperty and use a RefSet too.
2021-02-14 19:58:19 +01:00
Jonas Jenwald
1ee747a620 Remove unneeded instanceof MissingDataException checks
The following checks are all unneeded, and could easily cause confusion when reading the code. (All of them are my fault as well, since I've sometimes added those checks without really thinking about the surrounding code.)

 - In `PartialEvaluator.hasBlendModes` there cannot be any `MissingDataException`s thrown, given that the `Page.getOperatorList` method waits for all the necessary /Resources to load first. Furthermore, note also that if an error is thrown from `PartialEvaluator.hasBlendModes` then it'd completely break rendering of that page, since any errors thrown from `Page.getOperatorList` are simply sent to the main-thread.

 - In `PartialEvaluator.handleColorN` there cannot be any `MissingDataException`s thrown, given that again the `Page.getOperatorList` method waits for all the necessary /Resources to load before operatorList parsing starts.

 - In `XRef.readXRef` there cannot be any `MissingDataException`s thrown, given that we're *explicitly* requesting (and waiting for) the entire document in `pdfManagerReady` (in `src/core/worker.js`) before re-parsing of a corrupt document starts.
2021-02-13 12:26:05 +01:00
Calixte Denizet
ea06bb0e36 [api-minor] Annotation -- Don't compute appearance when nothing has changed
* don't set a value in annotationStorage by default:
   - having an undefined when the annotation is rendered for saving/printing means nothing has changed so use normal appearance
   - aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1681687
 * change the way to compute font size when this one is null in DA:
   - make fontSize proportional to line height
   - in multiline case, take into account the number of lines for text entered to adapt the font size
2021-02-12 19:27:21 +01:00
calixteman
0479deef4e
XFA -- Add other objects (#12949)
- connectionSet: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=969
 - datasets: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=1038
  - signature: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=1040
  - stylesheet: the same
  - xhtml: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=1187
2021-02-11 12:30:37 +01:00
calixteman
3787bd41ef
XFA -- Add localset object (#12948)
- Specifications: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=943
2021-02-10 18:04:43 +01:00
Jonas Jenwald
0068dba009 [api-minor] Rename -es5 to -legacy, to reduce confusion over what's actually supported (issue 12976)
*Please note that this will also require some edits of the Wiki.*
2021-02-10 16:01:59 +01:00
Jonas Jenwald
31098c404d
Use Math.hypot, instead of Math.sqrt with manual squaring (#12973)
When the PDF.js project started `Math.hypot` didn't exist yet, and until recently we still supported browsers (IE 11) without a native `Math.hypot` implementation; please see this compatibility information: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/hypot#browser_compatibility

Furthermore, somewhat recently there were performance improvements of `Math.hypot` in Firefox; see https://bugzilla.mozilla.org/show_bug.cgi?id=1648820

Finally, this patch also replaces a couple of multiplications with the exponentiation operator.
2021-02-10 12:28:49 +01:00
Jonas Jenwald
e6fe8a7d53 Handle errors gracefully, in PartialEvaluator.translateFont, when fetching the font file (issue 9462)
The *third* page of the referenced PDF document currently fails to render completely, since one of its font files fail to load.
Since that error isn't handled, a large part of the text is thus missing which looks quite bad. By "replacing" the font data with an *empty* stream, we'll thus be able to fallback to rendering the text with a standard font (instead of using `ErrorFont`). While there's obviously no guarantee that things will look perfect, actually rendering the text at all should be an improvement in general.

Also, print a warning in `PartialEvaluator.loadFont` when the `PartialEvaluator.translateFont` method rejects, since that'd have helped debug/fix the issue faster.
2021-02-06 19:44:53 +01:00
Jonas Jenwald
d3e65f24e3 Request all data, rather than throwing, when encountering general errors in ObjectLoader._walk (issue 9462, PR 3289 follow-up)
*As far as I can tell, this has been broken ever since PR 3289 (back in 2013) without anyone noticing.*

For any non-`MissingDataException` errors encountered in `ObjectLoader._walk`, we're simply throwing immediately which thus has the potential to *completely* break rendering of an entire page.
In practice this is obviously only an issue for PDF documents which are in one way or another corrupt, since that's the only way that `XRef.fetch` will throw non-`MissingDataException` errors. To make matters worse these errors are *intermittent*, since they can only occur if the document is still loading when the `ObjectLoader`-code runs (note the early return in `ObjectLoader.load`).

Please note that we cannot simply catch the error and let "normal" parsing continue in `ObjectLoader._walk`, since that could lead to errors elsewhere given that resources "below" the current one (in the graph) might not be checked as intended then.
All-in-all, the only way to make absolutely sure that we won't cause *unexpected* `MissingDataException`s somewhere else in the code-base is to fallback to fetching the *entire* document in this edge-case.
2021-02-06 14:33:50 +01:00
Brendan Dahl
a392082e30
Merge pull request #12944 from calixteman/xfa_config
XFA -- Update config object
2021-02-05 15:06:09 -08:00
Calixte Denizet
9d47e69771 XFA -- Update config object 2021-02-05 19:22:51 +01:00
Calixte Denizet
652ff57897 XFA -- Add template object
- Specifications: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=596
2021-02-03 21:05:10 +01:00
Calixte Denizet
7e0554afe2 XFA -- Add attributes and children in XFAObject
- in order to evaluate SOM expressions nodes and their attributes must be checked in the same order as in the xml;
 - add an object XFAObjectArray with a parameter max to handle multiple children with the same name.
2021-02-03 18:56:00 +01:00
Calixte Denizet
0ff5cd7eb5 XFA - Add a parser for XFA files
- the parser is base on a class extending XMLParserBase
 - it handle xml namespaces:
   * each namespace is assocated with a builder
   * builder builds nodes belonging to the namespace
   * when a node is inserted in the parent namespace compatibility is checked (if required)
 - to avoid name collision between xml names and object properties, use Symbol.
2021-02-01 13:45:31 +01:00
Tim van der Meij
e4e92d10e8
Merge pull request #12922 from Snuffleupagus/getTextContent-globalImageCache
Ignore globally cached images in `PartialEvaluator.getTextContent` (PR 11930 follow-up)
2021-01-28 23:44:10 +01:00
Tim van der Meij
8805614a03
Merge pull request #12924 from brendandahl/fix-clone
Fix font data clone error when pdfBug is enabled.
2021-01-28 23:42:12 +01:00
Jonas Jenwald
72da2aa166 Ignore globally cached images in PartialEvaluator.getTextContent (PR 11930 follow-up)
Given that we'll only cache `/XObject`s of the `Image`-type globally, we can utilize that in `PartialEvaluator.getTextContent` as well. This way, in cases such as e.g. issue 12098, we can avoid having to fetch/parse `/XObject`s that we already know to be `Image`s. This is helpful, since `Stream`s are not cached on the `XRef` instance (given their potential size) and the lookup can thus be somewhat expensive in general.

Also, skip a redundant `RefSetCache.has` check in the `GlobalImageCache.getData` method.
2021-01-28 10:19:26 +01:00
Brendan Dahl
52fb5abb0b Fix font data clone error when pdfBug is enabled.
The widths property should be an object to match what metrics returns.

In ZapfDingbats.pdf I was getting a data clone error with pdfBug enabled.
In buildCharCodeToWidth() there was an encoding with the name "at" which
is also the name of a method on an array. buildCharCodeToWidth assumes an
object is passed in, so when it checked for the "at" property, it found the
method and copied it over.

This only seemed to affect Firefox.
2021-01-27 14:38:43 -08:00
Jonas Jenwald
1ab6d2c604 Improve global image caching for small images (PR 11912 follow-up, issue 12098)
When implementing the `GlobalImageCache` functionality I was mostly worried about the effect of *very large* images, hence the maximum number of cached images were purposely kept quite low[1].
However, there's one fairly obvious problem with that approach: In documents with hundreds, or even thousands, of *small* images the `GlobalImageCache` as implemented becomes essentially pointless.

Hence this patch, where the `GlobalImageCache`-implementation is changed in the following ways:
 - We're still guaranteed to be able to cache a *minimum* number of images, set to `10` (similar as before).
 - If the *total* size of all the cached image data is below a threshold[2], we're allowed to cache additional images.

This patch thus *improve*, but doesn't completely fix, issue 12098. Note that that document is created by a *very poor* PDF generator, since every single page contains the *entire* document (with all of its /Resources) and to create the individual pages clipping is used.[3]

---
[1] Currently set to `10` images; imagine what would happen to overall memory usage if we encountered e.g. 50 images each 10 MB in size.

[2] This value was chosen, somewhat randomly, to be `40` megabytes; basically five times the [maximum individual image size per page](6249ef517d/src/display/api.js (L2483-L2484)).

[3] This surely has to be some kind of record w.r.t. how badly PDF generators can mess things up...
2021-01-26 12:00:12 +01:00
calixteman
a3f6882b06
JS -- add support for choice widget (#12826) 2021-01-25 23:40:57 +01:00
Tim van der Meij
25b84ce84c
Merge pull request #12828 from dhufnagel/feature/annotation_layer_display_fontsize
[api-minor] Set font size and color for text widget annotations
2021-01-23 16:08:07 +01:00
Jonas Jenwald
6bcb4e3ad9 Ensure that parseDefaultAppearance won't attempt to access a not yet defined variable (PR 12831 follow-up)
Note how, in the `if (this.stateManager.stateStack.length !== 0) {` branch, we're attempting to access the not yet defined variable[1] `args`. If this code-path is ever hit, an Error will be thrown and parsing will thus be aborted immediately (likely leading to e.g. rendering bugs).

Note that I found this purely by accident, since I happened to glance at the LGTM report. However, I've since found that the error is also present during the unit-test[2] and with this patch we're actually testing the *intended* thing here.

As part of fixing this, and to avoid re-introducing a similar bug in the future, we'll now instead always reset `args.length` *before* attempting to read the next operator.
Also, we can use the existing `EvaluatorPreprocessor.savedStatesDepth` getter to simplify the save/restore detection a tiny bit.

---
[1] The ESLint rule `no-use-before-define` would have helped catch this problem, but unfortunately we cannot enable that without quite a bit of refactoring all over the code-base.

[2] The unit-test was updated such that it would fail in the `master`-branch.
2021-01-23 15:33:28 +01:00
Dominik Hufnagel
c5083cda02 set font size and color on annotation layer
use the default appearance to set the font size and color of a text
annotation widget
2021-01-22 23:12:14 +01:00
Tim van der Meij
6ffb6b1c0c
Merge pull request #12885 from Snuffleupagus/worker-tweak-caching
Simplify the `PDFFunctionFactory._localFunctionCache` initialization (PR 12034 follow-up); Fix the `gStateObj` lookup in `TranslatedFont._removeType3ColorOperators` (PR 12718 follow-up)
2021-01-22 20:24:33 +01:00
Jonas Jenwald
ca1f58ea42 Use _defaultAppearanceData directly in WidgetAnnotation._getSaveFieldResources (PR 12831 follow-up)
With the changes in PR 12831, it's no longer necessary to keep track of the `fontName`-string separately since it's available through the `_defaultAppearanceData`-property as well.
2021-01-22 13:23:04 +01:00
Jonas Jenwald
8137c0547d Fix the gStateObj lookup in TranslatedFont._removeType3ColorOperators (PR 12718 follow-up)
As can be seen in 2cba290361/src/core/evaluator.js (L986) the `gStateObj` (which is actually an Array despite its name), is wrapped in Array when it's inserted into the OperatorList. Hence we obviously need to take this into account when accessing it in `TranslatedFont._removeType3ColorOperators`; this mistake happened because we don't have any test-cases for this particular code-path as far as I know.
2021-01-22 12:27:38 +01:00
Jonas Jenwald
cfaf23dee2 Simplify the PDFFunctionFactory._localFunctionCache initialization (PR 12034 follow-up)
By changing this a `shadow`ed getter, we can simply access it directly and not worry about it being initialized. I have no idea why I didn't just implement it this way in the first place.
2021-01-22 12:25:05 +01:00
Brendan Dahl
2cba290361
Merge pull request #12836 from calixteman/update_buttons
JS -- update radio/checkbox values even if there are no actions
2021-01-21 14:00:26 -08:00
calixteman
1039698697
Add a parser to get font data from the default appearance (#12831)
* Add a parser to get font data from the default appearance
 - pdfium & poppler use a special parser too to get these info.

* Update src/core/default_appearance.js

Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>

Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
2021-01-21 20:15:31 +01:00
Jonas Jenwald
b4eb55250e Remove redundant compatibility checks, for modern generic builds, in src/core/worker.js
With the recent additions of optional chaining and nullish coalescing to the PDF.js code-base, a couple of the checks in `src/core/worker.js` are now redundant; please see this compatibility information:
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Optional_chaining#browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Nullish_coalescing_operator#browser_compatibility

In practice, for the non-translated/non-polyfilled PDF.js builds, browsers without support for optional chaining and nullish coalescing will simply throw immediately upon loading of the code.
Hence both the `globalThis` and `Promise.allSettled` checks are now unnecessary, given this compatibility information:
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/globalThis#browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/allSettled#browser_compatibility

*Please note:* The `ReadableStream` check is however still necessary, since Node.js doesn't support that.
2021-01-20 13:09:56 +01:00
Jonas Jenwald
2600e59acb Always re-measure non-embedded ArialNarrow fonts (bug 1671312, PR 12725 follow-up)
While PR 12725 fixed bug 1671312 as reported, i.e. the "In the upper right corner "Purposes' has bad kerning."-part, it however broke other parts of the text rendering.
Note in particular the tables, e.g. on page 2 and beyond, where the glyphs are now rendered too close together. The reason for this is that the fonts in question are non-embedded ArialNarrow, which we just replace with Helvetica which obviously is not narrow. Given that the font replacement isn't a perfect fit for non-embedded ArialNarrow, we still need to re-measure the glyph widths in this case.
2021-01-14 15:51:48 +01:00
calixteman
1de1ae0be6
Merge pull request #12838 from calixteman/authors
[api-minor] Change the "dc:creator" Metadata field to an Array
2021-01-12 02:44:58 -08:00
Calixte Denizet
43d5512f5c [api-minor] Change the "dc:creator" Metadata field to an Array
- add scripting support for doc.info.authors
 - doc.info.metadata is the raw string with xml code
2021-01-11 21:34:07 +01:00
Tim van der Meij
f85b8721d1
Merge pull request #12842 from Snuffleupagus/issue-12841
Improve handling of JPEG images without an EOI marker (issue 12841)
2021-01-10 13:21:28 +01:00
Jonas Jenwald
81525fd446 Use ESLint to ensure that exports are sorted alphabetically
There's built-in ESLint rule, see `sort-imports`, to ensure that all `import`-statements are sorted alphabetically, since that often helps with readability.
Unfortunately there's no corresponding rule to sort `export`-statements alphabetically, however there's an ESLint plugin which does this; please see https://www.npmjs.com/package/eslint-plugin-sort-exports

The only downside here is that it's not automatically fixable, but the re-ordering is a one-time "cost" and the plugin will help maintain a *consistent* ordering of `export`-statements in the future.
*Note:* To reduce the possibility of introducing any errors here, the re-ordering was done by simply selecting the relevant lines and then using the built-in sort-functionality of my editor.
2021-01-09 20:37:51 +01:00
Jonas Jenwald
cd9422a075 Improve handling of JPEG images without an EOI marker (issue 12841)
Given that the PDF document in the issue contains the same very large JPEG image *three* times, this patch includes a test-case where only the first page has been extracted from it.
2021-01-09 20:19:39 +01:00
Calixte Denizet
7172f0a928 JS -- update radio/checkbox values even if there are no actions 2021-01-08 16:43:16 +01:00
Calixte Denizet
83119b9000 In a text widget, Font resources can be in the appearance 2021-01-08 10:13:47 +01:00
Tim van der Meij
048081fb69
Merge pull request #12824 from Snuffleupagus/preEvaluateFont-errors
Improve the handling of errors, in `PartialEvaluator.loadFont`, occuring in `PartialEvaluator.preEvaluateFont` (issue 12823)
2021-01-07 23:15:41 +01:00
Tim van der Meij
5bde4b71f8
Merge pull request #12292 from calixteman/encoding
Fix encoding issues when printing/saving a form with non-ascii characters
2021-01-07 22:56:42 +01:00
Jonas Jenwald
78c32c2697 Improve the handling of errors, in PartialEvaluator.loadFont, occuring in PartialEvaluator.preEvaluateFont (issue 12823)
Currently any errors thrown in `preEvaluateFont`, which is a *synchronous* method, will not be handled at all in the `loadFont` method and we were thus failing to return an `ErrorFont`-instance as intended here.

Also, add an *explicit* check in `PartialEvaluator.preEvaluateFont` to ensure that Type0-fonts always have a *valid* dictionary.
2021-01-07 11:38:38 +01:00
Calixte Denizet
56424967f2 Fix encoding issues when printing/saving a form with non-ascii characters 2021-01-05 17:23:18 +01:00
Tim van der Meij
ca18af6af3
Merge pull request #12774 from calixteman/doc_action_test
JS -- Add tests for print/save actions
2021-01-03 18:46:37 +01:00
Tim van der Meij
50303fc8f4
Merge pull request #12766 from Snuffleupagus/issue-11004
Ignore, rather than throwing on, unsupported Coding style default (COD) options in JPEG 2000 images (issue 11004)
2020-12-28 20:26:10 +01:00
Calixte Denizet
ffd4bc790c JS -- Add tests for print/save actions
* change PDFDocument::hasJSActions to return true when there are JS actions in catalog.
2020-12-24 18:51:00 +01:00
Calixte Denizet
7c3facb174 JS -- Add support for buttons
* radio buttons
 * checkboxes
2020-12-22 16:41:51 +01:00
Jonas Jenwald
cffb7af3b0 Ignore, rather than throwing on, unsupported Coding style default (COD) options in JPEG 2000 images (issue 11004)
Similar to other markers that we currently skip, by ignoring unsupported Coding style default (COD) options we'll at least render *something* here (although some JPEG 2000 images may look slightly wrong).
Note that if the unsupported COD options lead to additional errors, during parsing, we'll still abort parsing of the JPEG 2000 image.
2020-12-21 20:35:52 +01:00
Brendan Dahl
3ea1c43b15
Merge pull request #12751 from calixteman/da_not_a_string
Add a default DA for textfield to avoid issues when printing or saving
2020-12-21 09:44:08 -08:00
Calixte Denizet
a7c682c600 Add a default DA for textfield to avoid issues when printing or saving
* it aims to fix issue #12750
2020-12-19 23:38:45 +01:00
calixteman
e6e2809825
Merge pull request #12702 from calixteman/doc_actions
JS - Collect and execute actions at doc level
2020-12-18 21:33:32 +01:00
Calixte Denizet
1e2173f038 JS - Collect and execute actions at doc and pages level
* the goal is to execute actions like Open or OpenAction
 * can be tested with issue6106.pdf (auto-print)
 * once #12701 is merged, we can add page actions
2020-12-18 20:03:59 +01:00
Jonas Jenwald
48a76aea2b Ignore, rather than throwing on, Coding style component (COC) markers in JPEG 2000 images (issue 12752)
Similar to other markers that we currently skip, by ignoring the Coding style component (COC) marker we'll at least prevent outright errors (although some JPEG 2000 images may look slightly wrong).
2020-12-18 18:18:32 +01:00
Calixte Denizet
03814bd6a2 Don't use 'in' operator to check if key is in a Map 2020-12-16 16:00:12 +01:00
Tim van der Meij
d1848f5022
Merge pull request #12725 from brendandahl/remeasure-std
Use widths defined by font for standard fonts.
2020-12-11 20:36:19 +01:00
Jonas Jenwald
67e5db75d8 Ignore color-operators in Type3 glyphs beginning with a d1 operator (issue 12705)
Please refer to the PDF specification at https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1977497 and https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.3998470

This patch removes the color-operators in the evaluator, since that should be more efficient than doing it repeatedly in the main-thread when rendering the Type3 glyphs.
2020-12-11 15:49:13 +01:00
Brendan Dahl
45d9ab6e45 Use widths defined by font for standard fonts.
There doesn't seem to be anything definitive about this in
the spec, but from experimenting, it seems acrobat lets
PDFs override the widths of the standard fonts.
2020-12-10 15:30:39 -08:00
Tim van der Meij
00b4f86db3
Merge pull request #12717 from Snuffleupagus/issue-12714
Ensure that the /Annots-entry, on /Page-instances, is actually an Array (issue 12714)
2020-12-10 23:06:59 +01:00
Calixte Denizet
25bf504ff5 Be sure that CalculationOrder is either null or a non-empty array 2020-12-10 16:02:11 +01:00
Jonas Jenwald
796a0d3155 Ensure that the /Annots-entry, on /Page-instances, is actually an Array (issue 12714)
In the referenced PDF document, the second and third page has *corrupt* /Annots-entries which contain /Dict-data rather than the intended Arrays.
2020-12-10 11:42:00 +01:00
Tim van der Meij
012e15f7a3
Fix non-standard quadpoints orders for annotations
This change requires us to use valid quadpoints arrays in the existing
unit tests too due to the normalization.
2020-12-06 16:02:41 +01:00
Jonas Jenwald
c42029489e Run gulp lint --fix, to account for changes in Prettier version 2.2.1
Please refer to https://github.com/prettier/prettier/blob/master/CHANGELOG.md#221 for additional details.
2020-11-29 10:01:46 +01:00
Tim van der Meij
256068556d
Merge pull request #12662 from Snuffleupagus/issue-12402
Check the top-level /Pages dictionary when finding the trailer in `XRef.indexObjects` (issue 12402)
2020-11-25 21:54:41 +01:00
Jonas Jenwald
8a132f584d Check the top-level /Pages dictionary when finding the trailer in XRef.indexObjects (issue 12402)
In addition to the existing /Root and /Pages validation, also check that the /Pages-entry actually is a dictionary and that it has a valid /Count-entry.
This way we can avoid picking a trailer candidate which e.g. the `Catalog.numPages` getter will just end up rejecting, thus breaking PDF document loading completely.
2020-11-25 15:14:53 +01:00
Calixte Denizet
18b525de2e Parenthesis in names are not escaped when saving 2020-11-25 12:28:12 +01:00
Calixte Denizet
b11592a756 JS -- hidden annotations must be built in case a script show them
* in some pdf, there are actions with "event.source.hidden = ..."
 * in order to handle visibility when printing, annotationStorage is extended to store multiple properties (value, hidden, editable, ...)
2020-11-10 12:48:34 +01:00
Calixte Denizet
a5279897a7 JS -- Add listener for sandbox events only if there are some actions
* When no actions then set it to null instead of empty object
* Even if a field has no actions, it needs to listen to events from the sandbox in order to be updated if an action changes something in it.
2020-11-09 18:37:59 +01:00
Jonas Jenwald
a03b383edb Fail early, in modern GENERIC builds, if globalThis isn't available (PR 11799 follow-up, issue 12596)
It probably doesn't hurt to explicitly check for `globalThis` as well, in addition to the existing checks.
2020-11-07 19:00:33 +01:00
Tim van der Meij
99ac2d1036
Merge pull request #12583 from Snuffleupagus/nonBlendModesSet
Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in `PartialEvaluator.hasBlendModes`
2020-11-05 23:53:39 +01:00
Tim van der Meij
646f895d35
Merge pull request #12568 from calixteman/defaultvalue
[api-minor] JS -- Add default value in annotation data
2020-11-05 22:53:21 +01:00
Jonas Jenwald
082cd8fc6c Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in PartialEvaluator.hasBlendModes
The `PartialEvaluator.hasBlendModes` method is necessary to determine if there's any blend modes on a page, which unfortunately requires *synchronous* parsing of the /Resources of each page before its rendering can start (see the "StartRenderPage"-message).
In practice it's not uncommon for certain /Resources-entries to be found on more than one page (referenced via the XRef-table), which thus leads to unnecessary re-fetching/re-parsing of data in `PartialEvaluator.hasBlendModes`.

To improve performance, especially in pathological cases, we can cache /Resources-entries when it's absolutely clear that they do not contain *any* blend modes at all[1]. This way, subsequent `PartialEvaluator.hasBlendModes` calls can be made significantly more efficient.

This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf:
```
[
    {  "id": "issue6961",
       "file": "../web/pdfs/issue6961.pdf",
       "md5": "a80e4357a8fda758d96c2c76f2980b03",
       "rounds": 100,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat         | Count | Baseline(ms) | Current(ms) |  +/- |     %  | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | 0    | Overall      |   100 |         1034 |         555 | -480 | -46.39 |        faster
firefox | 0    | Page Request |   100 |          489 |           7 | -482 | -98.67 |        faster
firefox | 0    | Rendering    |   100 |          545 |         548 |    2 |   0.45 |
firefox | 1    | Overall      |   100 |          912 |         428 | -484 | -53.06 |        faster
firefox | 1    | Page Request |   100 |          487 |           1 | -486 | -99.77 |        faster
firefox | 1    | Rendering    |   100 |          425 |         427 |    2 |   0.51 |
```

---
[1] In the case where blend modes *are* found, it becomes a lot more difficult to know if it's generally safe to skip /Resources-entries. Hence we don't cache anything in that case, however note that most document/pages do not utilize blend modes anyway.
2020-11-05 16:59:08 +01:00
Calixte Denizet
39f5954729 JS -- Add default value in annotation data
* these values are used when a form is resetted
2020-11-05 13:44:23 +01:00
Brendan Dahl
1de2bc4816
Merge pull request #12505 from calixteman/12504
Split highlight annotation div into multiple divs
2020-11-04 10:41:28 -08:00
Tim van der Meij
3e52098e29
Merge pull request #12555 from calixteman/color
Replace css color rgb(...) by #...
2020-11-02 23:55:39 +01:00
Calixte Denizet
9d11b51a3e Replace css color rgb(...) by #...
* it's faster to generate the color code in using a table for components
* it's very likely a way faster to parse (when setting the color in the canvas)
2020-11-02 10:25:04 +01:00
Tim van der Meij
46e60a266c
Merge pull request #12552 from Snuffleupagus/annotation-fixes
Miscellaneous (small) improvements in `src/core/annotation.js`
2020-10-31 00:41:39 +01:00
Tim van der Meij
e341e6e542
Merge pull request #12525 from brendandahl/mark-info
[api-minor] Implement API to get MarkInfo from the catalog.
2020-10-31 00:05:19 +01:00
Brendan Dahl
f5c821e9c3 [api-minor] Implement API to get MarkInfo from the catalog. 2020-10-30 10:59:45 -07:00
Jonas Jenwald
fdb6520012 Change the Catalog.openAction getter back to using an Object internally (PR 12543 follow-up)
Given that the `Map`-pattern apparently has undesirable performance characteristics, change this getter back to using an Object instead and check its size before returning it.
2020-10-30 13:27:05 +01:00
Jonas Jenwald
a1e5581a0b Let Annotation._collectActions return null when no actions are present
Rather than returning an *empty* Object[1] we should be returning `null` instead, since that's consistent with existing API-functionality.
To avoid having to *manually* track if the Object is empty, this patch also introduces a small helper function to check its size.
2020-10-30 13:23:05 +01:00
Jonas Jenwald
8540b4cc76 Stop calling Font.charsToGlyphs, in src/core/annotation.js, with unused arguments
As can be seen in `src/core/fonts.js`, this method only accepts *one* parameter, hence it's somewhat difficult to understand what the Annotation-code is actually attempting to do here.
The only possible explanation that I can imagine, is that the intention was initially to call `Font.charToGlyph` *directly* instead. However, note that that'd would not actually have been correct, since that'd ignore one level of font-caching (see `this.charsCache`). Hence the unused arguments are removed, in `src/core/annotation.js`, and the `Font.charToGlyph` method is now marked as "private" as intended.
2020-10-30 13:17:52 +01:00
Jonas Jenwald
46e94cad17 Fix *some* errors reported by the ESLint no-useless-escape rule
This patch removes unnecessary escape-sequence in (mostly) strings, as a first step, since the ones in regular expressions probably requires more careful testing (just in case).
The only exception is a regular expression in `src/core/annotation.js`, since we should have both unit- and reference-tests for this code *and* given [this information on MDN](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Character_Classes#Types):
 > Inside a character set, the dot loses its special meaning and matches a literal dot.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-useless-escape
2020-10-29 15:40:40 +01:00
Jonas Jenwald
9fc7cdcc9d Use a Map, rather than an Object, internally in the Catalog.openAction getter (PR 11644 follow-up)
This provides a work-around to avoid having to conditionally try to initialize the `openAction`-object in multiple places.
Given that `Object.fromEntries` doesn't seem to *guarantee* that a `null` prototype is used, we thus hack around that by using `Object.assign` with `Object.create(null)`.
2020-10-28 14:43:28 +01:00
Tim van der Meij
ea4d88a330
Merge pull request #12395 from calixteman/checks
Render not displayed annotations in using normal appearance when printing
2020-10-28 00:11:10 +01:00
Calixte Denizet
6be2f84b4e Render not displayed annotations in using normal appearance when printing 2020-10-27 19:00:31 +01:00
Tim van der Meij
71a14be8e7
Merge pull request #12534 from Snuffleupagus/murmurhash-slice
Ensure that `MurmurHash3_64.update` handles `ArrayBuffer` input correctly, to avoid hash-collisions (issue 12533)
2020-10-26 23:34:03 +01:00
Jonas Jenwald
f2fa053c51 Ensure that MurmurHash3_64.update handles ArrayBuffer input correctly, to avoid hash-collisions (issue 12533)
Different fonts incorrectly end up with *identical* hashes, despite having different /ToUnicode data.
The issue, and it's very interesting that we've apparently not seen it before, appears to be caused by the fact that different /ToUnicode entries share the *same* underlying `ArrayBuffer`, which thus becomes problematic at the `const dataUint32 = new Uint32Array(data.buffer, 0, blockCounts);` line. The simplest solution thus seem to be to just *copy* the input, when it's an `ArrayBuffer`, rather than using it as-is. (Note that if we'd stringified the input, when calling `MurmurHash3_64.update`, the issue would also have been fixed. In this case, we're already creating an unique TypedArray.)
2020-10-26 16:27:33 +01:00
Jonas Jenwald
56fa6d414c Add a getArrayLookupTableFactory helper function and use it to re-format src/core/{glyphlist, unicode}.js
*Please note:* Once https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 is implemented, and we've removed SystemJS completely, this entire patch can (and even should) be reverted.

This is similar to the existing `getLookupTableFactory` helper function, but is implemented as outlined in issue 6774.
The re-formatting of the tables were done automatically, by using find-and-replace with regular expressions.

For reasons that I don't even pretend to understand, using this particular structure for these *very* long lookup tables allow SystemJS to process the files correctly/quickly and the development viewer thus works as intended.
2020-10-26 11:08:00 +01:00
Jonas Jenwald
441d9c8cc0 Change src/core/{glyphlist, unicode}.js to use standard import/export statements
While the *built* `pdf.worker.js` file still works correctly with these changes, despite these two files being excluded by Babel[1], the development viewer does not because of issues with SystemJS[2] and/or its Babel-plugin (both of which are old).
Furthermore, note also that excluding these two files from Babel-processing isn't *generally* necessary since e.g. the `gulp mozcentral` command works anyway. The explanation is rather that it's actually the source-map generation which fails for these huge sequences when building the `pdf.worker.js` file.

However, not using standard `import`/`export` statements in all files means we also need to use SystemJS when e.e. running the unit-tests. This is very unfortunate, since SystemJS (or its old Babel-version) doesn't support modern ECMAScript features such as e.g. optional chaining and nullish coalescing.

Unfortunately it also seems that https://bugzilla.mozilla.org/show_bug.cgi?id=1247687, which tracks the implementation of worker-modules in Firefox, has stalled since there hasn't been any updates for six months now.

To hopefully address all of the above, this patch is the first in a series that attempts to further reduce our reliance on SystemJS.

---
[1] The only difference being how the dependencies are handled, in the Webpack-bundled file.

[2] Parsing takes way too long and consumes too much memory, thus rendering the development viewer essentially unusable.
2020-10-26 11:08:00 +01:00
Tim van der Meij
b4ca3d55b8
Merge pull request #12508 from calixteman/button_fallback_font
Fallback font for buttons must be ZapfDingbats.
2020-10-24 18:56:12 +02:00
Tim van der Meij
180f35ee91
Merge pull request #12526 from Snuffleupagus/TilingPattern-args
Improve argument/name handling when parsing TilingPatterns (PR 12458 follow-up)
2020-10-24 15:47:57 +02:00
Tim van der Meij
c493dc96fa
Merge pull request #12516 from Snuffleupagus/fieldObjects-annotation-undefined
Prevent issues, in `PDFDocument.fieldObjects`, for invalid Annotations
2020-10-24 15:42:33 +02:00
Jonas Jenwald
b478d3e7b9 Improve argument/name handling when parsing TilingPatterns (PR 12458 follow-up)
- Handle the arguments correctly in `PartialEvaluator.handleColorN`.
   For TilingPatterns with a base-ColorSpace, we're currently using the `args` when computing the color. However, as can be seen we're passing the Array as-is to the `ColorSpace.getRgb` method, which means that the `Name` is included as well.[1]
   Thankfully this hasn't, as far as I know, caused any actual bugs, but that may be more luck than anything else given how the `ColorSpace` code is implemented. This can be easily fixed though, simply by popping the `Name`-object off of the `args` Array.

 - Cache TilingPatterns using the `Name`-string, rather than the object directly.
   This is not only consistent with other caches in `PartialEvaluator`, but importantly it also ensures that the cache lookup always works correctly. Note that since `Name`-objects, similar to other primitives, uses a cache themselves a *manually* triggered `cleanup`-call could thus (theoretically) cause the `LocalTilingPatternCache` to not find an existing entry. While the likelihood of this happening is *extremely* small, it's still something that we should fix.

---
[1] The `args` Array can e.g. look like this: `[0.043, 0.09, 0.188, 0.004, /P1]`, which means that we're passing in the `Name`-object to the `ColorSpace` method.
2020-10-24 13:49:46 +02:00
Calixte Denizet
37c86b2daa Fallback font for buttons must be ZapfDingbats.
Fix bug https://bugzilla.mozilla.org/show_bug.cgi?id=1669099.
2020-10-24 12:00:03 +02:00
Calixte Denizet
85e6c67cf3 Split highlight annotation div into multiple divs
Fix for issue #12504.
Highlight annotation may have several rectangles so we must have several divs to add mouse events handlers.
2020-10-23 15:26:16 +02:00
Jonas Jenwald
b44a975d7c Prevent issues, in PDFDocument.fieldObjects, for invalid Annotations
For an invalid Annotation, there's one code-path where `undefined` is returned from `AnnotationFactory._create`. That'd currently, incorrectly, trigger an error during the `PDFDocument._collectFieldObjects` parsing which thus seem good to avoid.
Along these lines, the filtering in `PDFDocument.fieldObjects` is also updated to handle both `null` and `undefined` the same way.
2020-10-22 13:24:43 +02:00
Calixte Denizet
d2ef878702 Invalidate an annotation with no quadPoints (when it's required)
Some pdf softwares don't remove highlight annotations but make the QuadPoints array empty.
And the Rect for the annotation can be [-32768, -32768, 32768, 32768] so it leads to have a giant div which catches all the mouse events and make the pdf unusable when there are some forms elements.
2020-10-21 13:53:19 +02:00
Calixte Denizet
c30a3a94f0 JS - Add a function in api to get the fields ids in AcroForm::CO 2020-10-17 12:56:40 +02:00
Tim van der Meij
ff2631493e
Merge pull request #12481 from calixteman/issue_12475
Get urls if any in AA::D dictionary for pushbuttons
2020-10-16 22:55:43 +02:00
Tim van der Meij
32bceae732
Merge pull request #12483 from Snuffleupagus/formInfo-hasFields
Don't store complex data in `PDFDocument.formInfo`, and replace the `fields` object with a `hasFields` boolean instead
2020-10-16 22:40:40 +02:00
Jonas Jenwald
f956d0a96a Stop caching the *parsed* Font data on its Dict object (PR 7347 follow-up)
Given that *all* fonts are, ever since PR 7347, now cached in the "normal" `fontCache` there's actually no reason for the special `font.translated` construction. (Given how Objects in JavaScript are references, rather than raw values, the old code shouldn't have caused any significant memory overhead.)

Instead we can simply store the `cacheKey`, which is a simple string, on only the Font `Dict`s where it's needed and thus look-up all fonts using the `fontCache` instead.
2020-10-16 17:45:01 +02:00
Jonas Jenwald
29af15f37e Add more validation in the PDFDocument._hasOnlyDocumentSignatures method
If this method is ever passed invalid/unexpected data, or if during the course of parsing (since it's used recursively) such data is found, it will fail in a non-graceful way.
Hence this patch, which ensures that we don't attempt to access non-existent properties and also that errors such as the one fixed in PR 12479 wouldn't have occured.
2020-10-16 13:03:47 +02:00
Jonas Jenwald
3351d3476d Don't store complex data in PDFDocument.formInfo, and replace the fields object with a hasFields boolean instead
*This patch is based on a couple of smaller things that I noticed when working on PR 12479.*

 - Don't store the /Fields on the `formInfo` getter, since that feels like overloading it with unintended (and too complex) data, and utilize a `hasFields` boolean instead.
   This functionality was originally added in PR 12271, to help determine what kind of form data a PDF document contains, and I think that we should ensure that the return value of `formInfo` only consists of "simple" data.
   With these changes the `fieldObjects` getter instead has to look-up the /Fields manually, however that shouldn't be a problem since the access is guarded by a `formInfo.hasFields` check which ensures that the data both exists and is valid. Furthermore, most documents doesn't even have any /AcroForm data anyway.

 - Determine the `hasFields` property *first*, to ensure that it's always correct even if there's errors when checking e.g. the /XFA or /SigFlags entires, since the `fieldObjects` getter depends on it.

 - Simplify a loop in `fieldObjects`, since the object being accessed is a `Map` and those have built-in iteration support.

 - Use a higher logging level for errors in the `formInfo` getter, and include the actual error message, since that'd have helped with fixing PR 12479 a lot quicker.

 - Update the JSDoc comment in `src/display/api.js` to list the return values correctly, and also slightly extend/improve the description.
2020-10-16 12:47:27 +02:00
Calixte Denizet
ce3d3a6ff8 Get urls if any in AA::D dictionary for pushbuttons 2020-10-15 19:42:36 +02:00
Jonas Jenwald
bc6b47a50e Convert PartialEvaluator.translateFont to an async method
This allows us to make a slight simplification in `PartialEvaluator.loadFont`, which thus removes an old TODO-comment from the method.
Furthermore, in `PartialEvaluator.translateFont`, the CMap-handling is now limited to only *composite* fonts to avoid having to wait for a "dummy"-Promise for most fonts.
2020-10-15 09:42:58 +02:00
Tim van der Meij
a373137304
Merge pull request #12429 from calixteman/collect_js
[api-minor] Add the possibility to collect Javascript actions
2020-10-14 23:27:47 +02:00
Calixte Denizet
71ecc3129b Add the possibility to collect Javascript actions 2020-10-14 10:44:16 +02:00
Tim van der Meij
1034769ca1
Merge pull request #12477 from Snuffleupagus/SaveDocument-WorkerTask
Handle `WorkerTask`s, and various PDF document properties, correctly in the "SaveDocument" handler in `src/core/worker.js`
2020-10-13 21:11:54 +02:00
Jonas Jenwald
65132ba5d8 Handle WorkerTasks, and various PDF document properties, correctly in the "SaveDocument" handler in src/core/worker.js
- Actually register/unregister the `WorkerTask`s, used when saving each page, correctly.
   To prevent issues when terminating the Worker, we purposely wait for all running `WorkerTask`s to complete first. Hence we need to actually handle `WorkerTask`s the same way in "SaveDocument" as in the rest of this file, see e.g. "GetOperatorList" and "GetTextContent".

 - Access `PDFDocument` properties in a generally safe/consistent way.
   While the current code works fine, given how the PDF document is being loaded, it still seems like a very good idea to be *consistent* in how we access these kind of properties (since in general you need to avoid `MissingDataException` everywhere in this file).

 - Change a variable name, since there's essentially no precedent in the code-base for *local* variable names to start with an underscore.
2020-10-13 19:30:43 +02:00
Jonas Jenwald
38629c345d Remove the scope parameter from the "GetOperatorList" handler in src/core/worker.js (PR 11110 follow-up)
Support for the `scope` parameter, in `MessageHandler.on`, was removed in PR 11110 however this particular case was unused/unnecessary for years prior to that change. (From a quick look through the history, I'm not even sure if it was actually needed in the first place.)
2020-10-13 15:58:38 +02:00
Jonas Jenwald
30e8d5dea1 Add local caching of TilingPatterns in PartialEvaluator.getOperatorList (issue 2765 and 8473)
In practice it's not uncommon for PDF documents to re-use the same TilingPatterns more than once, and parsing them is essentially equal to parsing of a (small) page since a `getOperatorList` call is required.

By caching the internal TilingPattern representation we can thus avoid having to re-parse the same data over and over, and there's also *less* asynchronous parsing required for repeated TilingPatterns.

Initially I had intended to include (standard) benchmark results with this patch, however it's not entirely clear that this is actually necessary here given the preliminary results.
When testing this manually in the development viewer, using `pdfBug=Stats`, the following (approximate) reduction in rendering times were observed when comparing `master` against this patch:
 - http://pubs.usgs.gov/sim/3067/pdf/sim3067sheet-2.pdf (from issue 2765): `6800 ms` -> `4100 ms`.
 - https://github.com/mozilla/pdf.js/files/1046131/stepped.pdf (from issue 8473): `54000 ms` -> `13000 ms`
 - https://github.com/mozilla/pdf.js/files/1046130/proof.pdf (from issue 8473): `5900 ms` -> `2500 ms`

As always, whenever you're dealing with documents which are "slow", there's usually a certain level of subjectivity involved with regards to what's deemed acceptable performance.
Hence it's not clear to me that we want to regard any of the referenced issues as fixed, however the improvements are significant enough to warrant caching of TilingPatterns in my opinion.
2020-10-08 18:43:21 +02:00
Jani Pehkonen
935568c2f1 Fix invalid XUID entries in CFF fonts
In CFF fonts, entry `XUID` should be an array that has no more than
16 elements. In the issue, the length is 20, which causes the fonts to fail.
See Appendix B, "Implementation Limits" in PostScript Language Reference Manual
https://web.archive.org/web/20170218093716/https://www.adobe.com/products/postscript/pdfs/PLRM.pdf
Actually entries `XUID` and `UniqueID` are obsolete altogether.
https://blogs.adobe.com/CCJKType/2016/06/no-more-xuid-arrays.html
2020-10-05 17:38:01 +03:00
Jonas Jenwald
9416b14e8b Re-factor how the ESLint no-var rule is enabled in the src/ folder
This simplifies/consolidates the ESLint configuration slightly in the `src/` folder, and prevents the addition of any new files where `var` is being used.[1]
Hence we no longer need to manually add `/* eslint no-var: error */` in files, which is easy to forget, and can instead disable the rule in the `src/core/` files where `var` is still in use.

---
[1] Obviously the `no-var` rule can, in the same way as every other rule, be disabled on a case-by-case basis where actually necessary.
2020-10-03 20:15:29 +02:00
Tim van der Meij
6ff1fe4ea9
Merge pull request #12333 from calixteman/tooltip
Add tooltip if any in annotations layer
2020-10-03 19:50:39 +02:00
calixteman
20b12d2bda Add tooltip if any in annotations layer 2020-10-02 10:11:18 +02:00
Jonas Jenwald
bd3b15b897 Use the cidToGidMap, if it exists, when building the glyph mapping for non-embedded composite fonts (issue 12418) 2020-09-28 14:40:43 +02:00
Calixte Denizet
5af352e65a Need to reset the streams when printing 2020-09-24 19:13:09 +02:00
Jonas Jenwald
2497e8eab9 Prevent errors if the InkList property, in InkAnnotations, is missing and/or not an Array (issue 12392)
To prevent a future bug, the `Vertices` property in PolylineAnnotations are handled the same way.
2020-09-19 15:34:32 +02:00
Calixte Denizet
d51e7e86ff Use the same kind of strings for radio values 2020-09-16 18:47:25 +02:00
Tim van der Meij
374aad77c4
Merge pull request #12375 from Snuffleupagus/emptyDict-set
Ensure that the empty dictionary won't be accidentally modified, and slightly improve the "SaveDocument" handler in `src/core/worker.js`
2020-09-15 23:04:57 +02:00
Calixte Denizet
16dd5403c7 Set parent of radio annotation even if there is no 'V' field 2020-09-15 14:41:57 +02:00
Jonas Jenwald
ed4e7cd8a4 A couple of small improvements in the "SaveDocument" handler in src/core/worker.js
- Check that the "Info"-entry, in the XRef-trailer, is actually a dictionary before accessing it. This is similar to the `PDFDocument.documentInfo` method and follows the general principal of validating data carefully before accessing it, given how often PDF-software may create corrupt PDF files.

 - Slightly simplify the "XFA"-lookup, since there's no point in trying to fetch something from the empty dictionary.
2020-09-15 09:57:40 +02:00
Jonas Jenwald
a531c98cd2 Ensure that the empty dictionary won't be accidentally modified
Currently there's nothing that prevents modification of the `Dict.empty` primitive, which obviously needs to be *truly* empty to prevent any future (hard to find) bugs.
2020-09-15 09:29:00 +02:00
Tim van der Meij
b0c7a74a0c
Merge pull request #12361 from Snuffleupagus/_getSaveFieldResources
Ensure that all necessary /Font resources are included when saving a `WidgetAnnotation`-instance (issue 12294)
2020-09-15 00:09:31 +02:00
Tim van der Meij
3ecd984758
Implement resetting of created streams for annotations 2020-09-14 23:08:50 +02:00
Jonas Jenwald
c992b8e460 Ensure that all necessary /Font resources are included when saving a WidgetAnnotation-instance (issue 12294)
This patch contains a possible approach for fixing issue 12294, which compared to other PRs is purposely limited to the affected `WidgetAnnotation` code.

As mentioned elsewhere, considering that we're (at least for now) trying to fix *one specific* case, I think that we should avoid modifying the `Dict` primitive[1] and/or avoid a solution that (indirectly) modifies an existing `Dict`-instance[2].
This patch simply fixes the issue at hand, since that seems easiest for now, and I'd suggest that we worry about a more general approach if/when that actually becomes necessary.

Hence the solution implemented here, for `WidgetAnnotation`, is to simply use a combination of the local *and* AcroForm /DR resources during OperatorList-parsing to ensure that things work correctly regardless of where a particular /Font resource is found.
For saving of form-data, on the other hand, we want to avoid increasing the file-size unnecessarily and need to be smarter than just merging all of the available resources. To achive this, a new `WidgetAnnotation._getSaveFieldResources` method will when necessary produce a combined resources `Dict` with only the minimum amount of data from the AcroForm /DR resources included.

---
[1] You want to avoid anything that could cause the general `Dict` implementation to become slower, or more complex, just for handling an edge-case in my opinion.

[2] If an existing `Dict`-instance is modified unexpectedly, that could very easily lead to problems elsewhere since e.g. `Dict`-instances created during parsing are not expected to be changed.
2020-09-14 15:22:40 +02:00
Tim van der Meij
dfebe7b907
Merge pull request #12365 from Snuffleupagus/forbid-DecodeStream.length
Ensure that the `length` property won't be *accidentally* accessed on a `DecodeStream`-instance
2020-09-11 22:18:30 +02:00
Jonas Jenwald
a11b7341a1 Ensure that the length property won't be *accidentally* accessed on a DecodeStream-instance
For these streams, compared to `Stream` and `ChunkedStream`, there's no well defined concept of length and consequently no `length` getter.[1] However, attempting to access the non-existent `length` won't currently error, but just return `undefined`, which could thus easily lead to bugs elsewhere in the code-base.

---
[1] However, note that *all* stream implementations have an `isEmpty` getter which can be used instead.
2020-09-11 13:25:40 +02:00
Calixte Denizet
fc154590e8 Dict keys need to be escaped too when saving 2020-09-11 12:25:05 +02:00
Calixte Denizet
dc4eb71ff1 PDF names need to be escaped when saving 2020-09-10 16:08:13 +02:00
Calixte Denizet
64a6efd95e Follow-up of pr #12344 2020-09-09 11:46:02 +02:00
Brendan Dahl
e51e9d1f33
Merge pull request #12345 from calixteman/save_btn
Don't try to save something for a button which is neither a checkbox nor a radio
2020-09-08 15:44:04 -07:00
calixteman
68b99c59ee
Save form data in XFA datasets when pdf is a mix of acroforms and xfa (#12344)
* Move display/xml_parser.js in shared to use it in worker

* Save form data in XFA datasets when pdf is a mix of acroforms and xfa

Co-authored-by: Brendan Dahl <brendan.dahl@gmail.com>
2020-09-08 15:13:52 -07:00
Calixte Denizet
7e5026dfc5 Don't try to save something for a button which is neither a checkbox nor a radio 2020-09-08 20:47:46 +02:00
Tim van der Meij
20c891542b
Merge pull request #12269 from calixteman/highlight
Add support for missing appearances for hightlights, strikeout, squiggly and underline annotations.
2020-09-06 22:25:36 +02:00
Calixte Denizet
65ecd981fe Add support for missing appearances for hightlights, strikeout, squiggly and underline annotations. 2020-09-06 15:40:15 +02:00
Jonas Jenwald
6c8f1f7d6f Run gulp lint --fix, to account for changes in Prettier version 2.1.x 2020-09-06 12:23:59 +02:00
Jonas Jenwald
784a420027 Add support, in Dict.merge, for merging of "sub"-dictionaries
This allows for merging of dictionaries one level deeper than previously. This could be useful e.g. for /Resources dictionaries, where you want to e.g. merge their respective /Font dictionaries (and other) together rather than picking just the first one.
2020-08-30 23:18:32 +02:00
Jonas Jenwald
2393443e73 Include the /Order array, if available, when parsing the Optional Content configuration
The `/Order` array is used to improve the display of Optional Content groups in PDF viewers, and it allows a PDF document to e.g. specify that Optional Content groups should be displayed as a (collapsable) tree-structure rather than as just a list.

Note that not all available Optional Content groups must be present in the `/Order` array, and PDF viewers will often (by default) hide those toggles in the UI.
To allow us to improve the UX around toggling of Optional Content groups, in the default viewer, these hidden-by-default groups are thus appended to the parsed `/Order` array under a *custom* nesting level (with `name == null`).

Finally, the patch also slightly tweaks an `OptionalContentConfig` related JSDoc-comment in the API.
2020-08-30 16:28:40 +02:00
Tim van der Meij
06b53d770a
Merge pull request #12259 from brendandahl/cmap-fix
Fix handling of symbolic fonts and unicode cmaps.
2020-08-30 16:01:24 +02:00
Brendan Dahl
45e8a31cc0 Fix handling of symbolic fonts and unicode cmaps.
In issue 12120, the font has a 1,0 cmap and is marked symbolic which
according to the spec means we should directly use the cmap instead of
the extra steps that are defined in 9.6.6.4.

However, just fixing that caused bug 1057544 to break. The font in bug
1057544 has a 0,1 cmap (Unicode 1.1) which we were not using, but is
easy to support. We're also easily able to support some of the other
unicode cmaps, so I added those as well.

There was also a second issue with bug 1057544, the cmap doesn't have
a mapping for the "quoteright" glyph, but it is defined in the post
table. To handle this, I've moved post table as a  fallback for any
font that has an encoding.
2020-08-27 14:33:11 -07:00
Calixte Denizet
ba94f04ba3 Bug 1661226 - Push button are not rendered with renderInteractiveForms enabled 2020-08-27 10:45:14 +02:00
Tim van der Meij
0f229d537f
Inline the setup method in the parse method in src/core/document.js
Now that the `parse` method is simplified we can inline the `setup`
method in the `parse` method since it's only two lines of code. This
avoids some indirection.
2020-08-25 23:28:55 +02:00
Tim van der Meij
280207c740
Redo the form type detection logic and include unit tests
Good form type detection is important to get reliable telemetry and to
only show the fallback bar if a form cannot be filled out by the user.

PDF.js only supports AcroForm data, so XFA data is explicitly unsupported
(tracked in issue #2373). However, the previous form type detection
couldn't separate AcroForm and XFA well enough, causing form type
telemetry to be incorrect sometimes and the fallback bar to be shown for
forms that could in fact be filled out by the user.

The solution in this commit is found by studying the specification and
the form documents that are available to us. In a nutshell the rules are:

- There is XFA data if the `XFA` entry is a non-empty array or stream.
- There is AcroForm data if the `Fields` entry is a non-empty array and
  it doesn't consist of only document signatures.

The document signatures part was not handled in the old code, causing a
document with only XFA data to also be marked as having AcroForm data.
Moreover, the old code didn't check all the data types.

Now that AcroForm and XFA can be distinguished, the viewer is configured
to only show the fallback bar for documents that only have XFA data. If
a document also has AcroForm data, the viewer can use that to render the
form. We have not found documents where the XFA data was necessary in
that case.

Finally, we include unit tests to ensure that all cases are covered and
move the form type detection out of the `parse` function so that it's
only executed if the document information is actually requested
(potentially making initial parsing a tiny bit faster).
2020-08-25 23:28:55 +02:00
Tim van der Meij
f0bf62ff54
Mark the catDict member as private in the Catalog class
Not only is `catDict` never accessed anymore outside of this file, it
should also never happen since it's internal to the catalog. If data
from it is needed elsewhere, the catalog should provide a getter for it
that can do basic data integrity checks and abstract away any
unnecessary details.
2020-08-25 23:28:55 +02:00
Tim van der Meij
f20f0bcc78
Move the AcroForm logic from the document to the catalog
The `AcroForm` entry is part of the catalog, not of the document, so its
logic should be placed there instead. The document should look in the
catalog to fetch it, and not have knowledge of `catDict`, which is a
member internal to the catalog.

Moreover, make the AcroForm member private on the document instance. It's
only used internally and was also never intended to be public. For users
it's exposed by the `getMetadata` API endpoint as `IsAcroFormPresent`.
Only a boolean is exposed, so we now also only store the boolean on the
document instance.

Finally, the annotation code needs access to the full AcroForm
dictionary, so it's updated to fetch the data from the catalog instead
of the document that now only holds the boolean.
2020-08-25 23:28:55 +02:00
Tim van der Meij
b41a2f4d5a
Move the collection logic from the document to the catalog
The `Collection` entry is part of the catalog, not of the document, so
its logic should be placed there instead. The document should look in the
catalog to fetch it, and not have knowledge of `catDict`, which is a
member internal to the catalog.

Moreover, remove the collection member from the document instance. It's
only used internally and was also never intended to be public. For users
it's exposed by the `getMetadata` API endpoint as `IsCollectionPresent`.
Moving this out of the `parse` function makes sure that the getter is
only executed if the document information is actually requested
(potentially making initial parsing a tiny bit faster).
2020-08-25 23:28:55 +02:00
Tim van der Meij
935d95b462
Move the version logic from the document to the catalog
The `Version` entry is part of the catalog, not of the document, so its
logic should be placed there instead. The document should look in the
catalog to fetch it, and not have knowledge of `catDict`, which is a
member internal to the catalog.

Moreover, make the version member private on the document instance. It's
only used internally and was also never intended to be public. For users
it's exposed by the `getMetadata` API endpoint as `PDFFormatVersion`.

Finally, clarify how the version from the header and the version from
the catalog are treated using a comment.
2020-08-25 23:28:55 +02:00
Jonas Jenwald
bd16c363ce Access the Catalog data correctly in the "GetPageIndex" handler in src/core/worker.js
Even though the code obviously works as-is, given that we have unit-tests for it, it still feels incorrect to just *assume* that the `Catalog`-instance has all of its properties immediately available. Especially when (almost) all of the other handlers, in `src/core/worker.js`, protect their data accesses with appropriate `pdfManager.ensure` calls.
2020-08-25 12:14:14 +02:00
Jonas Jenwald
2e6e2c3b41 Access the XRef data correctly in the "GetStats" handler in src/core/worker.js
Even though the code obviously works as-is, given that we have unit-tests for it, it still feels incorrect to just *assume* that the `XRef`-instance has all of its properties immediately available. Especially when (almost) all of the other handlers, in `src/core/worker.js`, protect their data accesses with appropriate `pdfManager.ensure` calls.
2020-08-25 12:14:11 +02:00
Jani Pehkonen
e7febbf0f7 Accent positioning in Type1 seac glyphs
In `display/canvas.js` the accent offsets must be multiplied by `fontSize` to make the offsets large enough. Another problem is in `core/type1_parser.js` when the Type1 command `seac` is handled. There is an error in the Adobe Type1 spec. See chapter 6 in Type1 Font Format Supplement, which provides an errata: The arguments of `seac` specify the offset of the left side bearing (LSB) points, not the offset of origins. This can be fixed in `core/type1_parser.js` by adding the difference of the LSB values.
2020-08-23 21:01:25 +03:00
Tim van der Meij
a8efc0296b
Obtain the export values for choice widgets from the normal appearance
The down appearance (`D`) is optional and not available in the document
from #12233, so the checkboxes are never saved/printed as checked
because the checked appearance is based on the export value that is
missing because the `D` entry is not available.

Instead, we should use the normal appearance (`N`) since that one is
required and therefore always available.

Finally, the /Off appearance is optional according to section 12.7.4.2.3
of the specification, so that needs to be taken into account to match
the specification and to fix reference test failures for the
`annotation-button-widget-print` test. That is a file that doesn't
specify an /Off appearance in the normal appearance dictionary.
2020-08-23 13:00:02 +02:00
Tim van der Meij
1b82ad8fff
Decode widget form values consistently
The helper method `_decodeFormValue` is used to ensure that it happens
in one place. Note that form values are field values, display values
and export values.
2020-08-23 13:00:01 +02:00
Tim van der Meij
12c20772ac
Improve the field value parsing for choice widgets to handle null values
The specification states that the field value is `null` if no item is
selected and we didn't handle this case properly. Even though this did
not break the rendering because we always convert the value to an array
and the `includes` check in the display layer would simply not match,
the field value would be `[null]` which is not expected and strange from
an API perspective.

This commit fixes that by ensuring that we return an empty array in
case the field value is `null`. The API therefore still always gives an
array for the field value, but now the code is more specific so that the
value is either an empty array or an array of strings.
2020-08-19 23:27:50 +02:00
Jonas Jenwald
1058f16605 Add (basic) support for transfer functions to Images (issue 6931, bug 1149713)
This is *similar* to the existing transfer function support for SMasks, but extended to simple image data.
Please note that the extra amount of data now being sent to the worker-thread, for affected /ExtGState entries, is limited to *at most* 4 `Uint8Array`s each with a length of 256 elements.

Refer to https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G9.1658137 for additional details.
2020-08-17 10:34:12 +02:00
Jonas Jenwald
9d3e046a4f Don't cache /ExtGState entries that contain fonts (PR 12087 follow-up)
I completely overlooked the fact that `PartialEvaluator.handleSetFont` also updates the current `state`, which means that currently we're not actually handling font data correctly for cached /ExtGState data. (Thankfully, using /ExtGState to set a font is somewhat rare in practice.)
2020-08-17 08:17:25 +02:00
Calixte Denizet
1a6816ba98 Add support for saving forms 2020-08-12 10:32:59 +02:00
Brendan Dahl
7fb01f9f2a
Merge pull request #12186 from brendandahl/loca-2
Fix bad truetype loca tables.
2020-08-10 20:34:19 -07:00
Brendan Dahl
f6dff81223 Fix bad truetype loca tables.
Some fonts have loca tables that aren't sorted or use 0 as an offset to
signal a missing glyph. This fixes the bad loca tables by sorting them
and then rewriting the loca table and potentially re-ordering the glyf
table to match.

Fixes #11131 and bug 1650302.
2020-08-10 14:15:49 -07:00
Calixte Denizet
88b112ab0c Support comb textfields for printing 2020-08-09 14:41:26 +02:00
Calixte Denizet
cd8bb7293b Support multiline textfields for printing 2020-08-09 12:14:34 +02:00
Calixte Denizet
1747d259f9 Support textfield and choice widgets for printing 2020-08-06 14:45:23 +02:00
Brendan Dahl
ac494a2278 Add support for optional marked content.
Add a new method to the API to get the optional content configuration. Add
a new render task param that accepts the above configuration.
For now, the optional content is not controllable by the user in
the viewer, but renders with the default configuration in the PDF.

All of the test files added exhibit different uses of optional content.

Fixes #269.

Fix test to work with optional content.

- Change the stopAtErrors test to ensure the operator list has something,
  instead of asserting the exact number of operators.
2020-08-04 09:26:55 -07:00
Tim van der Meij
5a66c56eca
Merge pull request #12108 from calixteman/radio
Add support for radios printing
2020-08-02 14:47:46 +02:00
Jonas Jenwald
6d192f987e Prevent Uncaught (in promise) AbortException when running the unit-tests
These errors can/will occur if data is still loading when the document is destroyed, which is the case in the API unit-tests that load the `tracemonkey.pdf` file.
While this patch prevents these kind of problems, and thus allows us to update Jasmine again, I cannot help but thinking that it's slightly "hacky". Basically, we'll simply catch and ignore (some) rejected promises once the document is destroyed and/or its data loading is aborted. However, I don't *think* that these changes should cause issues in general, since we don't really care about errors once document destruction has started (note e.g. the fair number of `catch` handlers ignoring `AbortException`s already).
2020-07-31 23:29:05 +02:00
Calixte Denizet
538017f7a7 Add support for radios printing 2020-07-31 14:31:49 +02:00
Tim van der Meij
eb4d6a0652
Merge pull request #12107 from calixteman/checkbox
Add support for checkboxes printing
2020-07-30 00:11:41 +02:00
Calixte Denizet
cb60523a15 Add support for checkboxes printing 2020-07-29 16:42:57 +02:00
Jonas Jenwald
fbe90b63ec [src/core/worker.js] Remove a useless Promise handler from the pdfManagerReady function
Looking carefully at this code, you'll notice that the `loadDocument` function has no less than *three* Promise handling functions. This obviously makes no sense, since a Promise can only have one resolve and one reject handler.

Hence the final `onFailure`-case is unreachable, which only serves to add confusion when reading the code. Note that this code has been re-factored more than once over the years, but it seems as if this may even have been incorrect already in PR 3310 (and no-one have noticed for seven years :-).
2020-07-28 14:51:50 +02:00
Jonas Jenwald
835b5ffddd Only check isType3Font the first time that TranslatedFont.loadType3Data is called
If the `TranslatedFont.type3Loaded` property exists, then you already know that the font must be a Type3 one.
2020-07-27 13:20:15 +02:00
Jonas Jenwald
f3ff526019 Send/receive Type3 images the same way as other globally-cached images
There's quite frankly no particular reason to special-case Type3-fonts with image resources, which are very rare anyway, now that we have a general mechanism for sending/receiving images globally.
2020-07-27 13:20:15 +02:00
Jonas Jenwald
7c9d0d5939 Improve how Type3-fonts with dependencies are handled
While the `CharProcs` streams of Type3-fonts *usually* don't rely on dependencies, such as e.g. images, it does happen in some cases.

Currently any dependencies are simply appended to the parent operatorList, which in practice means *only* the operatorList of the *first* page where the Type3-font is being used.
However, there's one thing that's slightly unfortunate with that approach: Since fonts are global to the PDF document, we really ought to ensure that any Type3 dependencies are appended to the operatorList of *all* pages where the Type3-font is being used. Otherwise there's a theoretical risk that, if one page has its rendering paused, another page may try to use a Type3-font whose dependencies are not yet fully resolved. In that case there would be errors, since Type3 operatorLists are executed synchronously.

Hence this patch, which ensures that all relevant pages will have Type3 dependencies appended to the main operatorList. (Note here that the `OperatorList.addDependencies` method, via `OperatorList.addDependency`, ensures that a dependency is only added *once* to any operatorList.)

Finally, these changes also remove the need for the "waiting for the main-thread"-hack that was added to `PartialEvaluator.buildPaintImageXObject` as part of fixing issue 10717.
2020-07-27 13:20:13 +02:00
Calixte Denizet
584902dbf8 Add an annotation storage in order to save annotation data in acroforms 2020-07-24 10:50:11 +02:00
Jonas Jenwald
684a7b89ac Remove unnecessary duplication in the addChildren helper function (used by the ObjectLoader)
Besides being fewer lines of code overall, this also avoids *one* `node instanceof Dict` check for both of the `Dict`/`Stream`-cases.
2020-07-17 16:32:24 +02:00
Jonas Jenwald
ea8e432c45 Add a getRawValues method, to Dict instances, to provide an easier way of getting all *raw* values
When the old `Dict.getAll()` method was removed, it was replaced with a `Dict.getKeys()` call and `Dict.get(...)` calls (in a loop).
While this pattern obviously makes a lot of sense in many cases, there's some instances where we actually want the *raw* `Dict` values (i.e. `Ref`s where applicable). In those cases, `Dict.getRaw(...)` calls are instead used within the loop. However, by introducing a new `Dict.getRawValues()` method we can reduce the number of (strictly unnecessary) function calls by simply getting the *raw* `Dict` values directly.
2020-07-17 16:32:00 +02:00
Jonas Jenwald
6381b5b08f Add a size getter, to Dict instances, to provide an easier way of checking the number of entries
This removes the need to manually call `Dict.getKeys()` and check its length.
2020-07-17 16:06:11 +02:00
Tim van der Meij
e63d1ebff5
Merge pull request #12087 from Snuffleupagus/LocalGStateCache
Add local caching of "simple" Graphics State (ExtGState) data in `PartialEvaluator.{getOperatorList, getTextContent}` (issue 2813)
2020-07-17 16:02:45 +02:00
Tim van der Meij
b19a1796ac
Convert RefSetCache to a proper class and to use a Map internally
Using a `Map` instead of an `Object` provides some advantages such as
cheaper ways to get the size of the cache, to find out if an entry is
contained in the cache and to iterate over the cache. Moreover, we can
clear and re-use the same `Map` object now instead of creating a new
one.
2020-07-17 13:35:29 +02:00
Jonas Jenwald
b3480842b3 Use a RefSet, rather than a plain Object, for tracking already processed nodes in PartialEvaluator.hasBlendModes 2020-07-17 09:52:36 +02:00
Jonas Jenwald
03547b5633 Change PartialEvaluator.setGState to an async method
Since this method calls `Dict.get` to fetch data, there could thus be `Error`s thrown in corrupt PDF documents when attempting to resolve an indirect object.
To ensure that this won't ever become a problem, we change the method to be `async` such that a rejected Promise would be returned and general OperatorList parsing won't break.
2020-07-15 14:27:18 +02:00
Jonas Jenwald
f20aeb9343 Slightly simplify the code in PartialEvaluator.hasBlendModes, e.g. by using for...of loops
- Replace the existing loops with `for...of` variants instead.

 - Make use of `continue`, to reduce indentation and to make the code (slightly) easier to follow, when checking `/Resources` entries.
2020-07-15 12:47:11 +02:00
Jonas Jenwald
15fa3f8518 Remove a redundant /XObject stream dictionary objId check in PartialEvaluator.hasBlendModes (PR 6971 follow-up)
This case should no longer happen, given the `instanceof Ref` branch just above (added in PR 6971).
Also, I've run the entire test-suite locally with `continue` replaced by `throw new Error(...)` and didn't find any problems.
2020-07-15 12:47:11 +02:00
Jonas Jenwald
84476da26e Handle lookup errors "silently" in PartialEvaluator.hasBlendModes (PR 11680 follow-up)
Given that this method is used during what's essentially a *pre*-parsing stage, before the actual OperatorList parsing occurs, on second thought it doesn't seem at all necessary to warn and trigger fallback in cases where there's lookup errors.

*Please note:* Any any errors will still be either suppressed or thrown, according to the `ignoreErrors` option, during the *actual* OperatorList parsing.
2020-07-15 12:47:07 +02:00
Jonas Jenwald
981ff41b5f Add local caching of non-font Graphics State (ExtGState) data in PartialEvaluator.getTextContent
It turns out that `getTextContent` suffers from *similar* problems with repeated GStates as `getOperatorList`; please see the previous patch.

While only `/ExtGState` resources containing Fonts will actually be *parsed* by `PartialEvaluator.getTextContent`, we're still forced to fetch/validate repeated `/ExtGState` resources even though *most* of them won't affect the textContent (since they mostly contain purely graphical state).

With these changes we also no longer need to immediately reset the current text-state when encountering a `setGState` operator, which may thus improve text-selection in some cases.
2020-07-14 10:34:43 +02:00
Jonas Jenwald
90eb579713 Add local caching of "simple" Graphics State (ExtGState) data in PartialEvaluator.getOperatorList (issue 2813)
This patch will help pathological cases the most, with issue 2813 being a particularily problematic example. While there's only *four* `/ExtGState` resources, there's a total `29062` of `setGState` operators. Even though parsing of a single `/ExtGState` resource is quite fast, having to re-parse them thousands of times does add up quite significantly.

For simplicity we'll only cache "simple" `/ExtGState` resource, since e.g. the general `SMask` case cannot be easily cached (without re-factoring other code, which may have undesirable effects on general parsing).

By caching "simple" `/ExtGState` resource, we thus improve performance by:
 - Not having to fetch/validate/parse the same `/ExtGState` data over and over.
 - Handling of repeated `setGState` operators becomes *synchronous* during the `OperatorList` building, instead of having to defer to the event-loop/microtask-queue since the `/ExtGState` parsing is done asynchronously.

---

Obviously I had intended to include (standard) benchmark results with this patch, but for reasons I don't understand the test run-time (even with `master`) of the document in issue 2813 is *a lot* slower than in the development viewer (making normal benchmarking infeasible).
However, testing this manually in the development viewer (using `pdfBug=Stats`) shows a *reduction* of `~10 %` in the rendering time of the PDF document in issue 2813.
2020-07-14 10:34:43 +02:00
Jonas Jenwald
d4d7ac1b88 Stop special-casing the (very unlikely) "no /XObject found"-scenario, when parsing OPS.paintXObject operators, in PartialEvaluator.{getOperatorList, getTextContent}
Originally there weren't any (generally) good ways to handle errors gracefully, on the worker-side, however that's no longer the case and we can simply fallback to the existing `ignoreErrors` functionality instead.
Also, please note that the "no `/XObject` found"-scenario should be *extremely* unlikely in practice and would only occur in corrupt/broken documents.

Note that the `PartialEvaluator.getOperatorList` case is especially bad currently, since we'll simply (attempt to) send the data as-is to the main-thread. This is quite bad, since in a corrupt/broken document the data *could* contain anything and e.g. be unclonable (which would cause breaking errors).
Also, we're (obviously) not attempting to do anything with this "raw" `OPS.paintXObject` data on the main-thread and simply ensuring that we never send it definately seems like the correct approach.
2020-07-12 21:59:59 +02:00
Tim van der Meij
7dabc5ecc8
Merge pull request #12063 from Snuffleupagus/issue-10989
Tweak the heuristic, in `src/core/jpg.js`, that handles JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10989)
2020-07-11 00:05:11 +02:00
Jonas Jenwald
d18cf47419 Remove the special handling, used when creating Indexed ColorSpaces, for the case where the lookup-data is a Stream
This special-case was added in PR 1992, however it became unnecessary with the changes in PR 4824 since all of the ColorSpace parsing is now done on the worker-thread (with only RGB-data being sent to the main-thread).
2020-07-10 17:22:55 +02:00
Jonas Jenwald
ea6a0e4435 Remove the IR (internal representation) part of the ColorSpace parsing
Originally ColorSpaces were only *partially* parsed on the worker-thread, to obtain an IR-format which was sent to the main-thread. This had the somewhat unfortunate side-effect of causing the majority of the (potentially heavy) ColorSpace parsing to happen on the main-thread.
Hence PR 4824 which, among other things, changed ColorSpaces to be *fully* parsed on the worker-thread with only RGB-data being sent to the main-thread.

While it thus originally was necessary to have `ColorSpace.{parseToIR, fromIR}` methods, to handle the worker/main-thread split, that's no longer the case and we can thus reduce all of the ColorSpace parsing to one method instead.

Currently, when parsing a ColorSpace, we call `ColorSpace.parseToIR` which parses the ColorSpace-data from the document and then creates the IR-format. We then, immediately, call `ColorSpace.fromIR` which parses the IR-format and then finally creates the actual ColorSpace.[1]
All-in-all, this leads to a fair amount of unnecessary indirection which also (in my opinion) makes the code less clear.

Obviously these changes are not really expected to have a significant effect on performance, especially with the recently added caching of ColorSpaces, however there'll now be strictly fewer function calls, less memory allocated, and overall less parsing required during ColorSpace-handling.

---
[1] For ICCBased ColorSpaces, given the validation necessary, this currently even leads to parsing an /Alternate ColorSpace *twice*.
2020-07-10 17:22:44 +02:00
Jonas Jenwald
4cc6797f17 Re-factor the idFactory functionality, used in the core/-code, and move the fontID generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.

For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.

In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 16:33:31 +02:00
Jonas Jenwald
1d66fce781 Tweak the heuristic, in src/core/jpg.js, that handles JPEG images with a wildly incorrect SOF (Start of Frame) scanLines parameter (issue 10989) 2020-07-06 13:06:49 +02:00
Jonas Jenwald
c95fbb6e21 Convert the code in src/core/evaluator.js to use standard classes
This removes additional `// eslint-disable-next-line no-shadow` usage, which our old pseudo-classes necessitated.

Most of the re-formatting changes, after the `class` definitions and methods were fixed, were done automatically by Prettier.

*Please note:* I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.
2020-07-05 16:01:04 +02:00
Jonas Jenwald
32a0b6fa73 Move some constants and helper functions out of the PartialEvaluator closure
This will simplify the `class` conversion in the next patch, and with modern JavaScript the moved code is still limited to the current module scope.

*Please note:* For improved consistency with our usual formatting, the `TILING_PATTERN`/`SHADING_PATTERN` constants where re-factored slightly.
2020-07-05 15:56:23 +02:00
Tim van der Meij
c4255fdbfd
Merge pull request #12059 from Snuffleupagus/image-class
Convert the code in `src/core/image.js` to use ES6 classes
2020-07-05 14:08:55 +02:00
Jonas Jenwald
59da1d5829 Convert the code in src/core/image.js to use ES6 classes
This removes additional `// eslint-disable-next-line no-shadow` usage, which our old pseudo-classes necessitated.

*Please note:* I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.
2020-07-05 09:34:14 +02:00
Jonas Jenwald
85ced3fbfd Allow BaseLocalCache to, optionally, only allocate storage for caching of references (PR 12034 follow-up)
*Yet another instalment in the never-ending series of things that you think of __after__ a patch has landed.*

Since `Function`s are only cached by reference, we thus don't need to allocate storage for names in `LocalFunctionCache` instances. Obviously the effect of these changes are *really tiny*, but it seems reasonable in principle to avoid allocating data structures that are guaranteed to be unused.
2020-07-04 15:01:32 +02:00
Jonas Jenwald
ca719ecaa4 Add local caching of Functions, by reference, in the PDFFunctionFactory (issue 2541)
Note that compared other structures, such as e.g. Images and ColorSpaces, `Function`s are not referred to by name, which however does bring the advantage of being able to share the cache for an *entire* page.
Furthermore, similar to ColorSpaces, the parsing of individual `Function`s are generally fast enough to not really warrant trying to cache them in any "smarter" way than by reference. (Hence trying to do caching similar to e.g. Fonts would most likely be a losing proposition, given the amount of data lookup/parsing that'd be required.)

Originally I tried implementing this similar to e.g. the recently added ColorSpace caching (and in a couple of different ways), however it unfortunately turned out to be quite ugly/unwieldy given the sheer number of functions/methods where you'd thus need to pass in a `LocalFunctionCache` instance. (Also, the affected functions/methods didn't exactly have short signatures as-is.)
After going back and forth on this for a while it seemed to me that the simplest, or least "invasive" if you will, solution would be if each `PartialEvaluator` instance had its *own* `PDFFunctionFactory` instance (since the latter is already passed to all of the required code). This way each `PDFFunctionFactory` instances could have a local `Function` cache, without it being necessary to provide a `LocalFunctionCache` instance manually at every `PDFFunctionFactory.{create, createFromArray}` call-site.

Obviously, with this patch, there's now (potentially) more `PDFFunctionFactory` instances than before when the entire document shared just one. However, each such instance is really quite small and it's also tied to a `PartialEvaluator` instance and those are *not* kept alive and/or cached. To reduce the impact of these changes, I've tried to make as many of these structures as possible *lazily initialized*, specifically:

 - The `PDFFunctionFactory`, on `PartialEvaluator` instances, since not all kinds of general parsing actually requires it. For example: `getTextContent` calls won't cause any `Function` to be parsed, and even some `getOperatorList` calls won't trigger `Function` parsing (if a page contains e.g. no Patterns or "complex" ColorSpaces).

 - The `LocalFunctionCache`, on `PDFFunctionFactory` instances, since only certain parsing requires it. Generally speaking, only e.g. Patterns, "complex" ColorSpaces, and/or (some) SoftMasks will trigger any `Function` parsing.

To put these changes into perspective, when loading/rendering all (14) pages of the default `tracemonkey.pdf` file there's now a total of 6 `PDFFunctionFactory` and 1 `LocalFunctionCache` instances created thanks to the lazy initialization.
(If you instead would keep the document-"global" `PDFFunctionFactory` instance and pass around `LocalFunctionCache` instances everywhere, the numbers for the `tracemonkey.pdf` file would be instead be something like 1 `PDFFunctionFactory` and 6 `LocalFunctionCache` instances.)
All-in-all, I thus don't think that the `PDFFunctionFactory` changes should be generally problematic.

With these changes, we can also modify (some) call-sites to pass in a `Reference` rather than the actual `Function` data. This is nice since `Function`s can also be `Streams`, which are not cached on the `XRef` instance (given their potential size), and this way we can avoid unnecessary lookups and thus save some additional time/resources.

Obviously I had intended to include (standard) benchmark results with these changes, but for reasons I don't really understand the test run-time (even with `master`) of the document in issue 2541 is quite a bit slower than in the development viewer.
However, logging the time it takes for the relevant `PDFFunctionFactory`/`PDFFunction ` parsing shows that it takes *approximately* `0.5 ms` for the `Function` in question. Looking up a cached `Function`, on the other hand, is *one order of magnitude faster* which does add up when the same `Function` is invoked close to 2000 times.
2020-07-04 00:55:18 +02:00
Jonas Jenwald
28d2ada59c Attempt to detect inline images which contain "EI" sequence in the actual image data (issue 11124)
This should reduce the possibility of accidentally truncating some inline images, while *not* causing the "EI" detection to become significantly slower.[1]
There's obviously a possibility that these added checks are not sufficient to catch *every* single case of "EI" sequences within the actual inline image data, but without specific test-cases I decided against over-engineering the solution here.

*Please note:* The interpolation issues are somewhat orthogonal to the main issue here, which is the truncated image, and it's already tracked elsewhere.

---
[1] I've looked at the issue a few times, and this is the first approach that I was able to come up with that didn't cause *unacceptable* performance regressions in e.g. issue 2618.
2020-06-26 13:15:06 +02:00
Jonas Jenwald
b8e1352934 Stop passing in unnecessary parameters when parsing the Alternate entry of ICCBased ColorSpaces (PR 9659 follow-up)
With the changes made in PR 9659, `ColorSpace.fromIR` no longer takes a second `pdfFunctionFactory` parameter and there's thus one call-site that can be simplified.
2020-06-24 23:53:10 +02:00
Jonas Jenwald
19d7976483 Improve (local) caching of parsed ColorSpaces (PR 12001 follow-up)
This patch contains the following *notable* improvements:
 - Changes the `ColorSpace.parse` call-sites to, where possible, pass in a reference rather than actual ColorSpace data (necessary for the next point).
 - Adds (local) caching of `ColorSpace`s by `Ref`, when applicable, in addition the caching by name. This (generally) improves `ColorSpace` caching for e.g. the SMask code-paths.
 - Extends the (local) `ColorSpace` caching to also apply when handling Images and Patterns, thus further reducing unneeded re-parsing.
 - Adds a new `ColorSpace.parseAsync` method, almost identical to the existing `ColorSpace.parse` one, but returning a Promise instead (this simplifies some code in the `PartialEvaluator`).
2020-06-24 23:53:10 +02:00
Jonas Jenwald
51e87b9248 Add a proper LocalColorSpaceCache class, rather than piggybacking on the image one (PR 12001 follow-up)
This will allow caching of ColorSpaces by either `Name` *or* `Ref`, which doesn't really make sense for images, thus allowing (better) caching for ColorSpaces used with e.g. Images and Patterns.
2020-06-24 23:53:10 +02:00
Jonas Jenwald
e22bc483a5 Re-factor ColorSpace.parse to take a parameter object, rather than a bunch of (randomly) ordered parameters
Given the number of existing parameters, this will avoid needlessly unwieldy call-sites especially with upcoming changes in later patches.
2020-06-24 23:53:10 +02:00
Jonas Jenwald
f0708717a9 Move the fetchBuiltInCMap method to the PartialEvaluator.prototype
Defining this *inline* in the "constructor" looks slightly weird (I really don't know why I wrote it like that originally), and it can simply be changed to a regular method instead.
2020-06-24 17:29:47 +02:00