Commit Graph

4340 Commits

Author SHA1 Message Date
Calixte Denizet
ecd45cc9af JS - Avoid a popup to ask for updating Acrobat.
- this popup appears because js is enabled;
  - and because the pdf contains some unsupported features (e.g. XFA);
  - can be tested with: https://www.cbsa-asfc.gc.ca/publications/forms-formulaires/a10.pdf.
2021-02-24 15:52:48 +01:00
calixteman
45329af926
XFA -- Add support for SOM expressions (#12983)
- specifications: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=87;
 - add a parser for SOM expressions;
 - add search functions to resolve those expressions;
 - search functions will be used to bind data into template.
2021-02-24 10:13:02 +01:00
Tim van der Meij
f3aa4408a5
Merge pull request #13005 from calixteman/colors
JS - Fix setting a color on an annotation
2021-02-21 14:50:03 +01:00
Calixte Denizet
4a5f1d1b7a JS - Fix setting a color on an annotation
- strokeColor corresponds to borderColor;
 - support fillColor and textColor;
 - support colors on the different annotations;
 - fix typo in aforms (+test).
2021-02-20 15:24:37 +01:00
Jonas Jenwald
d69cf702f3 Add a this-bound method for InternalRenderTask.cancel
This is similar to the other methods, and the only reason for this not having been done originally is that the `cancel` functionality is a later addition.
2021-02-20 14:47:57 +01:00
Jonas Jenwald
e9038cc3d1 Send the AnnotationStorage-data to the worker-thread as a Map
Rather than converting the `AnnotationStorage`-data to an Object, before sending it to the worker-thread, we should be able to simply send the internal `Map` directly.
The "structured clone algorithm" doesn't have a problem with `Map`s, however the `LoopbackPort` used when workers are *disabled* (e.g. in Node.js environments) didn't use to support them. With PR 12997 having lifted that restriction, we should now be able to simply send the `AnnotationStorage`-data as-is rather than having to iterate through it to first create an Object.

*Please note:* The changes in `src/core/annotation.js` could have been a lot more compact if we were able to use optional chaining in the `src/core` folder. Unfortunately that's still not possible, since SystemJS is being used in the development viewer (i.g. `gulp server`) and fixing that is *still* blocked by [bug 1247687](https://bugzilla.mozilla.org/show_bug.cgi?id=1247687).
2021-02-18 17:13:43 +01:00
calixteman
0fa9976268
XFA - Add support for prototypes (#12979)
- specifications: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=225&zoom=auto,-207,784
 - add a clone method on nodes in order to be able to clone a proto;
 - support ids in template namespace;
 - prevent from cycle when applying protos.
2021-02-18 10:32:25 +01:00
Tim van der Meij
4619b1b568
Merge pull request #12997 from Snuffleupagus/metadata-worker
Move the Metadata parsing to the worker-thread
2021-02-17 20:57:46 +01:00
Tim van der Meij
77862bdb8e
Merge pull request #12999 from Snuffleupagus/LoopbackPort-rm-sync
[api-minor] Remove support for synchronous event dispatching in `LoopbackPort`
2021-02-17 20:39:54 +01:00
Tim van der Meij
1d3af89cae
Merge pull request #12998 from Snuffleupagus/renderInteractiveForms-defaults
Simplify the default value handling of `renderInteractiveForms` in the viewer components
2021-02-17 20:37:03 +01:00
calixteman
b5be515375
XFA - Add a lexer/parser for FormCalc language (#12936)
- the language specifications are: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=1049
 - it can be used to:
   * as a scripting language for calculation, validations, ...
   * in SOM expressions to select nodes: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=101
2021-02-17 20:28:06 +01:00
Jonas Jenwald
3398070e26 [api-minor] Remove support for synchronous event dispatching in LoopbackPort
*Please note:* The `defer` parameter has been enabled by default ever since PR 9777 (in 2018), which first shipped in PDF.js release `2.0.943`.
With workers *disabled*, e.g. in Node.js environments, this has been used ever since without any problems reported[1].

The impetus for this change was that I happened to notice that *if* the `LoopbackPort` was used with synchronous event dispatching, we'd simply send that data as-is to the listeners. This created an inconsistency in the data returned from the `pdf.worker.js` file, since `postMessage` used with *actual* workers (or the `LoopbackPort` with `defer = true`) will ignore/throw when encountering unclonable data.
Originally my intention was simply to just call `cloneValue` regardless of the event dispatching used in `LoopbackPort`, however looking at the use-cases (or lack thereof) of the `LoopbackPort` it seemed reasonable to simply remove the `defer` parameter instead.

This patch is tagged "[api-minor]" since the `LoopbackPort` is still exposed in the API, although I really hope that no third-party is using this (since disabling workers leads to bad performance).

Finally, this patch changes a `forEach` loop to `for...of` and makes uses of optional changing in existing code.

---
[1] As evident by the `npm test` command run by Github Actions, and previously by Travis.
2021-02-17 16:12:29 +01:00
Jonas Jenwald
d366bbdf51 Move the encodeToXmlString helper function to src/core/core_utils.js
With the previous patch this function is now *only* accessed on the worker-thread, hence it's no longer necessary to include it in the *built* `pdf.js` file.
2021-02-17 13:12:01 +01:00
Jonas Jenwald
b66f294f64 Move the XML-parser to the src/core/-folder
With the previous patch this functionality is now *only* accessed on the worker-thread, hence it's no longer necessary to include it in the *built* `pdf.js` file.
2021-02-17 13:12:01 +01:00
Jonas Jenwald
cc3a6563ee Move the Metadata parsing to the worker-thread
The only reason, as far as I can tell, for parsing the Metadata on the main-thread is how it was originally implemented. When Metadata support was first implemented, it utilized the [`DOMParser`](https://developer.mozilla.org/en-US/docs/Web/API/DOMParser) which isn't available in workers.
Today, with the custom XML-parser being used, that's no longer an issue and it seems reasonable to move the Metadata parsing to the worker-thread[1], since that's where all parsing should happen (for performance reasons).

Based on these changes, we'll be able to reduce the now unnecessary duplication of the XML-parser (and related code) in both of the *built* `pdf.js`/`pdf.worker.js` files.

Finally, this patch changes the `_repair` method to use "Array + join" rather than string concatenation.

---
[1] This needed the previous patch, to enable sending of `Map`s between threads with workers disabled.
2021-02-17 13:12:01 +01:00
Jonas Jenwald
73bf45e64b Support Map and Set, with postMessage, when workers are disabled
The `LoopbackPort` currently doesn't support `Map` and `Set`, which it should since the "structured clone algorithm" used in browsers does support both of them; please see https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm#supported_types
2021-02-17 13:11:59 +01:00
Jonas Jenwald
0a28e51e40 Simplify the default value handling of renderInteractiveForms in the viewer components
I happened to look at this code, and I can't for the life of me figure out why I didn't just implement it like this patch in the first place (since the current format feels overly verbose).
2021-02-17 10:47:55 +01:00
Calixte Denizet
82f75a8ac2 JS -- Fix doc.getField and add missing field methods
- getField("foo") was wrongly returning a field named "foobar";
 - field object had few missing unimplemented methods
2021-02-17 10:42:52 +01:00
Tim van der Meij
bab059d8fd
Merge pull request #12964 from calixteman/12963
Avoid infinite loop when getting annotation field name
2021-02-16 22:36:24 +01:00
Calixte Denizet
0fc8267576 Avoid infinite loop when getting annotation field name
- aims to fix issue #12963;
 - use a Set to track already visited objects;
 - remove the loop limit in getInheritableProperty and use a RefSet too.
2021-02-14 19:58:19 +01:00
Jonas Jenwald
b26c7974fe [api-minor] Change the dc:subject Metadata field to an Array
This patch simply extends the existing handling of the `dc:creator` field, which should hopefully suffice here; please refer to https://wwwimages2.adobe.com/content/dam/acom/en/devnet/xmp/pdfs/XMP%20SDK%20Release%20cc-2016-08/XMPSpecificationPart1.pdf#page=34
2021-02-14 17:16:40 +01:00
Tim van der Meij
c79fd71457
Merge pull request #12896 from calixteman/text_layer
Modifiy the way to compute baseline to have a better match between canvas and text layer
2021-02-13 15:12:58 +01:00
Jonas Jenwald
1ee747a620 Remove unneeded instanceof MissingDataException checks
The following checks are all unneeded, and could easily cause confusion when reading the code. (All of them are my fault as well, since I've sometimes added those checks without really thinking about the surrounding code.)

 - In `PartialEvaluator.hasBlendModes` there cannot be any `MissingDataException`s thrown, given that the `Page.getOperatorList` method waits for all the necessary /Resources to load first. Furthermore, note also that if an error is thrown from `PartialEvaluator.hasBlendModes` then it'd completely break rendering of that page, since any errors thrown from `Page.getOperatorList` are simply sent to the main-thread.

 - In `PartialEvaluator.handleColorN` there cannot be any `MissingDataException`s thrown, given that again the `Page.getOperatorList` method waits for all the necessary /Resources to load before operatorList parsing starts.

 - In `XRef.readXRef` there cannot be any `MissingDataException`s thrown, given that we're *explicitly* requesting (and waiting for) the entire document in `pdfManagerReady` (in `src/core/worker.js`) before re-parsing of a corrupt document starts.
2021-02-13 12:26:05 +01:00
Calixte Denizet
ea06bb0e36 [api-minor] Annotation -- Don't compute appearance when nothing has changed
* don't set a value in annotationStorage by default:
   - having an undefined when the annotation is rendered for saving/printing means nothing has changed so use normal appearance
   - aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1681687
 * change the way to compute font size when this one is null in DA:
   - make fontSize proportional to line height
   - in multiline case, take into account the number of lines for text entered to adapt the font size
2021-02-12 19:27:21 +01:00
Calixte Denizet
b4421b076a Modifiy the way to compute baseline to have a better match between canvas and text layer
- use ascent of the fallback font instead of the one from pdf to position spans
 - use TextMetrics.fontBoundingBoxAscent if available or
 - use a basic heuristic to guess ascent in drawing char on a canvas
 - compute ascent as a ratio of font height
2021-02-12 11:28:02 +01:00
dhufnagel
fc925827b2
fix initial state of checkboxes in display layer (#12904)
consider the export value when multiple checkboxes have the same name
2021-02-12 11:22:54 +01:00
Jonas Jenwald
133158e4d5 Update the year in the license_header files 2021-02-11 17:52:26 +01:00
calixteman
0479deef4e
XFA -- Add other objects (#12949)
- connectionSet: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=969
 - datasets: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=1038
  - signature: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=1040
  - stylesheet: the same
  - xhtml: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=1187
2021-02-11 12:30:37 +01:00
calixteman
3787bd41ef
XFA -- Add localset object (#12948)
- Specifications: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=943
2021-02-10 18:04:43 +01:00
Jonas Jenwald
0068dba009 [api-minor] Rename -es5 to -legacy, to reduce confusion over what's actually supported (issue 12976)
*Please note that this will also require some edits of the Wiki.*
2021-02-10 16:01:59 +01:00
Jonas Jenwald
31098c404d
Use Math.hypot, instead of Math.sqrt with manual squaring (#12973)
When the PDF.js project started `Math.hypot` didn't exist yet, and until recently we still supported browsers (IE 11) without a native `Math.hypot` implementation; please see this compatibility information: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/hypot#browser_compatibility

Furthermore, somewhat recently there were performance improvements of `Math.hypot` in Firefox; see https://bugzilla.mozilla.org/show_bug.cgi?id=1648820

Finally, this patch also replaces a couple of multiplications with the exponentiation operator.
2021-02-10 12:28:49 +01:00
Tim Nguyen
2ca886baee Use DOM hidden property instead of attribute methods 2021-02-08 00:21:49 +01:00
Jonas Jenwald
e6fe8a7d53 Handle errors gracefully, in PartialEvaluator.translateFont, when fetching the font file (issue 9462)
The *third* page of the referenced PDF document currently fails to render completely, since one of its font files fail to load.
Since that error isn't handled, a large part of the text is thus missing which looks quite bad. By "replacing" the font data with an *empty* stream, we'll thus be able to fallback to rendering the text with a standard font (instead of using `ErrorFont`). While there's obviously no guarantee that things will look perfect, actually rendering the text at all should be an improvement in general.

Also, print a warning in `PartialEvaluator.loadFont` when the `PartialEvaluator.translateFont` method rejects, since that'd have helped debug/fix the issue faster.
2021-02-06 19:44:53 +01:00
Jonas Jenwald
d3e65f24e3 Request all data, rather than throwing, when encountering general errors in ObjectLoader._walk (issue 9462, PR 3289 follow-up)
*As far as I can tell, this has been broken ever since PR 3289 (back in 2013) without anyone noticing.*

For any non-`MissingDataException` errors encountered in `ObjectLoader._walk`, we're simply throwing immediately which thus has the potential to *completely* break rendering of an entire page.
In practice this is obviously only an issue for PDF documents which are in one way or another corrupt, since that's the only way that `XRef.fetch` will throw non-`MissingDataException` errors. To make matters worse these errors are *intermittent*, since they can only occur if the document is still loading when the `ObjectLoader`-code runs (note the early return in `ObjectLoader.load`).

Please note that we cannot simply catch the error and let "normal" parsing continue in `ObjectLoader._walk`, since that could lead to errors elsewhere given that resources "below" the current one (in the graph) might not be checked as intended then.
All-in-all, the only way to make absolutely sure that we won't cause *unexpected* `MissingDataException`s somewhere else in the code-base is to fallback to fetching the *entire* document in this edge-case.
2021-02-06 14:33:50 +01:00
Brendan Dahl
a392082e30
Merge pull request #12944 from calixteman/xfa_config
XFA -- Update config object
2021-02-05 15:06:09 -08:00
Calixte Denizet
9d47e69771 XFA -- Update config object 2021-02-05 19:22:51 +01:00
Calixte Denizet
652ff57897 XFA -- Add template object
- Specifications: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=596
2021-02-03 21:05:10 +01:00
Calixte Denizet
7e0554afe2 XFA -- Add attributes and children in XFAObject
- in order to evaluate SOM expressions nodes and their attributes must be checked in the same order as in the xml;
 - add an object XFAObjectArray with a parameter max to handle multiple children with the same name.
2021-02-03 18:56:00 +01:00
Calixte Denizet
0ff5cd7eb5 XFA - Add a parser for XFA files
- the parser is base on a class extending XMLParserBase
 - it handle xml namespaces:
   * each namespace is assocated with a builder
   * builder builds nodes belonging to the namespace
   * when a node is inserted in the parent namespace compatibility is checked (if required)
 - to avoid name collision between xml names and object properties, use Symbol.
2021-02-01 13:45:31 +01:00
Jonas Jenwald
fec8c4c43f Access this._onUnsupportedFeature directly in FontFaceObject.getPathGenerator
Given that `FontFaceObject` is not exposed in the public API, but only accessed internally, there's no need to assume that a `FontFaceObject`-instance is ever initialized without `onUnsupportedFeature` being provided. This is also consistent with the `BaseFontLoader` implementation.
2021-01-29 16:48:55 +01:00
Tim van der Meij
e4e92d10e8
Merge pull request #12922 from Snuffleupagus/getTextContent-globalImageCache
Ignore globally cached images in `PartialEvaluator.getTextContent` (PR 11930 follow-up)
2021-01-28 23:44:10 +01:00
Tim van der Meij
8805614a03
Merge pull request #12924 from brendandahl/fix-clone
Fix font data clone error when pdfBug is enabled.
2021-01-28 23:42:12 +01:00
Jonas Jenwald
72da2aa166 Ignore globally cached images in PartialEvaluator.getTextContent (PR 11930 follow-up)
Given that we'll only cache `/XObject`s of the `Image`-type globally, we can utilize that in `PartialEvaluator.getTextContent` as well. This way, in cases such as e.g. issue 12098, we can avoid having to fetch/parse `/XObject`s that we already know to be `Image`s. This is helpful, since `Stream`s are not cached on the `XRef` instance (given their potential size) and the lookup can thus be somewhat expensive in general.

Also, skip a redundant `RefSetCache.has` check in the `GlobalImageCache.getData` method.
2021-01-28 10:19:26 +01:00
Brendan Dahl
52fb5abb0b Fix font data clone error when pdfBug is enabled.
The widths property should be an object to match what metrics returns.

In ZapfDingbats.pdf I was getting a data clone error with pdfBug enabled.
In buildCharCodeToWidth() there was an encoding with the name "at" which
is also the name of a method on an array. buildCharCodeToWidth assumes an
object is passed in, so when it checked for the "at" property, it found the
method and copied it over.

This only seemed to affect Firefox.
2021-01-27 14:38:43 -08:00
Tim van der Meij
d52e5b0505
Merge pull request #12903 from Snuffleupagus/GlobalImageCache-byteSize
Improve global image caching for small images (PR 11912 follow-up, issue 12098)
2021-01-27 22:20:58 +01:00
Tim van der Meij
286271152f
Merge pull request #12910 from calixteman/bidi
Add back dir property in spans in text layer
2021-01-27 22:09:00 +01:00
calixteman
465697eb10
JS - QuickJS sandbox initialization must be the last evaluated string (#12914)
- aims to fix issue #12912
2021-01-26 14:56:01 +01:00
Jonas Jenwald
1ab6d2c604 Improve global image caching for small images (PR 11912 follow-up, issue 12098)
When implementing the `GlobalImageCache` functionality I was mostly worried about the effect of *very large* images, hence the maximum number of cached images were purposely kept quite low[1].
However, there's one fairly obvious problem with that approach: In documents with hundreds, or even thousands, of *small* images the `GlobalImageCache` as implemented becomes essentially pointless.

Hence this patch, where the `GlobalImageCache`-implementation is changed in the following ways:
 - We're still guaranteed to be able to cache a *minimum* number of images, set to `10` (similar as before).
 - If the *total* size of all the cached image data is below a threshold[2], we're allowed to cache additional images.

This patch thus *improve*, but doesn't completely fix, issue 12098. Note that that document is created by a *very poor* PDF generator, since every single page contains the *entire* document (with all of its /Resources) and to create the individual pages clipping is used.[3]

---
[1] Currently set to `10` images; imagine what would happen to overall memory usage if we encountered e.g. 50 images each 10 MB in size.

[2] This value was chosen, somewhat randomly, to be `40` megabytes; basically five times the [maximum individual image size per page](6249ef517d/src/display/api.js (L2483-L2484)).

[3] This surely has to be some kind of record w.r.t. how badly PDF generators can mess things up...
2021-01-26 12:00:12 +01:00
Calixte Denizet
539256c351 Add back dir property in spans in text layer
- aims to fix #12909
2021-01-26 12:00:05 +01:00
calixteman
a3f6882b06
JS -- add support for choice widget (#12826) 2021-01-25 23:40:57 +01:00