Commit Graph

210 Commits

Author SHA1 Message Date
Calixte Denizet
4b0538d07a Don't save anything in XFA entry if no XFA! (bug 1732344)
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1732344
  - rename some variables to have a more clear code;
  - and last but no least, add a unit test to test saving.
2021-09-23 19:51:23 +02:00
Calixte Denizet
2fc10727c5 XFA - Only warn about the wrong xfa type when there is an xfa thing 2021-09-18 15:44:05 +02:00
Jonas Jenwald
45ddb12f61 Remove no-op onPull/onCancel streamSink callbacks from the "GetTextContent"-handler
The `MessageHandler`-implementation already handles either of these callbacks being undefined, hence there's no particular reason (as far as I can tell) to add no-op functions here.

Also, in a couple of `MessageHandler`-methods, utilize an already existing local variable more.
2021-09-09 00:01:10 +02:00
Calixte Denizet
77b9657e57 XFA - Overwrite AcroForm dictionary when saving if no datasets in XFA (bug 1720179)
- aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1720179
  - in some pdfs the XFA array in AcroForm dictionary doesn't contain an entry for 'datasets' (which contains saved data), so basically this patch allows to overwrite the AcroForm dictionary with an updated XFA array when doing an incremental update.
2021-09-03 17:04:03 +02:00
Jonas Jenwald
a7f0301f21 [Regression] Re-factor the *internal* includeAnnotationStorage handling, since it's currently subtly wrong
*This patch is very similar to the recently fixed `renderInteractiveForms`-options, see PR 13867.*
As far as I can tell, this *subtle* bug has existed ever since `AnnotationStorage`-support was first added in PR 12106 (a little over a year ago).

The value of the `includeAnnotationStorage`-option, as passed to the `PDFPageProxy.render` method, will (potentially) affect the size/content of the operatorList that's returned from the worker (for documents with forms).
Given that operatorLists will generally, unless they contain huge images, be cached in the API, repeated `PDFPageProxy.render` calls where the form-data has been changed by the user in between, can thus *wrongly* return a cached operatorList.

In the viewer we're only using the `includeAnnotationStorage`-option when printing, which is probably why this has gone unnoticed for so long. Note that we, for performance reasons, don't cache printing-operatorLists in the API.
However, there's nothing stopping an API-user from using the `includeAnnotationStorage`-option during "normal" rendering, which could thus result in *subtle* (and difficult to understand) rendering bugs.

In order to handle this, we need to know if the `AnnotationStorage`-instance has been updated since the last `PDFPageProxy.render` call. The most "correct" solution would obviously be to create a hash of the `AnnotationStorage` contents, however that would require adding a bunch of code, complexity, and runtime overhead.
Given that operatorList caching in the API doesn't have to be perfect[1], but only have to avoid *false* cache-hits, we can simplify things significantly be only keeping track of the last time that the `AnnotationStorage`-data was modified.

*Please note:* While working on this patch, I also noticed that the `renderInteractiveForms`- and `includeAnnotationStorage`-options in the `PDFPageProxy.render` method are mutually exclusive.[2]
Given that the various Annotation-related options in `PDFPageProxy.render` have been added at different times, this has unfortunately led to the current "messy" situation.[3]

---
[1] Note how we're already not caching operatorLists for pages with *huge* images, in order to save memory, hence there's no guarantee that operatorLists will always be cached.

[2] Setting both to `true` will result in undefined behaviour, since trying to insert `AnnotationStorage`-values into fields that are being excluded from the operatorList-building will obviously not work, which isn't at all clear from the documentation.

[3] My intention is to try and fix this in a follow-up PR, and I've got a WIP patch locally, however it will result in a number of API-observable changes.
2021-08-18 10:09:03 +02:00
Jonas Jenwald
107efdb178 [Regression] Re-factor the *internal* renderInteractiveForms handling, since it's currently subtly wrong
The value of the `renderInteractiveForms` parameter, as passed to the `PDFPageProxy.render` method, will (potentially) affect the size/content of the operatorList that's returned from the worker (for documents with forms).
Given that operatorLists will generally, unless they contain huge images, be cached in the API, repeated `PDFPageProxy.render` calls that *only* change the `renderInteractiveForms` parameter can thus return an incorrect operatorList.

As far as I can tell, this *subtle* bug has existed ever since `renderInteractiveForms`-support was first added in PR 7633 (which is almost five years ago).
With the previous patch, fixing this is now really simple by "encoding" the `renderInteractiveForms` parameter in the *internal* renderingIntent handling.
2021-08-06 00:40:43 +02:00
Calixte Denizet
5cdee80c8e XFA - An image can be a stream in the pdf (bug 1718521) - hrefs can be found in catalog > Names > XFAImages 2021-07-05 14:06:23 +02:00
calixteman
783cbc1793
Revert "XFA - An image can be a stream in the pdf (bug 1718521)" 2021-07-05 12:47:14 +02:00
calixteman
b370d4714f
Merge pull request #13654 from calixteman/images
XFA - An image can be a stream in the pdf (bug 1718521)
2021-07-05 12:04:34 +02:00
Jonas Jenwald
661c60ecc9 [api-minor] Support accessing both the original and modified PDF fingerprint
The PDF.js API has only ever supported accessing the original file ID, however the second one that (should) exist in *modified* documents have thus far been completely inaccessible through the API.
That seems like a simple oversight, caused e.g. by the viewer not needing it, since it really shouldn't hurt to provide API-users with the ability to check if a PDF document has been modified since its creation.[1]

Please refer to https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G13.2261661 for additional information.

For an example of how to update existing code to use the new API, please see the changes in the `web/app.js` file included in this patch.

*Please note:* While I'm not sure if we'll ever be able to remove the old `PDFDocumentProxy.fingerprint` getter, given that it's existed since "forever", that probably isn't a big deal given that it's now limited to only `GENERIC`-builds.

---
[1] Although this obviously depends on the PDF software following the specification, by updating the second file ID as intended.
2021-07-03 13:56:33 +02:00
Calixte Denizet
f16828be49 XFA - An image can be a stream in the pdf (bug 1718521)
- hrefs can be found in catalog > Names > XFAImages
2021-07-02 20:34:10 +02:00
Calixte Denizet
429ffdcd2f XFA - Save filled data in the pdf when downloading the file (Bug 1716288)
- when binding (after parsing) we get a map between some template nodes and some data nodes;
  - so set user data in input handlers in using data node uids in the annotation storage;
  - to save the form, just put the value we have in the storage in the correct data nodes, serialize the xml as a string and then write the string at the end of the pdf using src/core/writer.js;
  - fix few bugs around data bindings:
    - the "Off" issue in Bug 1716980.
2021-06-25 18:57:01 +02:00
Calixte Denizet
8eeb7ab4a3 XFA - Add the possibily to layout and measure text
- some containers doesn't always have their 2 dimensions and those dimensions re based on contents;
  - so in order to measure text, we must get the glyph widths (for the xfa fonts) before starting the layout;
  - implement a word-wrap algorithm;
  - handle font change during text layout.
2021-06-17 14:17:02 +02:00
Jonas Jenwald
d995f90183 Fetch binary CMap data in the worker-thread, when useWorkerFetch is set
This patch uses the new option added in PR 12726 to *also* allow fetching binary CMap data directly in the worker-thread in browsers.
Given that these changes remove the need to transfer data between threads for the default (browser) use-case, we can also revert the changes in PR 11118 since that simplifies the overall implementation.
2021-06-08 21:51:07 +02:00
Brendan Dahl
4c1dd47e65 Include and use the 14 standard fonts files. 2021-06-07 11:10:11 -07:00
Calixte Denizet
45c3f00a27 XFA - Move the fake HTML representation of XFA from the worker to the main thread
- the only goal of this patch is to be able to get synchronously the fake html when printing from firefox:
    - in order to print we need to inject some html in beforeprint callback but we cannot block in waiting for all the pages.
  - from a memory point of view: it doesn't change anything since the fake HTML is deleted in the worker;
  - this way we don't break any assumptions.
2021-05-25 19:33:07 +02:00
Tim van der Meij
16efd09c9f
Enable the no-var linting rule in src/core/worker.js
This is done automatically with `gulp lint --fix` and the following
manual changes:

```diff
diff --git a/src/core/worker.js b/src/core/worker.js
index aec9c1d39..f88691622 100644
--- a/src/core/worker.js
+++ b/src/core/worker.js
@@ -300,7 +300,7 @@ class WorkerMessageHandler {
         cachedChunks = [];
       };
       const readPromise = new Promise(function (resolve, reject) {
-        var readChunk = function ({ value, done }) {
+        const readChunk = function ({ value, done }) {
           try {
             ensureNotTerminated();
             if (done) {
```
2021-04-25 17:40:00 +02:00
Jonas Jenwald
da22146b95 Replace a bunch of Array.prototype.forEach() cases with for...of loops instead
Using `for...of` is a modern and generally much nicer pattern, since it gets rid of unnecessary callback-functions. (In a couple of spots, a "regular" `for` loop had to be used.)
2021-04-24 13:00:19 +02:00
Jonas Jenwald
57a1ea840f
Ensure that saveDocument works if there's no /ID-entry in the PDF document (issue 13279) (#13280)
First of all, while it should be very unlikely that the /ID-entry is an *indirect* object, note how we're using `Dict.get` when parsing it e.g. in `PDFDocument.fingerprint`. Hence we definitely should be consistent here, since if the /ID-entry is an *indirect* object the existing code in `src/core/writer.js` would already fail.
Secondly, to fix the referenced issue, we also need to check that the /ID-entry actually is an Array before attempting to access its contents in `src/core/writer.js`.

*Drive-by change:* In the `xrefInfo` object passed to the `incrementalUpdate` function, re-name the `encrypt` property to `encryptRef` since its data is fetched using `Dict.getRaw` (given the names of the other properties fetched similarly).
2021-04-22 12:08:56 +02:00
Jonas Jenwald
f560fe6875 A couple of small scripting/XFA-related tweaks in the worker-code
- Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible.

 - Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.)

 - Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be.
   Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.)

 - Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.
2021-04-17 10:34:22 +02:00
Brendan Dahl
ac3fa1e3d7
Merge pull request #13146 from calixteman/xfa_fonts
XFA -- Load fonts permanently from the pdf
2021-04-16 12:55:12 -07:00
Calixte Denizet
7e9579045f XFA -- Load fonts permanently from the pdf
- Different fonts can be used in xfa and some of them are embedded in the pdf.
  - Load all the fonts in window.document.

Update src/core/document.js

Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>

Update src/core/worker.js

Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
2021-04-15 17:57:42 +02:00
Jonas Jenwald
54ef4370a2 Ensure that the data is loaded, in the "GetPageJSActions" message handler
Similar to all other data accesses, note e.g. the "GetDocJSActions" handler just above, we need to ensure that a `MissingDataException` isn't propagated to the main-thread if this data is accessed while the PDF document is still loading.
2021-04-12 13:54:37 +02:00
Jonas Jenwald
9360c7cbdc Avoid unnecessary parsing, in Page.GetStructTree, when no structTree is available (PR 13221 follow-up)
It's obviously (a bit) more efficient to return early in `Page.getStructTree`, rather than trying to first "parse" an *empty* structTree-root.

*Somehow I didn't think of this yesterday, but this feels like a much better solution overall; sorry about the churn here!*
2021-04-12 08:54:21 +02:00
Jonas Jenwald
0d2dd6c2fe Remove the unused "GetIsPureXfa" message handler in the worker (PR 13069 follow-up)
Looking at the API, there's no code which actually sends this message. Most likely it's a left-over from a previous version of PR 13069, since the `isPureXfa` parameter is being included in the "GetDoc" message.
2021-04-12 08:52:27 +02:00
Brendan Dahl
fc9501a637 Add support for basic structure tree for accessibility.
When a PDF is "marked" we now generate a separate DOM that represents
the structure tree from the PDF.  This DOM is inserted into the <canvas>
element and allows screen readers to walk the tree and have more
information about headings, images, links, etc. To link the structure
tree DOM (which is empty) to the text layer aria-owns is used. This
required modifying the text layer creation so that marked items are
now tracked.
2021-04-09 09:56:28 -07:00
calixteman
24e598a895
XFA - Add a layer to display XFA forms (#13069)
- add an option to enable XFA rendering if any;
  - for now, let the canvas layer: it could be useful to implement XFAF forms (embedded pdf in xml stream for the background and xfa form for the foreground);
  - ui elements in template DOM are pretty close to their html counterpart so we generate a fake html DOM from template one:
    - it makes easier to translate template properties to html ones;
    - it makes faster the creation of the html element in the main thread.
2021-03-19 10:11:40 +01:00
Jonas Jenwald
0068dba009 [api-minor] Rename -es5 to -legacy, to reduce confusion over what's actually supported (issue 12976)
*Please note that this will also require some edits of the Wiki.*
2021-02-10 16:01:59 +01:00
Jonas Jenwald
b4eb55250e Remove redundant compatibility checks, for modern generic builds, in src/core/worker.js
With the recent additions of optional chaining and nullish coalescing to the PDF.js code-base, a couple of the checks in `src/core/worker.js` are now redundant; please see this compatibility information:
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Optional_chaining#browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Nullish_coalescing_operator#browser_compatibility

In practice, for the non-translated/non-polyfilled PDF.js builds, browsers without support for optional chaining and nullish coalescing will simply throw immediately upon loading of the code.
Hence both the `globalThis` and `Promise.allSettled` checks are now unnecessary, given this compatibility information:
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/globalThis#browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/allSettled#browser_compatibility

*Please note:* The `ReadableStream` check is however still necessary, since Node.js doesn't support that.
2021-01-20 13:09:56 +01:00
Jonas Jenwald
81525fd446 Use ESLint to ensure that exports are sorted alphabetically
There's built-in ESLint rule, see `sort-imports`, to ensure that all `import`-statements are sorted alphabetically, since that often helps with readability.
Unfortunately there's no corresponding rule to sort `export`-statements alphabetically, however there's an ESLint plugin which does this; please see https://www.npmjs.com/package/eslint-plugin-sort-exports

The only downside here is that it's not automatically fixable, but the re-ordering is a one-time "cost" and the plugin will help maintain a *consistent* ordering of `export`-statements in the future.
*Note:* To reduce the possibility of introducing any errors here, the re-ordering was done by simply selecting the relevant lines and then using the built-in sort-functionality of my editor.
2021-01-09 20:37:51 +01:00
Calixte Denizet
1e2173f038 JS - Collect and execute actions at doc and pages level
* the goal is to execute actions like Open or OpenAction
 * can be tested with issue6106.pdf (auto-print)
 * once #12701 is merged, we can add page actions
2020-12-18 20:03:59 +01:00
Jonas Jenwald
c42029489e Run gulp lint --fix, to account for changes in Prettier version 2.2.1
Please refer to https://github.com/prettier/prettier/blob/master/CHANGELOG.md#221 for additional details.
2020-11-29 10:01:46 +01:00
Calixte Denizet
a5279897a7 JS -- Add listener for sandbox events only if there are some actions
* When no actions then set it to null instead of empty object
* Even if a field has no actions, it needs to listen to events from the sandbox in order to be updated if an action changes something in it.
2020-11-09 18:37:59 +01:00
Jonas Jenwald
a03b383edb Fail early, in modern GENERIC builds, if globalThis isn't available (PR 11799 follow-up, issue 12596)
It probably doesn't hurt to explicitly check for `globalThis` as well, in addition to the existing checks.
2020-11-07 19:00:33 +01:00
Brendan Dahl
f5c821e9c3 [api-minor] Implement API to get MarkInfo from the catalog. 2020-10-30 10:59:45 -07:00
Calixte Denizet
c30a3a94f0 JS - Add a function in api to get the fields ids in AcroForm::CO 2020-10-17 12:56:40 +02:00
Tim van der Meij
a373137304
Merge pull request #12429 from calixteman/collect_js
[api-minor] Add the possibility to collect Javascript actions
2020-10-14 23:27:47 +02:00
Calixte Denizet
71ecc3129b Add the possibility to collect Javascript actions 2020-10-14 10:44:16 +02:00
Tim van der Meij
1034769ca1
Merge pull request #12477 from Snuffleupagus/SaveDocument-WorkerTask
Handle `WorkerTask`s, and various PDF document properties, correctly in the "SaveDocument" handler in `src/core/worker.js`
2020-10-13 21:11:54 +02:00
Jonas Jenwald
65132ba5d8 Handle WorkerTasks, and various PDF document properties, correctly in the "SaveDocument" handler in src/core/worker.js
- Actually register/unregister the `WorkerTask`s, used when saving each page, correctly.
   To prevent issues when terminating the Worker, we purposely wait for all running `WorkerTask`s to complete first. Hence we need to actually handle `WorkerTask`s the same way in "SaveDocument" as in the rest of this file, see e.g. "GetOperatorList" and "GetTextContent".

 - Access `PDFDocument` properties in a generally safe/consistent way.
   While the current code works fine, given how the PDF document is being loaded, it still seems like a very good idea to be *consistent* in how we access these kind of properties (since in general you need to avoid `MissingDataException` everywhere in this file).

 - Change a variable name, since there's essentially no precedent in the code-base for *local* variable names to start with an underscore.
2020-10-13 19:30:43 +02:00
Jonas Jenwald
38629c345d Remove the scope parameter from the "GetOperatorList" handler in src/core/worker.js (PR 11110 follow-up)
Support for the `scope` parameter, in `MessageHandler.on`, was removed in PR 11110 however this particular case was unused/unnecessary for years prior to that change. (From a quick look through the history, I'm not even sure if it was actually needed in the first place.)
2020-10-13 15:58:38 +02:00
Jonas Jenwald
9416b14e8b Re-factor how the ESLint no-var rule is enabled in the src/ folder
This simplifies/consolidates the ESLint configuration slightly in the `src/` folder, and prevents the addition of any new files where `var` is being used.[1]
Hence we no longer need to manually add `/* eslint no-var: error */` in files, which is easy to forget, and can instead disable the rule in the `src/core/` files where `var` is still in use.

---
[1] Obviously the `no-var` rule can, in the same way as every other rule, be disabled on a case-by-case basis where actually necessary.
2020-10-03 20:15:29 +02:00
Jonas Jenwald
ed4e7cd8a4 A couple of small improvements in the "SaveDocument" handler in src/core/worker.js
- Check that the "Info"-entry, in the XRef-trailer, is actually a dictionary before accessing it. This is similar to the `PDFDocument.documentInfo` method and follows the general principal of validating data carefully before accessing it, given how often PDF-software may create corrupt PDF files.

 - Slightly simplify the "XFA"-lookup, since there's no point in trying to fetch something from the empty dictionary.
2020-09-15 09:57:40 +02:00
Calixte Denizet
64a6efd95e Follow-up of pr #12344 2020-09-09 11:46:02 +02:00
calixteman
68b99c59ee
Save form data in XFA datasets when pdf is a mix of acroforms and xfa (#12344)
* Move display/xml_parser.js in shared to use it in worker

* Save form data in XFA datasets when pdf is a mix of acroforms and xfa

Co-authored-by: Brendan Dahl <brendan.dahl@gmail.com>
2020-09-08 15:13:52 -07:00
Jonas Jenwald
bd16c363ce Access the Catalog data correctly in the "GetPageIndex" handler in src/core/worker.js
Even though the code obviously works as-is, given that we have unit-tests for it, it still feels incorrect to just *assume* that the `Catalog`-instance has all of its properties immediately available. Especially when (almost) all of the other handlers, in `src/core/worker.js`, protect their data accesses with appropriate `pdfManager.ensure` calls.
2020-08-25 12:14:14 +02:00
Jonas Jenwald
2e6e2c3b41 Access the XRef data correctly in the "GetStats" handler in src/core/worker.js
Even though the code obviously works as-is, given that we have unit-tests for it, it still feels incorrect to just *assume* that the `XRef`-instance has all of its properties immediately available. Especially when (almost) all of the other handlers, in `src/core/worker.js`, protect their data accesses with appropriate `pdfManager.ensure` calls.
2020-08-25 12:14:11 +02:00
Calixte Denizet
1a6816ba98 Add support for saving forms 2020-08-12 10:32:59 +02:00
Brendan Dahl
ac494a2278 Add support for optional marked content.
Add a new method to the API to get the optional content configuration. Add
a new render task param that accepts the above configuration.
For now, the optional content is not controllable by the user in
the viewer, but renders with the default configuration in the PDF.

All of the test files added exhibit different uses of optional content.

Fixes #269.

Fix test to work with optional content.

- Change the stopAtErrors test to ensure the operator list has something,
  instead of asserting the exact number of operators.
2020-08-04 09:26:55 -07:00
Jonas Jenwald
fbe90b63ec [src/core/worker.js] Remove a useless Promise handler from the pdfManagerReady function
Looking carefully at this code, you'll notice that the `loadDocument` function has no less than *three* Promise handling functions. This obviously makes no sense, since a Promise can only have one resolve and one reject handler.

Hence the final `onFailure`-case is unreachable, which only serves to add confusion when reading the code. Note that this code has been re-factored more than once over the years, but it seems as if this may even have been incorrect already in PR 3310 (and no-one have noticed for seven years :-).
2020-07-28 14:51:50 +02:00