Commit Graph

1925 Commits

Author SHA1 Message Date
Jonas Jenwald
8137c0547d Fix the gStateObj lookup in TranslatedFont._removeType3ColorOperators (PR 12718 follow-up)
As can be seen in 2cba290361/src/core/evaluator.js (L986) the `gStateObj` (which is actually an Array despite its name), is wrapped in Array when it's inserted into the OperatorList. Hence we obviously need to take this into account when accessing it in `TranslatedFont._removeType3ColorOperators`; this mistake happened because we don't have any test-cases for this particular code-path as far as I know.
2021-01-22 12:27:38 +01:00
Jonas Jenwald
cfaf23dee2 Simplify the PDFFunctionFactory._localFunctionCache initialization (PR 12034 follow-up)
By changing this a `shadow`ed getter, we can simply access it directly and not worry about it being initialized. I have no idea why I didn't just implement it this way in the first place.
2021-01-22 12:25:05 +01:00
Brendan Dahl
2cba290361
Merge pull request #12836 from calixteman/update_buttons
JS -- update radio/checkbox values even if there are no actions
2021-01-21 14:00:26 -08:00
calixteman
1039698697
Add a parser to get font data from the default appearance (#12831)
* Add a parser to get font data from the default appearance
 - pdfium & poppler use a special parser too to get these info.

* Update src/core/default_appearance.js

Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>

Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
2021-01-21 20:15:31 +01:00
Jonas Jenwald
b4eb55250e Remove redundant compatibility checks, for modern generic builds, in src/core/worker.js
With the recent additions of optional chaining and nullish coalescing to the PDF.js code-base, a couple of the checks in `src/core/worker.js` are now redundant; please see this compatibility information:
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Optional_chaining#browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Nullish_coalescing_operator#browser_compatibility

In practice, for the non-translated/non-polyfilled PDF.js builds, browsers without support for optional chaining and nullish coalescing will simply throw immediately upon loading of the code.
Hence both the `globalThis` and `Promise.allSettled` checks are now unnecessary, given this compatibility information:
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/globalThis#browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/allSettled#browser_compatibility

*Please note:* The `ReadableStream` check is however still necessary, since Node.js doesn't support that.
2021-01-20 13:09:56 +01:00
Jonas Jenwald
2600e59acb Always re-measure non-embedded ArialNarrow fonts (bug 1671312, PR 12725 follow-up)
While PR 12725 fixed bug 1671312 as reported, i.e. the "In the upper right corner "Purposes' has bad kerning."-part, it however broke other parts of the text rendering.
Note in particular the tables, e.g. on page 2 and beyond, where the glyphs are now rendered too close together. The reason for this is that the fonts in question are non-embedded ArialNarrow, which we just replace with Helvetica which obviously is not narrow. Given that the font replacement isn't a perfect fit for non-embedded ArialNarrow, we still need to re-measure the glyph widths in this case.
2021-01-14 15:51:48 +01:00
calixteman
1de1ae0be6
Merge pull request #12838 from calixteman/authors
[api-minor] Change the "dc:creator" Metadata field to an Array
2021-01-12 02:44:58 -08:00
Calixte Denizet
43d5512f5c [api-minor] Change the "dc:creator" Metadata field to an Array
- add scripting support for doc.info.authors
 - doc.info.metadata is the raw string with xml code
2021-01-11 21:34:07 +01:00
Tim van der Meij
f85b8721d1
Merge pull request #12842 from Snuffleupagus/issue-12841
Improve handling of JPEG images without an EOI marker (issue 12841)
2021-01-10 13:21:28 +01:00
Jonas Jenwald
81525fd446 Use ESLint to ensure that exports are sorted alphabetically
There's built-in ESLint rule, see `sort-imports`, to ensure that all `import`-statements are sorted alphabetically, since that often helps with readability.
Unfortunately there's no corresponding rule to sort `export`-statements alphabetically, however there's an ESLint plugin which does this; please see https://www.npmjs.com/package/eslint-plugin-sort-exports

The only downside here is that it's not automatically fixable, but the re-ordering is a one-time "cost" and the plugin will help maintain a *consistent* ordering of `export`-statements in the future.
*Note:* To reduce the possibility of introducing any errors here, the re-ordering was done by simply selecting the relevant lines and then using the built-in sort-functionality of my editor.
2021-01-09 20:37:51 +01:00
Jonas Jenwald
cd9422a075 Improve handling of JPEG images without an EOI marker (issue 12841)
Given that the PDF document in the issue contains the same very large JPEG image *three* times, this patch includes a test-case where only the first page has been extracted from it.
2021-01-09 20:19:39 +01:00
Calixte Denizet
7172f0a928 JS -- update radio/checkbox values even if there are no actions 2021-01-08 16:43:16 +01:00
Calixte Denizet
83119b9000 In a text widget, Font resources can be in the appearance 2021-01-08 10:13:47 +01:00
Tim van der Meij
048081fb69
Merge pull request #12824 from Snuffleupagus/preEvaluateFont-errors
Improve the handling of errors, in `PartialEvaluator.loadFont`, occuring in `PartialEvaluator.preEvaluateFont` (issue 12823)
2021-01-07 23:15:41 +01:00
Tim van der Meij
5bde4b71f8
Merge pull request #12292 from calixteman/encoding
Fix encoding issues when printing/saving a form with non-ascii characters
2021-01-07 22:56:42 +01:00
Jonas Jenwald
78c32c2697 Improve the handling of errors, in PartialEvaluator.loadFont, occuring in PartialEvaluator.preEvaluateFont (issue 12823)
Currently any errors thrown in `preEvaluateFont`, which is a *synchronous* method, will not be handled at all in the `loadFont` method and we were thus failing to return an `ErrorFont`-instance as intended here.

Also, add an *explicit* check in `PartialEvaluator.preEvaluateFont` to ensure that Type0-fonts always have a *valid* dictionary.
2021-01-07 11:38:38 +01:00
Calixte Denizet
56424967f2 Fix encoding issues when printing/saving a form with non-ascii characters 2021-01-05 17:23:18 +01:00
Tim van der Meij
ca18af6af3
Merge pull request #12774 from calixteman/doc_action_test
JS -- Add tests for print/save actions
2021-01-03 18:46:37 +01:00
Tim van der Meij
50303fc8f4
Merge pull request #12766 from Snuffleupagus/issue-11004
Ignore, rather than throwing on, unsupported Coding style default (COD) options in JPEG 2000 images (issue 11004)
2020-12-28 20:26:10 +01:00
Calixte Denizet
ffd4bc790c JS -- Add tests for print/save actions
* change PDFDocument::hasJSActions to return true when there are JS actions in catalog.
2020-12-24 18:51:00 +01:00
Calixte Denizet
7c3facb174 JS -- Add support for buttons
* radio buttons
 * checkboxes
2020-12-22 16:41:51 +01:00
Jonas Jenwald
cffb7af3b0 Ignore, rather than throwing on, unsupported Coding style default (COD) options in JPEG 2000 images (issue 11004)
Similar to other markers that we currently skip, by ignoring unsupported Coding style default (COD) options we'll at least render *something* here (although some JPEG 2000 images may look slightly wrong).
Note that if the unsupported COD options lead to additional errors, during parsing, we'll still abort parsing of the JPEG 2000 image.
2020-12-21 20:35:52 +01:00
Brendan Dahl
3ea1c43b15
Merge pull request #12751 from calixteman/da_not_a_string
Add a default DA for textfield to avoid issues when printing or saving
2020-12-21 09:44:08 -08:00
Calixte Denizet
a7c682c600 Add a default DA for textfield to avoid issues when printing or saving
* it aims to fix issue #12750
2020-12-19 23:38:45 +01:00
calixteman
e6e2809825
Merge pull request #12702 from calixteman/doc_actions
JS - Collect and execute actions at doc level
2020-12-18 21:33:32 +01:00
Calixte Denizet
1e2173f038 JS - Collect and execute actions at doc and pages level
* the goal is to execute actions like Open or OpenAction
 * can be tested with issue6106.pdf (auto-print)
 * once #12701 is merged, we can add page actions
2020-12-18 20:03:59 +01:00
Jonas Jenwald
48a76aea2b Ignore, rather than throwing on, Coding style component (COC) markers in JPEG 2000 images (issue 12752)
Similar to other markers that we currently skip, by ignoring the Coding style component (COC) marker we'll at least prevent outright errors (although some JPEG 2000 images may look slightly wrong).
2020-12-18 18:18:32 +01:00
Calixte Denizet
03814bd6a2 Don't use 'in' operator to check if key is in a Map 2020-12-16 16:00:12 +01:00
Tim van der Meij
d1848f5022
Merge pull request #12725 from brendandahl/remeasure-std
Use widths defined by font for standard fonts.
2020-12-11 20:36:19 +01:00
Jonas Jenwald
67e5db75d8 Ignore color-operators in Type3 glyphs beginning with a d1 operator (issue 12705)
Please refer to the PDF specification at https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1977497 and https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.3998470

This patch removes the color-operators in the evaluator, since that should be more efficient than doing it repeatedly in the main-thread when rendering the Type3 glyphs.
2020-12-11 15:49:13 +01:00
Brendan Dahl
45d9ab6e45 Use widths defined by font for standard fonts.
There doesn't seem to be anything definitive about this in
the spec, but from experimenting, it seems acrobat lets
PDFs override the widths of the standard fonts.
2020-12-10 15:30:39 -08:00
Tim van der Meij
00b4f86db3
Merge pull request #12717 from Snuffleupagus/issue-12714
Ensure that the /Annots-entry, on /Page-instances, is actually an Array (issue 12714)
2020-12-10 23:06:59 +01:00
Calixte Denizet
25bf504ff5 Be sure that CalculationOrder is either null or a non-empty array 2020-12-10 16:02:11 +01:00
Jonas Jenwald
796a0d3155 Ensure that the /Annots-entry, on /Page-instances, is actually an Array (issue 12714)
In the referenced PDF document, the second and third page has *corrupt* /Annots-entries which contain /Dict-data rather than the intended Arrays.
2020-12-10 11:42:00 +01:00
Tim van der Meij
012e15f7a3
Fix non-standard quadpoints orders for annotations
This change requires us to use valid quadpoints arrays in the existing
unit tests too due to the normalization.
2020-12-06 16:02:41 +01:00
Jonas Jenwald
c42029489e Run gulp lint --fix, to account for changes in Prettier version 2.2.1
Please refer to https://github.com/prettier/prettier/blob/master/CHANGELOG.md#221 for additional details.
2020-11-29 10:01:46 +01:00
Tim van der Meij
256068556d
Merge pull request #12662 from Snuffleupagus/issue-12402
Check the top-level /Pages dictionary when finding the trailer in `XRef.indexObjects` (issue 12402)
2020-11-25 21:54:41 +01:00
Jonas Jenwald
8a132f584d Check the top-level /Pages dictionary when finding the trailer in XRef.indexObjects (issue 12402)
In addition to the existing /Root and /Pages validation, also check that the /Pages-entry actually is a dictionary and that it has a valid /Count-entry.
This way we can avoid picking a trailer candidate which e.g. the `Catalog.numPages` getter will just end up rejecting, thus breaking PDF document loading completely.
2020-11-25 15:14:53 +01:00
Calixte Denizet
18b525de2e Parenthesis in names are not escaped when saving 2020-11-25 12:28:12 +01:00
Calixte Denizet
b11592a756 JS -- hidden annotations must be built in case a script show them
* in some pdf, there are actions with "event.source.hidden = ..."
 * in order to handle visibility when printing, annotationStorage is extended to store multiple properties (value, hidden, editable, ...)
2020-11-10 12:48:34 +01:00
Calixte Denizet
a5279897a7 JS -- Add listener for sandbox events only if there are some actions
* When no actions then set it to null instead of empty object
* Even if a field has no actions, it needs to listen to events from the sandbox in order to be updated if an action changes something in it.
2020-11-09 18:37:59 +01:00
Jonas Jenwald
a03b383edb Fail early, in modern GENERIC builds, if globalThis isn't available (PR 11799 follow-up, issue 12596)
It probably doesn't hurt to explicitly check for `globalThis` as well, in addition to the existing checks.
2020-11-07 19:00:33 +01:00
Tim van der Meij
99ac2d1036
Merge pull request #12583 from Snuffleupagus/nonBlendModesSet
Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in `PartialEvaluator.hasBlendModes`
2020-11-05 23:53:39 +01:00
Tim van der Meij
646f895d35
Merge pull request #12568 from calixteman/defaultvalue
[api-minor] JS -- Add default value in annotation data
2020-11-05 22:53:21 +01:00
Jonas Jenwald
082cd8fc6c Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in PartialEvaluator.hasBlendModes
The `PartialEvaluator.hasBlendModes` method is necessary to determine if there's any blend modes on a page, which unfortunately requires *synchronous* parsing of the /Resources of each page before its rendering can start (see the "StartRenderPage"-message).
In practice it's not uncommon for certain /Resources-entries to be found on more than one page (referenced via the XRef-table), which thus leads to unnecessary re-fetching/re-parsing of data in `PartialEvaluator.hasBlendModes`.

To improve performance, especially in pathological cases, we can cache /Resources-entries when it's absolutely clear that they do not contain *any* blend modes at all[1]. This way, subsequent `PartialEvaluator.hasBlendModes` calls can be made significantly more efficient.

This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf:
```
[
    {  "id": "issue6961",
       "file": "../web/pdfs/issue6961.pdf",
       "md5": "a80e4357a8fda758d96c2c76f2980b03",
       "rounds": 100,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat         | Count | Baseline(ms) | Current(ms) |  +/- |     %  | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | 0    | Overall      |   100 |         1034 |         555 | -480 | -46.39 |        faster
firefox | 0    | Page Request |   100 |          489 |           7 | -482 | -98.67 |        faster
firefox | 0    | Rendering    |   100 |          545 |         548 |    2 |   0.45 |
firefox | 1    | Overall      |   100 |          912 |         428 | -484 | -53.06 |        faster
firefox | 1    | Page Request |   100 |          487 |           1 | -486 | -99.77 |        faster
firefox | 1    | Rendering    |   100 |          425 |         427 |    2 |   0.51 |
```

---
[1] In the case where blend modes *are* found, it becomes a lot more difficult to know if it's generally safe to skip /Resources-entries. Hence we don't cache anything in that case, however note that most document/pages do not utilize blend modes anyway.
2020-11-05 16:59:08 +01:00
Calixte Denizet
39f5954729 JS -- Add default value in annotation data
* these values are used when a form is resetted
2020-11-05 13:44:23 +01:00
Brendan Dahl
1de2bc4816
Merge pull request #12505 from calixteman/12504
Split highlight annotation div into multiple divs
2020-11-04 10:41:28 -08:00
Tim van der Meij
3e52098e29
Merge pull request #12555 from calixteman/color
Replace css color rgb(...) by #...
2020-11-02 23:55:39 +01:00
Calixte Denizet
9d11b51a3e Replace css color rgb(...) by #...
* it's faster to generate the color code in using a table for components
* it's very likely a way faster to parse (when setting the color in the canvas)
2020-11-02 10:25:04 +01:00
Tim van der Meij
46e60a266c
Merge pull request #12552 from Snuffleupagus/annotation-fixes
Miscellaneous (small) improvements in `src/core/annotation.js`
2020-10-31 00:41:39 +01:00
Tim van der Meij
e341e6e542
Merge pull request #12525 from brendandahl/mark-info
[api-minor] Implement API to get MarkInfo from the catalog.
2020-10-31 00:05:19 +01:00
Brendan Dahl
f5c821e9c3 [api-minor] Implement API to get MarkInfo from the catalog. 2020-10-30 10:59:45 -07:00
Jonas Jenwald
fdb6520012 Change the Catalog.openAction getter back to using an Object internally (PR 12543 follow-up)
Given that the `Map`-pattern apparently has undesirable performance characteristics, change this getter back to using an Object instead and check its size before returning it.
2020-10-30 13:27:05 +01:00
Jonas Jenwald
a1e5581a0b Let Annotation._collectActions return null when no actions are present
Rather than returning an *empty* Object[1] we should be returning `null` instead, since that's consistent with existing API-functionality.
To avoid having to *manually* track if the Object is empty, this patch also introduces a small helper function to check its size.
2020-10-30 13:23:05 +01:00
Jonas Jenwald
8540b4cc76 Stop calling Font.charsToGlyphs, in src/core/annotation.js, with unused arguments
As can be seen in `src/core/fonts.js`, this method only accepts *one* parameter, hence it's somewhat difficult to understand what the Annotation-code is actually attempting to do here.
The only possible explanation that I can imagine, is that the intention was initially to call `Font.charToGlyph` *directly* instead. However, note that that'd would not actually have been correct, since that'd ignore one level of font-caching (see `this.charsCache`). Hence the unused arguments are removed, in `src/core/annotation.js`, and the `Font.charToGlyph` method is now marked as "private" as intended.
2020-10-30 13:17:52 +01:00
Jonas Jenwald
46e94cad17 Fix *some* errors reported by the ESLint no-useless-escape rule
This patch removes unnecessary escape-sequence in (mostly) strings, as a first step, since the ones in regular expressions probably requires more careful testing (just in case).
The only exception is a regular expression in `src/core/annotation.js`, since we should have both unit- and reference-tests for this code *and* given [this information on MDN](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Character_Classes#Types):
 > Inside a character set, the dot loses its special meaning and matches a literal dot.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-useless-escape
2020-10-29 15:40:40 +01:00
Jonas Jenwald
9fc7cdcc9d Use a Map, rather than an Object, internally in the Catalog.openAction getter (PR 11644 follow-up)
This provides a work-around to avoid having to conditionally try to initialize the `openAction`-object in multiple places.
Given that `Object.fromEntries` doesn't seem to *guarantee* that a `null` prototype is used, we thus hack around that by using `Object.assign` with `Object.create(null)`.
2020-10-28 14:43:28 +01:00
Tim van der Meij
ea4d88a330
Merge pull request #12395 from calixteman/checks
Render not displayed annotations in using normal appearance when printing
2020-10-28 00:11:10 +01:00
Calixte Denizet
6be2f84b4e Render not displayed annotations in using normal appearance when printing 2020-10-27 19:00:31 +01:00
Tim van der Meij
71a14be8e7
Merge pull request #12534 from Snuffleupagus/murmurhash-slice
Ensure that `MurmurHash3_64.update` handles `ArrayBuffer` input correctly, to avoid hash-collisions (issue 12533)
2020-10-26 23:34:03 +01:00
Jonas Jenwald
f2fa053c51 Ensure that MurmurHash3_64.update handles ArrayBuffer input correctly, to avoid hash-collisions (issue 12533)
Different fonts incorrectly end up with *identical* hashes, despite having different /ToUnicode data.
The issue, and it's very interesting that we've apparently not seen it before, appears to be caused by the fact that different /ToUnicode entries share the *same* underlying `ArrayBuffer`, which thus becomes problematic at the `const dataUint32 = new Uint32Array(data.buffer, 0, blockCounts);` line. The simplest solution thus seem to be to just *copy* the input, when it's an `ArrayBuffer`, rather than using it as-is. (Note that if we'd stringified the input, when calling `MurmurHash3_64.update`, the issue would also have been fixed. In this case, we're already creating an unique TypedArray.)
2020-10-26 16:27:33 +01:00
Jonas Jenwald
56fa6d414c Add a getArrayLookupTableFactory helper function and use it to re-format src/core/{glyphlist, unicode}.js
*Please note:* Once https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 is implemented, and we've removed SystemJS completely, this entire patch can (and even should) be reverted.

This is similar to the existing `getLookupTableFactory` helper function, but is implemented as outlined in issue 6774.
The re-formatting of the tables were done automatically, by using find-and-replace with regular expressions.

For reasons that I don't even pretend to understand, using this particular structure for these *very* long lookup tables allow SystemJS to process the files correctly/quickly and the development viewer thus works as intended.
2020-10-26 11:08:00 +01:00
Jonas Jenwald
441d9c8cc0 Change src/core/{glyphlist, unicode}.js to use standard import/export statements
While the *built* `pdf.worker.js` file still works correctly with these changes, despite these two files being excluded by Babel[1], the development viewer does not because of issues with SystemJS[2] and/or its Babel-plugin (both of which are old).
Furthermore, note also that excluding these two files from Babel-processing isn't *generally* necessary since e.g. the `gulp mozcentral` command works anyway. The explanation is rather that it's actually the source-map generation which fails for these huge sequences when building the `pdf.worker.js` file.

However, not using standard `import`/`export` statements in all files means we also need to use SystemJS when e.e. running the unit-tests. This is very unfortunate, since SystemJS (or its old Babel-version) doesn't support modern ECMAScript features such as e.g. optional chaining and nullish coalescing.

Unfortunately it also seems that https://bugzilla.mozilla.org/show_bug.cgi?id=1247687, which tracks the implementation of worker-modules in Firefox, has stalled since there hasn't been any updates for six months now.

To hopefully address all of the above, this patch is the first in a series that attempts to further reduce our reliance on SystemJS.

---
[1] The only difference being how the dependencies are handled, in the Webpack-bundled file.

[2] Parsing takes way too long and consumes too much memory, thus rendering the development viewer essentially unusable.
2020-10-26 11:08:00 +01:00
Tim van der Meij
b4ca3d55b8
Merge pull request #12508 from calixteman/button_fallback_font
Fallback font for buttons must be ZapfDingbats.
2020-10-24 18:56:12 +02:00
Tim van der Meij
180f35ee91
Merge pull request #12526 from Snuffleupagus/TilingPattern-args
Improve argument/name handling when parsing TilingPatterns (PR 12458 follow-up)
2020-10-24 15:47:57 +02:00
Tim van der Meij
c493dc96fa
Merge pull request #12516 from Snuffleupagus/fieldObjects-annotation-undefined
Prevent issues, in `PDFDocument.fieldObjects`, for invalid Annotations
2020-10-24 15:42:33 +02:00
Jonas Jenwald
b478d3e7b9 Improve argument/name handling when parsing TilingPatterns (PR 12458 follow-up)
- Handle the arguments correctly in `PartialEvaluator.handleColorN`.
   For TilingPatterns with a base-ColorSpace, we're currently using the `args` when computing the color. However, as can be seen we're passing the Array as-is to the `ColorSpace.getRgb` method, which means that the `Name` is included as well.[1]
   Thankfully this hasn't, as far as I know, caused any actual bugs, but that may be more luck than anything else given how the `ColorSpace` code is implemented. This can be easily fixed though, simply by popping the `Name`-object off of the `args` Array.

 - Cache TilingPatterns using the `Name`-string, rather than the object directly.
   This is not only consistent with other caches in `PartialEvaluator`, but importantly it also ensures that the cache lookup always works correctly. Note that since `Name`-objects, similar to other primitives, uses a cache themselves a *manually* triggered `cleanup`-call could thus (theoretically) cause the `LocalTilingPatternCache` to not find an existing entry. While the likelihood of this happening is *extremely* small, it's still something that we should fix.

---
[1] The `args` Array can e.g. look like this: `[0.043, 0.09, 0.188, 0.004, /P1]`, which means that we're passing in the `Name`-object to the `ColorSpace` method.
2020-10-24 13:49:46 +02:00
Calixte Denizet
37c86b2daa Fallback font for buttons must be ZapfDingbats.
Fix bug https://bugzilla.mozilla.org/show_bug.cgi?id=1669099.
2020-10-24 12:00:03 +02:00
Calixte Denizet
85e6c67cf3 Split highlight annotation div into multiple divs
Fix for issue #12504.
Highlight annotation may have several rectangles so we must have several divs to add mouse events handlers.
2020-10-23 15:26:16 +02:00
Jonas Jenwald
b44a975d7c Prevent issues, in PDFDocument.fieldObjects, for invalid Annotations
For an invalid Annotation, there's one code-path where `undefined` is returned from `AnnotationFactory._create`. That'd currently, incorrectly, trigger an error during the `PDFDocument._collectFieldObjects` parsing which thus seem good to avoid.
Along these lines, the filtering in `PDFDocument.fieldObjects` is also updated to handle both `null` and `undefined` the same way.
2020-10-22 13:24:43 +02:00
Calixte Denizet
d2ef878702 Invalidate an annotation with no quadPoints (when it's required)
Some pdf softwares don't remove highlight annotations but make the QuadPoints array empty.
And the Rect for the annotation can be [-32768, -32768, 32768, 32768] so it leads to have a giant div which catches all the mouse events and make the pdf unusable when there are some forms elements.
2020-10-21 13:53:19 +02:00
Calixte Denizet
c30a3a94f0 JS - Add a function in api to get the fields ids in AcroForm::CO 2020-10-17 12:56:40 +02:00
Tim van der Meij
ff2631493e
Merge pull request #12481 from calixteman/issue_12475
Get urls if any in AA::D dictionary for pushbuttons
2020-10-16 22:55:43 +02:00
Tim van der Meij
32bceae732
Merge pull request #12483 from Snuffleupagus/formInfo-hasFields
Don't store complex data in `PDFDocument.formInfo`, and replace the `fields` object with a `hasFields` boolean instead
2020-10-16 22:40:40 +02:00
Jonas Jenwald
f956d0a96a Stop caching the *parsed* Font data on its Dict object (PR 7347 follow-up)
Given that *all* fonts are, ever since PR 7347, now cached in the "normal" `fontCache` there's actually no reason for the special `font.translated` construction. (Given how Objects in JavaScript are references, rather than raw values, the old code shouldn't have caused any significant memory overhead.)

Instead we can simply store the `cacheKey`, which is a simple string, on only the Font `Dict`s where it's needed and thus look-up all fonts using the `fontCache` instead.
2020-10-16 17:45:01 +02:00
Jonas Jenwald
29af15f37e Add more validation in the PDFDocument._hasOnlyDocumentSignatures method
If this method is ever passed invalid/unexpected data, or if during the course of parsing (since it's used recursively) such data is found, it will fail in a non-graceful way.
Hence this patch, which ensures that we don't attempt to access non-existent properties and also that errors such as the one fixed in PR 12479 wouldn't have occured.
2020-10-16 13:03:47 +02:00
Jonas Jenwald
3351d3476d Don't store complex data in PDFDocument.formInfo, and replace the fields object with a hasFields boolean instead
*This patch is based on a couple of smaller things that I noticed when working on PR 12479.*

 - Don't store the /Fields on the `formInfo` getter, since that feels like overloading it with unintended (and too complex) data, and utilize a `hasFields` boolean instead.
   This functionality was originally added in PR 12271, to help determine what kind of form data a PDF document contains, and I think that we should ensure that the return value of `formInfo` only consists of "simple" data.
   With these changes the `fieldObjects` getter instead has to look-up the /Fields manually, however that shouldn't be a problem since the access is guarded by a `formInfo.hasFields` check which ensures that the data both exists and is valid. Furthermore, most documents doesn't even have any /AcroForm data anyway.

 - Determine the `hasFields` property *first*, to ensure that it's always correct even if there's errors when checking e.g. the /XFA or /SigFlags entires, since the `fieldObjects` getter depends on it.

 - Simplify a loop in `fieldObjects`, since the object being accessed is a `Map` and those have built-in iteration support.

 - Use a higher logging level for errors in the `formInfo` getter, and include the actual error message, since that'd have helped with fixing PR 12479 a lot quicker.

 - Update the JSDoc comment in `src/display/api.js` to list the return values correctly, and also slightly extend/improve the description.
2020-10-16 12:47:27 +02:00
Calixte Denizet
ce3d3a6ff8 Get urls if any in AA::D dictionary for pushbuttons 2020-10-15 19:42:36 +02:00
Jonas Jenwald
bc6b47a50e Convert PartialEvaluator.translateFont to an async method
This allows us to make a slight simplification in `PartialEvaluator.loadFont`, which thus removes an old TODO-comment from the method.
Furthermore, in `PartialEvaluator.translateFont`, the CMap-handling is now limited to only *composite* fonts to avoid having to wait for a "dummy"-Promise for most fonts.
2020-10-15 09:42:58 +02:00
Tim van der Meij
a373137304
Merge pull request #12429 from calixteman/collect_js
[api-minor] Add the possibility to collect Javascript actions
2020-10-14 23:27:47 +02:00
Calixte Denizet
71ecc3129b Add the possibility to collect Javascript actions 2020-10-14 10:44:16 +02:00
Tim van der Meij
1034769ca1
Merge pull request #12477 from Snuffleupagus/SaveDocument-WorkerTask
Handle `WorkerTask`s, and various PDF document properties, correctly in the "SaveDocument" handler in `src/core/worker.js`
2020-10-13 21:11:54 +02:00
Jonas Jenwald
65132ba5d8 Handle WorkerTasks, and various PDF document properties, correctly in the "SaveDocument" handler in src/core/worker.js
- Actually register/unregister the `WorkerTask`s, used when saving each page, correctly.
   To prevent issues when terminating the Worker, we purposely wait for all running `WorkerTask`s to complete first. Hence we need to actually handle `WorkerTask`s the same way in "SaveDocument" as in the rest of this file, see e.g. "GetOperatorList" and "GetTextContent".

 - Access `PDFDocument` properties in a generally safe/consistent way.
   While the current code works fine, given how the PDF document is being loaded, it still seems like a very good idea to be *consistent* in how we access these kind of properties (since in general you need to avoid `MissingDataException` everywhere in this file).

 - Change a variable name, since there's essentially no precedent in the code-base for *local* variable names to start with an underscore.
2020-10-13 19:30:43 +02:00
Jonas Jenwald
38629c345d Remove the scope parameter from the "GetOperatorList" handler in src/core/worker.js (PR 11110 follow-up)
Support for the `scope` parameter, in `MessageHandler.on`, was removed in PR 11110 however this particular case was unused/unnecessary for years prior to that change. (From a quick look through the history, I'm not even sure if it was actually needed in the first place.)
2020-10-13 15:58:38 +02:00
Jonas Jenwald
30e8d5dea1 Add local caching of TilingPatterns in PartialEvaluator.getOperatorList (issue 2765 and 8473)
In practice it's not uncommon for PDF documents to re-use the same TilingPatterns more than once, and parsing them is essentially equal to parsing of a (small) page since a `getOperatorList` call is required.

By caching the internal TilingPattern representation we can thus avoid having to re-parse the same data over and over, and there's also *less* asynchronous parsing required for repeated TilingPatterns.

Initially I had intended to include (standard) benchmark results with this patch, however it's not entirely clear that this is actually necessary here given the preliminary results.
When testing this manually in the development viewer, using `pdfBug=Stats`, the following (approximate) reduction in rendering times were observed when comparing `master` against this patch:
 - http://pubs.usgs.gov/sim/3067/pdf/sim3067sheet-2.pdf (from issue 2765): `6800 ms` -> `4100 ms`.
 - https://github.com/mozilla/pdf.js/files/1046131/stepped.pdf (from issue 8473): `54000 ms` -> `13000 ms`
 - https://github.com/mozilla/pdf.js/files/1046130/proof.pdf (from issue 8473): `5900 ms` -> `2500 ms`

As always, whenever you're dealing with documents which are "slow", there's usually a certain level of subjectivity involved with regards to what's deemed acceptable performance.
Hence it's not clear to me that we want to regard any of the referenced issues as fixed, however the improvements are significant enough to warrant caching of TilingPatterns in my opinion.
2020-10-08 18:43:21 +02:00
Jani Pehkonen
935568c2f1 Fix invalid XUID entries in CFF fonts
In CFF fonts, entry `XUID` should be an array that has no more than
16 elements. In the issue, the length is 20, which causes the fonts to fail.
See Appendix B, "Implementation Limits" in PostScript Language Reference Manual
https://web.archive.org/web/20170218093716/https://www.adobe.com/products/postscript/pdfs/PLRM.pdf
Actually entries `XUID` and `UniqueID` are obsolete altogether.
https://blogs.adobe.com/CCJKType/2016/06/no-more-xuid-arrays.html
2020-10-05 17:38:01 +03:00
Jonas Jenwald
9416b14e8b Re-factor how the ESLint no-var rule is enabled in the src/ folder
This simplifies/consolidates the ESLint configuration slightly in the `src/` folder, and prevents the addition of any new files where `var` is being used.[1]
Hence we no longer need to manually add `/* eslint no-var: error */` in files, which is easy to forget, and can instead disable the rule in the `src/core/` files where `var` is still in use.

---
[1] Obviously the `no-var` rule can, in the same way as every other rule, be disabled on a case-by-case basis where actually necessary.
2020-10-03 20:15:29 +02:00
Tim van der Meij
6ff1fe4ea9
Merge pull request #12333 from calixteman/tooltip
Add tooltip if any in annotations layer
2020-10-03 19:50:39 +02:00
calixteman
20b12d2bda Add tooltip if any in annotations layer 2020-10-02 10:11:18 +02:00
Jonas Jenwald
bd3b15b897 Use the cidToGidMap, if it exists, when building the glyph mapping for non-embedded composite fonts (issue 12418) 2020-09-28 14:40:43 +02:00
Calixte Denizet
5af352e65a Need to reset the streams when printing 2020-09-24 19:13:09 +02:00
Jonas Jenwald
2497e8eab9 Prevent errors if the InkList property, in InkAnnotations, is missing and/or not an Array (issue 12392)
To prevent a future bug, the `Vertices` property in PolylineAnnotations are handled the same way.
2020-09-19 15:34:32 +02:00
Calixte Denizet
d51e7e86ff Use the same kind of strings for radio values 2020-09-16 18:47:25 +02:00
Tim van der Meij
374aad77c4
Merge pull request #12375 from Snuffleupagus/emptyDict-set
Ensure that the empty dictionary won't be accidentally modified, and slightly improve the "SaveDocument" handler in `src/core/worker.js`
2020-09-15 23:04:57 +02:00
Calixte Denizet
16dd5403c7 Set parent of radio annotation even if there is no 'V' field 2020-09-15 14:41:57 +02:00
Jonas Jenwald
ed4e7cd8a4 A couple of small improvements in the "SaveDocument" handler in src/core/worker.js
- Check that the "Info"-entry, in the XRef-trailer, is actually a dictionary before accessing it. This is similar to the `PDFDocument.documentInfo` method and follows the general principal of validating data carefully before accessing it, given how often PDF-software may create corrupt PDF files.

 - Slightly simplify the "XFA"-lookup, since there's no point in trying to fetch something from the empty dictionary.
2020-09-15 09:57:40 +02:00
Jonas Jenwald
a531c98cd2 Ensure that the empty dictionary won't be accidentally modified
Currently there's nothing that prevents modification of the `Dict.empty` primitive, which obviously needs to be *truly* empty to prevent any future (hard to find) bugs.
2020-09-15 09:29:00 +02:00
Tim van der Meij
b0c7a74a0c
Merge pull request #12361 from Snuffleupagus/_getSaveFieldResources
Ensure that all necessary /Font resources are included when saving a `WidgetAnnotation`-instance (issue 12294)
2020-09-15 00:09:31 +02:00
Tim van der Meij
3ecd984758
Implement resetting of created streams for annotations 2020-09-14 23:08:50 +02:00
Jonas Jenwald
c992b8e460 Ensure that all necessary /Font resources are included when saving a WidgetAnnotation-instance (issue 12294)
This patch contains a possible approach for fixing issue 12294, which compared to other PRs is purposely limited to the affected `WidgetAnnotation` code.

As mentioned elsewhere, considering that we're (at least for now) trying to fix *one specific* case, I think that we should avoid modifying the `Dict` primitive[1] and/or avoid a solution that (indirectly) modifies an existing `Dict`-instance[2].
This patch simply fixes the issue at hand, since that seems easiest for now, and I'd suggest that we worry about a more general approach if/when that actually becomes necessary.

Hence the solution implemented here, for `WidgetAnnotation`, is to simply use a combination of the local *and* AcroForm /DR resources during OperatorList-parsing to ensure that things work correctly regardless of where a particular /Font resource is found.
For saving of form-data, on the other hand, we want to avoid increasing the file-size unnecessarily and need to be smarter than just merging all of the available resources. To achive this, a new `WidgetAnnotation._getSaveFieldResources` method will when necessary produce a combined resources `Dict` with only the minimum amount of data from the AcroForm /DR resources included.

---
[1] You want to avoid anything that could cause the general `Dict` implementation to become slower, or more complex, just for handling an edge-case in my opinion.

[2] If an existing `Dict`-instance is modified unexpectedly, that could very easily lead to problems elsewhere since e.g. `Dict`-instances created during parsing are not expected to be changed.
2020-09-14 15:22:40 +02:00