Commit Graph

4206 Commits

Author SHA1 Message Date
Jonas Jenwald
bd3b15b897 Use the cidToGidMap, if it exists, when building the glyph mapping for non-embedded composite fonts (issue 12418) 2020-09-28 14:40:43 +02:00
Tim van der Meij
120c5c2261
Merge pull request #12409 from Snuffleupagus/bug-1627030
Compute the `transformOrigin` correctly, for negative values, when rendering `AnnotationElement`s (bug 1627030)
2020-09-24 23:48:21 +02:00
Calixte Denizet
5af352e65a Need to reset the streams when printing 2020-09-24 19:13:09 +02:00
Jonas Jenwald
fca53a8eb0 Compute the transformOrigin correctly, for negative values, when rendering AnnotationElements (bug 1627030)
This changes the `transformOrigin` calculations in `AnnotationElement._createContainer` and `PopupAnnotationElement.render`, to ensure that e.g. the clickable area of annotations and/or popups are both positioned correctly.

The problem occurs for *negative* values, since they're not negated correctly because of how the `transformOrigin` strings were build; see issue 12406 for a more in-depth explanation. Previously, for negative values, the `transformOrigin` strings would thus be ignored since they're not valid.
2020-09-24 10:28:29 +02:00
Jonas Jenwald
2497e8eab9 Prevent errors if the InkList property, in InkAnnotations, is missing and/or not an Array (issue 12392)
To prevent a future bug, the `Vertices` property in PolylineAnnotations are handled the same way.
2020-09-19 15:34:32 +02:00
Calixte Denizet
d51e7e86ff Use the same kind of strings for radio values 2020-09-16 18:47:25 +02:00
Tim van der Meij
558d3870d3
Merge pull request #12369 from emilio/better-cancelation-follow-up
canvas: fix restore() with existing SMask groups and re-land #12363.
2020-09-15 23:19:17 +02:00
Tim van der Meij
374aad77c4
Merge pull request #12375 from Snuffleupagus/emptyDict-set
Ensure that the empty dictionary won't be accidentally modified, and slightly improve the "SaveDocument" handler in `src/core/worker.js`
2020-09-15 23:04:57 +02:00
Calixte Denizet
16dd5403c7 Set parent of radio annotation even if there is no 'V' field 2020-09-15 14:41:57 +02:00
Jonas Jenwald
ed4e7cd8a4 A couple of small improvements in the "SaveDocument" handler in src/core/worker.js
- Check that the "Info"-entry, in the XRef-trailer, is actually a dictionary before accessing it. This is similar to the `PDFDocument.documentInfo` method and follows the general principal of validating data carefully before accessing it, given how often PDF-software may create corrupt PDF files.

 - Slightly simplify the "XFA"-lookup, since there's no point in trying to fetch something from the empty dictionary.
2020-09-15 09:57:40 +02:00
Jonas Jenwald
a531c98cd2 Ensure that the empty dictionary won't be accidentally modified
Currently there's nothing that prevents modification of the `Dict.empty` primitive, which obviously needs to be *truly* empty to prevent any future (hard to find) bugs.
2020-09-15 09:29:00 +02:00
Tim van der Meij
b0c7a74a0c
Merge pull request #12361 from Snuffleupagus/_getSaveFieldResources
Ensure that all necessary /Font resources are included when saving a `WidgetAnnotation`-instance (issue 12294)
2020-09-15 00:09:31 +02:00
Tim van der Meij
9d7b1d89ca
Merge pull request #12370 from timvandermeij/annotation-reset
Implement resetting of created streams for annotations
2020-09-14 23:16:17 +02:00
Tim van der Meij
3ecd984758
Implement resetting of created streams for annotations 2020-09-14 23:08:50 +02:00
Calixte Denizet
0c8de5aaf9 Replace \n and \r by \n and \r when saving a string 2020-09-14 17:34:39 +02:00
Jonas Jenwald
c992b8e460 Ensure that all necessary /Font resources are included when saving a WidgetAnnotation-instance (issue 12294)
This patch contains a possible approach for fixing issue 12294, which compared to other PRs is purposely limited to the affected `WidgetAnnotation` code.

As mentioned elsewhere, considering that we're (at least for now) trying to fix *one specific* case, I think that we should avoid modifying the `Dict` primitive[1] and/or avoid a solution that (indirectly) modifies an existing `Dict`-instance[2].
This patch simply fixes the issue at hand, since that seems easiest for now, and I'd suggest that we worry about a more general approach if/when that actually becomes necessary.

Hence the solution implemented here, for `WidgetAnnotation`, is to simply use a combination of the local *and* AcroForm /DR resources during OperatorList-parsing to ensure that things work correctly regardless of where a particular /Font resource is found.
For saving of form-data, on the other hand, we want to avoid increasing the file-size unnecessarily and need to be smarter than just merging all of the available resources. To achive this, a new `WidgetAnnotation._getSaveFieldResources` method will when necessary produce a combined resources `Dict` with only the minimum amount of data from the AcroForm /DR resources included.

---
[1] You want to avoid anything that could cause the general `Dict` implementation to become slower, or more complex, just for handling an edge-case in my opinion.

[2] If an existing `Dict`-instance is modified unexpectedly, that could very easily lead to problems elsewhere since e.g. `Dict`-instances created during parsing are not expected to be changed.
2020-09-14 15:22:40 +02:00
Emilio Cobos Álvarez
bf8b1adf73 canvas: Properly restore all the remaining items in stateStack in endDrawing.
We were correctly finishing the SMask group but not restoring all the extra
transformations applied in stateStack, so if somebody ends up drawing to the
same context after canceling mid-draw we'd get artifacts.

This re-lands #12363 and fixes Mozilla bug 1664178[1].

[1]: https://bugzilla.mozilla.org/show_bug.cgi?id=1664178
2020-09-12 16:37:54 +02:00
Emilio Cobos Álvarez
3a277f3ba5 canvas: restore() should reflect that smask groups are finished when stateStack is empty.
This fixes the issue that caused #12363 to get reverted, see #12367.
When we end the SMask group and stateStack.length is zero, nothing updates
this.current to reflect it.
2020-09-12 16:37:54 +02:00
Jonas Jenwald
f43d1b316b
Revert "canvas: Properly restore all the remaining items in stateStack in endDrawing" 2020-09-12 16:15:33 +02:00
Tim van der Meij
cdac6f4e68
Merge pull request #12363 from emilio/better-cancelation
canvas: Properly restore all the remaining items in stateStack in endDrawing
2020-09-12 15:03:34 +02:00
Emilio Cobos Álvarez
ef1e9a1a3e
canvas: Properly restore all the remaining items in stateStack in endDrawing.
We were correctly finishing the SMask group but not restoring all the extra
transformations applied in stateStack, so if somebody ends up drawing to the
same context after canceling mid-draw we'd get artifacts.

This fixes Mozilla bug 1664178[1].

[1]: https://bugzilla.mozilla.org/show_bug.cgi?id=1664178
2020-09-12 13:50:56 +02:00
Tim van der Meij
dfebe7b907
Merge pull request #12365 from Snuffleupagus/forbid-DecodeStream.length
Ensure that the `length` property won't be *accidentally* accessed on a `DecodeStream`-instance
2020-09-11 22:18:30 +02:00
Jonas Jenwald
a11b7341a1 Ensure that the length property won't be *accidentally* accessed on a DecodeStream-instance
For these streams, compared to `Stream` and `ChunkedStream`, there's no well defined concept of length and consequently no `length` getter.[1] However, attempting to access the non-existent `length` won't currently error, but just return `undefined`, which could thus easily lead to bugs elsewhere in the code-base.

---
[1] However, note that *all* stream implementations have an `isEmpty` getter which can be used instead.
2020-09-11 13:25:40 +02:00
Calixte Denizet
fc154590e8 Dict keys need to be escaped too when saving 2020-09-11 12:25:05 +02:00
Tim van der Meij
8cfcd7a488
Merge pull request #12360 from calixteman/12359
Reset cursor position when focus is out of text field
2020-09-10 23:11:35 +02:00
Calixte Denizet
dc4eb71ff1 PDF names need to be escaped when saving 2020-09-10 16:08:13 +02:00
Calixte Denizet
44b24fcc29 Reset cursor position when focus is out of text field 2020-09-10 10:37:13 +02:00
Tim van der Meij
f9d56320f5
Merge pull request #12349 from calixteman/followup_12344
Follow-up of pr #12344
2020-09-09 23:40:53 +02:00
Calixte Denizet
908e7ae5e4 Set the modification date to the current day when saving 2020-09-09 19:06:39 +02:00
Calixte Denizet
64a6efd95e Follow-up of pr #12344 2020-09-09 11:46:02 +02:00
Brendan Dahl
e51e9d1f33
Merge pull request #12345 from calixteman/save_btn
Don't try to save something for a button which is neither a checkbox nor a radio
2020-09-08 15:44:04 -07:00
calixteman
68b99c59ee
Save form data in XFA datasets when pdf is a mix of acroforms and xfa (#12344)
* Move display/xml_parser.js in shared to use it in worker

* Save form data in XFA datasets when pdf is a mix of acroforms and xfa

Co-authored-by: Brendan Dahl <brendan.dahl@gmail.com>
2020-09-08 15:13:52 -07:00
Calixte Denizet
7e5026dfc5 Don't try to save something for a button which is neither a checkbox nor a radio 2020-09-08 20:47:46 +02:00
Tim van der Meij
20c891542b
Merge pull request #12269 from calixteman/highlight
Add support for missing appearances for hightlights, strikeout, squiggly and underline annotations.
2020-09-06 22:25:36 +02:00
Jonas Jenwald
babeae9448 Remove, manually implemented, DOM polyfills only necessary for IE 11 support
Please refer to the following compatibility information:
 - https://developer.mozilla.org/en-US/docs/Web/API/ChildNode/remove#Browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/API/DOMTokenList/add#Browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/API/DOMTokenList/remove#Browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/API/DOMTokenList/toggle#Browser_compatibility

Finally, for the `pushState`/`replaceState` polyfills, please refer to PRs 10461 and 11318 for additional details.
2020-09-06 18:24:17 +02:00
Calixte Denizet
65ecd981fe Add support for missing appearances for hightlights, strikeout, squiggly and underline annotations. 2020-09-06 15:40:15 +02:00
Tim van der Meij
50958c46f7
Merge pull request #12331 from Snuffleupagus/Promise-polyfill
[api-minor] Only support browsers/environments that have *basic* support for `Promise` natively
2020-09-06 14:19:55 +02:00
Jonas Jenwald
449c7763d5 [api-minor] Only support browsers/environments that have *basic* support for Promise natively
Based on https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise#Browser_compatibility and https://caniuse.com/#feat=promises, all even remotely modern browsers already support *basic* `Promise` functionality natively.

The only reason for keeping the `Promise` polyfill (at all) is to be able to support recent additions to the specification, such as e.g. `finally` and `allSettled`.
Note that this patch will, on its own, remove support for IE 11/Edge (non-Chromium based) in both the general PDF.js library and the default viewer.
2020-09-06 13:45:56 +02:00
Jonas Jenwald
6c8f1f7d6f Run gulp lint --fix, to account for changes in Prettier version 2.1.x 2020-09-06 12:23:59 +02:00
Jonas Jenwald
784a420027 Add support, in Dict.merge, for merging of "sub"-dictionaries
This allows for merging of dictionaries one level deeper than previously. This could be useful e.g. for /Resources dictionaries, where you want to e.g. merge their respective /Font dictionaries (and other) together rather than picking just the first one.
2020-08-30 23:18:32 +02:00
Jonas Jenwald
66aabe3ec7 [api-minor] Add support for toggling of Optional Content in the viewer (issue 12096)
*Besides, obviously, adding viewer support:* This patch attempts to improve the general API for Optional Content Groups slightly, by adding a couple of new methods for interacting with the (more complex) data structures of `OptionalContentConfig`-instances. (Thus allowing us to mark some of the data as "private", given that it probably shouldn't be manipulated directly.)

By utilizing not just the "raw" Optional Content Groups, but the data from the `/Order` array when available, we can thus display the Layers in a proper tree-structure with collapsible headings for PDF documents that utilizes that feature.

Note that it's possible to reset all Optional Content Groups to their default visibility state, simply by double-clicking on the Layers-button in the sidebar.
(Currently that's indicated in the Layers-button tooltip, which is obviously easy to overlook, however it's probably the best we can do for now without adding more buttons, or even a dropdown-toolbar, to the sidebar.)

Also, the current Layers-button icons are a little rough around the edges, quite literally, but given that the viewer will soon have its UI modernized anyway they hopefully suffice in the meantime.

To give users *full* control of the visibility of the various Optional Content Groups, even those which according to the `/Order` array should not (by default) be toggleable in the UI, this patch will place those under a *custom* heading which:
 - Is collapsed by default, and placed at the bottom of the Layers-tree, to be a bit less obtrusive.
 - Uses a slightly different formatting, compared to the "regular" headings.
 - Is localizable.

Finally, note that the thumbnails are *purposely* always rendered with all Optional Content Groups at their default visibility state, since that seems the most useful and it's also consistent with other viewers.
To ensure that this works as intended, we'll thus disable the `PDFThumbnailView.setImage` functionality when the Optional Content Groups have been changed in the viewer. (This obviously means that we'll re-render thumbnails instead of using the rendered pages. However, this situation ought to be rare enough for this to not really be a problem.)
2020-08-30 16:28:40 +02:00
Jonas Jenwald
2393443e73 Include the /Order array, if available, when parsing the Optional Content configuration
The `/Order` array is used to improve the display of Optional Content groups in PDF viewers, and it allows a PDF document to e.g. specify that Optional Content groups should be displayed as a (collapsable) tree-structure rather than as just a list.

Note that not all available Optional Content groups must be present in the `/Order` array, and PDF viewers will often (by default) hide those toggles in the UI.
To allow us to improve the UX around toggling of Optional Content groups, in the default viewer, these hidden-by-default groups are thus appended to the parsed `/Order` array under a *custom* nesting level (with `name == null`).

Finally, the patch also slightly tweaks an `OptionalContentConfig` related JSDoc-comment in the API.
2020-08-30 16:28:40 +02:00
Tim van der Meij
06b53d770a
Merge pull request #12259 from brendandahl/cmap-fix
Fix handling of symbolic fonts and unicode cmaps.
2020-08-30 16:01:24 +02:00
Brendan Dahl
45e8a31cc0 Fix handling of symbolic fonts and unicode cmaps.
In issue 12120, the font has a 1,0 cmap and is marked symbolic which
according to the spec means we should directly use the cmap instead of
the extra steps that are defined in 9.6.6.4.

However, just fixing that caused bug 1057544 to break. The font in bug
1057544 has a 0,1 cmap (Unicode 1.1) which we were not using, but is
easy to support. We're also easily able to support some of the other
unicode cmaps, so I added those as well.

There was also a second issue with bug 1057544, the cmap doesn't have
a mapping for the "quoteright" glyph, but it is defined in the post
table. To handle this, I've moved post table as a  fallback for any
font that has an encoding.
2020-08-27 14:33:11 -07:00
Calixte Denizet
ba94f04ba3 Bug 1661226 - Push button are not rendered with renderInteractiveForms enabled 2020-08-27 10:45:14 +02:00
Tim van der Meij
0f229d537f
Inline the setup method in the parse method in src/core/document.js
Now that the `parse` method is simplified we can inline the `setup`
method in the `parse` method since it's only two lines of code. This
avoids some indirection.
2020-08-25 23:28:55 +02:00
Tim van der Meij
280207c740
Redo the form type detection logic and include unit tests
Good form type detection is important to get reliable telemetry and to
only show the fallback bar if a form cannot be filled out by the user.

PDF.js only supports AcroForm data, so XFA data is explicitly unsupported
(tracked in issue #2373). However, the previous form type detection
couldn't separate AcroForm and XFA well enough, causing form type
telemetry to be incorrect sometimes and the fallback bar to be shown for
forms that could in fact be filled out by the user.

The solution in this commit is found by studying the specification and
the form documents that are available to us. In a nutshell the rules are:

- There is XFA data if the `XFA` entry is a non-empty array or stream.
- There is AcroForm data if the `Fields` entry is a non-empty array and
  it doesn't consist of only document signatures.

The document signatures part was not handled in the old code, causing a
document with only XFA data to also be marked as having AcroForm data.
Moreover, the old code didn't check all the data types.

Now that AcroForm and XFA can be distinguished, the viewer is configured
to only show the fallback bar for documents that only have XFA data. If
a document also has AcroForm data, the viewer can use that to render the
form. We have not found documents where the XFA data was necessary in
that case.

Finally, we include unit tests to ensure that all cases are covered and
move the form type detection out of the `parse` function so that it's
only executed if the document information is actually requested
(potentially making initial parsing a tiny bit faster).
2020-08-25 23:28:55 +02:00
Tim van der Meij
f0bf62ff54
Mark the catDict member as private in the Catalog class
Not only is `catDict` never accessed anymore outside of this file, it
should also never happen since it's internal to the catalog. If data
from it is needed elsewhere, the catalog should provide a getter for it
that can do basic data integrity checks and abstract away any
unnecessary details.
2020-08-25 23:28:55 +02:00
Tim van der Meij
f20f0bcc78
Move the AcroForm logic from the document to the catalog
The `AcroForm` entry is part of the catalog, not of the document, so its
logic should be placed there instead. The document should look in the
catalog to fetch it, and not have knowledge of `catDict`, which is a
member internal to the catalog.

Moreover, make the AcroForm member private on the document instance. It's
only used internally and was also never intended to be public. For users
it's exposed by the `getMetadata` API endpoint as `IsAcroFormPresent`.
Only a boolean is exposed, so we now also only store the boolean on the
document instance.

Finally, the annotation code needs access to the full AcroForm
dictionary, so it's updated to fetch the data from the catalog instead
of the document that now only holds the boolean.
2020-08-25 23:28:55 +02:00
Tim van der Meij
b41a2f4d5a
Move the collection logic from the document to the catalog
The `Collection` entry is part of the catalog, not of the document, so
its logic should be placed there instead. The document should look in the
catalog to fetch it, and not have knowledge of `catDict`, which is a
member internal to the catalog.

Moreover, remove the collection member from the document instance. It's
only used internally and was also never intended to be public. For users
it's exposed by the `getMetadata` API endpoint as `IsCollectionPresent`.
Moving this out of the `parse` function makes sure that the getter is
only executed if the document information is actually requested
(potentially making initial parsing a tiny bit faster).
2020-08-25 23:28:55 +02:00
Tim van der Meij
935d95b462
Move the version logic from the document to the catalog
The `Version` entry is part of the catalog, not of the document, so its
logic should be placed there instead. The document should look in the
catalog to fetch it, and not have knowledge of `catDict`, which is a
member internal to the catalog.

Moreover, make the version member private on the document instance. It's
only used internally and was also never intended to be public. For users
it's exposed by the `getMetadata` API endpoint as `PDFFormatVersion`.

Finally, clarify how the version from the header and the version from
the catalog are treated using a comment.
2020-08-25 23:28:55 +02:00
Jonas Jenwald
bd16c363ce Access the Catalog data correctly in the "GetPageIndex" handler in src/core/worker.js
Even though the code obviously works as-is, given that we have unit-tests for it, it still feels incorrect to just *assume* that the `Catalog`-instance has all of its properties immediately available. Especially when (almost) all of the other handlers, in `src/core/worker.js`, protect their data accesses with appropriate `pdfManager.ensure` calls.
2020-08-25 12:14:14 +02:00
Jonas Jenwald
2e6e2c3b41 Access the XRef data correctly in the "GetStats" handler in src/core/worker.js
Even though the code obviously works as-is, given that we have unit-tests for it, it still feels incorrect to just *assume* that the `XRef`-instance has all of its properties immediately available. Especially when (almost) all of the other handlers, in `src/core/worker.js`, protect their data accesses with appropriate `pdfManager.ensure` calls.
2020-08-25 12:14:11 +02:00
Jani Pehkonen
e7febbf0f7 Accent positioning in Type1 seac glyphs
In `display/canvas.js` the accent offsets must be multiplied by `fontSize` to make the offsets large enough. Another problem is in `core/type1_parser.js` when the Type1 command `seac` is handled. There is an error in the Adobe Type1 spec. See chapter 6 in Type1 Font Format Supplement, which provides an errata: The arguments of `seac` specify the offset of the left side bearing (LSB) points, not the offset of origins. This can be fixed in `core/type1_parser.js` by adding the difference of the LSB values.
2020-08-23 21:01:25 +03:00
Tim van der Meij
7df8aa34a5
Merge pull request #12263 from timvandermeij/acroform-fixes
Fix AcroForm printing/saving edge cases
2020-08-23 13:37:30 +02:00
Tim van der Meij
a8efc0296b
Obtain the export values for choice widgets from the normal appearance
The down appearance (`D`) is optional and not available in the document
from #12233, so the checkboxes are never saved/printed as checked
because the checked appearance is based on the export value that is
missing because the `D` entry is not available.

Instead, we should use the normal appearance (`N`) since that one is
required and therefore always available.

Finally, the /Off appearance is optional according to section 12.7.4.2.3
of the specification, so that needs to be taken into account to match
the specification and to fix reference test failures for the
`annotation-button-widget-print` test. That is a file that doesn't
specify an /Off appearance in the normal appearance dictionary.
2020-08-23 13:00:02 +02:00
Tim van der Meij
1b82ad8fff
Decode widget form values consistently
The helper method `_decodeFormValue` is used to ensure that it happens
in one place. Note that form values are field values, display values
and export values.
2020-08-23 13:00:01 +02:00
Jonas Jenwald
fa02808f76 Mark the setModified method, on AnnotationStorage, as "private" (PR 12241 follow-up)
Since it shouldn't be called manually, we can just mark it as "private".
2020-08-22 20:04:25 +02:00
Jonas Jenwald
1f5021d76a Prevent errors if PDFDocumentProxy.saveDocument is called without the annotationStorage parameter (PR 12241 follow-up)
Obviously it doesn't make sense to call that method without providing an `AnnotationStorage`-instance, however we should ensure that doing so won't cause errors.
Hence we need to check that `annotationStorage` is actually defined, before attempting to call its `resetModified` method.
2020-08-22 18:09:17 +02:00
Tim van der Meij
36e149800e
Unconditionally set the field value for choice widgets in the annotation storage
This commit makes the following improvements:

- The code is similar to the other interactive form widgets now, with a
  clear note for the only difference.
- Calling `getOrCreateValue` unconditionally ensures that choice widgets
  always have a value in the annotation storage. Previously we only
  inserted a value in the annotation storage when an option matched or
  when a selection was changed. However, this causes breakage when
  saving/printing because comboboxes, which we don't fully support yet
  but are rendered, might not have a value in storage at all. Their
  field value might not match any option since it allows the user to
  enter a custom value.
- Calling `getOrCreateValue` unconditionally ensures that forms with
  choice widgets no longer always trigger a warning when the user
  navigates away from the page. This fixes
  https://github.com/mozilla/pdf.js/pull/12241#discussion_r474279654
2020-08-22 16:01:33 +02:00
Jonas Jenwald
a8de614a9f Also enable renderInteractiveForms by default in the viewer components (PR 12201 follow-up)
Given that `renderInteractiveForms` is now enabled by default in "full" viewer, it seems reasonable to enable it by default in the viewer components as well.
Especially considering that it's simple to disable, when creating the affected components, for anyone implementing their own viewer.
2020-08-22 14:24:04 +02:00
Tim van der Meij
5fed7112a2
Use the export value instead of the display value for choice widget option selection
The export value is used when the document is saved, so it should also
be used when the document is opened to determine which choice widget
option is selected. The display value is, as the name implies, only to
be used for viewer display purposes and not for other logic.

This makes sure that in the document from #12233 the "Favourite colour"
choice widget is correctly initialized with "Red" instead of "Black"
because the field value is equal to the export value (always the case),
but not the display value (not always the case). Moreover, saving now
also correctly uses the export value and not the display value.
2020-08-22 14:11:41 +02:00
Tim van der Meij
3c790936c1
Merge pull request #12247 from timvandermeij/acroform-choice-null
Improve the field value parsing for choice widgets to handle `null` values
2020-08-21 23:17:20 +02:00
Brendan Dahl
8023175103 Support file save triggered from the Firefox integrated version.
Related to https://bugzilla.mozilla.org/show_bug.cgi?id=1659753

This allows Firefox trigger a "save" event from ctrl/cmd+s or the "Save
Page As" context menu, which in turn lets pdf.js generate a new PDF if
there is form data to save.

I also now use `sourceEventType` on downloads so Firefox can determine if
it should launch the "open with" dialog or "save as" dialog.
2020-08-20 18:05:08 -07:00
Aki Sasaki
83365a3756 confirm if leaving a modified form without saving 2020-08-20 17:23:06 -07:00
Tim van der Meij
12c20772ac
Improve the field value parsing for choice widgets to handle null values
The specification states that the field value is `null` if no item is
selected and we didn't handle this case properly. Even though this did
not break the rendering because we always convert the value to an array
and the `includes` check in the display layer would simply not match,
the field value would be `[null]` which is not expected and strange from
an API perspective.

This commit fixes that by ensuring that we return an empty array in
case the field value is `null`. The API therefore still always gives an
array for the field value, but now the code is more specific so that the
value is either an empty array or an array of strings.
2020-08-19 23:27:50 +02:00
Jonas Jenwald
1058f16605 Add (basic) support for transfer functions to Images (issue 6931, bug 1149713)
This is *similar* to the existing transfer function support for SMasks, but extended to simple image data.
Please note that the extra amount of data now being sent to the worker-thread, for affected /ExtGState entries, is limited to *at most* 4 `Uint8Array`s each with a length of 256 elements.

Refer to https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G9.1658137 for additional details.
2020-08-17 10:34:12 +02:00
Jonas Jenwald
9d3e046a4f Don't cache /ExtGState entries that contain fonts (PR 12087 follow-up)
I completely overlooked the fact that `PartialEvaluator.handleSetFont` also updates the current `state`, which means that currently we're not actually handling font data correctly for cached /ExtGState data. (Thankfully, using /ExtGState to set a font is somewhat rare in practice.)
2020-08-17 08:17:25 +02:00
Jonas Jenwald
b26d736809 Ensure that the "DocException" message handler, in the API, will always either error or warn (depending on the build) if a valid Error isn't found
Having this present would have made debugging issues 11941 and 12209 so much quicker and easier.
2020-08-13 13:17:30 +02:00
Calixte Denizet
1a6816ba98 Add support for saving forms 2020-08-12 10:32:59 +02:00
Tim van der Meij
57c988853b
Merge pull request #12192 from Snuffleupagus/misc-AnnotationStorage-improvements
A couple of (small) tweaks of the `AnnotationStorage` (PR 12173 follow-up)
2020-08-11 23:46:13 +02:00
Brendan Dahl
7fb01f9f2a
Merge pull request #12186 from brendandahl/loca-2
Fix bad truetype loca tables.
2020-08-10 20:34:19 -07:00
Brendan Dahl
f6dff81223 Fix bad truetype loca tables.
Some fonts have loca tables that aren't sorted or use 0 as an offset to
signal a missing glyph. This fixes the bad loca tables by sorting them
and then rewriting the loca table and potentially re-ordering the glyf
table to match.

Fixes #11131 and bug 1650302.
2020-08-10 14:15:49 -07:00
Jonas Jenwald
4d351eab93 A couple of (small) tweaks of the AnnotationStorage (PR 12173 follow-up)
- Initialize the `AnnotationStorage`-instance, on `PDFDocumentProxy`, lazily.
 - Change the `AnnotationStorage` to use a `Map` internally, rather than a regular Object (simplifies the following points).
 - Let `AnnotationStorage.getAll` return `null` when there's no data stored, to avoid unnecessary parsing on the worker-thread. This ought to "just work", since the worker-thread code *should* already handle the `!annotationStorage` case everywhere.
 - Add a new `AnnotationStorage.size` getter, to be able to easily tell if there's any data stored.
2020-08-10 17:07:24 +02:00
Calixte Denizet
88b112ab0c Support comb textfields for printing 2020-08-09 14:41:26 +02:00
Tim van der Meij
b061c300b4
Merge pull request #12176 from calixteman/multiline
Support multiline textfields for printing
2020-08-09 13:37:36 +02:00
Calixte Denizet
cd8bb7293b Support multiline textfields for printing 2020-08-09 12:14:34 +02:00
Jonathan Grimes
ac723a1760 Allow loading pdf fonts into another document. 2020-08-08 02:52:32 +00:00
Takashi Tamura
4ac62d8787 Fix the type of PDFDocumentLoadingTask.destroy. 2020-08-07 16:10:19 +09:00
Tim van der Meij
8c162f57f7
Merge pull request #12175 from calixteman/textfield
Support textfield and choice widgets for printing
2020-08-07 00:20:29 +02:00
Calixte Denizet
1747d259f9 Support textfield and choice widgets for printing 2020-08-06 14:45:23 +02:00
Jonas Jenwald
16fa9dc4ea Add support for Object.fromEntries
This provides a simpler way of creating an `Object` from e.g. a `Map`, without having to manually iterate over it.
Please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/fromEntries
2020-08-06 14:39:51 +02:00
Jonas Jenwald
5e44b241b2 [api-minor] Fix the annotationStorage parameter in PDFPageProxy.render
While the parameter name (clearly) suggests that an `AnnotationStorage`-instance is expected, looking at the only call-sites that include the parameter (i.e. the `PDFPrintServiceFactory` instances) it actually contains just a normal Object.

Hence it seems much more reasonable to actually pass a valid `AnnotationStorage`-instance, as the name suggests, and simply have `PDFPageProxy.render` do the `annotationStorage.getAll()` call. (Since we cannot send an `AnnotationStorage`-instance as-is to the worker-thread, given the "structured clone algorithm".)
2020-08-05 23:02:30 +02:00
Takashi Tamura
a0f0ab78f3 Fix the type definition of TypedArray. 2020-08-05 17:01:08 +09:00
Tim van der Meij
56ca027c08
Improve consistency for the API documentation comments
Over time we used multiple different formats for JSDoc comments. This
commit standardizes those formats to the one we used most often.

Moreover, this removes the example in the outline endpoint documentation
since it now has a proper type definition and it didn't render correctly
in JSDoc.
2020-08-04 23:27:22 +02:00
Tim van der Meij
ba4a07ce07
Fix incorrect types in the API documentation 2020-08-04 23:19:59 +02:00
Tim van der Meij
3116216e1d
Improve the API documentation for PDFDocumentLoadingTask
This commit:
- formats the documentation block according to the standards;
- replaces the callback definitions with the `function` type (we have
  that for other definitions already and the callback type was not
  rendered correctly by JSDoc);
- synchronizes the type documentation and the class documentation;
- fixes the documentation by making it easier to read and making sure
  that all optional properties are indicated as such;
- uses the `@link` tag to indicate links to other code.

The `typestest` still passes and JSDoc now renders this class correctly.
2020-08-04 23:17:24 +02:00
Brendan Dahl
ac494a2278 Add support for optional marked content.
Add a new method to the API to get the optional content configuration. Add
a new render task param that accepts the above configuration.
For now, the optional content is not controllable by the user in
the viewer, but renders with the default configuration in the PDF.

All of the test files added exhibit different uses of optional content.

Fixes #269.

Fix test to work with optional content.

- Change the stopAtErrors test to ensure the operator list has something,
  instead of asserting the exact number of operators.
2020-08-04 09:26:55 -07:00
Tim van der Meij
e68ac05f18
Merge pull request #12160 from tamuratak/worker_options
Use typedef to define the type of GlobalWorkerOptions (PR 12102 follow-up)
2020-08-03 22:55:49 +02:00
Tim van der Meij
0b75701012
Merge pull request #12157 from tamuratak/fix_svggraphics
Fix the type of SVGGraphics (PR 12102 follow-up)
2020-08-03 22:52:10 +02:00
Tim van der Meij
adc7645a44
Merge pull request #12161 from tamuratak/exported_func
Add types to functions exported as API in src/pdf.js (PR 12102 follow-up)
2020-08-03 22:43:50 +02:00
Takashi Tamura
923ba27f1f Tweak for the type of PageViewportParameters.viewBox 2020-08-03 20:42:42 +09:00
Takashi Tamura
bc4648c0a6 Add types to functions exported as API in src/pdf.js. 2020-08-03 19:19:48 +09:00
Takashi Tamura
f6fd8e9e7f Use typedef to define the type of GlobalWorkerOptions. 2020-08-03 19:06:28 +09:00
Takashi Tamura
d72bbecee2 Fix the type of SVGGraphics. 2020-08-03 09:58:19 +09:00
Tim van der Meij
00a8b42e67
Merge pull request #12102 from ineiti/add_types_annotations
Add types annotations
2020-08-02 16:45:37 +02:00
Tim van der Meij
5a66c56eca
Merge pull request #12108 from calixteman/radio
Add support for radios printing
2020-08-02 14:47:46 +02:00
Jonas Jenwald
05baa4c89f
Revert "[api-minor] Allow loading pdf fonts into another document." 2020-08-01 12:52:39 +02:00
Tim van der Meij
173b92a873
Merge pull request #12131 from jsg2021/issue-8271
[api-minor] Allow loading pdf fonts into another document.
2020-08-01 01:13:41 +02:00
Tim van der Meij
0d20a2b7b4
Merge pull request #12144 from Snuffleupagus/uncaught-promise-AbortException
Prevent `Uncaught (in promise) AbortException` when running the unit-tests
2020-08-01 00:22:10 +02:00
Jonas Jenwald
6d192f987e Prevent Uncaught (in promise) AbortException when running the unit-tests
These errors can/will occur if data is still loading when the document is destroyed, which is the case in the API unit-tests that load the `tracemonkey.pdf` file.
While this patch prevents these kind of problems, and thus allows us to update Jasmine again, I cannot help but thinking that it's slightly "hacky". Basically, we'll simply catch and ignore (some) rejected promises once the document is destroyed and/or its data loading is aborted. However, I don't *think* that these changes should cause issues in general, since we don't really care about errors once document destruction has started (note e.g. the fair number of `catch` handlers ignoring `AbortException`s already).
2020-07-31 23:29:05 +02:00
Jonathan Grimes
9b16b8ef71
Allow loading pdf fonts into another document. 2020-07-31 11:41:48 -05:00
Jonas Jenwald
346afd1e1c [api-minor] Fix the AnnotationStorage usage properly in the viewer/tests (PR 12107 and 12143 follow-up)
*The [api-minor] label probably ought to have been added to the original PR, given the changes to the `createAnnotationLayerBuilder` signature (if nothing else).*

This patch fixes the following things:
 - Let the `AnnotationLayer.render` method create an `AnnotationStorage`-instance if none was provided, thus making the parameter *properly* optional. This not only fixes the reference tests, it also prevents issues when the viewer components are used.
 - Stop exporting `AnnotationStorage` in the official API, i.e. the `src/pdf.js` file, since it's no longer necessary given the change above. Generally speaking, unless absolutely necessary we probably shouldn't export unused things in the API.
 - Fix a number of JSDocs `typedef`s, in `src/display/` and `web/` code, to actually account for the new `annotationStorage` parameter.
 - Update `web/interfaces.js` to account for the changes in `createAnnotationLayerBuilder`.
 - Initialize the storage, in `AnnotationStorage`, using `Object.create(null)` rather than `{}` (which is the PDF.js default).
2020-07-31 16:32:46 +02:00
Calixte Denizet
538017f7a7 Add support for radios printing 2020-07-31 14:31:49 +02:00
Aki Sasaki
7bb65bab7f fix reftests after #12107
The f1040-annotations reftest started hanging after #12107. We traced
this to `TypeError: can't access property "getOrCreateValue", storage is
undefined`.

We essentially need to add `annotationStorage` to the parameters in
test/driver.js.
2020-07-30 12:25:27 -07:00
Linus Gasser
f1bbfdc16d Add typescript definitions
This PR adds typescript definitions from the JSDoc already present.
It adds a new gulp-target 'types' that calls 'tsc', the typescript
compiler, to create the definitions.

To use the definitions, users can simply do the following:

```
import {getDocument, GlobalWorkerOptions} from "pdfjs-dist";
import pdfjsWorker from "pdfjs-dist/build/pdf.worker.entry";
GlobalWorkerOptions.workerSrc = pdfjsWorker;

const pdf = await getDocument("file:///some.pdf").promise;
```

Co-authored-by: @oBusk
Co-authored-by: @tamuratak
2020-07-30 11:10:37 +02:00
Tim van der Meij
eb4d6a0652
Merge pull request #12107 from calixteman/checkbox
Add support for checkboxes printing
2020-07-30 00:11:41 +02:00
Calixte Denizet
cb60523a15 Add support for checkboxes printing 2020-07-29 16:42:57 +02:00
Tim van der Meij
6537e64cb8
Merge pull request #12136 from Snuffleupagus/PDFFetchStreamRangeReader-AbortError
Ignore `fetch()` errors, in `PDFFetchStreamRangeReader`, once the request has been aborted
2020-07-29 00:09:51 +02:00
Jonas Jenwald
1b720a4b23 Ignore fetch() errors, in PDFFetchStreamRangeReader, once the request has been aborted
Besides making general sense, as far as I can tell, this patch should also prevent *one* source of `Uncaught (in promise) ...` exceptions.
Unfortunately `reason instanceof AbortError` doesn't work here, since `AbortError` isn't actually defined in browsers; note how even the DOM specification contains an example using the `name` property: https://dom.spec.whatwg.org/#aborting-ongoing-activities

This patch prevents the following errors from being logged in the console, when the unit-tests are running:
 - Firefox: `Uncaught (in promise) DOMException: The operation was aborted.`
 - Chrome: `Uncaught (in promise) DOMException: The user aborted a request.`
2020-07-28 17:18:49 +02:00
Jonas Jenwald
fbe90b63ec [src/core/worker.js] Remove a useless Promise handler from the pdfManagerReady function
Looking carefully at this code, you'll notice that the `loadDocument` function has no less than *three* Promise handling functions. This obviously makes no sense, since a Promise can only have one resolve and one reject handler.

Hence the final `onFailure`-case is unreachable, which only serves to add confusion when reading the code. Note that this code has been re-factored more than once over the years, but it seems as if this may even have been incorrect already in PR 3310 (and no-one have noticed for seven years :-).
2020-07-28 14:51:50 +02:00
Jonas Jenwald
835b5ffddd Only check isType3Font the first time that TranslatedFont.loadType3Data is called
If the `TranslatedFont.type3Loaded` property exists, then you already know that the font must be a Type3 one.
2020-07-27 13:20:15 +02:00
Jonas Jenwald
f3ff526019 Send/receive Type3 images the same way as other globally-cached images
There's quite frankly no particular reason to special-case Type3-fonts with image resources, which are very rare anyway, now that we have a general mechanism for sending/receiving images globally.
2020-07-27 13:20:15 +02:00
Jonas Jenwald
7c9d0d5939 Improve how Type3-fonts with dependencies are handled
While the `CharProcs` streams of Type3-fonts *usually* don't rely on dependencies, such as e.g. images, it does happen in some cases.

Currently any dependencies are simply appended to the parent operatorList, which in practice means *only* the operatorList of the *first* page where the Type3-font is being used.
However, there's one thing that's slightly unfortunate with that approach: Since fonts are global to the PDF document, we really ought to ensure that any Type3 dependencies are appended to the operatorList of *all* pages where the Type3-font is being used. Otherwise there's a theoretical risk that, if one page has its rendering paused, another page may try to use a Type3-font whose dependencies are not yet fully resolved. In that case there would be errors, since Type3 operatorLists are executed synchronously.

Hence this patch, which ensures that all relevant pages will have Type3 dependencies appended to the main operatorList. (Note here that the `OperatorList.addDependencies` method, via `OperatorList.addDependency`, ensures that a dependency is only added *once* to any operatorList.)

Finally, these changes also remove the need for the "waiting for the main-thread"-hack that was added to `PartialEvaluator.buildPaintImageXObject` as part of fixing issue 10717.
2020-07-27 13:20:13 +02:00
Calixte Denizet
584902dbf8 Add an annotation storage in order to save annotation data in acroforms 2020-07-24 10:50:11 +02:00
Jonas Jenwald
684a7b89ac Remove unnecessary duplication in the addChildren helper function (used by the ObjectLoader)
Besides being fewer lines of code overall, this also avoids *one* `node instanceof Dict` check for both of the `Dict`/`Stream`-cases.
2020-07-17 16:32:24 +02:00
Jonas Jenwald
ea8e432c45 Add a getRawValues method, to Dict instances, to provide an easier way of getting all *raw* values
When the old `Dict.getAll()` method was removed, it was replaced with a `Dict.getKeys()` call and `Dict.get(...)` calls (in a loop).
While this pattern obviously makes a lot of sense in many cases, there's some instances where we actually want the *raw* `Dict` values (i.e. `Ref`s where applicable). In those cases, `Dict.getRaw(...)` calls are instead used within the loop. However, by introducing a new `Dict.getRawValues()` method we can reduce the number of (strictly unnecessary) function calls by simply getting the *raw* `Dict` values directly.
2020-07-17 16:32:00 +02:00
Jonas Jenwald
6381b5b08f Add a size getter, to Dict instances, to provide an easier way of checking the number of entries
This removes the need to manually call `Dict.getKeys()` and check its length.
2020-07-17 16:06:11 +02:00
Tim van der Meij
e63d1ebff5
Merge pull request #12087 from Snuffleupagus/LocalGStateCache
Add local caching of "simple" Graphics State (ExtGState) data in `PartialEvaluator.{getOperatorList, getTextContent}` (issue 2813)
2020-07-17 16:02:45 +02:00
Tim van der Meij
b19a1796ac
Convert RefSetCache to a proper class and to use a Map internally
Using a `Map` instead of an `Object` provides some advantages such as
cheaper ways to get the size of the cache, to find out if an entry is
contained in the cache and to iterate over the cache. Moreover, we can
clear and re-use the same `Map` object now instead of creating a new
one.
2020-07-17 13:35:29 +02:00
Tim van der Meij
a604973cc7
Merge pull request #12085 from tamuratak/fix_isnodejs
Make the detection of Node.js environments on Electron strict.
2020-07-17 13:29:59 +02:00
Jonas Jenwald
b3480842b3 Use a RefSet, rather than a plain Object, for tracking already processed nodes in PartialEvaluator.hasBlendModes 2020-07-17 09:52:36 +02:00
Jonas Jenwald
03547b5633 Change PartialEvaluator.setGState to an async method
Since this method calls `Dict.get` to fetch data, there could thus be `Error`s thrown in corrupt PDF documents when attempting to resolve an indirect object.
To ensure that this won't ever become a problem, we change the method to be `async` such that a rejected Promise would be returned and general OperatorList parsing won't break.
2020-07-15 14:27:18 +02:00
Jonas Jenwald
f20aeb9343 Slightly simplify the code in PartialEvaluator.hasBlendModes, e.g. by using for...of loops
- Replace the existing loops with `for...of` variants instead.

 - Make use of `continue`, to reduce indentation and to make the code (slightly) easier to follow, when checking `/Resources` entries.
2020-07-15 12:47:11 +02:00
Jonas Jenwald
15fa3f8518 Remove a redundant /XObject stream dictionary objId check in PartialEvaluator.hasBlendModes (PR 6971 follow-up)
This case should no longer happen, given the `instanceof Ref` branch just above (added in PR 6971).
Also, I've run the entire test-suite locally with `continue` replaced by `throw new Error(...)` and didn't find any problems.
2020-07-15 12:47:11 +02:00
Jonas Jenwald
84476da26e Handle lookup errors "silently" in PartialEvaluator.hasBlendModes (PR 11680 follow-up)
Given that this method is used during what's essentially a *pre*-parsing stage, before the actual OperatorList parsing occurs, on second thought it doesn't seem at all necessary to warn and trigger fallback in cases where there's lookup errors.

*Please note:* Any any errors will still be either suppressed or thrown, according to the `ignoreErrors` option, during the *actual* OperatorList parsing.
2020-07-15 12:47:07 +02:00
Jonas Jenwald
981ff41b5f Add local caching of non-font Graphics State (ExtGState) data in PartialEvaluator.getTextContent
It turns out that `getTextContent` suffers from *similar* problems with repeated GStates as `getOperatorList`; please see the previous patch.

While only `/ExtGState` resources containing Fonts will actually be *parsed* by `PartialEvaluator.getTextContent`, we're still forced to fetch/validate repeated `/ExtGState` resources even though *most* of them won't affect the textContent (since they mostly contain purely graphical state).

With these changes we also no longer need to immediately reset the current text-state when encountering a `setGState` operator, which may thus improve text-selection in some cases.
2020-07-14 10:34:43 +02:00
Jonas Jenwald
90eb579713 Add local caching of "simple" Graphics State (ExtGState) data in PartialEvaluator.getOperatorList (issue 2813)
This patch will help pathological cases the most, with issue 2813 being a particularily problematic example. While there's only *four* `/ExtGState` resources, there's a total `29062` of `setGState` operators. Even though parsing of a single `/ExtGState` resource is quite fast, having to re-parse them thousands of times does add up quite significantly.

For simplicity we'll only cache "simple" `/ExtGState` resource, since e.g. the general `SMask` case cannot be easily cached (without re-factoring other code, which may have undesirable effects on general parsing).

By caching "simple" `/ExtGState` resource, we thus improve performance by:
 - Not having to fetch/validate/parse the same `/ExtGState` data over and over.
 - Handling of repeated `setGState` operators becomes *synchronous* during the `OperatorList` building, instead of having to defer to the event-loop/microtask-queue since the `/ExtGState` parsing is done asynchronously.

---

Obviously I had intended to include (standard) benchmark results with this patch, but for reasons I don't understand the test run-time (even with `master`) of the document in issue 2813 is *a lot* slower than in the development viewer (making normal benchmarking infeasible).
However, testing this manually in the development viewer (using `pdfBug=Stats`) shows a *reduction* of `~10 %` in the rendering time of the PDF document in issue 2813.
2020-07-14 10:34:43 +02:00
Jonas Jenwald
d4d7ac1b88 Stop special-casing the (very unlikely) "no /XObject found"-scenario, when parsing OPS.paintXObject operators, in PartialEvaluator.{getOperatorList, getTextContent}
Originally there weren't any (generally) good ways to handle errors gracefully, on the worker-side, however that's no longer the case and we can simply fallback to the existing `ignoreErrors` functionality instead.
Also, please note that the "no `/XObject` found"-scenario should be *extremely* unlikely in practice and would only occur in corrupt/broken documents.

Note that the `PartialEvaluator.getOperatorList` case is especially bad currently, since we'll simply (attempt to) send the data as-is to the main-thread. This is quite bad, since in a corrupt/broken document the data *could* contain anything and e.g. be unclonable (which would cause breaking errors).
Also, we're (obviously) not attempting to do anything with this "raw" `OPS.paintXObject` data on the main-thread and simply ensuring that we never send it definately seems like the correct approach.
2020-07-12 21:59:59 +02:00
Takashi Tamura
473ea1f1a4 Make the detection of Node.js environments on Electron strict.
The main process and its child processes should be detected as Node.js environments.
2020-07-12 10:56:17 +09:00
Tim van der Meij
7dabc5ecc8
Merge pull request #12063 from Snuffleupagus/issue-10989
Tweak the heuristic, in `src/core/jpg.js`, that handles JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10989)
2020-07-11 00:05:11 +02:00
Jonas Jenwald
d18cf47419 Remove the special handling, used when creating Indexed ColorSpaces, for the case where the lookup-data is a Stream
This special-case was added in PR 1992, however it became unnecessary with the changes in PR 4824 since all of the ColorSpace parsing is now done on the worker-thread (with only RGB-data being sent to the main-thread).
2020-07-10 17:22:55 +02:00
Jonas Jenwald
ea6a0e4435 Remove the IR (internal representation) part of the ColorSpace parsing
Originally ColorSpaces were only *partially* parsed on the worker-thread, to obtain an IR-format which was sent to the main-thread. This had the somewhat unfortunate side-effect of causing the majority of the (potentially heavy) ColorSpace parsing to happen on the main-thread.
Hence PR 4824 which, among other things, changed ColorSpaces to be *fully* parsed on the worker-thread with only RGB-data being sent to the main-thread.

While it thus originally was necessary to have `ColorSpace.{parseToIR, fromIR}` methods, to handle the worker/main-thread split, that's no longer the case and we can thus reduce all of the ColorSpace parsing to one method instead.

Currently, when parsing a ColorSpace, we call `ColorSpace.parseToIR` which parses the ColorSpace-data from the document and then creates the IR-format. We then, immediately, call `ColorSpace.fromIR` which parses the IR-format and then finally creates the actual ColorSpace.[1]
All-in-all, this leads to a fair amount of unnecessary indirection which also (in my opinion) makes the code less clear.

Obviously these changes are not really expected to have a significant effect on performance, especially with the recently added caching of ColorSpaces, however there'll now be strictly fewer function calls, less memory allocated, and overall less parsing required during ColorSpace-handling.

---
[1] For ICCBased ColorSpaces, given the validation necessary, this currently even leads to parsing an /Alternate ColorSpace *twice*.
2020-07-10 17:22:44 +02:00
Jonas Jenwald
4cc6797f17 Re-factor the idFactory functionality, used in the core/-code, and move the fontID generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.

For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.

In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 16:33:31 +02:00
Jonas Jenwald
1d66fce781 Tweak the heuristic, in src/core/jpg.js, that handles JPEG images with a wildly incorrect SOF (Start of Frame) scanLines parameter (issue 10989) 2020-07-06 13:06:49 +02:00
Jonas Jenwald
c95fbb6e21 Convert the code in src/core/evaluator.js to use standard classes
This removes additional `// eslint-disable-next-line no-shadow` usage, which our old pseudo-classes necessitated.

Most of the re-formatting changes, after the `class` definitions and methods were fixed, were done automatically by Prettier.

*Please note:* I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.
2020-07-05 16:01:04 +02:00
Jonas Jenwald
32a0b6fa73 Move some constants and helper functions out of the PartialEvaluator closure
This will simplify the `class` conversion in the next patch, and with modern JavaScript the moved code is still limited to the current module scope.

*Please note:* For improved consistency with our usual formatting, the `TILING_PATTERN`/`SHADING_PATTERN` constants where re-factored slightly.
2020-07-05 15:56:23 +02:00
Tim van der Meij
c4255fdbfd
Merge pull request #12059 from Snuffleupagus/image-class
Convert the code in `src/core/image.js` to use ES6 classes
2020-07-05 14:08:55 +02:00
Jonas Jenwald
59da1d5829 Convert the code in src/core/image.js to use ES6 classes
This removes additional `// eslint-disable-next-line no-shadow` usage, which our old pseudo-classes necessitated.

*Please note:* I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.
2020-07-05 09:34:14 +02:00
Jonas Jenwald
85ced3fbfd Allow BaseLocalCache to, optionally, only allocate storage for caching of references (PR 12034 follow-up)
*Yet another instalment in the never-ending series of things that you think of __after__ a patch has landed.*

Since `Function`s are only cached by reference, we thus don't need to allocate storage for names in `LocalFunctionCache` instances. Obviously the effect of these changes are *really tiny*, but it seems reasonable in principle to avoid allocating data structures that are guaranteed to be unused.
2020-07-04 15:01:32 +02:00
Jonas Jenwald
ca719ecaa4 Add local caching of Functions, by reference, in the PDFFunctionFactory (issue 2541)
Note that compared other structures, such as e.g. Images and ColorSpaces, `Function`s are not referred to by name, which however does bring the advantage of being able to share the cache for an *entire* page.
Furthermore, similar to ColorSpaces, the parsing of individual `Function`s are generally fast enough to not really warrant trying to cache them in any "smarter" way than by reference. (Hence trying to do caching similar to e.g. Fonts would most likely be a losing proposition, given the amount of data lookup/parsing that'd be required.)

Originally I tried implementing this similar to e.g. the recently added ColorSpace caching (and in a couple of different ways), however it unfortunately turned out to be quite ugly/unwieldy given the sheer number of functions/methods where you'd thus need to pass in a `LocalFunctionCache` instance. (Also, the affected functions/methods didn't exactly have short signatures as-is.)
After going back and forth on this for a while it seemed to me that the simplest, or least "invasive" if you will, solution would be if each `PartialEvaluator` instance had its *own* `PDFFunctionFactory` instance (since the latter is already passed to all of the required code). This way each `PDFFunctionFactory` instances could have a local `Function` cache, without it being necessary to provide a `LocalFunctionCache` instance manually at every `PDFFunctionFactory.{create, createFromArray}` call-site.

Obviously, with this patch, there's now (potentially) more `PDFFunctionFactory` instances than before when the entire document shared just one. However, each such instance is really quite small and it's also tied to a `PartialEvaluator` instance and those are *not* kept alive and/or cached. To reduce the impact of these changes, I've tried to make as many of these structures as possible *lazily initialized*, specifically:

 - The `PDFFunctionFactory`, on `PartialEvaluator` instances, since not all kinds of general parsing actually requires it. For example: `getTextContent` calls won't cause any `Function` to be parsed, and even some `getOperatorList` calls won't trigger `Function` parsing (if a page contains e.g. no Patterns or "complex" ColorSpaces).

 - The `LocalFunctionCache`, on `PDFFunctionFactory` instances, since only certain parsing requires it. Generally speaking, only e.g. Patterns, "complex" ColorSpaces, and/or (some) SoftMasks will trigger any `Function` parsing.

To put these changes into perspective, when loading/rendering all (14) pages of the default `tracemonkey.pdf` file there's now a total of 6 `PDFFunctionFactory` and 1 `LocalFunctionCache` instances created thanks to the lazy initialization.
(If you instead would keep the document-"global" `PDFFunctionFactory` instance and pass around `LocalFunctionCache` instances everywhere, the numbers for the `tracemonkey.pdf` file would be instead be something like 1 `PDFFunctionFactory` and 6 `LocalFunctionCache` instances.)
All-in-all, I thus don't think that the `PDFFunctionFactory` changes should be generally problematic.

With these changes, we can also modify (some) call-sites to pass in a `Reference` rather than the actual `Function` data. This is nice since `Function`s can also be `Streams`, which are not cached on the `XRef` instance (given their potential size), and this way we can avoid unnecessary lookups and thus save some additional time/resources.

Obviously I had intended to include (standard) benchmark results with these changes, but for reasons I don't really understand the test run-time (even with `master`) of the document in issue 2541 is quite a bit slower than in the development viewer.
However, logging the time it takes for the relevant `PDFFunctionFactory`/`PDFFunction ` parsing shows that it takes *approximately* `0.5 ms` for the `Function` in question. Looking up a cached `Function`, on the other hand, is *one order of magnitude faster* which does add up when the same `Function` is invoked close to 2000 times.
2020-07-04 00:55:18 +02:00
Jonas Jenwald
4a7e29865d [api-minor] Use the NodeCanvasFactory/NodeCMapReaderFactory classes as defaults in Node.js environments (issue 11900)
This moves, and slightly simplifies, code that's currently residing in the unit-test utils into the actual library, such that it's bundled with `GENERIC`-builds and used in e.g. the API-code.

As an added bonus, this also brings out-of-the-box support for CMaps in e.g. the Node.js examples.
2020-07-02 04:44:23 +02:00
Brendan Dahl
fe3df495cc
Merge pull request #12040 from wojtekmaj/replace-non-inclusive
Replace non-inclusive "whitelist" term with "allowlist"
2020-07-01 15:41:41 -07:00
Jonas Jenwald
fef24658e7 Adjust the heuristics used when dealing with rectangles, i.e. re operators, with zero width/height (issue 12010) 2020-07-02 00:02:49 +02:00
Wojciech Maj
78970bbbe1
Replace non-inclusive "whitelist" term with "allowlist" 2020-06-29 17:15:14 +02:00
Jonas Jenwald
28d2ada59c Attempt to detect inline images which contain "EI" sequence in the actual image data (issue 11124)
This should reduce the possibility of accidentally truncating some inline images, while *not* causing the "EI" detection to become significantly slower.[1]
There's obviously a possibility that these added checks are not sufficient to catch *every* single case of "EI" sequences within the actual inline image data, but without specific test-cases I decided against over-engineering the solution here.

*Please note:* The interpolation issues are somewhat orthogonal to the main issue here, which is the truncated image, and it's already tracked elsewhere.

---
[1] I've looked at the issue a few times, and this is the first approach that I was able to come up with that didn't cause *unacceptable* performance regressions in e.g. issue 2618.
2020-06-26 13:15:06 +02:00
Jonas Jenwald
b8e1352934 Stop passing in unnecessary parameters when parsing the Alternate entry of ICCBased ColorSpaces (PR 9659 follow-up)
With the changes made in PR 9659, `ColorSpace.fromIR` no longer takes a second `pdfFunctionFactory` parameter and there's thus one call-site that can be simplified.
2020-06-24 23:53:10 +02:00
Jonas Jenwald
19d7976483 Improve (local) caching of parsed ColorSpaces (PR 12001 follow-up)
This patch contains the following *notable* improvements:
 - Changes the `ColorSpace.parse` call-sites to, where possible, pass in a reference rather than actual ColorSpace data (necessary for the next point).
 - Adds (local) caching of `ColorSpace`s by `Ref`, when applicable, in addition the caching by name. This (generally) improves `ColorSpace` caching for e.g. the SMask code-paths.
 - Extends the (local) `ColorSpace` caching to also apply when handling Images and Patterns, thus further reducing unneeded re-parsing.
 - Adds a new `ColorSpace.parseAsync` method, almost identical to the existing `ColorSpace.parse` one, but returning a Promise instead (this simplifies some code in the `PartialEvaluator`).
2020-06-24 23:53:10 +02:00
Jonas Jenwald
51e87b9248 Add a proper LocalColorSpaceCache class, rather than piggybacking on the image one (PR 12001 follow-up)
This will allow caching of ColorSpaces by either `Name` *or* `Ref`, which doesn't really make sense for images, thus allowing (better) caching for ColorSpaces used with e.g. Images and Patterns.
2020-06-24 23:53:10 +02:00
Jonas Jenwald
e22bc483a5 Re-factor ColorSpace.parse to take a parameter object, rather than a bunch of (randomly) ordered parameters
Given the number of existing parameters, this will avoid needlessly unwieldy call-sites especially with upcoming changes in later patches.
2020-06-24 23:53:10 +02:00