pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	30e8d5dea1	Add local caching of TilingPatterns in `PartialEvaluator.getOperatorList` (issue 2765 and 8473) In practice it's not uncommon for PDF documents to re-use the same TilingPatterns more than once, and parsing them is essentially equal to parsing of a (small) page since a `getOperatorList` call is required. By caching the internal TilingPattern representation we can thus avoid having to re-parse the same data over and over, and there's also less asynchronous parsing required for repeated TilingPatterns. Initially I had intended to include (standard) benchmark results with this patch, however it's not entirely clear that this is actually necessary here given the preliminary results. When testing this manually in the development viewer, using `pdfBug=Stats`, the following (approximate) reduction in rendering times were observed when comparing `master` against this patch: - http://pubs.usgs.gov/sim/3067/pdf/sim3067sheet-2.pdf (from issue 2765): `6800 ms` -> `4100 ms`. - https://github.com/mozilla/pdf.js/files/1046131/stepped.pdf (from issue 8473): `54000 ms` -> `13000 ms` - https://github.com/mozilla/pdf.js/files/1046130/proof.pdf (from issue 8473): `5900 ms` -> `2500 ms` As always, whenever you're dealing with documents which are "slow", there's usually a certain level of subjectivity involved with regards to what's deemed acceptable performance. Hence it's not clear to me that we want to regard any of the referenced issues as fixed, however the improvements are significant enough to warrant caching of TilingPatterns in my opinion.	2020-10-08 18:43:21 +02:00
Jani Pehkonen	935568c2f1	Fix invalid `XUID` entries in CFF fonts In CFF fonts, entry `XUID` should be an array that has no more than 16 elements. In the issue, the length is 20, which causes the fonts to fail. See Appendix B, "Implementation Limits" in PostScript Language Reference Manual https://web.archive.org/web/20170218093716/https://www.adobe.com/products/postscript/pdfs/PLRM.pdf Actually entries `XUID` and `UniqueID` are obsolete altogether. https://blogs.adobe.com/CCJKType/2016/06/no-more-xuid-arrays.html	2020-10-05 17:38:01 +03:00
Jonas Jenwald	9416b14e8b	Re-factor how the ESLint `no-var` rule is enabled in the `src/` folder This simplifies/consolidates the ESLint configuration slightly in the `src/` folder, and prevents the addition of any new files where `var` is being used.[1] Hence we no longer need to manually add `/* eslint no-var: error */` in files, which is easy to forget, and can instead disable the rule in the `src/core/` files where `var` is still in use. --- [1] Obviously the `no-var` rule can, in the same way as every other rule, be disabled on a case-by-case basis where actually necessary.	2020-10-03 20:15:29 +02:00
Tim van der Meij	6ff1fe4ea9	Merge pull request #12333 from calixteman/tooltip Add tooltip if any in annotations layer	2020-10-03 19:50:39 +02:00
calixteman	20b12d2bda	Add tooltip if any in annotations layer	2020-10-02 10:11:18 +02:00
Jonas Jenwald	bd3b15b897	Use the `cidToGidMap`, if it exists, when building the glyph mapping for non-embedded composite fonts (issue 12418)	2020-09-28 14:40:43 +02:00
Calixte Denizet	5af352e65a	Need to reset the streams when printing	2020-09-24 19:13:09 +02:00
Jonas Jenwald	2497e8eab9	Prevent errors if the `InkList` property, in InkAnnotations, is missing and/or not an Array (issue 12392) To prevent a future bug, the `Vertices` property in PolylineAnnotations are handled the same way.	2020-09-19 15:34:32 +02:00
Calixte Denizet	d51e7e86ff	Use the same kind of strings for radio values	2020-09-16 18:47:25 +02:00
Tim van der Meij	374aad77c4	Merge pull request #12375 from Snuffleupagus/emptyDict-set Ensure that the empty dictionary won't be accidentally modified, and slightly improve the "SaveDocument" handler in `src/core/worker.js`	2020-09-15 23:04:57 +02:00
Calixte Denizet	16dd5403c7	Set parent of radio annotation even if there is no 'V' field	2020-09-15 14:41:57 +02:00
Jonas Jenwald	ed4e7cd8a4	A couple of small improvements in the "SaveDocument" handler in `src/core/worker.js` - Check that the "Info"-entry, in the XRef-trailer, is actually a dictionary before accessing it. This is similar to the `PDFDocument.documentInfo` method and follows the general principal of validating data carefully before accessing it, given how often PDF-software may create corrupt PDF files. - Slightly simplify the "XFA"-lookup, since there's no point in trying to fetch something from the empty dictionary.	2020-09-15 09:57:40 +02:00
Jonas Jenwald	a531c98cd2	Ensure that the empty dictionary won't be accidentally modified Currently there's nothing that prevents modification of the `Dict.empty` primitive, which obviously needs to be truly empty to prevent any future (hard to find) bugs.	2020-09-15 09:29:00 +02:00
Tim van der Meij	b0c7a74a0c	Merge pull request #12361 from Snuffleupagus/_getSaveFieldResources Ensure that all necessary /Font resources are included when saving a `WidgetAnnotation`-instance (issue 12294)	2020-09-15 00:09:31 +02:00
Tim van der Meij	3ecd984758	Implement resetting of created streams for annotations	2020-09-14 23:08:50 +02:00
Jonas Jenwald	c992b8e460	Ensure that all necessary /Font resources are included when saving a `WidgetAnnotation`-instance (issue 12294) This patch contains a possible approach for fixing issue 12294, which compared to other PRs is purposely limited to the affected `WidgetAnnotation` code. As mentioned elsewhere, considering that we're (at least for now) trying to fix one specific case, I think that we should avoid modifying the `Dict` primitive[1] and/or avoid a solution that (indirectly) modifies an existing `Dict`-instance[2]. This patch simply fixes the issue at hand, since that seems easiest for now, and I'd suggest that we worry about a more general approach if/when that actually becomes necessary. Hence the solution implemented here, for `WidgetAnnotation`, is to simply use a combination of the local and AcroForm /DR resources during OperatorList-parsing to ensure that things work correctly regardless of where a particular /Font resource is found. For saving of form-data, on the other hand, we want to avoid increasing the file-size unnecessarily and need to be smarter than just merging all of the available resources. To achive this, a new `WidgetAnnotation._getSaveFieldResources` method will when necessary produce a combined resources `Dict` with only the minimum amount of data from the AcroForm /DR resources included. --- [1] You want to avoid anything that could cause the general `Dict` implementation to become slower, or more complex, just for handling an edge-case in my opinion. [2] If an existing `Dict`-instance is modified unexpectedly, that could very easily lead to problems elsewhere since e.g. `Dict`-instances created during parsing are not expected to be changed.	2020-09-14 15:22:40 +02:00
Tim van der Meij	dfebe7b907	Merge pull request #12365 from Snuffleupagus/forbid-DecodeStream.length Ensure that the `length` property won't be accidentally accessed on a `DecodeStream`-instance	2020-09-11 22:18:30 +02:00
Jonas Jenwald	a11b7341a1	Ensure that the `length` property won't be accidentally accessed on a `DecodeStream`-instance For these streams, compared to `Stream` and `ChunkedStream`, there's no well defined concept of length and consequently no `length` getter.[1] However, attempting to access the non-existent `length` won't currently error, but just return `undefined`, which could thus easily lead to bugs elsewhere in the code-base. --- [1] However, note that all stream implementations have an `isEmpty` getter which can be used instead.	2020-09-11 13:25:40 +02:00
Calixte Denizet	fc154590e8	Dict keys need to be escaped too when saving	2020-09-11 12:25:05 +02:00
Calixte Denizet	dc4eb71ff1	PDF names need to be escaped when saving	2020-09-10 16:08:13 +02:00
Calixte Denizet	64a6efd95e	Follow-up of pr #12344	2020-09-09 11:46:02 +02:00
Brendan Dahl	e51e9d1f33	Merge pull request #12345 from calixteman/save_btn Don't try to save something for a button which is neither a checkbox nor a radio	2020-09-08 15:44:04 -07:00
calixteman	68b99c59ee	Save form data in XFA datasets when pdf is a mix of acroforms and xfa (#12344 ) * Move display/xml_parser.js in shared to use it in worker * Save form data in XFA datasets when pdf is a mix of acroforms and xfa Co-authored-by: Brendan Dahl <brendan.dahl@gmail.com>	2020-09-08 15:13:52 -07:00
Calixte Denizet	7e5026dfc5	Don't try to save something for a button which is neither a checkbox nor a radio	2020-09-08 20:47:46 +02:00
Tim van der Meij	20c891542b	Merge pull request #12269 from calixteman/highlight Add support for missing appearances for hightlights, strikeout, squiggly and underline annotations.	2020-09-06 22:25:36 +02:00
Calixte Denizet	65ecd981fe	Add support for missing appearances for hightlights, strikeout, squiggly and underline annotations.	2020-09-06 15:40:15 +02:00
Jonas Jenwald	6c8f1f7d6f	Run `gulp lint --fix`, to account for changes in Prettier version `2.1.x`	2020-09-06 12:23:59 +02:00
Jonas Jenwald	784a420027	Add support, in `Dict.merge`, for merging of "sub"-dictionaries This allows for merging of dictionaries one level deeper than previously. This could be useful e.g. for /Resources dictionaries, where you want to e.g. merge their respective /Font dictionaries (and other) together rather than picking just the first one.	2020-08-30 23:18:32 +02:00
Jonas Jenwald	2393443e73	Include the `/Order` array, if available, when parsing the Optional Content configuration The `/Order` array is used to improve the display of Optional Content groups in PDF viewers, and it allows a PDF document to e.g. specify that Optional Content groups should be displayed as a (collapsable) tree-structure rather than as just a list. Note that not all available Optional Content groups must be present in the `/Order` array, and PDF viewers will often (by default) hide those toggles in the UI. To allow us to improve the UX around toggling of Optional Content groups, in the default viewer, these hidden-by-default groups are thus appended to the parsed `/Order` array under a custom nesting level (with `name == null`). Finally, the patch also slightly tweaks an `OptionalContentConfig` related JSDoc-comment in the API.	2020-08-30 16:28:40 +02:00
Tim van der Meij	06b53d770a	Merge pull request #12259 from brendandahl/cmap-fix Fix handling of symbolic fonts and unicode cmaps.	2020-08-30 16:01:24 +02:00
Brendan Dahl	45e8a31cc0	Fix handling of symbolic fonts and unicode cmaps. In issue 12120, the font has a 1,0 cmap and is marked symbolic which according to the spec means we should directly use the cmap instead of the extra steps that are defined in 9.6.6.4. However, just fixing that caused bug 1057544 to break. The font in bug 1057544 has a 0,1 cmap (Unicode 1.1) which we were not using, but is easy to support. We're also easily able to support some of the other unicode cmaps, so I added those as well. There was also a second issue with bug 1057544, the cmap doesn't have a mapping for the "quoteright" glyph, but it is defined in the post table. To handle this, I've moved post table as a fallback for any font that has an encoding.	2020-08-27 14:33:11 -07:00
Calixte Denizet	ba94f04ba3	Bug 1661226 - Push button are not rendered with renderInteractiveForms enabled	2020-08-27 10:45:14 +02:00
Tim van der Meij	0f229d537f	Inline the `setup` method in the `parse` method in `src/core/document.js` Now that the `parse` method is simplified we can inline the `setup` method in the `parse` method since it's only two lines of code. This avoids some indirection.	2020-08-25 23:28:55 +02:00
Tim van der Meij	280207c740	Redo the form type detection logic and include unit tests Good form type detection is important to get reliable telemetry and to only show the fallback bar if a form cannot be filled out by the user. PDF.js only supports AcroForm data, so XFA data is explicitly unsupported (tracked in issue #2373). However, the previous form type detection couldn't separate AcroForm and XFA well enough, causing form type telemetry to be incorrect sometimes and the fallback bar to be shown for forms that could in fact be filled out by the user. The solution in this commit is found by studying the specification and the form documents that are available to us. In a nutshell the rules are: - There is XFA data if the `XFA` entry is a non-empty array or stream. - There is AcroForm data if the `Fields` entry is a non-empty array and it doesn't consist of only document signatures. The document signatures part was not handled in the old code, causing a document with only XFA data to also be marked as having AcroForm data. Moreover, the old code didn't check all the data types. Now that AcroForm and XFA can be distinguished, the viewer is configured to only show the fallback bar for documents that only have XFA data. If a document also has AcroForm data, the viewer can use that to render the form. We have not found documents where the XFA data was necessary in that case. Finally, we include unit tests to ensure that all cases are covered and move the form type detection out of the `parse` function so that it's only executed if the document information is actually requested (potentially making initial parsing a tiny bit faster).	2020-08-25 23:28:55 +02:00
Tim van der Meij	f0bf62ff54	Mark the `catDict` member as private in the `Catalog` class Not only is `catDict` never accessed anymore outside of this file, it should also never happen since it's internal to the catalog. If data from it is needed elsewhere, the catalog should provide a getter for it that can do basic data integrity checks and abstract away any unnecessary details.	2020-08-25 23:28:55 +02:00
Tim van der Meij	f20f0bcc78	Move the AcroForm logic from the document to the catalog The `AcroForm` entry is part of the catalog, not of the document, so its logic should be placed there instead. The document should look in the catalog to fetch it, and not have knowledge of `catDict`, which is a member internal to the catalog. Moreover, make the AcroForm member private on the document instance. It's only used internally and was also never intended to be public. For users it's exposed by the `getMetadata` API endpoint as `IsAcroFormPresent`. Only a boolean is exposed, so we now also only store the boolean on the document instance. Finally, the annotation code needs access to the full AcroForm dictionary, so it's updated to fetch the data from the catalog instead of the document that now only holds the boolean.	2020-08-25 23:28:55 +02:00
Tim van der Meij	b41a2f4d5a	Move the collection logic from the document to the catalog The `Collection` entry is part of the catalog, not of the document, so its logic should be placed there instead. The document should look in the catalog to fetch it, and not have knowledge of `catDict`, which is a member internal to the catalog. Moreover, remove the collection member from the document instance. It's only used internally and was also never intended to be public. For users it's exposed by the `getMetadata` API endpoint as `IsCollectionPresent`. Moving this out of the `parse` function makes sure that the getter is only executed if the document information is actually requested (potentially making initial parsing a tiny bit faster).	2020-08-25 23:28:55 +02:00
Tim van der Meij	935d95b462	Move the version logic from the document to the catalog The `Version` entry is part of the catalog, not of the document, so its logic should be placed there instead. The document should look in the catalog to fetch it, and not have knowledge of `catDict`, which is a member internal to the catalog. Moreover, make the version member private on the document instance. It's only used internally and was also never intended to be public. For users it's exposed by the `getMetadata` API endpoint as `PDFFormatVersion`. Finally, clarify how the version from the header and the version from the catalog are treated using a comment.	2020-08-25 23:28:55 +02:00
Jonas Jenwald	bd16c363ce	Access the `Catalog` data correctly in the "GetPageIndex" handler in `src/core/worker.js` Even though the code obviously works as-is, given that we have unit-tests for it, it still feels incorrect to just assume that the `Catalog`-instance has all of its properties immediately available. Especially when (almost) all of the other handlers, in `src/core/worker.js`, protect their data accesses with appropriate `pdfManager.ensure` calls.	2020-08-25 12:14:14 +02:00
Jonas Jenwald	2e6e2c3b41	Access the `XRef` data correctly in the "GetStats" handler in `src/core/worker.js` Even though the code obviously works as-is, given that we have unit-tests for it, it still feels incorrect to just assume that the `XRef`-instance has all of its properties immediately available. Especially when (almost) all of the other handlers, in `src/core/worker.js`, protect their data accesses with appropriate `pdfManager.ensure` calls.	2020-08-25 12:14:11 +02:00
Jani Pehkonen	e7febbf0f7	Accent positioning in Type1 `seac` glyphs In `display/canvas.js` the accent offsets must be multiplied by `fontSize` to make the offsets large enough. Another problem is in `core/type1_parser.js` when the Type1 command `seac` is handled. There is an error in the Adobe Type1 spec. See chapter 6 in Type1 Font Format Supplement, which provides an errata: The arguments of `seac` specify the offset of the left side bearing (LSB) points, not the offset of origins. This can be fixed in `core/type1_parser.js` by adding the difference of the LSB values.	2020-08-23 21:01:25 +03:00
Tim van der Meij	a8efc0296b	Obtain the export values for choice widgets from the normal appearance The down appearance (`D`) is optional and not available in the document from #12233, so the checkboxes are never saved/printed as checked because the checked appearance is based on the export value that is missing because the `D` entry is not available. Instead, we should use the normal appearance (`N`) since that one is required and therefore always available. Finally, the /Off appearance is optional according to section 12.7.4.2.3 of the specification, so that needs to be taken into account to match the specification and to fix reference test failures for the `annotation-button-widget-print` test. That is a file that doesn't specify an /Off appearance in the normal appearance dictionary.	2020-08-23 13:00:02 +02:00
Tim van der Meij	1b82ad8fff	Decode widget form values consistently The helper method `_decodeFormValue` is used to ensure that it happens in one place. Note that form values are field values, display values and export values.	2020-08-23 13:00:01 +02:00
Tim van der Meij	12c20772ac	Improve the field value parsing for choice widgets to handle `null` values The specification states that the field value is `null` if no item is selected and we didn't handle this case properly. Even though this did not break the rendering because we always convert the value to an array and the `includes` check in the display layer would simply not match, the field value would be `[null]` which is not expected and strange from an API perspective. This commit fixes that by ensuring that we return an empty array in case the field value is `null`. The API therefore still always gives an array for the field value, but now the code is more specific so that the value is either an empty array or an array of strings.	2020-08-19 23:27:50 +02:00
Jonas Jenwald	1058f16605	Add (basic) support for transfer functions to Images (issue 6931, bug 1149713) This is similar to the existing transfer function support for SMasks, but extended to simple image data. Please note that the extra amount of data now being sent to the worker-thread, for affected /ExtGState entries, is limited to at most 4 `Uint8Array`s each with a length of 256 elements. Refer to https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G9.1658137 for additional details.	2020-08-17 10:34:12 +02:00
Jonas Jenwald	9d3e046a4f	Don't cache /ExtGState entries that contain fonts (PR 12087 follow-up) I completely overlooked the fact that `PartialEvaluator.handleSetFont` also updates the current `state`, which means that currently we're not actually handling font data correctly for cached /ExtGState data. (Thankfully, using /ExtGState to set a font is somewhat rare in practice.)	2020-08-17 08:17:25 +02:00
Calixte Denizet	1a6816ba98	Add support for saving forms	2020-08-12 10:32:59 +02:00
Brendan Dahl	7fb01f9f2a	Merge pull request #12186 from brendandahl/loca-2 Fix bad truetype loca tables.	2020-08-10 20:34:19 -07:00
Brendan Dahl	f6dff81223	Fix bad truetype loca tables. Some fonts have loca tables that aren't sorted or use 0 as an offset to signal a missing glyph. This fixes the bad loca tables by sorting them and then rewriting the loca table and potentially re-ordering the glyf table to match. Fixes #11131 and bug 1650302.	2020-08-10 14:15:49 -07:00
Calixte Denizet	88b112ab0c	Support comb textfields for printing	2020-08-09 14:41:26 +02:00

1 2 3 4 5 ...

1791 Commits