pdf.js

Author	SHA1	Message	Date
Tim van der Meij	5254676ef3	Merge pull request #14055 from Snuffleupagus/PDF_TO_CSS_UNITS Add `PDF_TO_CSS_UNITS` to the `PixelsPerInch`-structure	2021-09-22 22:24:51 +02:00
Jonas Jenwald	81a1c1cef7	Correctly validate URLs in XFA documents (bug 1731240) With this patch we'll ensure that only valid absolute URLs can be used in XFA documents, similar to the existing validation done for "regular" PDF documents. Furthermore, we'll also attempt to add a default protocol (i.e. `http`) to URLs beginning with "www." in XFA documents as well; this on its own is enough to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1731240	2021-09-21 21:21:01 +02:00
Jonas Jenwald	3e550f392a	Add `PDF_TO_CSS_UNITS` to the `PixelsPerInch`-structure Rather than re-computing this value in a number of different places throughout the code-base[1], we can expose this in the API via the existing `PixelsPerInch`-structure instead. There's also been feature requests asking for the old `CSS_UNITS` viewer constant to be made accessible, such that it could be used in third-party implementations. I suppose that it could be argued that it's somewhat confusing to place a unitless property in `PixelsPerInch`, however given that the `PDF_TO_CSS_UNITS`-property is defined strictly in terms of the existing properties this is hopefully deemed reasonable. --- [1] These include: - The viewer, with the `CSS_UNITS` name. - The reference-tests. - The display-layer, when rendering images; see PR 13991.	2021-09-20 13:20:09 +02:00
Jonas Jenwald	8ea27ce157	Tweak how fonts with an /Encoding are handled in `adjustToUnicode` (issue 14048, PR 13277 follow-up) Currently we only exclude /Encoding entries that also contains a /Differences array, which is the cause of the text-selection problem in the referenced issue. In order to address this we'll now also exclude /Encoding entries that contain one of the predefined named encodings, and no longer require that it also contains a /Differences array. Please note: This patch cases a small "regression" in the `bug1130815-text` test-case, however this is actually an improvement when compared with Adobe Reader and PDFium (in Google Chrome).	2021-09-18 22:44:25 +02:00
Tim van der Meij	83d3bb43f4	Merge pull request #14041 from Snuffleupagus/issue-9367 Support cmaps with only CID characters, when building the ToUnicode-map (issue 9367)	2021-09-18 16:47:06 +02:00
Jonas Jenwald	20eb6ca2ec	Merge pull request #14044 from calixteman/bug1719148 Annotations - Avoid empty value in text field when storage contains something for it (bug 1719148)	2021-09-18 16:31:45 +02:00
Tim van der Meij	c870fb489e	Merge pull request #14013 from Snuffleupagus/api-unittest-instanceof Improve the API unit-tests, and try to expose more API-functionality in the TypeScript definitions	2021-09-18 16:08:19 +02:00
Calixte Denizet	eb762ad624	Annotations - Avoid empty value in text field when storage contains something for it (bug 1719148) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1719148; - JS can set a property for a non-rendered annotation using the annotationStorage but the other unset default properties must be used when the annotation is finally rendered; - so this patch just adds the properties already set in the annotationStorage to the default value.	2021-09-18 15:08:22 +02:00
Jonas Jenwald	0e92f995c9	Re-factor the `EventBus` and `isInAutomation` handling (PR 11655 follow-up) Rather than forcing the "regular" `EventBus` to check and handle `isInAutomation` for every `dispatch` call, we can take advantage of subclassing instead. Hence this PR introduces a new `AutomationEventBus` class, which extends `EventBus`, and is used by the default viewer when `isInAutomation === true`.	2021-09-18 09:59:53 +02:00
Jonas Jenwald	ed73cf6d50	Support cmaps with only CID characters, when building the ToUnicode-map (issue 9367) In this particular case the `CMap`-data that we create contains only numbers, but no strings, which causes `PartialEvaluator.readToUnicode` to create a ToUnicode-map with only empty strings. Please note: This is yet another case where I don't know if it's necessarily the best and most correct solution, but it does fix the referenced issue.	2021-09-18 00:26:15 +02:00
Calixte Denizet	5bef8120e7	Annotation - For checkboxes, get field value from AS (if any) instead of V (bug 1722036) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1722036. - AS and V should share the same value for checkbox: it's at least what the specs say; - the pdf in the above bug opens correctly in Acrobat so it likely means that AS is chosen over V.	2021-09-17 13:04:16 +02:00
Jonas Jenwald	a11343e9af	Improve glyph mapping for non-embedded composite standard fonts with a /CIDToGIDMap (issue 11915) Please note: All of this feels very handwavy, but at least it passes all tests locally. Hopefully we have enough tests for this part of the font code. For non-embedded composite standard fonts with an "incomplete" /CIDToGIDMap, we'll now fallback to an explicitly defined /ToUnicode map even when that one happens to be an /Identity-H or /Identity-V map. The `Font.fallbackToSystemFont` method is unfortunately getting more and more special-cases, however that might be unavoidable given all the weird non-embedded fonts found in the wild :-(	2021-09-15 11:30:40 +02:00
Jonas Jenwald	d854352cd5	Improve the API unit-tests by checking that `PDFPageProxy.render` returns a `RenderTask`-instance This is similar to existing unit-tests, which checks for `PDFDocumentProxy`- and `PDFPageProxy`-instances.	2021-09-13 13:34:37 +02:00
Jonas Jenwald	fa7a607d33	Improve the API unit-tests by checking that `getDocument` returns a `PDFDocumentLoadingTask`-instance This is similar to existing unit-tests, which checks for `PDFDocumentProxy`- and `PDFPageProxy`-instances.	2021-09-13 13:34:28 +02:00
Jonas Jenwald	7025b9f859	[src/core/writer.js] Support `null` values in the `writeValue` function This fixes something that I noticed, having recently looked at both the `Lexer.getObj` and `writeValue` code. Please note that I unfortunately don't have an example of a form where saving fails without this patch. However, given its overall simplicity and that unit-tests are added, it's hopefully deemed useful to fix this potential issue pro-actively rather than waiting for a bug report. At this point one might, and rightly so, wonder if there's actually any real-world PDF documents where a `null` value is being used? Unfortunately the answer is yes, and we have a couple of examples in the test-suite (although none of those are related to forms); please see: `issue1015`, `issue2642`, `issue10402`, `issue12823`, `issue13823`, and `pr12564`.	2021-09-12 18:24:37 +02:00
Jonas Jenwald	761519ef3f	Merge pull request #13998 from calixteman/bug1729971 Write boolean value when saving a form (bug 1729971)	2021-09-12 15:38:10 +02:00
Jonas Jenwald	a47844d1fc	Let `Lexer.getObj` return a dummy-`Cmd` for commands that start with a non-visible ASCII character (issue 13999) This way we avoid breaking badly generated PDF documents where a non-visible ASCII character is "glued" to a valid command.	2021-09-11 19:54:13 +02:00
Jonas Jenwald	0e54f568fb	Re-factor the `CSS_PIXELS_PER_INCH`/`PDF_PIXELS_PER_INCH` exports (PR 13991 follow-up) For improved maintainability, since these constants are being exposed in the official API, this patch moves them into an Object instead.	2021-09-11 11:15:25 +02:00
Jonas Jenwald	9ce63a6dc6	Merge pull request #13991 from brendandahl/interpolate Enable/disable image smoothing based on image interpolate value. (bug 1722191)	2021-09-11 10:02:53 +02:00
Brendan Dahl	f38fb42b42	Enable/disable image smoothing based on image interpolate value. (bug 1722191) While some of the output looks worse to my eye, this behavior more closely matches what I see when I open the PDFs in Adobe acrobat. Fixes: #4706, #9713, #8245, #1344	2021-09-10 14:23:35 -07:00
Calixte Denizet	474ab7c86d	Write boolean value when saving a form (bug 1729971) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1729971#c4.	2021-09-10 14:10:25 +02:00
Jonas Jenwald	5678c75562	Merge pull request #13996 from Snuffleupagus/downloadutils-link-check Make `verifyManifestFiles` fail for non-linked test-cases with a `"link": true`-entry	2021-09-10 14:05:01 +02:00
calixteman	57b80074a2	Merge pull request #13995 from calixteman/xfa_record XFA - Handle $record shorcut in SOM expression (issue #13994)	2021-09-10 13:57:50 +02:00
Jonas Jenwald	d60cc7200b	Make `verifyManifestFiles` fail for non-linked test-cases with a `"link": true`-entry Currently it's possible to accidentally, e.g. by simply copy-and-pasting from an existing test-case, add an unnecessary `"link": true`-entry for locally available PDF files. This leads to inconsistencies in the manifest file, and doesn't feel like a great developer experience. However we can easily fix it by having `verifyManifestFiles` fail in this situation, and doing so actually turned up a couple of existing cases.	2021-09-10 09:51:34 +02:00
Calixte Denizet	c5841b3794	XFA - Handle shorcut in SOM expression (issue #13994 )	2021-09-09 19:54:45 +02:00
Calixte Denizet	623860bf8f	XFA - Remove the checked attribute from the checkbox when unchecked (bug 1729877) - it aims to fix: https://bugzilla.mozilla.org/show_bug.cgi?id=1729877.	2021-09-09 19:14:16 +02:00
Tim van der Meij	8a79f13e5a	Merge pull request #13985 from Snuffleupagus/issue-11088 Improve glyph mapping for non-embedded composite standard fonts (issue 11088)	2021-09-08 22:15:27 +02:00
Calixte Denizet	2b938c42f5	Avoid an error in integration test because of a locale different of en-US	2021-09-08 18:00:03 +02:00
Jonas Jenwald	69034ab8dc	Improve glyph mapping for non-embedded composite standard fonts (issue 11088) For non-embedded CIDFontType2 fonts with a non-/Identity encoding, use the /ToUnicode data to improve the glyph mapping.	2021-09-08 15:15:33 +02:00
Tim van der Meij	1b20f61b56	Merge pull request #13972 from Snuffleupagus/issue-13971 Treat all content as visible when no optional content groups are defined (issue 13971)	2021-09-04 15:53:44 +02:00
Tim van der Meij	680f33c31c	Merge pull request #13961 from Snuffleupagus/simpler-regexp Simplify some regular expressions	2021-09-04 15:39:30 +02:00
Jonas Jenwald	6318ccf6d2	Treat all content as visible when no optional content groups are defined (issue 13971) In the referenced PDF document the /Contents stream contains MarkedContent-operators, however no optional content dictionary exists; according to [the specification](https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G7.3883825): > Null values or references to deleted objects shall be ignored. If this entry is not present, is an empty array, or contains references only to null or deleted objects, the membership dictionary shall have no effect on the visibility of any content.	2021-09-04 08:13:37 +02:00
Jonas Jenwald	3ccf277f58	Fallback to the /ToUnicode map for TrueType fonts with (3, 1) and (1, 0) cmap-tables (issue 13316) In the PDF document some of the glyphs have bogus `differences`-entries[1] that cannot be resolved to valid glyph names, thus causing the glyph mapping to fail. My initial idea was to use a similar approach as in the `PartialEvaluator._simpleFontToUnicode`-method, to extract the charCodes from those entries, however it turned out that that didn't actually help in this case (the mapping was still wrong). To fix this I'm thus proposing that we fallback to the /ToUnicode map when no other useable data exists (e.g. no post-table), since it hopefully shouldn't make things any worse than leaving parts of the glyph map empty (which currently happens). --- [1] As can be seem below, some of the entries are completely normal while others are non-standard: ``` Differences (array) 0 = 65 1 = /g5167 2 = /space 3 = /g11927 4 = /g17737 5 = /g11540 6 = /g2180 7 = /K 8 = /P 9 = /two 10 = /zero 11 = /one 12 = /five 13 = /four 14 = /g6932 15 = /g7246 16 = /g1691 17 = /g2343 18 = /g14792 19 = /g3325 20 = /g4280 21 = /g20383 22 = /g18166 23 = /g16988 24 = /g17943 25 = /g19223 26 = /g10830 27 = 97 28 = /g982 29 = /g1226 30 = /g5059 31 = /g2677 32 = /g1042 33 = /g11568 34 = /L 35 = /three 36 = /seven 37 = /g2364 38 = /g12063 39 = /g5356 40 = /g2173 41 = /g17877 42 = /g7273 43 = /g7647 44 = /g7224 45 = /g19327 46 = /g5054 47 = /g2342 48 = /g10136 49 = /g6856 50 = /g13381 51 = /g7257 52 = /g12093 53 = /g2359 ```	2021-09-04 07:38:22 +02:00
Brendan Dahl	da15dbf962	Merge pull request #13698 from linfangrong/master [FIX] fix jpx tag tree decode (issue 11957)	2021-09-03 10:00:19 -07:00
Brendan Dahl	a8ce15a2d7	Merge pull request #13966 from calixteman/no_ns XFA - Created data node mustn't belong to datasets namespace	2021-09-03 09:59:40 -07:00
Calixte Denizet	77b9657e57	XFA - Overwrite AcroForm dictionary when saving if no datasets in XFA (bug 1720179) - aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1720179 - in some pdfs the XFA array in AcroForm dictionary doesn't contain an entry for 'datasets' (which contains saved data), so basically this patch allows to overwrite the AcroForm dictionary with an updated XFA array when doing an incremental update.	2021-09-03 17:04:03 +02:00
Calixte Denizet	57ae3a5a76	XFA - Created data node mustn't belong to datasets namespace - when some named nodes in the template don't have their counterpart in datasets we create some nodes: the main node mustn't belong to the datasets namespace because it doesn't make sense and Acrobat Reader isn't able to read pdf with such nodes. - so created nodes under a datasets node have a namespaceId set to -1 and consequently when serialized no namespace prefix will appear.	2021-09-03 15:43:25 +02:00
Brendan Dahl	804abb3786	Merge pull request #13959 from calixteman/encrypt Correctly pad strings when saving an encrypted pdf (bug 1726789)	2021-09-02 11:41:02 -07:00
Jonas Jenwald	c42887221a	Simplify some regular expressions There's a fair number of regular expressions througout the code-base which are slightly more verbose than strictly necessary, in particular: - We have a lot of regular expressions that use `[0-9]` explicitly, and those can be simplified to use `\d` instead. - We have one instance of a regular expression containing a `A-Za-z0-9_` sequence, which can be simplified to use `\w` instead.	2021-09-02 11:50:42 +02:00
Calixte Denizet	9619bf92be	Correctly pad strings when saving an encrypted pdf (bug 1726789)	2021-09-02 10:37:21 +02:00
Tim van der Meij	0a366dda6a	Merge pull request #13955 from Snuffleupagus/issue-13433 Always prefer the post-table for TrueType fonts with (0, x) cmap-tables (issue 13433)	2021-09-01 21:46:34 +02:00
Jonas Jenwald	b7b6076294	Always prefer the post-table for TrueType fonts with (0, x) cmap-tables (issue 13433) While I don't know if this is necessarily the "correct" solution, it does fix issue 13433 without breaking any of the existing reference-tests.	2021-09-01 12:35:49 +02:00
Jonas Jenwald	ba9f004097	Extend `getNonStdFontMap` for non-embedded versions of the ItcSymbol font (issue 11532) Despite its name, the fonts in ItcSymbol-family are "regular" fonts and not Symbol ones. However, given that the font name contains the word "Symbol" we ended up picking the wrong code-path in the `Font.fallbackToSystemFont`-method. Please note: While this patch ensures that the text becomes readable, by falling back a standard font, the rendering will obviously not be perfect. However, that's the PDF generators "fault" since non-embedded fonts cannot be guaranteed to render correctly in all environments.	2021-08-31 23:21:16 +02:00
linfangrong	369f1899c6	[FIX] fix jpx tag tree decode (issue 11957)	2021-08-31 11:44:26 +08:00
Brendan Dahl	a7f807b059	Only use base encoding if it's populated. (bug 1727053) The font dict in this file has an encoding entry, but only specifies a differences map. The base encoding is empty in this case and shouldn't be used.	2021-08-30 12:51:59 -07:00
Brendan Dahl	306119b12a	Merge pull request #13932 from Snuffleupagus/oc-images Support Optional Content in Image-/XObjects (issue 13931)	2021-08-30 10:10:14 -07:00
Jonas Jenwald	e69afc6f3d	Re-factor the `setPDFNetworkStreamFactory` usage for the unit-tests (PR 13549 follow-up) This should have been part of PR 13549, since we no longer support browsers without native Fetch API and ReadableStream implementations.	2021-08-29 18:27:53 +02:00
Jonas Jenwald	1a1de9bb3e	Add support for specifying non-default Optional Content in the ref-tests	2021-08-26 16:54:16 +02:00
Jonas Jenwald	853b1172a1	Support Optional Content in Image-/XObjects (issue 13931) Currently, in the `PartialEvaluator`, we only support Optional Content in Form-/XObjects. Hence this patch adds support for Image-/XObjects as well, which looks like a simple oversight in PR 12095 since the canvas-implementation already contains the necessary code to support this.	2021-08-26 16:54:15 +02:00
Michael Wu	c08b4ea30d	Fix Viewer API definitions and include in CI The Viewer API definitions do not compile because of missing imports and anonymous objects are typed as `Object`. These issues were not caught during CI because the test project was not compiling anything from the Viewer API. As an example of the first problem: ``` /** * @implements MyInterface / export class MyClass { ... } ``` will generate a broken definition that doesn’t import MyInterface: ``` /* * @implements MyInterface / export class MyClass implements MyInterface { ... } ``` This can be fixed by adding a typedef jsdoc to specify the import: ``` /* @typedef {import("./otherFile").MyInterface} MyInterface / ``` See https://github.com/jsdoc/jsdoc/issues/1537 and https://github.com/microsoft/TypeScript/issues/22160 for more details. As an example of the second problem: ``` /* * Gets the size of the specified page, converted from PDF units to inches. * @param {Object} An Object containing the properties: {Array} `view`, * {number} `userUnit`, and {number} `rotate`. / function getPageSizeInches({ view, userUnit, rotate }) { ... } ``` generates the broken definition: ``` function getPageSizeInches({ view, userUnit, rotate }: Object) { ... } ``` The jsdoc should specify the type of each nested property: ``` /* * Gets the size of the specified page, converted from PDF units to inches. * @param {Object} options An object containing the properties: {Array} `view`, * {number} `userUnit`, and {number} `rotate`. * @param {number[]} options.view * @param {number} options.userUnit * @param {number} options.rotate */ ```	2021-08-25 18:45:46 -04:00

... 4 5 6 7 8 ...

2799 Commits