pdf.js

Author	SHA1	Message	Date
Tim van der Meij	56e3ef68d4	Merge pull request #14106 from calixteman/names Empty name is allowed in ISO 32000	2021-10-09 14:29:10 +02:00
Jonas Jenwald	69a97bcba7	Take the /CIDToGIDMap data into account when computing the hash, in `PartialEvaluator.preEvaluateFont`, for composite fonts (bug 1734802) This is unfortunately yet another bug in the `preEvaluateFont`-implementation, and I've lost count of the number of times I've had to tweak this code over the years :-( I really cannot help thinking that PR 4423 was way too simplistic, since it missed a bunch of cases that leads to broken font rendering in many PDF documents. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1734802	2021-10-08 13:15:21 +02:00
Calixte Denizet	f384ad2356	Empty name is allowed in ISO 32000 - the exact sentence from the spec: "The token SOLIDUS (a slash followed by no regular characters) introduces a unique valid name defined by the empty sequence of characters." - so just remove the warning.	2021-10-06 20:50:39 +02:00
Jonas Jenwald	bb9c905c5d	Ensure that various URL-related options are applied in the `xfaLayer` too Note how both the annotationLayer and the document outline will apply various URL-related options when creating the link-elements. For consistency the `xfaLayer`-rendering should obviously use the same options, to ensure that the existing options are indeed applied to all URLs regardless of where they originate.	2021-10-02 09:32:23 +02:00
Jonas Jenwald	284d259054	Merge pull request #14057 from Snuffleupagus/bug-920426 Support CMap-data with only strings, when parsing TrueType composite fonts (bug 920426)	2021-10-01 23:22:25 +02:00
Calixte Denizet	aecbd7cd89	AcroForm: Add support for ResetForm action - it aims to fix #12721. - Thanks to PR #14023, we've now the fieldObjects in the annotation layer so we can easily map fields names on their id if needed. - Reset values in the storage, in the JS sandbox and in the visible html elements.	2021-09-30 22:02:33 +02:00
Jonas Jenwald	d3ca28bc34	Support CMap-data with only strings, when parsing TrueType composite fonts (bug 920426) In the referenced bug, the embedded fonts contain custom CMap-data that only include strings. Note how for embedded composite TrueType fonts we're using the CMap-data when building the glyph mapping, and currently we end up with a completely empty map because the code expects only CID numbers. Furthermore, just fixing the glyph mapping alone isn't sufficient to fully address the bug, since we also need to consider this "special" kind of CMap-data when looking up glyph widths.	2021-09-30 18:10:47 +02:00
Tim van der Meij	9a74f3e6e0	Merge pull request #14049 from calixteman/bg_from_mk Annotation - Use border and background colors from MK dictionary	2021-09-29 21:13:20 +02:00
Calixte Denizet	0776cd9b90	Annotation - Use border and background colors from MK dictionary - it aims to fix #13003; - set the bg and fg colors as they're in the pdf; - put a transparent overlay to help to see the fields.	2021-09-26 20:49:26 +02:00
Jonas Jenwald	e6e04694f4	[api-minor] Move the `addDefaultProtocolToUrl`/`tryConvertUrlEncoding` functionality into the `createValidAbsoluteUrl` function Having recently worked with, and reviewed patches touching, this code it seemed that it's probably not a bad idea to move that functionality into `createValidAbsoluteUrl` as new options instead. For the `addDefaultProtocolToUrl` functionality in particular, the existing helper function was not only moved but slightly improved as well. Looking at the code, I realized that there's a small risk that it would incorrectly match a relative URL-string too. With these changes, the `createValidAbsoluteUrl` call-sites in the `src/core/`-code can be simplified a little bit. Please note: This patch may, indirectly, change the format of the `unsafeUrl`-property returned with relevant Annotations and OutlineItems; hence the `api-minor` tag. However, I'd argue that it's actually more correct this way since the whole purpose of `unsafeUrl` is/was to return the URL data as-is without any parsing done.	2021-09-26 14:29:54 +02:00
Calixte Denizet	558e58f354	XFA - Add <a> element in button when an url is detected (bug 1716758) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1716758; - some buttons have a JS action with the pattern `app.launchURL(...)` (or similar) so extract when it's possible the url and generate a <a> element with the href equals to the found url; - pdf.js already had some code to handle that so this patch slightly refactor that.	2021-09-25 21:59:39 +02:00
Calixte Denizet	c0e9108d00	Annotation - Some checkboxes have an empty N dictionary - it aims to fix #14021; - the N dict is empty here so just create a default one; - it implies that the checked checkbox has no appearance so create a default one too in order to print it; - in the pdf in the issue, a checked box is not printed because it has no default appearance so we need to guess its appearance from its state.	2021-09-25 16:00:47 +02:00
Tim van der Meij	cc110b8542	Merge pull request #14064 from Snuffleupagus/issue-13845 Fallback to font name matching, when checking for serif fonts (issue 13845)	2021-09-25 12:41:57 +02:00
Jonas Jenwald	b23b8d8a5d	Merge pull request #14074 from Snuffleupagus/issue-14046 [api-minor] Add basic support for RTL text-content in PopupAnnotations (issue 14046)	2021-09-25 12:37:44 +02:00
Tim van der Meij	36dc93fe5d	Merge pull request #14065 from Snuffleupagus/fewer-EXPORT_DATA_PROPERTIES [api-minor] Stop exporting, by default, a few additional Font properties (PR 11777 follow-up)	2021-09-25 12:25:56 +02:00
Jonas Jenwald	1dcd2f0cd3	[api-minor] Add basic support for RTL text-content in PopupAnnotations (issue 14046) In order to implement this, we utilize the existing `bidi` function to infer the text-direction of /T and /Contents entries. While this may not be perfect in cases where one PopupAnnotation mixes LTR and RTL languages, it should work well enough in most cases. To avoid having to add two new properties in lots of annotations, supplementing the existing `title`/`contents`-properties, this patch instead re-factors the existing code such that the properties are replaced by Objects (containing `str` and `dir`). Please note: In order avoid breaking existing third-party implementations, `GENERIC`-builds of the PDF.js library will still provide the old `title`/`contents`-properties on annotations returned by `PDFPageProxy.getAnnotations`.	2021-09-25 09:18:58 +02:00
calixteman	104e049338	Merge pull request #14073 from calixteman/bindItems XFA - Bind items when there's a bindItems entry	2021-09-24 09:01:52 -07:00
Calixte Denizet	97c1e076a1	XFA - Bind items when there's a bindItems entry - In the pdf in issue #14071, some select fields don't contain any values; - the corresponding node has a bindItems and a bind elements and _bindItems function was just not called.	2021-09-24 16:08:58 +02:00
Calixte Denizet	cd73e282eb	XFA - Create a new page in case of overflow - it aims to fix #14071; - a subform is overflowing and the the target in case of overflow is itself. In this case we must create a new page.	2021-09-24 14:57:55 +02:00
Calixte Denizet	4b0538d07a	Don't save anything in XFA entry if no XFA! (bug 1732344) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1732344 - rename some variables to have a more clear code; - and last but no least, add a unit test to test saving.	2021-09-23 19:51:23 +02:00
Jonas Jenwald	9acfe486d4	Fallback to font name matching, when checking for serif fonts (issue 13845) In order to handle fonts that specify completely bogus /Flags-entries, fallback to font name matching to determine if the font is a serif one.	2021-09-23 01:11:57 +02:00
Jonas Jenwald	e027748627	[api-minor] Stop exporting, by default, a few additional Font properties (PR 11777 follow-up) This is similar to the "isSymbolicFont"-property, which is no longer exported by default after PR 11777. Both "isMonospace" and "isSerifFont" are internal properties, used during font parsing and building of the glyph mapping on the worker-thread. However both of these properties are completely unused on the main-thread and/or in the API, and accessing them they will now require setting the `fontExtraProperties`-option when calling `getDocument`.	2021-09-23 00:44:43 +02:00
Jonas Jenwald	81a1c1cef7	Correctly validate URLs in XFA documents (bug 1731240) With this patch we'll ensure that only valid absolute URLs can be used in XFA documents, similar to the existing validation done for "regular" PDF documents. Furthermore, we'll also attempt to add a default protocol (i.e. `http`) to URLs beginning with "www." in XFA documents as well; this on its own is enough to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1731240	2021-09-21 21:21:01 +02:00
Jonas Jenwald	8ea27ce157	Tweak how fonts with an /Encoding are handled in `adjustToUnicode` (issue 14048, PR 13277 follow-up) Currently we only exclude /Encoding entries that also contains a /Differences array, which is the cause of the text-selection problem in the referenced issue. In order to address this we'll now also exclude /Encoding entries that contain one of the predefined named encodings, and no longer require that it also contains a /Differences array. Please note: This patch cases a small "regression" in the `bug1130815-text` test-case, however this is actually an improvement when compared with Adobe Reader and PDFium (in Google Chrome).	2021-09-18 22:44:25 +02:00
Tim van der Meij	83d3bb43f4	Merge pull request #14041 from Snuffleupagus/issue-9367 Support cmaps with only CID characters, when building the ToUnicode-map (issue 9367)	2021-09-18 16:47:06 +02:00
Calixte Denizet	2fc10727c5	XFA - Only warn about the wrong xfa type when there is an xfa thing	2021-09-18 15:44:05 +02:00
Jonas Jenwald	e3223b68fc	Extract some of the glyphMap handling, for non-embedded composite standard fonts, into a helper function This reduces some unnecessary duplication, since we currently have essentially the same code in a handful of places in the `Font.fallbackToSystemFont`-method.	2021-09-18 12:39:48 +02:00
Jonas Jenwald	ed73cf6d50	Support cmaps with only CID characters, when building the ToUnicode-map (issue 9367) In this particular case the `CMap`-data that we create contains only numbers, but no strings, which causes `PartialEvaluator.readToUnicode` to create a ToUnicode-map with only empty strings. Please note: This is yet another case where I don't know if it's necessarily the best and most correct solution, but it does fix the referenced issue.	2021-09-18 00:26:15 +02:00
Calixte Denizet	5bef8120e7	Annotation - For checkboxes, get field value from AS (if any) instead of V (bug 1722036) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1722036. - AS and V should share the same value for checkbox: it's at least what the specs say; - the pdf in the above bug opens correctly in Acrobat so it likely means that AS is chosen over V.	2021-09-17 13:04:16 +02:00
Jonas Jenwald	a11343e9af	Improve glyph mapping for non-embedded composite standard fonts with a /CIDToGIDMap (issue 11915) Please note: All of this feels very handwavy, but at least it passes all tests locally. Hopefully we have enough tests for this part of the font code. For non-embedded composite standard fonts with an "incomplete" /CIDToGIDMap, we'll now fallback to an explicitly defined /ToUnicode map even when that one happens to be an /Identity-H or /Identity-V map. The `Font.fallbackToSystemFont` method is unfortunately getting more and more special-cases, however that might be unavoidable given all the weird non-embedded fonts found in the wild :-(	2021-09-15 11:30:40 +02:00
Calixte Denizet	9812e35916	XFA - Don't create images for unsupported mime types	2021-09-14 10:55:25 +02:00
Jonas Jenwald	7025b9f859	[src/core/writer.js] Support `null` values in the `writeValue` function This fixes something that I noticed, having recently looked at both the `Lexer.getObj` and `writeValue` code. Please note that I unfortunately don't have an example of a form where saving fails without this patch. However, given its overall simplicity and that unit-tests are added, it's hopefully deemed useful to fix this potential issue pro-actively rather than waiting for a bug report. At this point one might, and rightly so, wonder if there's actually any real-world PDF documents where a `null` value is being used? Unfortunately the answer is yes, and we have a couple of examples in the test-suite (although none of those are related to forms); please see: `issue1015`, `issue2642`, `issue10402`, `issue12823`, `issue13823`, and `pr12564`.	2021-09-12 18:24:37 +02:00
Jonas Jenwald	5d578ea36a	[src/core/writer.js] Remove unnecessary string-wrapping for boolean values in `writeValue` (PR 13998 follow-up)	2021-09-12 15:45:45 +02:00
Jonas Jenwald	761519ef3f	Merge pull request #13998 from calixteman/bug1729971 Write boolean value when saving a form (bug 1729971)	2021-09-12 15:38:10 +02:00
Jonas Jenwald	a47844d1fc	Let `Lexer.getObj` return a dummy-`Cmd` for commands that start with a non-visible ASCII character (issue 13999) This way we avoid breaking badly generated PDF documents where a non-visible ASCII character is "glued" to a valid command.	2021-09-11 19:54:13 +02:00
Tim van der Meij	e97f01b17c	Merge pull request #13977 from Snuffleupagus/enqueueChunk-batch [api-minor] Reduce `postMessage` overhead, in `PartialEvaluator.getTextContent`, by sending text chunks in batches (issue 13962)	2021-09-11 13:34:07 +02:00
Jonas Jenwald	9ce63a6dc6	Merge pull request #13991 from brendandahl/interpolate Enable/disable image smoothing based on image interpolate value. (bug 1722191)	2021-09-11 10:02:53 +02:00
Brendan Dahl	f38fb42b42	Enable/disable image smoothing based on image interpolate value. (bug 1722191) While some of the output looks worse to my eye, this behavior more closely matches what I see when I open the PDFs in Adobe acrobat. Fixes: #4706, #9713, #8245, #1344	2021-09-10 14:23:35 -07:00
Calixte Denizet	474ab7c86d	Write boolean value when saving a form (bug 1729971) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1729971#c4.	2021-09-10 14:10:25 +02:00
Calixte Denizet	c5841b3794	XFA - Handle shorcut in SOM expression (issue #13994 )	2021-09-09 19:54:45 +02:00
Jonas Jenwald	45ddb12f61	Remove no-op `onPull`/`onCancel` streamSink callbacks from the "GetTextContent"-handler The `MessageHandler`-implementation already handles either of these callbacks being undefined, hence there's no particular reason (as far as I can tell) to add no-op functions here. Also, in a couple of `MessageHandler`-methods, utilize an already existing local variable more.	2021-09-09 00:01:10 +02:00
Jonas Jenwald	f90f9466e3	[api-minor] Reduce `postMessage` overhead, in `PartialEvaluator.getTextContent`, by sending text chunks in batches (issue 13962) Following the STR in the issue, this patch reduces the number of `PartialEvaluator.getTextContent`-related `postMessage`-calls by approximately 78 percent.[1] Note that by enforcing a relatively low value when batching text chunks, we should thus improve worst-case scenarios while not negatively affect all `textLayer` building. While working on these changes I noticed, thanks to our unit-tests, that the implementation of the `appendEOL` function unfortunately means that the number and content of the textItems could actually be affected by the particular chunking used. That seems extremely unfortunate, since in practice this means that the particular chunking used is thus observable through the API. Obviously that should be a completely internal implementation detail, which is why this patch also modifies `appendEOL` to mitigate that.[2] Given that this patch adds a minimum batch size in `enqueueChunk`, there's obviously nothing preventing it from becoming a lot larger then the limit (depending e.g. on the PDF structure and the CPU load/speed). While sending more text chunks at once isn't an issue in itself, it could become problematic at the main-thread during `textLayer` building. Note how both the `PartialEvaluator` and `CanvasGraphics` implementations utilize `Date.now()`-checks, to prevent long-running parsing/rendering from "hanging" the respective thread. In the `textLayer` building we don't utilize such a construction[3], and streaming of textContent is thus essentially acting as a simple stand-in for that functionality. Hence why we want to avoid choosing a too large minimum batch size, since that could thus indirectly affect main-thread performance negatively. --- [1] While it'd be possible to go even lower, that'd likely require more invasive re-factoring/changes to the `PartialEvaluator.getTextContent`-code to ensure that the batches don't become too large. [2] This should also, as far as I can tell, explain some of the regressions observed in the "enhance" text-selection tests back in PR 13257. Looking closer at the `appendEOL` function it should potentially be changed even more, however that should probably not be done here. [3] I'd really like to avoid implementing something like that for the `textLayer` building as well, given that it'd require adding a fair bit of complexity.	2021-09-09 00:01:07 +02:00
Jonas Jenwald	69034ab8dc	Improve glyph mapping for non-embedded composite standard fonts (issue 11088) For non-embedded CIDFontType2 fonts with a non-/Identity encoding, use the /ToUnicode data to improve the glyph mapping.	2021-09-08 15:15:33 +02:00
Tim van der Meij	680f33c31c	Merge pull request #13961 from Snuffleupagus/simpler-regexp Simplify some regular expressions	2021-09-04 15:39:30 +02:00
Jonas Jenwald	3ccf277f58	Fallback to the /ToUnicode map for TrueType fonts with (3, 1) and (1, 0) cmap-tables (issue 13316) In the PDF document some of the glyphs have bogus `differences`-entries[1] that cannot be resolved to valid glyph names, thus causing the glyph mapping to fail. My initial idea was to use a similar approach as in the `PartialEvaluator._simpleFontToUnicode`-method, to extract the charCodes from those entries, however it turned out that that didn't actually help in this case (the mapping was still wrong). To fix this I'm thus proposing that we fallback to the /ToUnicode map when no other useable data exists (e.g. no post-table), since it hopefully shouldn't make things any worse than leaving parts of the glyph map empty (which currently happens). --- [1] As can be seem below, some of the entries are completely normal while others are non-standard: ``` Differences (array) 0 = 65 1 = /g5167 2 = /space 3 = /g11927 4 = /g17737 5 = /g11540 6 = /g2180 7 = /K 8 = /P 9 = /two 10 = /zero 11 = /one 12 = /five 13 = /four 14 = /g6932 15 = /g7246 16 = /g1691 17 = /g2343 18 = /g14792 19 = /g3325 20 = /g4280 21 = /g20383 22 = /g18166 23 = /g16988 24 = /g17943 25 = /g19223 26 = /g10830 27 = 97 28 = /g982 29 = /g1226 30 = /g5059 31 = /g2677 32 = /g1042 33 = /g11568 34 = /L 35 = /three 36 = /seven 37 = /g2364 38 = /g12063 39 = /g5356 40 = /g2173 41 = /g17877 42 = /g7273 43 = /g7647 44 = /g7224 45 = /g19327 46 = /g5054 47 = /g2342 48 = /g10136 49 = /g6856 50 = /g13381 51 = /g7257 52 = /g12093 53 = /g2359 ```	2021-09-04 07:38:22 +02:00
Brendan Dahl	da15dbf962	Merge pull request #13698 from linfangrong/master [FIX] fix jpx tag tree decode (issue 11957)	2021-09-03 10:00:19 -07:00
Brendan Dahl	a8ce15a2d7	Merge pull request #13966 from calixteman/no_ns XFA - Created data node mustn't belong to datasets namespace	2021-09-03 09:59:40 -07:00
Calixte Denizet	77b9657e57	XFA - Overwrite AcroForm dictionary when saving if no datasets in XFA (bug 1720179) - aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1720179 - in some pdfs the XFA array in AcroForm dictionary doesn't contain an entry for 'datasets' (which contains saved data), so basically this patch allows to overwrite the AcroForm dictionary with an updated XFA array when doing an incremental update.	2021-09-03 17:04:03 +02:00
Calixte Denizet	57ae3a5a76	XFA - Created data node mustn't belong to datasets namespace - when some named nodes in the template don't have their counterpart in datasets we create some nodes: the main node mustn't belong to the datasets namespace because it doesn't make sense and Acrobat Reader isn't able to read pdf with such nodes. - so created nodes under a datasets node have a namespaceId set to -1 and consequently when serialized no namespace prefix will appear.	2021-09-03 15:43:25 +02:00
Brendan Dahl	804abb3786	Merge pull request #13959 from calixteman/encrypt Correctly pad strings when saving an encrypted pdf (bug 1726789)	2021-09-02 11:41:02 -07:00

1 2 3 4 5 ...

2400 Commits