Commit Graph

4962 Commits

Author SHA1 Message Date
Tim van der Meij
5254676ef3
Merge pull request #14055 from Snuffleupagus/PDF_TO_CSS_UNITS
Add `PDF_TO_CSS_UNITS` to the `PixelsPerInch`-structure
2021-09-22 22:24:51 +02:00
Jonas Jenwald
81a1c1cef7 Correctly validate URLs in XFA documents (bug 1731240)
With this patch we'll ensure that only valid absolute URLs can be used in XFA documents, similar to the existing validation done for "regular" PDF documents.
Furthermore, we'll also attempt to add a default protocol (i.e. `http`) to URLs beginning with "www." in XFA documents as well; this on its own is enough to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1731240
2021-09-21 21:21:01 +02:00
Jonas Jenwald
3e550f392a Add PDF_TO_CSS_UNITS to the PixelsPerInch-structure
Rather than re-computing this value in a number of different places throughout the code-base[1], we can expose this in the API via the existing `PixelsPerInch`-structure instead.
There's also been feature requests asking for the old `CSS_UNITS` viewer constant to be made accessible, such that it could be used in third-party implementations.

I suppose that it could be argued that it's somewhat confusing to place a unitless property in `PixelsPerInch`, however given that the `PDF_TO_CSS_UNITS`-property is defined strictly in terms of the existing properties this is hopefully deemed reasonable.

---
[1] These include:
 - The viewer, with the `CSS_UNITS` name.
 - The reference-tests.
 - The display-layer, when rendering images; see PR 13991.
2021-09-20 13:20:09 +02:00
Jonas Jenwald
8ea27ce157 Tweak how fonts with an /Encoding are handled in adjustToUnicode (issue 14048, PR 13277 follow-up)
Currently we only exclude /Encoding entries that also contains a /Differences array, which is the cause of the text-selection problem in the referenced issue.
In order to address this we'll now also exclude /Encoding entries that contain one of the predefined *named* encodings, and no longer require that it also contains a /Differences array.

*Please note:* This patch cases a small "regression" in the `bug1130815-text` test-case, however this is actually an improvement when compared with Adobe Reader and PDFium (in Google Chrome).
2021-09-18 22:44:25 +02:00
Tim van der Meij
83d3bb43f4
Merge pull request #14041 from Snuffleupagus/issue-9367
Support cmaps with only CID characters, when building the ToUnicode-map (issue 9367)
2021-09-18 16:47:06 +02:00
Jonas Jenwald
20eb6ca2ec
Merge pull request #14044 from calixteman/bug1719148
Annotations - Avoid empty value in text field when storage contains something for it (bug 1719148)
2021-09-18 16:31:45 +02:00
Jonas Jenwald
6634afd646
Merge pull request #14045 from calixteman/noise
XFA - Only warn about the wrong xfa type when there is an xfa thing
2021-09-18 16:13:20 +02:00
Tim van der Meij
c870fb489e
Merge pull request #14013 from Snuffleupagus/api-unittest-instanceof
Improve the API unit-tests, and try to expose more API-functionality in the TypeScript definitions
2021-09-18 16:08:19 +02:00
Calixte Denizet
2fc10727c5 XFA - Only warn about the wrong xfa type when there is an xfa thing 2021-09-18 15:44:05 +02:00
calixteman
ffa2572bdf
Merge pull request #14038 from calixteman/saveas
JS - Implement few possibilities with app.execMenuItem (bug 1724399)
2021-09-18 15:33:03 +02:00
Calixte Denizet
eb762ad624 Annotations - Avoid empty value in text field when storage contains something for it (bug 1719148)
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1719148;
  - JS can set a property for a non-rendered annotation using the annotationStorage but the other unset default properties must be used when the annotation is finally rendered;
  - so this patch just adds the properties already set in the annotationStorage to the default value.
2021-09-18 15:08:22 +02:00
Calixte Denizet
bfd570038d JS - Implement few possibilities with app.execMenuItem (bug 1724399)
- it aims to fix: https://bugzilla.mozilla.org/show_bug.cgi?id=1724399.
2021-09-18 13:52:32 +02:00
Jonas Jenwald
e3223b68fc Extract some of the glyphMap handling, for non-embedded composite standard fonts, into a helper function
This reduces some unnecessary duplication, since we currently have essentially the same code in a handful of places in the `Font.fallbackToSystemFont`-method.
2021-09-18 12:39:48 +02:00
Jonas Jenwald
ed73cf6d50 Support cmaps with only CID characters, when building the ToUnicode-map (issue 9367)
In this particular case the `CMap`-data that we create contains only numbers, but no strings, which causes `PartialEvaluator.readToUnicode` to create a ToUnicode-map with only empty strings.

*Please note:* This is yet another case where I don't know if it's necessarily the best and most correct solution, but it does fix the referenced issue.
2021-09-18 00:26:15 +02:00
Calixte Denizet
e87c12bf34 JS - Avoid the Stay/Leave popup when clicking on a button with a JS action
- it aims to fix #14039.
2021-09-17 21:04:07 +02:00
Calixte Denizet
5bef8120e7 Annotation - For checkboxes, get field value from AS (if any) instead of V (bug 1722036)
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1722036.
  - AS and V should share the same value for checkbox: it's at least what the specs say;
  - the pdf in the above bug opens correctly in Acrobat so it likely means that AS is chosen over V.
2021-09-17 13:04:16 +02:00
Brendan Dahl
d6a27860e3
Merge pull request #14025 from Snuffleupagus/issue-11915
Improve glyph mapping for non-embedded composite standard fonts with a /CIDToGIDMap (issue 11915)
2021-09-16 08:06:35 -07:00
Calixte Denizet
a3aa6dd6ab Annotation - Checkboxes with the same name and export values must be in unison
- it aims to fix #14024.
  - this patch adds an attribute `acroformExportValue` to the HTML input in order to set the checked attribute in taking into account the exportValue for the checkboxes with the same name.
2021-09-15 15:30:24 +02:00
Jonas Jenwald
a11343e9af Improve glyph mapping for non-embedded composite standard fonts with a /CIDToGIDMap (issue 11915)
*Please note:* All of this feels very handwavy, but at least it passes all tests locally. Hopefully we have enough tests for this part of the font code.

For non-embedded composite standard fonts with an "incomplete" /CIDToGIDMap, we'll now fallback to an *explicitly defined* /ToUnicode map even when that one happens to be an /Identity-H or /Identity-V map.

The `Font.fallbackToSystemFont` method is unfortunately getting more and more special-cases, however that might be unavoidable given all the weird non-embedded fonts found in the wild :-(
2021-09-15 11:30:40 +02:00
Calixte Denizet
9812e35916 XFA - Don't create images for unsupported mime types 2021-09-14 10:55:25 +02:00
Jonas Jenwald
95057a4e56 Try to expose more API-functionality in the TypeScript definitions
While these types apparently makes sense in TypeScript environments, we really don't want to extend the *public* API by simply exporting the relevant classes directly in `src/pdf.js` (since they should never be called/initialized manually).

Please see e.g. issue 12384 where this was first requested, and note that a possible work-around was also provided there. This patch simply implements that work-around[1], which will hopefully be helpful to TypeScript users.

---
[1] Based on the discussion in PR 13957, the two previous patches appear to be necessary for this to actually work.
2021-09-13 13:57:56 +02:00
Jonas Jenwald
d854352cd5 Improve the API unit-tests by checking that PDFPageProxy.render returns a RenderTask-instance
This is similar to existing unit-tests, which checks for `PDFDocumentProxy`- and `PDFPageProxy`-instances.
2021-09-13 13:34:37 +02:00
Jonas Jenwald
fa7a607d33 Improve the API unit-tests by checking that getDocument returns a PDFDocumentLoadingTask-instance
This is similar to existing unit-tests, which checks for `PDFDocumentProxy`- and `PDFPageProxy`-instances.
2021-09-13 13:34:28 +02:00
Jonas Jenwald
7025b9f859 [src/core/writer.js] Support null values in the writeValue function
*This fixes something that I noticed, having recently looked at both the `Lexer.getObj` and `writeValue` code.*

Please note that I unfortunately don't have an example of a form where saving fails without this patch. However, given its overall simplicity and that unit-tests are added, it's hopefully deemed useful to fix this potential issue pro-actively rather than waiting for a bug report.

At this point one might, and rightly so, wonder if there's actually any real-world PDF documents where a `null` value is being used?
Unfortunately the answer is *yes*, and we have a couple of examples in the test-suite (although none of those are related to forms); please see: `issue1015`, `issue2642`, `issue10402`, `issue12823`, `issue13823`, and `pr12564`.
2021-09-12 18:24:37 +02:00
Jonas Jenwald
5d578ea36a [src/core/writer.js] Remove unnecessary string-wrapping for boolean values in writeValue (PR 13998 follow-up) 2021-09-12 15:45:45 +02:00
Jonas Jenwald
761519ef3f
Merge pull request #13998 from calixteman/bug1729971
Write boolean value when saving a form (bug 1729971)
2021-09-12 15:38:10 +02:00
Jonas Jenwald
a47844d1fc Let Lexer.getObj return a dummy-Cmd for commands that start with a non-visible ASCII character (issue 13999)
This way we avoid breaking badly generated PDF documents where a non-visible ASCII character is "glued" to a valid command.
2021-09-11 19:54:13 +02:00
Tim van der Meij
e97f01b17c
Merge pull request #13977 from Snuffleupagus/enqueueChunk-batch
[api-minor] Reduce `postMessage` overhead, in `PartialEvaluator.getTextContent`, by sending text chunks in batches (issue 13962)
2021-09-11 13:34:07 +02:00
Jonas Jenwald
0e54f568fb Re-factor the CSS_PIXELS_PER_INCH/PDF_PIXELS_PER_INCH exports (PR 13991 follow-up)
For improved maintainability, since these constants are being exposed in the official API, this patch moves them into an Object instead.
2021-09-11 11:15:25 +02:00
Jonas Jenwald
bd51bbfd16 Remove mozImageSmoothingEnabled fallback in CanvasGraphics.endGroup
This was added all the way back in PR 2936, however it's been unnecessary ever since Firefox 51 (released on 2017-01-24); please see the MDN compatibility data:
https://developer.mozilla.org/en-US/docs/Web/API/CanvasRenderingContext2D/imageSmoothingEnabled#browser_compatibility
2021-09-11 10:30:39 +02:00
Jonas Jenwald
9ce63a6dc6
Merge pull request #13991 from brendandahl/interpolate
Enable/disable image smoothing based on image interpolate value. (bug 1722191)
2021-09-11 10:02:53 +02:00
Brendan Dahl
f38fb42b42 Enable/disable image smoothing based on image interpolate value. (bug 1722191)
While some of the output looks worse to my eye, this behavior more
closely matches what I see when I open the PDFs in Adobe acrobat.

Fixes: #4706, #9713, #8245, #1344
2021-09-10 14:23:35 -07:00
Calixte Denizet
474ab7c86d Write boolean value when saving a form (bug 1729971)
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1729971#c4.
2021-09-10 14:10:25 +02:00
calixteman
57b80074a2
Merge pull request #13995 from calixteman/xfa_record
XFA - Handle $record shorcut in SOM expression (issue #13994)
2021-09-10 13:57:50 +02:00
Calixte Denizet
c5841b3794 XFA - Handle shorcut in SOM expression (issue #13994) 2021-09-09 19:54:45 +02:00
Calixte Denizet
623860bf8f XFA - Remove the checked attribute from the checkbox when unchecked (bug 1729877)
- it aims to fix: https://bugzilla.mozilla.org/show_bug.cgi?id=1729877.
2021-09-09 19:14:16 +02:00
Jonas Jenwald
45ddb12f61 Remove no-op onPull/onCancel streamSink callbacks from the "GetTextContent"-handler
The `MessageHandler`-implementation already handles either of these callbacks being undefined, hence there's no particular reason (as far as I can tell) to add no-op functions here.

Also, in a couple of `MessageHandler`-methods, utilize an already existing local variable more.
2021-09-09 00:01:10 +02:00
Jonas Jenwald
f90f9466e3 [api-minor] Reduce postMessage overhead, in PartialEvaluator.getTextContent, by sending text chunks in batches (issue 13962)
Following the STR in the issue, this patch reduces the number of `PartialEvaluator.getTextContent`-related `postMessage`-calls by approximately 78 percent.[1]
Note that by enforcing a relatively low value when batching text chunks, we should thus improve worst-case scenarios while not negatively affect all `textLayer` building.

While working on these changes I noticed, thanks to our unit-tests, that the implementation of the `appendEOL` function unfortunately means that the number and content of the textItems could actually be affected by the particular chunking used.
That seems *extremely* unfortunate, since in practice this means that the particular chunking used is thus observable through the API. Obviously that should be a completely internal implementation detail, which is why this patch also modifies `appendEOL` to mitigate that.[2]

Given that this patch adds a *minimum* batch size in `enqueueChunk`, there's obviously nothing preventing it from becoming a lot larger then the limit (depending e.g. on the PDF structure and the CPU load/speed).
While sending more text chunks at once isn't an issue in itself, it could become problematic at the main-thread during `textLayer` building. Note how both the `PartialEvaluator` and `CanvasGraphics` implementations utilize `Date.now()`-checks, to prevent long-running parsing/rendering from "hanging" the respective thread. In the `textLayer` building we don't utilize such a construction[3], and streaming of textContent is thus essentially acting as a *simple* stand-in for that functionality.
Hence why we want to avoid choosing a too large minimum batch size, since that could thus indirectly affect main-thread performance negatively.

---
[1] While it'd be possible to go even lower, that'd likely require more invasive re-factoring/changes to the `PartialEvaluator.getTextContent`-code to ensure that the batches don't become too large.

[2] This should also, as far as I can tell, explain some of the regressions observed in the "enhance" text-selection tests back in PR 13257.
    Looking closer at the `appendEOL` function it should potentially be changed even more, however that should probably not be done here.

[3] I'd really like to avoid implementing something like that for the `textLayer` building as well, given that it'd require adding a fair bit of complexity.
2021-09-09 00:01:07 +02:00
Jonas Jenwald
69034ab8dc Improve glyph mapping for non-embedded composite standard fonts (issue 11088)
For non-embedded CIDFontType2 fonts with a non-/Identity encoding, use the /ToUnicode data to improve the glyph mapping.
2021-09-08 15:15:33 +02:00
Jonas Jenwald
4c1b586dd2 Reduce the size of TextLayerRenderTask._textDivProperties in "regular" text-selection mode
While these changes will obviously not have a significant effect on overall memory usage, it cannot hurt as far as I'm concerned. This patch makes the following changes:
 - Clear out `_textDivProperties` once rendering is done, since those properties are only necessary to keep alive when *enhanced* text-selection is being used.

 - Reduce the size of the `_textDivProperties`-entries by default, since a majority of the properties are only relevant when *enhanced* text-selection is being used.
2021-09-05 12:12:34 +02:00
Tim van der Meij
1b20f61b56
Merge pull request #13972 from Snuffleupagus/issue-13971
Treat all content as visible when no optional content groups are defined (issue 13971)
2021-09-04 15:53:44 +02:00
Tim van der Meij
680f33c31c
Merge pull request #13961 from Snuffleupagus/simpler-regexp
Simplify some regular expressions
2021-09-04 15:39:30 +02:00
Jonas Jenwald
6318ccf6d2 Treat all content as visible when no optional content groups are defined (issue 13971)
In the referenced PDF document the /Contents stream contains MarkedContent-operators, however no optional content dictionary exists; according to [the specification](https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G7.3883825):

> Null values or references to deleted objects shall be ignored. If this entry is
  not present, is an empty array, or contains references only to null or deleted
  objects,  the  membership  dictionary  shall  have  no  effect  on  the  visibility  of
  any content.
2021-09-04 08:13:37 +02:00
Jonas Jenwald
3ccf277f58 Fallback to the /ToUnicode map for TrueType fonts with (3, 1) and (1, 0) cmap-tables (issue 13316)
In the PDF document some of the glyphs have bogus `differences`-entries[1] that cannot be resolved to valid glyph names, thus causing the glyph mapping to fail.
My initial idea was to use a similar approach as in the `PartialEvaluator._simpleFontToUnicode`-method, to extract the charCodes from those entries, however it turned out that that didn't actually help in this case (the mapping was still wrong).

To fix this I'm thus proposing that we fallback to the /ToUnicode map when no other useable data exists (e.g. no post-table), since it *hopefully* shouldn't make things any worse than leaving parts of the glyph map empty (which currently happens).

---
[1] As can be seem below, some of the entries are completely normal while others are non-standard:
```
Differences (array)
    0 = 65
    1 = /g5167
    2 = /space
    3 = /g11927
    4 = /g17737
    5 = /g11540
    6 = /g2180
    7 = /K
    8 = /P
    9 = /two
    10 = /zero
    11 = /one
    12 = /five
    13 = /four
    14 = /g6932
    15 = /g7246
    16 = /g1691
    17 = /g2343
    18 = /g14792
    19 = /g3325
    20 = /g4280
    21 = /g20383
    22 = /g18166
    23 = /g16988
    24 = /g17943
    25 = /g19223
    26 = /g10830
    27 = 97
    28 = /g982
    29 = /g1226
    30 = /g5059
    31 = /g2677
    32 = /g1042
    33 = /g11568
    34 = /L
    35 = /three
    36 = /seven
    37 = /g2364
    38 = /g12063
    39 = /g5356
    40 = /g2173
    41 = /g17877
    42 = /g7273
    43 = /g7647
    44 = /g7224
    45 = /g19327
    46 = /g5054
    47 = /g2342
    48 = /g10136
    49 = /g6856
    50 = /g13381
    51 = /g7257
    52 = /g12093
    53 = /g2359
```
2021-09-04 07:38:22 +02:00
Brendan Dahl
da15dbf962
Merge pull request #13698 from linfangrong/master
[FIX] fix jpx tag tree decode (issue 11957)
2021-09-03 10:00:19 -07:00
Brendan Dahl
a8ce15a2d7
Merge pull request #13966 from calixteman/no_ns
XFA - Created data node mustn't belong to datasets namespace
2021-09-03 09:59:40 -07:00
Calixte Denizet
77b9657e57 XFA - Overwrite AcroForm dictionary when saving if no datasets in XFA (bug 1720179)
- aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1720179
  - in some pdfs the XFA array in AcroForm dictionary doesn't contain an entry for 'datasets' (which contains saved data), so basically this patch allows to overwrite the AcroForm dictionary with an updated XFA array when doing an incremental update.
2021-09-03 17:04:03 +02:00
Calixte Denizet
57ae3a5a76 XFA - Created data node mustn't belong to datasets namespace
- when some named nodes in the template don't have their counterpart in datasets we create some nodes: the main node mustn't belong to the datasets namespace because it doesn't make sense and Acrobat Reader isn't able to read pdf with such nodes.
  - so created nodes under a datasets node have a namespaceId set to -1 and consequently when serialized no namespace prefix will appear.
2021-09-03 15:43:25 +02:00
Brendan Dahl
804abb3786
Merge pull request #13959 from calixteman/encrypt
Correctly pad strings when saving an encrypted pdf (bug 1726789)
2021-09-02 11:41:02 -07:00
Jonas Jenwald
c42887221a Simplify some regular expressions
There's a fair number of regular expressions througout the code-base which are slightly more verbose than strictly necessary, in particular:
 - We have a lot of regular expressions that use `[0-9]` explicitly, and those can be simplified to use `\d` instead.
 - We have one instance of a regular expression containing a `A-Za-z0-9_` sequence, which can be simplified to use `\w` instead.
2021-09-02 11:50:42 +02:00