Commit Graph

4923 Commits

Author SHA1 Message Date
Jonas Jenwald
4c1b586dd2 Reduce the size of TextLayerRenderTask._textDivProperties in "regular" text-selection mode
While these changes will obviously not have a significant effect on overall memory usage, it cannot hurt as far as I'm concerned. This patch makes the following changes:
 - Clear out `_textDivProperties` once rendering is done, since those properties are only necessary to keep alive when *enhanced* text-selection is being used.

 - Reduce the size of the `_textDivProperties`-entries by default, since a majority of the properties are only relevant when *enhanced* text-selection is being used.
2021-09-05 12:12:34 +02:00
Tim van der Meij
1b20f61b56
Merge pull request #13972 from Snuffleupagus/issue-13971
Treat all content as visible when no optional content groups are defined (issue 13971)
2021-09-04 15:53:44 +02:00
Tim van der Meij
680f33c31c
Merge pull request #13961 from Snuffleupagus/simpler-regexp
Simplify some regular expressions
2021-09-04 15:39:30 +02:00
Jonas Jenwald
6318ccf6d2 Treat all content as visible when no optional content groups are defined (issue 13971)
In the referenced PDF document the /Contents stream contains MarkedContent-operators, however no optional content dictionary exists; according to [the specification](https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G7.3883825):

> Null values or references to deleted objects shall be ignored. If this entry is
  not present, is an empty array, or contains references only to null or deleted
  objects,  the  membership  dictionary  shall  have  no  effect  on  the  visibility  of
  any content.
2021-09-04 08:13:37 +02:00
Jonas Jenwald
3ccf277f58 Fallback to the /ToUnicode map for TrueType fonts with (3, 1) and (1, 0) cmap-tables (issue 13316)
In the PDF document some of the glyphs have bogus `differences`-entries[1] that cannot be resolved to valid glyph names, thus causing the glyph mapping to fail.
My initial idea was to use a similar approach as in the `PartialEvaluator._simpleFontToUnicode`-method, to extract the charCodes from those entries, however it turned out that that didn't actually help in this case (the mapping was still wrong).

To fix this I'm thus proposing that we fallback to the /ToUnicode map when no other useable data exists (e.g. no post-table), since it *hopefully* shouldn't make things any worse than leaving parts of the glyph map empty (which currently happens).

---
[1] As can be seem below, some of the entries are completely normal while others are non-standard:
```
Differences (array)
    0 = 65
    1 = /g5167
    2 = /space
    3 = /g11927
    4 = /g17737
    5 = /g11540
    6 = /g2180
    7 = /K
    8 = /P
    9 = /two
    10 = /zero
    11 = /one
    12 = /five
    13 = /four
    14 = /g6932
    15 = /g7246
    16 = /g1691
    17 = /g2343
    18 = /g14792
    19 = /g3325
    20 = /g4280
    21 = /g20383
    22 = /g18166
    23 = /g16988
    24 = /g17943
    25 = /g19223
    26 = /g10830
    27 = 97
    28 = /g982
    29 = /g1226
    30 = /g5059
    31 = /g2677
    32 = /g1042
    33 = /g11568
    34 = /L
    35 = /three
    36 = /seven
    37 = /g2364
    38 = /g12063
    39 = /g5356
    40 = /g2173
    41 = /g17877
    42 = /g7273
    43 = /g7647
    44 = /g7224
    45 = /g19327
    46 = /g5054
    47 = /g2342
    48 = /g10136
    49 = /g6856
    50 = /g13381
    51 = /g7257
    52 = /g12093
    53 = /g2359
```
2021-09-04 07:38:22 +02:00
Brendan Dahl
da15dbf962
Merge pull request #13698 from linfangrong/master
[FIX] fix jpx tag tree decode (issue 11957)
2021-09-03 10:00:19 -07:00
Brendan Dahl
a8ce15a2d7
Merge pull request #13966 from calixteman/no_ns
XFA - Created data node mustn't belong to datasets namespace
2021-09-03 09:59:40 -07:00
Calixte Denizet
77b9657e57 XFA - Overwrite AcroForm dictionary when saving if no datasets in XFA (bug 1720179)
- aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1720179
  - in some pdfs the XFA array in AcroForm dictionary doesn't contain an entry for 'datasets' (which contains saved data), so basically this patch allows to overwrite the AcroForm dictionary with an updated XFA array when doing an incremental update.
2021-09-03 17:04:03 +02:00
Calixte Denizet
57ae3a5a76 XFA - Created data node mustn't belong to datasets namespace
- when some named nodes in the template don't have their counterpart in datasets we create some nodes: the main node mustn't belong to the datasets namespace because it doesn't make sense and Acrobat Reader isn't able to read pdf with such nodes.
  - so created nodes under a datasets node have a namespaceId set to -1 and consequently when serialized no namespace prefix will appear.
2021-09-03 15:43:25 +02:00
Brendan Dahl
804abb3786
Merge pull request #13959 from calixteman/encrypt
Correctly pad strings when saving an encrypted pdf (bug 1726789)
2021-09-02 11:41:02 -07:00
Jonas Jenwald
c42887221a Simplify some regular expressions
There's a fair number of regular expressions througout the code-base which are slightly more verbose than strictly necessary, in particular:
 - We have a lot of regular expressions that use `[0-9]` explicitly, and those can be simplified to use `\d` instead.
 - We have one instance of a regular expression containing a `A-Za-z0-9_` sequence, which can be simplified to use `\w` instead.
2021-09-02 11:50:42 +02:00
Calixte Denizet
9619bf92be Correctly pad strings when saving an encrypted pdf (bug 1726789) 2021-09-02 10:37:21 +02:00
Tim van der Meij
0a366dda6a
Merge pull request #13955 from Snuffleupagus/issue-13433
Always prefer the post-table for TrueType fonts with (0, x) cmap-tables (issue 13433)
2021-09-01 21:46:34 +02:00
Tim van der Meij
19ce2de6f7
Merge pull request #13952 from Snuffleupagus/ItcSymbol
Extend `getNonStdFontMap` for non-embedded versions of the ItcSymbol font (issue 11532)
2021-09-01 21:38:59 +02:00
Jonas Jenwald
b7b6076294 Always prefer the post-table for TrueType fonts with (0, x) cmap-tables (issue 13433)
While I don't know if this is necessarily the "correct" solution, it does fix issue 13433 without breaking any of the existing reference-tests.
2021-09-01 12:35:49 +02:00
Jonas Jenwald
ba9f004097 Extend getNonStdFontMap for non-embedded versions of the ItcSymbol font (issue 11532)
Despite its name, the fonts in ItcSymbol-family are "regular" fonts and not Symbol ones. However, given that the font name contains the word "Symbol" we ended up picking the wrong code-path in the `Font.fallbackToSystemFont`-method.

*Please note:* While this patch ensures that the text becomes readable, by falling back a standard font, the rendering will obviously not be perfect. However, that's the PDF generators "fault" since non-embedded fonts cannot be guaranteed to render correctly in all environments.
2021-08-31 23:21:16 +02:00
Jonas Jenwald
1f56451d56 Implement PDFNetworkStreamRangeRequestReader._onError, to handle range request errors with XMLHttpRequest (issue 9883)
Given that the Fetch API is normally being used now, these changes are probably less important now than they used to be. However, given that it's simple enough to implement this I figured why not just fix issue 9883 (better late than never I suppose).
2021-08-31 10:23:57 +02:00
Jonas Jenwald
bd9a92a161 Use optional chaining more in the src/display/network.js file
Also changes the different `_onDone`/`_onProgress` methods to use consistent parameter names, and some other small improvements.
2021-08-31 10:23:54 +02:00
linfangrong
369f1899c6 [FIX] fix jpx tag tree decode (issue 11957) 2021-08-31 11:44:26 +08:00
Brendan Dahl
a7f807b059 Only use base encoding if it's populated. (bug 1727053)
The font dict in this file has an encoding entry, but only specifies a
differences map. The base encoding is empty in this case and shouldn't
be used.
2021-08-30 12:51:59 -07:00
Brendan Dahl
306119b12a
Merge pull request #13932 from Snuffleupagus/oc-images
Support Optional Content in Image-/XObjects (issue 13931)
2021-08-30 10:10:14 -07:00
Jonas Jenwald
cf0ccc4bab
Merge pull request #13937 from overleaf/jpa-fix-error-handling
Fix handling of fetch errors
2021-08-30 15:50:03 +02:00
Jakob Ackermann
291ffd3059
Fix handling of fetch errors
Testing:
- delete the pdf file while the initial request is inflight
- delete the pdf file after the initial request has finished

Repeat for a small file and large file, exercising both one-off and
 chunked transports.
2021-08-30 12:43:28 +01:00
Tim van der Meij
954e1a1694
Merge pull request #13943 from Snuffleupagus/api-more-async
Use `async` a bit more in the API
2021-08-29 14:34:14 +02:00
Jonas Jenwald
ce3f5ea2bf Use async a bit more in the API
This patch changes the `PDFDocumentLoadingTask.destroy`-method and the `_fetchDocument`-function to be `async`, which slightly simplifies the relevant code.

Furthermore, remove the catch-handler from the `WorkerTransport.getPageIndex`-method since it's no longer needed. Given that the `MessageHandler` is nowadays wrapping every possible Exception, it's no longer necessary to try and re-wrap the reason here.
2021-08-29 12:31:28 +02:00
Jonas Jenwald
9ea3fa0747 Ensure that PasswordException is handled correctly in the wrapReason function
While running the unit-tests with some logging statements added to this code, I noticed that `PasswordException` was missing from the list of potential Errors that could be passed to the `wrapReason` function.
2021-08-28 12:24:12 +02:00
Tim van der Meij
153d058b3a
Merge pull request #13933 from brendandahl/xfa-checkbox2
Fix saving of XFA checkboxes. (bug 1726381)
2021-08-27 22:45:44 +02:00
Jonas Jenwald
b34d2cdc42 Ensure that beginMarkedContentProps/endMarkedContent-operators, for /XObjects, are balanced in corrupt documents (PR 13854 follow-up)
Something that I *just* realized is that while PR 13854 fixed an issue as reported, it could still cause bugs in other similarily broken documents since we'll not insert a matching endMarkedContent-operator in the operatorList.
2021-08-26 17:05:30 +02:00
Jonas Jenwald
853b1172a1 Support Optional Content in Image-/XObjects (issue 13931)
Currently, in the `PartialEvaluator`, we only support Optional Content in Form-/XObjects. Hence this patch adds support for Image-/XObjects as well, which looks like a simple oversight in PR 12095 since the canvas-implementation already contains the necessary code to support this.
2021-08-26 16:54:15 +02:00
Brendan Dahl
6d2193a812 Fix saving of XFA checkboxes. (bug 1726381)
Previously were were always setting the storage value to the on value.
2021-08-24 15:53:55 -07:00
Jonas Jenwald
2a0ad8e696 Add deprecation warnings for the renderInteractiveForms and includeAnnotationStorage options, in PDFPageProxy.render
*This is done separately from the previous patch, to make it easier to revert these changes once they've been included in a couple of releases.*

Please note that because these two options are mutually exclusive, which is a large part of the reason for the previous patch, it's not guaranteed that the fallback-values will always be correct in every situation (but it's the best that we can do).
2021-08-24 01:40:12 +02:00
Jonas Jenwald
41efa3c071 [api-minor] Introduce a new annotationMode-option, in PDFPageProxy.{render, getOperatorList}
*This is a follow-up to PRs 13867 and 13899.*

This patch is tagged `api-minor` for the following reasons:
 - It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.

 - For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.

 - It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.

---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
2021-08-24 01:13:02 +02:00
Brendan Dahl
56e7bb626c
Merge pull request #13660 from calixteman/no_xfaf
XFA - Disable xfa rendering for XFAF pdfs
2021-08-23 12:30:29 -07:00
Calixte Denizet
04573d2dc8 XFA - Disable xfa rendering for XFAF pdfs
- we'll implement XFAF support later.
2021-08-23 12:18:20 -07:00
Brendan Dahl
bf5a45ce6d
Merge pull request #13908 from brendandahl/xfa-find
[api-minor] XFA - Support text search in XFA documents.
2021-08-23 08:53:02 -07:00
Brendan Dahl
bb47128864 XFA - Support text search in XFA documents.
Moves the logic out of TextLayerBuilder to handle
highlighting matches into a new separate class `TextHighlighter`
that can be used with regular PDFs and XFA PDFs.

To mimic the current find functionality in XFA, two arrays
from the XFA rendering are created to get the text content
and map those to DOM nodes.

Fixes #13878
2021-08-23 08:44:20 -07:00
Tim van der Meij
83e1064360
Merge pull request #13920 from Snuffleupagus/issue-13916
Extend the glyph maps for standard respectively Calibri fonts (issue 13916)
2021-08-21 15:05:08 +02:00
Tim van der Meij
db11ba024d
Merge pull request #13899 from Snuffleupagus/includeAnnotationStorage-fix-caching
[Regression] Re-factor the *internal* `includeAnnotationStorage` handling, since it's currently subtly wrong
2021-08-21 15:04:28 +02:00
Jonas Jenwald
ac27f96987 Extend the glyph maps for standard respectively Calibri fonts (issue 13916) 2021-08-21 00:48:38 +02:00
Jonas Jenwald
5f25fea0fe Re-factor the LocalTilingPatternCache to cache by Ref rather than Name (PR 12458 follow-up, issue 13780)
This way there cannot be any *incorrect* cache hits, since Refs are guaranteed to be unique.
Please note that the reason for caching by Ref rather than doing something along the lines of the `localShadingPatternCache` (which uses a `Map` directly), is that TilingPatterns are streams and those cannot be cached on the `XRef`-instance (this way we avoid unnecessary parsing).
2021-08-18 12:49:01 +02:00
Jonas Jenwald
8ee5acd85d Tweak handling of the onlyRefs-option in the BaseLocalCache class 2021-08-18 12:24:51 +02:00
Jonas Jenwald
a7f0301f21 [Regression] Re-factor the *internal* includeAnnotationStorage handling, since it's currently subtly wrong
*This patch is very similar to the recently fixed `renderInteractiveForms`-options, see PR 13867.*
As far as I can tell, this *subtle* bug has existed ever since `AnnotationStorage`-support was first added in PR 12106 (a little over a year ago).

The value of the `includeAnnotationStorage`-option, as passed to the `PDFPageProxy.render` method, will (potentially) affect the size/content of the operatorList that's returned from the worker (for documents with forms).
Given that operatorLists will generally, unless they contain huge images, be cached in the API, repeated `PDFPageProxy.render` calls where the form-data has been changed by the user in between, can thus *wrongly* return a cached operatorList.

In the viewer we're only using the `includeAnnotationStorage`-option when printing, which is probably why this has gone unnoticed for so long. Note that we, for performance reasons, don't cache printing-operatorLists in the API.
However, there's nothing stopping an API-user from using the `includeAnnotationStorage`-option during "normal" rendering, which could thus result in *subtle* (and difficult to understand) rendering bugs.

In order to handle this, we need to know if the `AnnotationStorage`-instance has been updated since the last `PDFPageProxy.render` call. The most "correct" solution would obviously be to create a hash of the `AnnotationStorage` contents, however that would require adding a bunch of code, complexity, and runtime overhead.
Given that operatorList caching in the API doesn't have to be perfect[1], but only have to avoid *false* cache-hits, we can simplify things significantly be only keeping track of the last time that the `AnnotationStorage`-data was modified.

*Please note:* While working on this patch, I also noticed that the `renderInteractiveForms`- and `includeAnnotationStorage`-options in the `PDFPageProxy.render` method are mutually exclusive.[2]
Given that the various Annotation-related options in `PDFPageProxy.render` have been added at different times, this has unfortunately led to the current "messy" situation.[3]

---
[1] Note how we're already not caching operatorLists for pages with *huge* images, in order to save memory, hence there's no guarantee that operatorLists will always be cached.

[2] Setting both to `true` will result in undefined behaviour, since trying to insert `AnnotationStorage`-values into fields that are being excluded from the operatorList-building will obviously not work, which isn't at all clear from the documentation.

[3] My intention is to try and fix this in a follow-up PR, and I've got a WIP patch locally, however it will result in a number of API-observable changes.
2021-08-18 10:09:03 +02:00
Jonas Jenwald
1465b1670f [src/display/api.js] Move the getRenderingIntent helper function into WorkerTransport
By doing this re-factoring *separately*, since it's mostly a mechanical change, the size/scope of the next patch will be reduced somewhat.
2021-08-18 09:58:26 +02:00
Tim van der Meij
e9146b19e6
Merge pull request #13892 from Snuffleupagus/Dict-merge-refactor-2
Move some validation, in `Dict.merge`, used during merging of sub-dictionaries (PR 13775 follow-up)
2021-08-14 12:26:19 +02:00
Jonas Jenwald
e2aa067603 Simplify the ReadableStream polyfill
At this point in time, all of the supported browsers (in the PDF.js project) have native `ReadableStream` implementations; see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#browser_compatibility

Hence the polyfill is *only* necessary in Node.js environments now, and we shouldn't need to do any detailed feature detection either (since that was only done for the non-Chromium versions of the MS Edge browser).
Finally, we can slightly reduce the size of the Chromium-extension since the polyfill shouldn't be needed there either.
2021-08-13 12:28:55 +02:00
Jonas Jenwald
3369f9a783 Move some validation, in Dict.merge, used during merging of sub-dictionaries (PR 13775 follow-up)
By not adding any additional non-`Dict` entries to the list of candidates for merging of sub-dictionaries, we can very slightly reduce the amount of parsing required by not having to *again* iterate through unmergeable data.
2021-08-12 11:32:11 +02:00
Jonas Jenwald
6167566f1b Re-factor the BaseException.name handling, and clean-up some code
Once we're finally able to get rid of SystemJS, which is unfortunately still blocked on [bug 1247687](https://bugzilla.mozilla.org/show_bug.cgi?id=1247687), we might also want to clean-up (or even completely remove) the `BaseException` abstraction and simply extend `Error` directly instead.

At that point we'd need to (explicitly) set the `name` on each class anyway, so this patch is essentially preparing for future clean-up. Furthermore, after the `BaseException` abstraction was added there's been *multiple* issues filed about third-party minification breaking our code since `this.constructor.name` is not guaranteed to always do what you intended.

While hard-coding the strings indeed feels quite unfortunate, it's likely the "best" solution to avoid the problem described above.
2021-08-10 11:27:47 +02:00
Jonas Jenwald
7f2d524df5 Improve caching of Annotations-data, by using a Map, in the API
Rather than caching only the *last* `PDFPageProxy.getAnnotations` call, and having to handle the intent separately, we can instead implement the caching in exactly the same way as done in the `PDFPageProxy.{render, getOperatorList}` methods.
2021-08-08 08:14:51 +02:00
Tim van der Meij
036b81496e
Merge pull request #13882 from Snuffleupagus/PDFWorker-rm-closure
[api-minor] Remove the closure from the `PDFWorker` class, in the `src/display/api.js` file
2021-08-07 19:52:39 +02:00
Tim van der Meij
952f6366bf
Merge pull request #13867 from Snuffleupagus/RenderingIntentFlag
[api-minor] Re-factor the *internal* renderingIntent, and change the default `intent` value in the `PDFPageProxy.getAnnotations` method
2021-08-07 19:25:51 +02:00