Commit Graph

1291 Commits

Author SHA1 Message Date
Calixte Denizet
3a21423386 [Acroform] Use the full path to find the node in the XFA datasets where to store the value
I noticed several 'Path not found' errors because of a field called #subform[2].
From the XFA specs, the hash is used for a class of elements in the template tree.
When we're looking for a node in the datasets tree, it doesn't make sense to search
for a class. Hence the path element starting with a hash are just skipped.
2023-02-23 12:09:39 +01:00
Calixte Denizet
dca54c8f8a [JS] Send a Validate action on change on Choice widget 2023-02-19 16:33:05 +01:00
Calixte Denizet
fc7d74385f Don't replace an eol by a whitespace when the last char is a Katakana-Hiragana diacritic 2023-02-16 11:31:58 +01:00
Calixte Denizet
58e4d92884 [Annotation] For choice widget, use the I entry instead of the V one (bug 1770750)
It isn't really conform to the specifications but Acrobat is working like that...
2023-02-09 17:26:13 +01:00
Calixte Denizet
a25895bf72 [Annotation] Take into account the stroke alpha for a FreeText without appearance 2023-02-07 22:15:27 +01:00
Calixte Denizet
ea7b4b4d6c [Annotation] Avoid to encrypt the appearance stream two times (bug 1815476) 2023-02-07 19:26:46 +01:00
Jonas Jenwald
808ca828f1 Extend getGlyphMapForStandardFonts with additional entries (issue 15977) 2023-01-30 12:13:21 +01:00
Calixte Denizet
6f4d037a8e [JS] Correctly format field with numbers (bug 1811694, bug 1811510)
In PR #15757, a value is automatically converted into a number when it's possible
but the case of numbers like "000123" has been overlooked and their format must
be preserved.
When a script is doing something like "foo.value + bar.value" and the values are
numbers then "foo.value" must return a number but the displayed value must be what
the user entered or what a script set, so this patch is just adding a a field
_orginalValue in order to track the value has it has defined.
Some people are used to use a comma as decimal separator, hence it must be considered
when a value is parsed into a number.
This patch is fixing a regression introduced by #15757.
2023-01-26 14:57:02 +01:00
Jonas Jenwald
40a46e4397 Tweak adjustType1ToUnicode for fonts with a predefined *named* encoding (bug 1811668, PR 14050 follow-up)
*Please note:* I cannot reproduce the problem reported in bug 1811668, regarding the context menu, and in any case it's not clear that that part is even a PDF Viewer bug.

Looking at bug 1811668 I couldn't help but noticing that the textLayer isn't correct, and it's unfortunately once again a problem with the `adjustType1ToUnicode` function. That's intended to help improve text-selection for fonts without a /ToUnicode-entry, and in many cases it does help (the original PR fixed lots of issues) however it's also caused some problems.

In order to improve text-selection in bug 1811668, we'll now properly ignore fonts that have a predefined *named* encoding specified since that's really the intention with PR 14050.
2023-01-21 12:21:21 +01:00
Jonas Jenwald
f2fce93826 [JBIG2] Ensure that the decodeInteger function returns valid integers (issue 15942)
The JBIG2 images in this PDF document are corrupt enough that even Adobe Reader warns about it when opening the file.
*Please note:* I don't really know the JBIG2 image format at all, however from a very brief look at the specification it seems that integers should be 32-bit.
2023-01-19 17:14:17 +01:00
Jonas Jenwald
d6be5141e9 Fallback to using the name table to infer the encoding for TrueType fonts missing such data (issue 15910)
The relevant TrueType font is missing both /ToUnicode *and* /Encoding entires, either of which would have prevented the (current) broken textLayer rendering.
My first idea was that we could use the `post` table in the TrueType font, see https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6post.html, to get the actual glyphNames and amend the fallback ToUnicode-map that way. Unfortunately that didn't work, since the `post` table only contained ".notdef" and "" (i.e. empty string) entries.

Instead we try to use the `name` table in the TrueType font, see https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6name.html, to determine if the platform is Windows and thus fallback to generate a ToUnicode-map from the `WinAnsiEncoding`.
2023-01-17 16:04:51 +01:00
Jonas Jenwald
cefaecc2e8 Ensure that Annotation appearance-entries are actually Streams
Note how all over the `src/core/annotation.js`-code we're assuming that if an `appearance`-entry exists it's also a Stream. However, we're not actually checking that thoroughly enough which causes issues in some badly generated PDF documents.
2023-01-16 13:02:53 +01:00
Jonas Jenwald
7d94fdeb48 Support parsing encrypted documents in XRef.indexObjects (issue 15893)
*Please note:* The reduced test-case is *not* a perfect reproduction of the original PDF document, since this one fails to open in e.g. Adobe Reader, but I do believe that it captures the most important points here.

For corrupt *and* encrypted PDF documents, it's possible that only some trailer dictionaries actually contain an /Encrypt-entry. Previously we'd could easily miss that, since we generally pick the first not obviously corrupt trailer dictionary, and the solution implemented here is to simply pre-parse all trailer dictionaries to see if there's any /Encrypt-entries.
2023-01-06 13:09:37 +01:00
Calixte Denizet
dea2471e96 [JS] UserActivation must be enabled before running document actions
else auto-print is broken (it's a regression from patch #15822).
2023-01-04 21:26:36 +01:00
Jonas Jenwald
2fcf8bb5be Re-factor searching for incomplete objects in XRef.indexObjects (issue 15803)
When trying to find incomplete objects, i.e. those missing the "endobj"-string at the end, there's unfortunately a number of possible operators that we need to check for. Otherwise we could miss e.g. the "trailer" at the end of a corrupt PDF document, which is why the referenced document didn't work.

Currently we do all searching on the "raw" bytes of the PDF document, for efficiency, however this doesn't really work when we need to check for *multiple* potential command-strings. To keep the complexity manageable we'll instead use regular expressions here, but we can at least avoid creating lots of substrings thanks to the `RegExp.lastIndex` property; which is well supported across browsers according to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastIndex#browser_compatibility

Note that this repeated regular expression usage could perhaps be slightly less efficient than the old code, however this method is only invoked for corrupt PDF documents.
2022-12-19 23:01:09 +01:00
Calixte Denizet
f80880ccaa Strip out a reserved operator (9) from CFF char strings (fixes issue #15784) 2022-12-16 15:17:46 +01:00
Jonas Jenwald
26135b0313 Always parse the entire startXRefQueue in XRef.readXRef (issue 15833)
Previously we'd abort all parsing if an Error was encountered, despite the fact that multiple `startXRefQueue`-entries may be available and that continued parsing could thus eventually be able to find usable data.

Note that in the referenced PDF document the `startxref`-operator, at the end of the file, points to a position in the middle of an arbitrary `stream` which is why things break.
2022-12-15 13:46:28 +01:00
Calixte Denizet
2ebf8745a2 [JS] Run the named actions before running the format when the file is open (issue #15818)
It's a follow-up of #14950: some format actions are ran when the document is open
but we must be sure we've everything ready for that, hence we have to run some
named actions before runnig the global format.
In playing with the form, I discovered that the blur event wasn't triggered when
JS called `setFocus` (because in such a case the mouse was never down). So I removed
the mouseState thing to just use the correct commitKey when blur is triggered by a
TAB key.
2022-12-13 21:12:32 +01:00
Calixte Denizet
0c1ec946aa [JS] Handle correctly choice widgets where the display and the export values are different (issue #15815) 2022-12-13 19:08:26 +01:00
Calixte Denizet
1a397681fe The annotation layer dimensions must be set before adding some elements (follow-up of #15770)
In order to move the annotations in the DOM to have something which corresponds
to the visual order, we need to have their dimensions/positions which means that
the parent must have some dimensions.
2022-12-13 14:54:45 +01:00
Calixte Denizet
4f0bfabe7a Take all the viewBox into account when computing the coordinates of an annotation in the page (fixes #15789) 2022-12-08 15:02:20 +01:00
Calixte Denizet
20fd9099f8 [Annotation] Send correctly the updated values to the JS sandbox 2022-11-29 17:34:06 +01:00
Calixte Denizet
ea1995991b Don't add an extra space after a Katakana or a Hiragana at the eol when searching 2022-11-29 10:46:48 +01:00
Calixte Denizet
ae7da6ae48 [JS] By default, a text field value must be treated as a number (bug 1802888) 2022-11-28 16:24:01 +01:00
Jonas Jenwald
aa5b678f94 Add default icons for FileAttachment annotations (bug 1230933)
*Please note:* This "borrows" the icons from Thunderbird.

According to the PDF specification, see https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2096626, we should be providing default icons for FileAttachment annotations without appearances.
2022-11-26 11:24:59 +01:00
Jonas Jenwald
8fda3f04fe
Merge pull request #15732 from Snuffleupagus/issue-15719
Add a fallback for non-embedded *composite* Tahoma fonts (issue 15719)
2022-11-24 19:09:12 +01:00
Jonas Jenwald
d1c01b3164 Add a fallback for non-embedded *composite* Tahoma fonts (issue 15719) 2022-11-23 15:51:18 +01:00
Jonas Jenwald
47682985d3 Add support for Optional Content in TilingPatterns (issue 15716)
This can't be a particularly common feature, since we've supported Optional Content for over two years and this is the very first TilingPattern-case we've seen.
2022-11-23 12:58:00 +01:00
Calixte Denizet
2be64d63e1 Normalize fullwidth, halfwidth and circled chars when searching 2022-11-14 19:27:51 +01:00
Jonas Jenwald
a1d48e3651 Add a *linked* test-case for issue 2618
Given that this PDF document is an interesting test-case for performance reasons, w.r.t. inline image caching, it probably can't hurt to add it to the test-suite to make it more readily available.
Considering the contents of that PDF document I'm not sure if we can include it directly in the repository, hence why a *linked* test-case was choosen here.
2022-11-12 16:31:01 +01:00
Jonas Jenwald
595711bd7c
Merge pull request #15679 from Snuffleupagus/bug-1799927-2
Use the *full* inline image as the cacheKey in `Parser.makeInlineImage` (bug 1799927)
2022-11-10 22:54:48 +01:00
Calixte Denizet
3ca03603c2 [Annotation] Fix printing/saving for annotations containing some non-ascii chars and with no fonts to handle them (bug 1666824)
- For text fields
 * when printing, we generate a fake font which contains some widths computed thanks to
   an OffscreenCanvas and its method measureText.
   In order to avoid to have to layout the glyphs ourselves, we just render all of them
   in one call in the showText method in using the system sans-serif/monospace fonts.
 * when saving, we continue to create the appearance streams if the fonts contain the char
   but when a char is missing, we just set, in the AcroForm dict, the flag /NeedAppearances
   to true and remove the appearance stream. This way, we let the different readers handle
   the rendering of the strings.
- For FreeText annotations
  * when printing, we use the same trick as for text fields.
  * there is no need to save an appearance since Acrobat is able to infer one from the
    Content entry.
2022-11-10 19:05:39 +01:00
Jonas Jenwald
b46e0d61cf Use the *full* inline image as the cacheKey in Parser.makeInlineImage (bug 1799927)
*Please note:* This only fixes the "wrong letter" part of bug 1799927.

It appears that the simple `computeAdler32` function, used when caching inline images, generates hash collisions for some (very short) TypedArrays. In this case that leads to some of the "letters", which are actually inline images, being rendered incorrectly.
Rather than switching to another hashing algorithm, e.g. the `MurmurHash3_64` class, we simply cache using a stringified version of the inline image data as the cacheKey to prevent any future collisions. While this will (naturally) lead to slightly higher peak memory usage, it'll however be limited to the current `Parser`-instance which means that it's not persistent.

One small benefit of these changes is that we can avoid creating lots of `Stream`-instances for already cached inline images.
2022-11-10 18:27:26 +01:00
Jonas Jenwald
2516ffa78e Fallback to finding the first "obj" occurrence, when the trailer-dictionary is incomplete (issue 15590)
Note that the "trailer"-case is already a fallback, since normally we're able to use the "xref"-operator even in corrupt documents. However, when a "trailer"-operator is found we still expect "startxref" to exist and be usable in order to advance the stream position. When that's not the case, as happens in the referenced issue, we use a simple fallback to find the first "obj" occurrence instead.

This *partially* fixes issue 15590, since without this patch we fail to find any objects at all during `XRef.indexObjects`. However, note that the PDF document is still corrupt and won't render since there's no actual /Pages-dictionary and the /Root-entry simply points to the /OpenAction-dictionary instead.
2022-11-03 12:46:30 +01:00
calixteman
e42e1cde61
Merge pull request #15615 from calixteman/bug1796741
[Form] Don't use field appearances when /NeedAppearances is set to true (bug 1796741)
2022-10-31 09:58:27 +01:00
Jonas Jenwald
980acddbfa Prevent textLayer errors in documents with unbalanced beginMarkedContent/endMarkedContent operators (issue 15629) 2022-10-26 18:35:48 +02:00
Calixte Denizet
9f95a14e91 [Form] Don't use field appearances when /NeedAppearances is set to true (bug 1796741)
When a form isn't changed, we used the appearances we had in the file, but when
/NeedAppearances is true, all the appearances have to be regenerated whatever they're.
2022-10-26 12:10:51 +02:00
Jonas Jenwald
71bd8b4de9 Let Lexer.getNumber treat more invalid "numbers" as zero (issue 15604)
In the referenced PDF document there are "numbers" which consist only of `-.`, and while that's obviously not valid Adobe Reader seems to handle it just fine.
Letting this method ignore more invalid "numbers" was suggested during the review of PR 14543, so let's simply relax our the validation here.
2022-10-20 22:36:15 +02:00
Calixte Denizet
6db9cefaaf [Annotation] Replace use of id by data-element-id to have the correct id 2022-10-19 23:36:28 +02:00
Jonas Jenwald
3c046c0a21 Extend getSupplementalGlyphMapForCalibri with some umlauts (issue 15594) 2022-10-19 17:49:40 +02:00
Jonas Jenwald
bc13a277ce Relax the /Pages dictionary /Count check for corrupt documents (issue 9105)
After PR 14311, and follow-up patches, we no longer require that the /Count entry (in the /Pages dictionary) is either present or even valid in order to parse/render a PDF document.
Hence it seems strange to keep this requirement for *corrupt* PDF documents, when trying to find a usable `trailer` in the `XRef.indexObjects` method.
2022-10-19 12:28:25 +02:00
Jonas Jenwald
de99f99a01 Fallback and try a *previous* generation if all else fails in XRef.indexObjects (issue 15577)
When we fail to find a usable PDF document `trailer` *and* there were errors during parsing, try and fallback to a *previous* generation as a last resort during fetching of uncompressed references.
*Please note:* This will not affect "normal" PDF documents, with valid /XRef data, and even most *corrupt* documents should be completely unaffected by these changes.
2022-10-18 20:24:01 +02:00
Calixte Denizet
556513a6e7 Use all the current transform as key when caching some image for masks used with pattern fill (bug 1795263, #15573) 2022-10-14 14:37:58 +02:00
Jonas Jenwald
858d941ff8 Take the /CIDToGIDMap into account when getting the glyph mapping for CFF fonts (issue 15559)
*Please note:* I don't really know what I'm doing here, however the patch appears to fix the referenced issue when comparing the rendering with Adobe Reader (with the caveat that I don't speak the language in question).
2022-10-13 10:02:25 +02:00
Jonas Jenwald
081e897588 Ensure that Page.getOperatorList handles Annotation parsing errors correctly (issue 15557)
*Fixes a regression from PR 15246, sorry about that!*

The return value of all `Annotation.getOperatorList` methods was changed in PR 15246, however I missed updating the error code-path in `Page.getOperatorList` which thus breaks all operatorList-parsing for pages with corrupt Annotations.
2022-10-10 09:48:01 +02:00
Jonas Jenwald
ce66fefbff [api-minor] Add partial support for the "GoToE" action (issue 8844)
*Please note:* The referenced issue is the only mention that I can find, in either GitHub or Bugzilla, of "GoToE" actions.
Hence why I've purposely settled for a very simple, and partial, "GoToE" implementation to avoid complicating things initially.[1] In particular, this patch only supports "GoToE" actions that references the /EmbeddedFiles-dict in the PDF document.

See https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2048909

---
[1] Usually I always prefer having *real-world* test-cases to work with, whenever I'm implementing new features.
2022-10-06 10:33:07 +02:00
Jonas Jenwald
c87f90102c Add more non-standard ligatures in the glyphlist.js file (issue 15516)
Note that this PR only adds the "underscore"-variant of *actually existing* ligatures, however the referenced PDF document also uses a couple of non-standard ones (e.g. `ft`, `Th`, and `fh`) that we cannot easily support without larger changes (since they don't have official Unicode-entries).
Given that it's clearly the PDF document, and its fonts, that's the culprit here it's not entirely clear to me that we actually want to attempt a larger refactoring/rewriting of the `glyphlist.js` code, assuming it's even generally possible. Especially when this patch alone already improves our copy-paste behaviour when compared to both Adobe Reader and PDFium, and that this is only the *second* time this sort of bug has been reported.
2022-09-27 16:31:51 +02:00
Jonas Jenwald
f1b0dc6f04 Tweak the heuristic that handles JPEG images with a wildly incorrect SOF (Start of Frame) scanLines parameter (issue 15492) 2022-09-22 14:09:04 +02:00
Calixte Denizet
198e9a3db1 Initialize values in the path bounding box before flushing the operator list (bug 1791583)
OperatorList.addOp can trigger a flush if it's required, hence the values passed to it must
be correctly initialized in order to avoid some wrong values in the renderer.
Because of that a clip path was considered as empty, nothing was clipped, hence the wrong
rendering in bug 1791583.
2022-09-20 20:01:54 +02:00
Jonas Jenwald
7a19def34c Extend getSupplementalGlyphMapForCalibri with more entries (issue 15443) 2022-09-15 22:19:16 +02:00