Commit Graph

1212 Commits

Author SHA1 Message Date
Jonas Jenwald
2fcf8bb5be Re-factor searching for incomplete objects in XRef.indexObjects (issue 15803)
When trying to find incomplete objects, i.e. those missing the "endobj"-string at the end, there's unfortunately a number of possible operators that we need to check for. Otherwise we could miss e.g. the "trailer" at the end of a corrupt PDF document, which is why the referenced document didn't work.

Currently we do all searching on the "raw" bytes of the PDF document, for efficiency, however this doesn't really work when we need to check for *multiple* potential command-strings. To keep the complexity manageable we'll instead use regular expressions here, but we can at least avoid creating lots of substrings thanks to the `RegExp.lastIndex` property; which is well supported across browsers according to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastIndex#browser_compatibility

Note that this repeated regular expression usage could perhaps be slightly less efficient than the old code, however this method is only invoked for corrupt PDF documents.
2022-12-19 23:01:09 +01:00
Calixte Denizet
f80880ccaa Strip out a reserved operator (9) from CFF char strings (fixes issue #15784) 2022-12-16 15:17:46 +01:00
Jonas Jenwald
26135b0313 Always parse the entire startXRefQueue in XRef.readXRef (issue 15833)
Previously we'd abort all parsing if an Error was encountered, despite the fact that multiple `startXRefQueue`-entries may be available and that continued parsing could thus eventually be able to find usable data.

Note that in the referenced PDF document the `startxref`-operator, at the end of the file, points to a position in the middle of an arbitrary `stream` which is why things break.
2022-12-15 13:46:28 +01:00
Calixte Denizet
0c1ec946aa [JS] Handle correctly choice widgets where the display and the export values are different (issue #15815) 2022-12-13 19:08:26 +01:00
Jonas Jenwald
aa5b678f94 Add default icons for FileAttachment annotations (bug 1230933)
*Please note:* This "borrows" the icons from Thunderbird.

According to the PDF specification, see https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2096626, we should be providing default icons for FileAttachment annotations without appearances.
2022-11-26 11:24:59 +01:00
Jonas Jenwald
8fda3f04fe
Merge pull request #15732 from Snuffleupagus/issue-15719
Add a fallback for non-embedded *composite* Tahoma fonts (issue 15719)
2022-11-24 19:09:12 +01:00
Jonas Jenwald
d1c01b3164 Add a fallback for non-embedded *composite* Tahoma fonts (issue 15719) 2022-11-23 15:51:18 +01:00
Jonas Jenwald
47682985d3 Add support for Optional Content in TilingPatterns (issue 15716)
This can't be a particularly common feature, since we've supported Optional Content for over two years and this is the very first TilingPattern-case we've seen.
2022-11-23 12:58:00 +01:00
Jonas Jenwald
a1d48e3651 Add a *linked* test-case for issue 2618
Given that this PDF document is an interesting test-case for performance reasons, w.r.t. inline image caching, it probably can't hurt to add it to the test-suite to make it more readily available.
Considering the contents of that PDF document I'm not sure if we can include it directly in the repository, hence why a *linked* test-case was choosen here.
2022-11-12 16:31:01 +01:00
Jonas Jenwald
595711bd7c
Merge pull request #15679 from Snuffleupagus/bug-1799927-2
Use the *full* inline image as the cacheKey in `Parser.makeInlineImage` (bug 1799927)
2022-11-10 22:54:48 +01:00
Calixte Denizet
3ca03603c2 [Annotation] Fix printing/saving for annotations containing some non-ascii chars and with no fonts to handle them (bug 1666824)
- For text fields
 * when printing, we generate a fake font which contains some widths computed thanks to
   an OffscreenCanvas and its method measureText.
   In order to avoid to have to layout the glyphs ourselves, we just render all of them
   in one call in the showText method in using the system sans-serif/monospace fonts.
 * when saving, we continue to create the appearance streams if the fonts contain the char
   but when a char is missing, we just set, in the AcroForm dict, the flag /NeedAppearances
   to true and remove the appearance stream. This way, we let the different readers handle
   the rendering of the strings.
- For FreeText annotations
  * when printing, we use the same trick as for text fields.
  * there is no need to save an appearance since Acrobat is able to infer one from the
    Content entry.
2022-11-10 19:05:39 +01:00
Jonas Jenwald
b46e0d61cf Use the *full* inline image as the cacheKey in Parser.makeInlineImage (bug 1799927)
*Please note:* This only fixes the "wrong letter" part of bug 1799927.

It appears that the simple `computeAdler32` function, used when caching inline images, generates hash collisions for some (very short) TypedArrays. In this case that leads to some of the "letters", which are actually inline images, being rendered incorrectly.
Rather than switching to another hashing algorithm, e.g. the `MurmurHash3_64` class, we simply cache using a stringified version of the inline image data as the cacheKey to prevent any future collisions. While this will (naturally) lead to slightly higher peak memory usage, it'll however be limited to the current `Parser`-instance which means that it's not persistent.

One small benefit of these changes is that we can avoid creating lots of `Stream`-instances for already cached inline images.
2022-11-10 18:27:26 +01:00
calixteman
e42e1cde61
Merge pull request #15615 from calixteman/bug1796741
[Form] Don't use field appearances when /NeedAppearances is set to true (bug 1796741)
2022-10-31 09:58:27 +01:00
Jonas Jenwald
980acddbfa Prevent textLayer errors in documents with unbalanced beginMarkedContent/endMarkedContent operators (issue 15629) 2022-10-26 18:35:48 +02:00
Calixte Denizet
9f95a14e91 [Form] Don't use field appearances when /NeedAppearances is set to true (bug 1796741)
When a form isn't changed, we used the appearances we had in the file, but when
/NeedAppearances is true, all the appearances have to be regenerated whatever they're.
2022-10-26 12:10:51 +02:00
Jonas Jenwald
71bd8b4de9 Let Lexer.getNumber treat more invalid "numbers" as zero (issue 15604)
In the referenced PDF document there are "numbers" which consist only of `-.`, and while that's obviously not valid Adobe Reader seems to handle it just fine.
Letting this method ignore more invalid "numbers" was suggested during the review of PR 14543, so let's simply relax our the validation here.
2022-10-20 22:36:15 +02:00
Jonas Jenwald
3c046c0a21 Extend getSupplementalGlyphMapForCalibri with some umlauts (issue 15594) 2022-10-19 17:49:40 +02:00
Jonas Jenwald
bc13a277ce Relax the /Pages dictionary /Count check for corrupt documents (issue 9105)
After PR 14311, and follow-up patches, we no longer require that the /Count entry (in the /Pages dictionary) is either present or even valid in order to parse/render a PDF document.
Hence it seems strange to keep this requirement for *corrupt* PDF documents, when trying to find a usable `trailer` in the `XRef.indexObjects` method.
2022-10-19 12:28:25 +02:00
Jonas Jenwald
de99f99a01 Fallback and try a *previous* generation if all else fails in XRef.indexObjects (issue 15577)
When we fail to find a usable PDF document `trailer` *and* there were errors during parsing, try and fallback to a *previous* generation as a last resort during fetching of uncompressed references.
*Please note:* This will not affect "normal" PDF documents, with valid /XRef data, and even most *corrupt* documents should be completely unaffected by these changes.
2022-10-18 20:24:01 +02:00
Calixte Denizet
556513a6e7 Use all the current transform as key when caching some image for masks used with pattern fill (bug 1795263, #15573) 2022-10-14 14:37:58 +02:00
Jonas Jenwald
858d941ff8 Take the /CIDToGIDMap into account when getting the glyph mapping for CFF fonts (issue 15559)
*Please note:* I don't really know what I'm doing here, however the patch appears to fix the referenced issue when comparing the rendering with Adobe Reader (with the caveat that I don't speak the language in question).
2022-10-13 10:02:25 +02:00
Jonas Jenwald
081e897588 Ensure that Page.getOperatorList handles Annotation parsing errors correctly (issue 15557)
*Fixes a regression from PR 15246, sorry about that!*

The return value of all `Annotation.getOperatorList` methods was changed in PR 15246, however I missed updating the error code-path in `Page.getOperatorList` which thus breaks all operatorList-parsing for pages with corrupt Annotations.
2022-10-10 09:48:01 +02:00
Jonas Jenwald
f1b0dc6f04 Tweak the heuristic that handles JPEG images with a wildly incorrect SOF (Start of Frame) scanLines parameter (issue 15492) 2022-09-22 14:09:04 +02:00
Calixte Denizet
198e9a3db1 Initialize values in the path bounding box before flushing the operator list (bug 1791583)
OperatorList.addOp can trigger a flush if it's required, hence the values passed to it must
be correctly initialized in order to avoid some wrong values in the renderer.
Because of that a clip path was considered as empty, nothing was clipped, hence the wrong
rendering in bug 1791583.
2022-09-20 20:01:54 +02:00
Jonas Jenwald
7a19def34c Extend getSupplementalGlyphMapForCalibri with more entries (issue 15443) 2022-09-15 22:19:16 +02:00
Jonas Jenwald
2f2ecad8fd Extend getGlyphMapForStandardFonts with some quote-entries (issue 15441) 2022-09-15 11:37:20 +02:00
Jonas Jenwald
947d390421 Fallback to a standard font when a Type1 font program is empty (issue 15292)
*Please note:* This is only a, hopefully generally helpful, work-around rather than a proper solution to issue 15292.

There's something that's "special" about the Type1 fonts in the referenced PDF document, since we don't manage to find any actual font programs and thus cannot render anything.
Given that it shouldn't make sense for a Type1 font program to ever be empty, since that means that there's no glyph-data to render, we simply fallback to a standard font to at least try and render *something* in these rare cases.
2022-09-05 12:07:19 +02:00
Jonas Jenwald
12d60e0acf Don't allow adjustToUnicode to extend a built-in /ToUnicode map (issue 15352)
Given that the change in PR 13393 was slightly speculative, given the lack of test-cases, let's just revert part of that to fix the referenced issue.
Based on a quick look at old issues and existing test-cases, it seems that most (if not all) PDF documents that benefit from using the font-data in this way lack any /ToUnicode maps which should mean that they're unaffected by these changes.
2022-09-03 23:11:42 +02:00
Jonas Jenwald
571ce13dd6 [api-major] Remove the enhanceTextSelection functionality (PR 15145 follow-up)
For the `gulp mozcentral` command, this reduces the size of the *built* `pdf.js` file by `> 10` kB.
2022-08-28 15:04:47 +02:00
Calixte Denizet
04f78c935c Fix OTS issue with empty index (#15289) 2022-08-08 22:56:26 +02:00
Jonas Jenwald
899fc29eef Always set a border-radius for RadioButton annotations (issue 15262) 2022-08-02 13:58:20 +02:00
Calixte Denizet
d092a85b6c Fix wrong order of arguments when calling the CipherTransform ctor (bug 1782186) 2022-07-29 12:46:45 +02:00
Jonas Jenwald
2cad5cf45b Set opacity in the reference tests (PR 15219 follow-up)
Without these changes in the manifest, the affected test-cases fail to render correctly.
2022-07-28 14:35:09 +02:00
Jonas Jenwald
fc018ea9ea Support images with /Filter-entries that contain Arrays (issue 15220)
This patch "borrows" the code found in the `Parser.makeInlineImage`-method, to ensure that JBIG2 and JPX images can be rendered correctly.
2022-07-25 08:41:37 +02:00
Jonas Jenwald
60bd9580e2 Ignore invalid /CIDToGIDMap-entries when parsing fonts (issue 15139)
In the referenced PDF document the fonts have /CIDToGIDMap-entries that cannot be loaded. Hence, only when `ignoreErrors` is set, we'll now ignore these corrupt /CIDToGIDMap-entries and fallback to simply assume that no such data is available.

Given that this is *clearly* a case of a corrupt PDF document, there's no guarantee that this will "fix" things in the general case since a /CIDToGIDMap may be *required* in order for some composite fonts to render correctly. However, attempting to render *something* is surely better than skipping a font altogether.
2022-07-20 11:58:44 +02:00
Calixte Denizet
8f26ba5487 [Annotation] A push button can have no action (bug 1778692) 2022-07-08 15:39:56 +02:00
Jonas Jenwald
79cfc548fc Improve text-selection for Type3 fonts with bogus /FontBBox-entries (issue 14999)
This extends PR 13461, by also building a fallback bounding box for Type3 fonts that contain a much too small /FontBBox-entry.

*Please note:* While this patch improves things overall, copy-and-pasting still doesn't work perfectly for this document. In particular the lowercase letter "c" cannot be selected/copied, however this can be reproduced in both Adobe Reader and PDFium (in Google Chrome) too, which is caused by a lack of proper /ToUnicode-data in the PDF document.
2022-07-05 14:27:14 +02:00
calixteman
23fcdabb37
Merge pull request #15088 from calixteman/editor_rotation
Support rotating editor layer
2022-06-25 16:18:07 +02:00
Calixte Denizet
0c420f5135 Support rotating editor layer
- As in the annotation layer, use percent instead of pixels as unit;
- handle the rotation of the editor layer in allowing editing when rotation
  angle is not zero;
- the different editors are rotated counterclockwise in order to be usable
  when the main page is itself rotated;
- add support for saving/printing rotated editors.
2022-06-24 20:02:32 +02:00
Jonas Jenwald
c48dc251e0 Add (basic) support for Optional Content in Annotations
Given that Annotations can also have an `OC`-entry, we need to take that into account when generating their operatorLists.

Note that in order to simplify the patch the `getOperatorList`-methods, for the Annotation-classes, were converted to be `async`.
2022-06-24 15:19:56 +02:00
Calixte Denizet
e49d039853 Correctly order added annotations when saving or printing
- the annotations must be rendered in the same order as the chronological one.
- fix a bug in document.js which avoids to read a saved pdf correctly in Acrobat:
  there is no need to reset the xref state: it's done in worker.js once everything
  has been saved.
2022-06-23 17:39:12 +02:00
Calixte Denizet
f27c8c4471 [Editor] Add support for printing newly added Ink annotations 2022-06-21 18:21:49 +02:00
Calixte Denizet
cdc58b7a52 Rotate annotations based on the MK::R value (bug 1675139)
- it aims to fix: https://bugzilla.mozilla.org/show_bug.cgi?id=1675139;
- An annotation can be rotated (counterclockwise);
- the rotation can be set in using JS.
2022-06-21 17:57:26 +02:00
Jonas Jenwald
64cce1269e Add basic support for non-embedded ArialUnicodeMS fonts (issue 15044)
This appears to be a Microsoft-specific version of the regular Arial font, hence we simply map this to Helvetica in the same way that we treat many other Arial-named fonts.
2022-06-15 10:37:20 +02:00
Jonas Jenwald
2dca14028d Extend getGlyphMapForStandardFonts with some Hebrew entries (issue 15033)
This only adds the minimum entries required in order to render the referenced document correctly, rather than trying to support "all" Hebrew glyphs, to ensure that all lines in `getGlyphMapForStandardFonts` are covered by tests.
2022-06-13 10:08:39 +02:00
Jonas Jenwald
3d244cb6a8 Render PopupAnnotations even if they have missing or empty /Rect-entries (issue 15012, PR 14439 follow-up)
This only applies to *corrupt* PDF documents, where Annotations are missing the required /Rect-entry. Rendering PopupAnnotations unconditionally shouldn't be a problem, since we're not using a `BaseSVGFactory`-instance in that case.
2022-06-09 15:10:54 +02:00
Calixte Denizet
2dd0c861bf Outline fields which are required (bug 1724918)
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1724918;

- it applies for both Acroform and XFA.
2022-06-07 17:02:11 +02:00
Calixte Denizet
96d0d22d66 Reset all the canvas states after rendering each annotations (#14105)
- each annotation must be rendered independently of the others. So
  after having rendered each annotation, the canvas states are reset
  in order to have something clean to render the next one.
2022-06-07 14:59:02 +02:00
Jonas Jenwald
59dd4ea2b0 Lookup image-data correctly in paintImageMaskXObjectGroup (issue 14990)
*This fixes a regression from PR 14754.*

We didn't lookup the image-data correctly, with the result that we tried to render some ImageMasks using a string rather than the intended TypedArray. To make matters worse, this code-path was apparently not *properly* covered by existing test-cases.
2022-06-05 12:39:23 +02:00
Calixte Denizet
66b513fc00 [Annotations] Show buttons even if they've no actions
- it's a regression from PR #14247:
 - before the PR, the button was rendered on the canvas whatever its status was;
 - after the PR, the button image has been moved in an other canvas so when the button is not renderable
   (because it has no actions) then the image is not added the HTML element.
- the buttons in the pdf in bug 1737260 or in the pdf in #14308 were not visible
- make the button always renderable but don't add the link element if it's useless.
2022-05-28 23:50:50 +02:00