Commit Graph

910 Commits

Author SHA1 Message Date
Tim van der Meij
ea4d88a330
Merge pull request #12395 from calixteman/checks
Render not displayed annotations in using normal appearance when printing
2020-10-28 00:11:10 +01:00
Calixte Denizet
6be2f84b4e Render not displayed annotations in using normal appearance when printing 2020-10-27 19:00:31 +01:00
Jonas Jenwald
f2fa053c51 Ensure that MurmurHash3_64.update handles ArrayBuffer input correctly, to avoid hash-collisions (issue 12533)
Different fonts incorrectly end up with *identical* hashes, despite having different /ToUnicode data.
The issue, and it's very interesting that we've apparently not seen it before, appears to be caused by the fact that different /ToUnicode entries share the *same* underlying `ArrayBuffer`, which thus becomes problematic at the `const dataUint32 = new Uint32Array(data.buffer, 0, blockCounts);` line. The simplest solution thus seem to be to just *copy* the input, when it's an `ArrayBuffer`, rather than using it as-is. (Note that if we'd stringified the input, when calling `MurmurHash3_64.update`, the issue would also have been fixed. In this case, we're already creating an unique TypedArray.)
2020-10-26 16:27:33 +01:00
Calixte Denizet
37c86b2daa Fallback font for buttons must be ZapfDingbats.
Fix bug https://bugzilla.mozilla.org/show_bug.cgi?id=1669099.
2020-10-24 12:00:03 +02:00
Jani Pehkonen
935568c2f1 Fix invalid XUID entries in CFF fonts
In CFF fonts, entry `XUID` should be an array that has no more than
16 elements. In the issue, the length is 20, which causes the fonts to fail.
See Appendix B, "Implementation Limits" in PostScript Language Reference Manual
https://web.archive.org/web/20170218093716/https://www.adobe.com/products/postscript/pdfs/PLRM.pdf
Actually entries `XUID` and `UniqueID` are obsolete altogether.
https://blogs.adobe.com/CCJKType/2016/06/no-more-xuid-arrays.html
2020-10-05 17:38:01 +03:00
Jonas Jenwald
bd3b15b897 Use the cidToGidMap, if it exists, when building the glyph mapping for non-embedded composite fonts (issue 12418) 2020-09-28 14:40:43 +02:00
Jonas Jenwald
fca53a8eb0 Compute the transformOrigin correctly, for negative values, when rendering AnnotationElements (bug 1627030)
This changes the `transformOrigin` calculations in `AnnotationElement._createContainer` and `PopupAnnotationElement.render`, to ensure that e.g. the clickable area of annotations and/or popups are both positioned correctly.

The problem occurs for *negative* values, since they're not negated correctly because of how the `transformOrigin` strings were build; see issue 12406 for a more in-depth explanation. Previously, for negative values, the `transformOrigin` strings would thus be ignored since they're not valid.
2020-09-24 10:28:29 +02:00
Jonas Jenwald
2497e8eab9 Prevent errors if the InkList property, in InkAnnotations, is missing and/or not an Array (issue 12392)
To prevent a future bug, the `Vertices` property in PolylineAnnotations are handled the same way.
2020-09-19 15:34:32 +02:00
Tim van der Meij
b0c7a74a0c
Merge pull request #12361 from Snuffleupagus/_getSaveFieldResources
Ensure that all necessary /Font resources are included when saving a `WidgetAnnotation`-instance (issue 12294)
2020-09-15 00:09:31 +02:00
Tim van der Meij
3ecd984758
Implement resetting of created streams for annotations 2020-09-14 23:08:50 +02:00
Jonas Jenwald
c992b8e460 Ensure that all necessary /Font resources are included when saving a WidgetAnnotation-instance (issue 12294)
This patch contains a possible approach for fixing issue 12294, which compared to other PRs is purposely limited to the affected `WidgetAnnotation` code.

As mentioned elsewhere, considering that we're (at least for now) trying to fix *one specific* case, I think that we should avoid modifying the `Dict` primitive[1] and/or avoid a solution that (indirectly) modifies an existing `Dict`-instance[2].
This patch simply fixes the issue at hand, since that seems easiest for now, and I'd suggest that we worry about a more general approach if/when that actually becomes necessary.

Hence the solution implemented here, for `WidgetAnnotation`, is to simply use a combination of the local *and* AcroForm /DR resources during OperatorList-parsing to ensure that things work correctly regardless of where a particular /Font resource is found.
For saving of form-data, on the other hand, we want to avoid increasing the file-size unnecessarily and need to be smarter than just merging all of the available resources. To achive this, a new `WidgetAnnotation._getSaveFieldResources` method will when necessary produce a combined resources `Dict` with only the minimum amount of data from the AcroForm /DR resources included.

---
[1] You want to avoid anything that could cause the general `Dict` implementation to become slower, or more complex, just for handling an edge-case in my opinion.

[2] If an existing `Dict`-instance is modified unexpectedly, that could very easily lead to problems elsewhere since e.g. `Dict`-instances created during parsing are not expected to be changed.
2020-09-14 15:22:40 +02:00
Tim van der Meij
20c891542b
Merge pull request #12269 from calixteman/highlight
Add support for missing appearances for hightlights, strikeout, squiggly and underline annotations.
2020-09-06 22:25:36 +02:00
Calixte Denizet
65ecd981fe Add support for missing appearances for hightlights, strikeout, squiggly and underline annotations. 2020-09-06 15:40:15 +02:00
Tim van der Meij
06b53d770a
Merge pull request #12259 from brendandahl/cmap-fix
Fix handling of symbolic fonts and unicode cmaps.
2020-08-30 16:01:24 +02:00
Brendan Dahl
45e8a31cc0 Fix handling of symbolic fonts and unicode cmaps.
In issue 12120, the font has a 1,0 cmap and is marked symbolic which
according to the spec means we should directly use the cmap instead of
the extra steps that are defined in 9.6.6.4.

However, just fixing that caused bug 1057544 to break. The font in bug
1057544 has a 0,1 cmap (Unicode 1.1) which we were not using, but is
easy to support. We're also easily able to support some of the other
unicode cmaps, so I added those as well.

There was also a second issue with bug 1057544, the cmap doesn't have
a mapping for the "quoteright" glyph, but it is defined in the post
table. To handle this, I've moved post table as a  fallback for any
font that has an encoding.
2020-08-27 14:33:11 -07:00
Calixte Denizet
ba94f04ba3 Bug 1661226 - Push button are not rendered with renderInteractiveForms enabled 2020-08-27 10:45:14 +02:00
Tim van der Meij
73bc8e1d9f
Include forms/print reference tests for the document from #12233
In addition to the unit tests these reference tests make sure that this
document, that triggered some edge cases in our code, can be rendered
and printed successfully now.
2020-08-23 13:00:02 +02:00
Tim van der Meij
483efedc66
Include reference tests for text/choice/button widget printing 2020-08-18 12:36:33 +02:00
Jonas Jenwald
1058f16605 Add (basic) support for transfer functions to Images (issue 6931, bug 1149713)
This is *similar* to the existing transfer function support for SMasks, but extended to simple image data.
Please note that the extra amount of data now being sent to the worker-thread, for affected /ExtGState entries, is limited to *at most* 4 `Uint8Array`s each with a length of 256 elements.

Refer to https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G9.1658137 for additional details.
2020-08-17 10:34:12 +02:00
Brendan Dahl
7fb01f9f2a
Merge pull request #12186 from brendandahl/loca-2
Fix bad truetype loca tables.
2020-08-10 20:34:19 -07:00
Brendan Dahl
f6dff81223 Fix bad truetype loca tables.
Some fonts have loca tables that aren't sorted or use 0 as an offset to
signal a missing glyph. This fixes the bad loca tables by sorting them
and then rewriting the loca table and potentially re-ordering the glyf
table to match.

Fixes #11131 and bug 1650302.
2020-08-10 14:15:49 -07:00
Jonas Jenwald
128e41d6be Add a test-case for issue 4398
Issue 4398 was fixed by PR 4437, however a test-case wasn't included as far as I can tell. Given that PR 12186 is now in the process of re-factoring that code, adding a test-case cannot hurt as far as I'm concerned.
2020-08-08 11:50:19 +02:00
Brendan Dahl
ac494a2278 Add support for optional marked content.
Add a new method to the API to get the optional content configuration. Add
a new render task param that accepts the above configuration.
For now, the optional content is not controllable by the user in
the viewer, but renders with the default configuration in the PDF.

All of the test files added exhibit different uses of optional content.

Fixes #269.

Fix test to work with optional content.

- Change the stopAtErrors test to ensure the operator list has something,
  instead of asserting the exact number of operators.
2020-08-04 09:26:55 -07:00
Jonas Jenwald
1d66fce781 Tweak the heuristic, in src/core/jpg.js, that handles JPEG images with a wildly incorrect SOF (Start of Frame) scanLines parameter (issue 10989) 2020-07-06 13:06:49 +02:00
Jonas Jenwald
fef24658e7 Adjust the heuristics used when dealing with rectangles, i.e. re operators, with zero width/height (issue 12010) 2020-07-02 00:02:49 +02:00
Tim van der Meij
75fed02630
Merge pull request #12043 from Snuffleupagus/issue-4260-test
Add a reduced test-case for issue 4260 (PR 4521 follow-up)
2020-07-01 23:51:21 +02:00
Jonas Jenwald
e451cabe37 Add a reduced test-case for issue 4260 (PR 4521 follow-up) 2020-06-30 09:26:41 +02:00
Jonas Jenwald
4a5b68e077 Add at least *some* test-coverage for the RenderTask.onContinue functionality
The default viewer, and thus Firefox, depends on the `RenderTask.onContinue` functionality to pause/continue rendering (such that the most visible page always renders first).
Despite this functionality thus being very important, it has however never actually been tested *at all* as far as I can tell. Hence this patch which adds a new boolean `renderTaskOnContinue` parameter (`false` by default), that can be used to force a reference-test to use the `RenderTask.onContinue` code-path in the `InternalRenderTask` class.

Note that I purposely made this new reference-test behaviour *optional*, since I didn't want to negatively affect the general runtime of the tests (given that there's a slight delay added to the rendering). Also, for e.g. benchmarking you'd most likely want to stay away from the `RenderTask.onContinue` functionality for similar reasons.
2020-06-29 00:38:34 +02:00
Jonas Jenwald
28d2ada59c Attempt to detect inline images which contain "EI" sequence in the actual image data (issue 11124)
This should reduce the possibility of accidentally truncating some inline images, while *not* causing the "EI" detection to become significantly slower.[1]
There's obviously a possibility that these added checks are not sufficient to catch *every* single case of "EI" sequences within the actual inline image data, but without specific test-cases I decided against over-engineering the solution here.

*Please note:* The interpolation issues are somewhat orthogonal to the main issue here, which is the truncated image, and it's already tracked elsewhere.

---
[1] I've looked at the issue a few times, and this is the first approach that I was able to come up with that didn't cause *unacceptable* performance regressions in e.g. issue 2618.
2020-06-26 13:15:06 +02:00
Jonas Jenwald
e18fa3fc45 Tweak the QueueOptimizer to recognize OPS.paintImageMaskXObject operators as *repeated* when the "skew" transformation matrix elements are non-zero (issue 8078)
*First of all, I should mention that my understanding of the finer details of the `QueueOptimizer` (and its related `CanvasGraphics` methods) is somewhat limited.*
Hence I'm not sure if there's actually a very good reason for *only* considering ImageMasks where the "skew" transformation matrix elements are zero as *repeated*, however simply looking at the code I just don't see why these elements cannot be non-zero as long as they are *all identical* for the ImageMasks.
Furthermore, looking at the *group* case (which is what we're currently falling back to), there's no particular limitation placed upon the transformation matrix elements.

While this patch obviously isn't enough to *completely* fix the issue, since there should be a visible Pattern rendered as well[1], it seem (at least to me) like enough of an improvement that submitting this is justified.
With these changes the referenced PDF document will no longer hang the *entire* browser, and rendering also finishes in a *reasonable* time (< 10 seconds for me) which seem fine given the *huge* number of identical inline images present.[2]

---
[1] Temporarily changing the Pattern to a solid color *does* render the correct/expected area, which suggests that the remaining problem is a pre-existing issue related to the Pattern-handling itself rather than the `QueueOptimizer` functionality.

[2] The document isn't exactly rendered immediately in e.g. Adobe Reader either.
2020-06-20 12:18:48 +02:00
Jonas Jenwald
4b51bcc733 Ensure that PDFImage.buildImage won't accidentally swallow errors, e.g. from ColorSpace parsing (issue 6707, PR 11601 follow-up)
Because of a really stupid `Promise`-related mistake on my part, when re-factoring `PDFImage.buildImage` during the `NativeImageDecoder` removal, we're no longer re-throwing errors occuring during image parsing/decoding as intended.
The result is that some (fairly) corrupt documents will never finish loading, and unfortunately there were apparently no sufficiently corrupt images in the test-suite to catch this.
2020-06-13 15:02:37 +02:00
Carlos Rodríguez
802aa14a99 Jpeg encoded with RGB -instead of YCbCr- write the components index as "RGB" in ASCII to say it so
On ISO/IEC 10918-6:2013 (E), section 6.1: (http://www.itu.int/rec/T-REC-T.872-201206-I/en)

"Images encoded with three components are assumed to be RGB data encoded as YCbCr unless the image contains an APP14 marker segment as specified in 6.5.3, in which case the colour encoding is considered either RGB or YCbCr according to the application data of the APP14 marker segment"

But common jpeg libraries consider RGB too if components index are ASCII R (0x52), G (0x47) and B (0x42): https://stackoverflow.com/questions/50798014/determining-color-space-for-jpeg/50861048

Issue #11931
2020-06-04 15:08:47 +02:00
Tim van der Meij
3b615e4ca3
Merge pull request #11601 from Snuffleupagus/rm-nativeImageDecoderSupport
[api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js`
2020-05-23 15:33:46 +02:00
Jonas Jenwald
56ebf01ae0 Avoid hanging the worker-thread for CMap data with ridiculously large ranges (issue 11922)
This patch was inspired by ad2b64f124/xpdf/CharCodeToUnicode.cc (L480-L484)
2020-05-22 15:23:17 +02:00
Jonas Jenwald
0351852d74 [api-minor] Decode all JPEG images with the built-in PDF.js decoder in src/core/jpg.js
Currently some JPEG images are decoded by the built-in PDF.js decoder in `src/core/jpg.js`, while others attempt to use the browser JPEG decoder. This inconsistency seem unfortunate for a number of reasons:

 - It adds, compared to the other image formats supported in the PDF specification, a fair amount of code/complexity to the image handling in the PDF.js library.

 - The PDF specification support JPEG images with features, e.g. certain ColorSpaces, that browsers are unable to decode natively. Hence, determining if a JPEG image is possible to decode natively in the browser require a non-trivial amount of parsing. In particular, we're parsing (part of) the raw JPEG data to extract certain marker data and we also need to parse the ColorSpace for the JPEG image.

 - While some JPEG images may, for all intents and purposes, appear to be natively supported there's still cases where the browser may fail to decode some JPEG images. In order to support those cases, we've had to implement a fallback to the PDF.js JPEG decoder if there's any issues during the native decoding. This also means that it's no longer possible to simply send the JPEG image to the main-thread and continue parsing, but you now need to actually wait for the main-thread to indicate success/failure first.
   In practice this means that there's a code-path where the worker-thread is forced to wait for the main-thread, while the reverse should *always* be the case.

 - The native decoding, for anything except the *simplest* of JPEG images, result in increased peak memory usage because there's a handful of short-lived copies of the JPEG data (see PR 11707).
Furthermore this also leads to data being *parsed* on the main-thread, rather than the worker-thread, which you usually want to avoid for e.g. performance and UI-reponsiveness reasons.

 - Not all environments, e.g. Node.js, fully support native JPEG decoding. This has, historically, lead to some issues and support requests.

 - Different browsers may use different JPEG decoders, possibly leading to images being rendered slightly differently depending on the platform/browser where the PDF.js library is used.

Originally the implementation in `src/core/jpg.js` were unable to handle all of the JPEG images in the test-suite, but over the last couple of years I've fixed (hopefully) all of those issues.
At this point in time, there's two kinds of failure with this patch:

 - Changes which are basically imperceivable to the naked eye, where some pixels in the images are essentially off-by-one (in all components), which could probably be attributed to things such as different rounding behaviour in the browser/PDF.js JPEG decoder.
   This type of "failure" accounts for the *vast* majority of the total number of changes in the reference tests.

 - Changes where the JPEG images now looks *ever so slightly* blurrier than with the native browser decoder. For quite some time I've just assumed that this pointed to a general deficiency in the `src/core/jpg.js` implementation, however I've discovered when comparing two viewers side-by-side that the differences vanish at higher zoom levels (usually around 200% is enough).
   Basically if you disable [this downscaling in canvas.js](8fb82e939c/src/display/canvas.js (L2356-L2395)), which is what happens when zooming in, the differences simply vanish!
   Hence I'm pretty satisfied that there's no significant problems with the `src/core/jpg.js` implementation, and the problems are rather tied to the general quality of the downscaling algorithm used. It could even be seen as a positive that *all* images now share the same downscaling behaviour, since this actually fixes one old bug; see issue 7041.
2020-05-22 00:22:48 +02:00
Jonas Jenwald
4aabd063fc Gracefully handle annotation parsing errors in Page.getOperatorList (issue 11871)
This should ensure that a page will always render successfully, even if there's errors during the Annotation fetching/parsing.
Additionally the `OperatorList.addOpList` method is also adjusted to ignore invalid data, to make it slightly more robust.
2020-05-04 17:09:48 +02:00
Tim van der Meij
96923eb2a6
Merge pull request #11805 from Snuffleupagus/issue-11794
Always skip over any additional, unexpected, RSTx (restart) markers in corrupt JPEG images (issue 11794)
2020-04-16 00:08:58 +02:00
Jonas Jenwald
44b4a74f48 A couple of small String.fromCodePoint improvements (PR 11698 and 11769 follow-up)
- Add a reduced test-case for issue 11768, to prevent future regressions.
   (Given that PR 11769 is only a work-around, rather than a proper solution, it may not be entirely accurate for the issue to be closed as fixed.)

 - Add more validation of the charCode, as found by the heuristics, in `PartialEvaluator._buildSimpleFontToUnicode` to prevent future issues.
2020-04-15 13:45:08 +02:00
Jonas Jenwald
06f6f8719f Always skip over any additional, unexpected, RSTx (restart) markers in corrupt JPEG images (issue 11794) 2020-04-14 23:27:08 +02:00
Jonas Jenwald
91efde5246 Add a heuristic to scale even single-char text, when the horizontal/vertical scaling differs significantly (issue 11713)
At this point in time, compared to when the "ignore single-char" code was added, we *should* generally be doing a much better job of combining text into as few chunks as possible.
However, there's still bad cases where we're not able to combine text as much as one would like, which is why I'm *not* proposing to simply measure/scale all text. Instead this patch will to only measure/scale single-char text in cases where the horizontal/vertical scale is off significantly, since that's were you'd expect bad text-selection behaviour otherwise.

Note that most of the movement caused by this patch is with Type3 fonts, which is a somewhat special font type and one where our current text-selection behaviour is probably the least good.
2020-04-07 00:36:23 +02:00
Jonas Jenwald
938d519192 Create the glyph mapping correctly for composite Type1, i.e. CIDFontType0, fonts (issue 11740)
This updates `Type1Font.getGlyphMapping` with a code-path "borrowed" from `CFFFont.getGlyphMapping`.
2020-04-06 11:21:02 +02:00
Jani Pehkonen
a22c0eab48 The first glyph in CFF CIDFonts must be named 0 instead of ".notdef"
Fixes #11718 in which the `ff` ligature glyph is at index zero in a CFF font. Beacuse this is a CIDFont, glyph names are CIDs, which are integers. Thus the string `".notdef"` is not correct. The rest of the charset data is already parsed correctly as integers when the boolean argument `cid` is true.
2020-03-24 15:56:50 +02:00
Jonas Jenwald
15e8692eff Don't accidentally accept invalid glyphNames which *appear* to follow the Cdd{d}/cdd{d} format in PartialEvaluator._buildSimpleFontToUnicode (issue 11697)
The /Differences array of the problematic font contains a `/c.1` entry, which is consequently detected as a *possible* Cdd{d}/cdd{d} glyphName by the existing heuristics.
Because of how the base 10 conversion is implemented, which is necessary for the base 16 special case, the parsed charCode becomes `0.1` thus causing `String.fromCodePoint` to throw since that obviously isn't a valid code point.

To fix the referenced issue, and to hopefully prevent similar ones in the future, the patch adds *additional* validation of the charCode found by the heuristics.
2020-03-13 23:35:47 +01:00
Jonas Jenwald
65e6ea2cb2 Prevent lookup errors in PartialEvaluator.hasBlendModes from breaking all parsing/rendering of a page (issue 11678)
The PDF document in question is *corrupt*, since it contains an XObject with a truncated dictionary and where the stream contents start without a "stream" operator.
2020-03-09 12:00:12 +01:00
Tim van der Meij
1a97c142b3
Merge pull request #11523 from Snuffleupagus/issue-10880
Add a heuristic, in `src/core/jpg.js`, to handle JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10880)
2020-03-06 23:03:09 +01:00
Tim van der Meij
c95b9b1e17
Merge pull request #11653 from Snuffleupagus/ensureStateFont
Ensure that there's always a setFont (Tf) operator before text rendering operators (issue 11651)
2020-03-03 23:33:13 +01:00
Jani Pehkonen
71e7686950 Fix Type1 font parsing when .notdef is not at index zero
Fixes #11477
The PDF draws many space characters but the embedded fonts don't have a glyph named `space`, so `.notdef` should be drawn instead. PDF.js assumed that Type1 fonts define `.notdef` as the first glyph (index 0). However, now the fonts have the glyph `A` at index 0 and `.notdef` is the last one, so `A` appears where spaces are expected.

Because the rest of the font machinery in `core/fonts.js` assumes `.notdef` is at index zero, it's easiest to modify `core/type1_parser.js` so that it "repairs" fonts and makes sure `.notdef` is at index 0.
2020-03-03 21:55:51 +02:00
Jonas Jenwald
65e514e063 Ensure that there's always a setFont (Tf) operator before text rendering operators (issue 11651)
The PDF document in question is *corrupt*, since it contains multiple instances of incorrect operators.
We obviously don't want to slow down parsing of *all* documents (since most are valid), just to accommodate a particular bad PDF generator, hence the reason for the inline check before calling the `ensureStateFont` method.
2020-03-03 10:05:18 +01:00
Takashi Tamura
d8c9f119b0 Fix the vertical writing mode with horizontal scaling. #11555.
It is not valid to multiply textHScale when the writing mode is vertical.

See 9.4.4 Text Space Details, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1694762
2020-02-29 07:48:29 +09:00
Jonas Jenwald
c3c3b8cd81 Add a heuristic, in src/core/jpg.js, to handle JPEG images with a wildly incorrect SOF (Start of Frame) scanLines parameter (issue 10880)
*This whole patch feels somewhat arbitrary, and I'd be slightly worried about possibly breaking something else.*

To limit the impact of these changes, we only re-parse JPEG images using a reduced `scanLines` value if and only if: An unexpected EOI (End of Image) marker was encountered during decoding of Scan data *and* the "actual" `scanLines` value is at least one order of magnitude smaller than expected.
2020-02-22 14:16:07 +01:00