Commit Graph

372 Commits

Author SHA1 Message Date
Jonas Jenwald
61e19bee43 Build a fallback ToUnicode map for simple fonts (issue 8229)
In some fonts, the included `ToUnicode` data is incomplete causing text-selection to not work properly. For simple fonts that contain encoding data, we can manually build a `ToUnicode` map to attempt to improve things.

Please note that since we're currently using the `ToUnicode` data during glyph mapping, in an attempt to avoid rendering regressions, I purposely didn't want to amend to original `ToUnicode` data for this text-selection edge-case.
Instead, I opted for the current solution, which will (hopefully) give slightly better text-extraction results in PDF file with incomplete `ToUnicode` data.

According to the PDF specification, see [section 9.10.2](http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1873172):

> A conforming reader can use these methods, in the priority given, to map a character code to a Unicode value.
> ...

Reading that paragraph literally, it doesn't seem too unreasonable to use *different* methods for different charcodes.

Fixes 8229.
2017-11-26 14:45:15 +01:00
Jonas Jenwald
83e8398ff2 For non-embedded fonts, map softhyphen (0x00AD) to regular hyphen (0x002D) (issue 9084)
In the PDF file, the `ToUnicode` data first maps the hyphen correctly, and then *overwrites* it to point to the softhyphen instead. That one cannot be rendered in browsers, and an empty space thus appear instead.

Fixes 9084.
2017-10-31 13:26:04 +01:00
Jonas Jenwald
92fcfce685
Merge pull request #9082 from brendandahl/issue7562
Overwrite glyphs contour count if it's less than -1.
2017-10-30 20:44:01 +01:00
Brendan Dahl
17037b5e51 Overwrite glyphs contour count if it's less than -1.
The test pdf has a contour count of -70, but OTS doesn't
like values less than -1.

Fixes issue #7562.
2017-10-30 09:16:51 -07:00
Jonas Jenwald
d71a576b30 Merge pull request #9045 from brendandahl/sani-name
Sanitize name index in compile phase of CFF.
2017-10-24 11:48:03 +02:00
Brendan Dahl
6b12612a52 Sanitize name index in compile phase of CFF.
Fixes #8960
2017-10-23 17:13:49 -07:00
Brendan Dahl
fcc9943d04 Use charstring as plain text when lengthIV is -1.
Fixes #7769
2017-10-18 14:19:59 -07:00
Jonas Jenwald
b1472cddbb Allow getOperatorList/getTextContent to skip errors when parsing broken XObjects (issue 8702, issue 8704)
This patch makes use of the existing `ignoreErrors` property in `src/core/evaluator.js`, see PRs 8240 and 8441, thus allowing us to attempt to recovery as much as possible of a page even when it contains broken XObjects.

Fixes 8702.
Fixes 8704.
2017-09-29 17:14:21 +02:00
Brendan Dahl
18e2321845 Overwrite maxSizeOfInstructions in maxp with computed value.
In issue #7507 the value is less than the actuall max size
of the glyph instructions causing OTS to fail the font.
2017-09-25 17:53:26 -07:00
Jonas Jenwald
10727572a2 Merge pull request #8950 from timvandermeij/polygon-polyline-annotations
Implement support for polyline and polygon annotations
2017-09-24 15:16:14 +02:00
Tim van der Meij
c69a7a83da Merge pull request #8932 from janpe2/jbig2-sym-offset
JBIG2 symbol offsets
2017-09-23 17:11:45 +02:00
Tim van der Meij
ed8c0ebfa7
Implement reference tests for polyline and polygon annotations 2017-09-23 17:01:19 +02:00
Jonas Jenwald
abc864fca9 Merge pull request #8938 from brendandahl/bug1392647
Use font's default width even when 0. (bug 1392647)
2017-09-20 22:38:39 +02:00
Brendan Dahl
10ba292b46 Use font's default width even when 0.
Bug 1392647 has a PDF where the default width of the font
is 0. It draws some charcodes that don't have glyphs, but
we were wrongly using the 1000 default width for these
charcodes causing some text to be overlapping.
2017-09-20 11:38:30 -07:00
Jani Pehkonen
5d1074c110 Fix JBIG2 symbol offsets in text regions 2017-09-19 23:43:23 +03:00
Jani Pehkonen
3d99b8d706 CCITTFaxStream problem when EndOfBlock is false 2017-09-19 22:19:40 +03:00
Tim van der Meij
400e4aae0e
Implement support for stamp annotations 2017-09-16 16:37:50 +02:00
Tim van der Meij
320779e6ed Merge pull request #8691 from timvandermeij/square-circle-annotations
Implement support for square and circle annotations
2017-09-09 22:56:54 +02:00
Tim van der Meij
c04f9d6098
Implement reference tests for square and circle annotations 2017-09-09 21:36:28 +02:00
Jonas Jenwald
7115e136e4 Hide unsupported LinkAnnotations (issue 3897)
Rather than displaying links that does *nothing* when clicked, it probably makes more sense to simply not render them instead. Especially since it turns out that, at least at this point in time, this is *very* easy to both implement and test.

Fixes 3897.
2017-09-06 12:52:56 +02:00
Jonas Jenwald
49b8cd5a6a Attempt to improve the EI detection heuristics, for inline images, in streams containing NUL bytes (issue 8823)
Since this patch will now treat (some) `NUL` bytes as "ASCII", the number of `followingBytes` checked are thus increased to (hopefully) reduce the risk of introducing new false positives.

Fixes 8823.
2017-08-27 12:48:28 +02:00
Jonas Jenwald
4891b9c7e0 Replace the test-case for issue 8798 with a reduced one (PR 8800 follow-up)
*Re: issue 8798 and PR 8800.*

Big thanks to @THausherr for providing the test-case.
2017-08-24 17:43:05 +02:00
Brendan Dahl
0bef50d56d Fix two cmap related issues.
In issue #8707, there's a char code mapped to a non-
existing glyph which shouldn't be drawn. However, we
saw it was missing and tried to then use the post table and
end up mapping it incorrectly.

This illuminated a problem with issue #5704 and bug
893730 where glyphs disappeared after above fix.  This was
from the cmap returning the wrong glyph id. Which in turn was
caused because the font had multiple of the same type of cmap
table and we were choosing the last one. Now, we instead
default to the first one. I'm unsure if we should instead be
merging the multiple cmaps, but using only the first one works.
2017-08-03 22:19:36 -07:00
Jonas Jenwald
23ec6b16ca Add a fallback for non-embedded SegoeUISymbol font (issue 8697)
The PDF file uses a non-embedded SegoeUISymbol font, which is *not* a standard font (and is mainly used by Microsoft, see https://en.wikipedia.org/wiki/Segoe).

Fixes 8697.
2017-07-25 12:45:11 +02:00
Jonas Jenwald
794b099385 Add a reduced test-case for issue 7696
Issue 7696 was one of the issues fixed by PR 8580. The other ones were all cases of missing glyphs, however in this particular one glyphs did render but every single one was incorrect.
Hence it probably cannot hurt to have a small, reduced, reference test for that PDF file as well.
2017-07-24 09:55:16 +02:00
Rob Wu
01f03fe393 Optimize PNG compression in SVG backend on Node.js
Use the environment's zlib implementation if available to get
reasonably-sized SVG files when an XObject image is converted to PNG.
The generated PNG is not optimal because we do not use a PNG predictor.
Futher, when our SVG backend is run in a browser, the generated PNG
images will still be unnecessarily large (though the use of blob:-URLs
when available should reduce the impact on memory usage). If we want to
optimize PNG images in browsers too, we can either try to use a DEFLATE
library such as pako, or re-use our XObject image painting logic in
src/display/canvas.js. This potential improvement is not implemented by
this commit

Tested with:

- Node.js 8.1.3 (uses zlib)
- Node.js 0.11.12 (uses zlib)
- Node.js 0.10.48 (falls back to inferior existing implementation).
- Chrome 59.0.3071.86
- Firefox 54.0

Tests:

Unit test on Node.js:

```
$ gulp lib
$ JASMINE_CONFIG_PATH=test/unit/clitests.json node ./node_modules/.bin/jasmine --filter=SVG
```

Unit test in browser: Run `gulp server` and open
http://localhost:8888/test/unit/unit_test.html?spec=SVGGraphics

To verify that the patch works as desired,

```
$ node examples/node/pdf2svg.js test/pdfs/xobject-image.pdf
$ du -b svgdump/xobject-image-1.svg
 # ^ Calculates the file size. Confirm that the size is small
 #   (784 instead of 80664 bytes).
```
2017-07-10 18:56:57 +02:00
Jonas Jenwald
eff257b820 Merge pull request #8580 from brendandahl/missing-glyf
Fix how we detect and handle missing glyph data.
2017-07-04 12:16:07 +02:00
Brendan Dahl
efbbd8533f Only mask char codes of (3, 0) cmap tables in the range of 0xF000 to 0xF0FF. 2017-07-03 13:13:46 -07:00
Brendan Dahl
6d4f748fb1 Fix how we detect and handle missing glyph data. 2017-07-03 13:06:06 -07:00
Brendan Dahl
a8a8909d2d Fix missing notdef in expert encoding. 2017-06-29 12:12:39 -07:00
Brendan Dahl
f1f9d98519 Merge pull request #8507 from Snuffleupagus/issue-8480
Only special-case OpenType fonts with `CFF` data if it's both a composite (i.e. Type0) font and also has a non-default CID to GID map (issue 8480)
2017-06-23 13:36:58 -07:00
Rob Wu
fc6448d18c Move svg:clipPath generation from clip to endPath
In the PDF from issue 8527, the clip operator (W) shows up before a path
is defined. The current SVG backend however expects a path to exist
before generating a `<svg:clipPath>` element.
In the example, the path was defined after the clip, followed by a
endPath operator (n).
So this commit fixes the bug by moving the path generation logic from
clip to endPath.

Our canvas backend appears to use similar logic:
`CanvasGraphics_endPath` calls `consumePath`, which in turn draws the
clip and resets the `pendingClip` state. The canvas backend calls
`consumePath` from multiple other places, so we probably need to check
whether doing so is also necessary for the SVG backend.

I scanned our corpus of PDF files in test/pdfs, and found that in every
instance (except for one), the "W" PDF operator (clip) is immediately
followed by "n" (endPath). The new test from this commit (clippath.pdf)
starts with "W", followed by a path definition and then "n".

    # Commands used to find some of the clipping commands:
    grep -ra '^W$' -C7 | less -S
    grep -ra '^W ' -C7 | less -S
    grep -ra ' W$' -C7 | less -S

test/pdfs/issue6413.pdf is the only file where "W" (a tline 55) is not
followed by "n". In fact, the "W" is the last operation of a series of
XObject painting operations, and removing it does not have any effect
on the rendered PDF (confirmed by looking at the output of PDF.js's
canvas backend, and ImageMagick's convert command).
2017-06-22 01:08:17 +02:00
Jonas Jenwald
8b4a42e5b8 Only special-case OpenType fonts with CFF data if it's both a composite (i.e. Type0) font and also has a non-default CID to GID map (issue 8480)
*As mentioned the last time that I touched this particular part of the font code, I'm sincerely hope that this doesn't cause any regressions!*

However, the patch passes all tests added in PRs 5770, 6270, and 7904 (and obviously all other tests as well). Furthermore, I've manually checked all the issues/bugs referenced in those PRs without finding any issues.

Fixes 8480.
2017-06-09 21:15:39 +02:00
Jonas Jenwald
4ce5e520fb Add different code-paths to {CMap, ToUnicodeMap}.charCodeOf depending on length, since Array.prototype.indexOf can be extremely inefficient for very large arrays (issue 8372)
Fixes 8372.
2017-05-24 19:47:04 +02:00
Jonas Jenwald
31c24ed631 Don't map glyphs to the HANGUL FILLER (0x3164) Unicode location (issue 8424)
*This patch follows a similar pattern as previous ones, by skipping certain problematic Unicode locations.*

According to http://searchfox.org/mozilla-central/rev/6c2dbacbba1d58b8679cee700fd0a54189e0cf1b/gfx/harfbuzz/src/hb-unicode-private.hh#136, it seems that the HANGUL FILLER (0x3164) location is "special".

Fixes 8424.
2017-05-23 16:12:45 +02:00
chris.greening
cfc2f36f5c Adds additional parameter so background color of canvas can be set 2017-05-17 17:06:44 +01:00
Yury Delendik
c4c44c1bbe Merge pull request #8240 from Snuffleupagus/api-stopAtErrors
[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815)
2017-04-13 10:58:49 -05:00
Tim van der Meij
32e01cda96 Merge pull request #8228 from timvandermeij/line-annotations
Implement support for line annotations
2017-04-13 00:18:31 +02:00
Tim van der Meij
e15a2ec523
Annotations: implement support for line annotations
This patch implements support for line annotations. Other viewers only
show the popup annotation when hovering over the line, which may have
any orientation. To make this possible, we render an invisible line (SVG
element) over the line on the canvas that acts as the trigger for the
popup annotation. This invisible line has the same starting coordinates,
ending coordinates and width of the line on the canvas.
2017-04-12 23:05:25 +02:00
Jonas Jenwald
a39d636eb8 [api-minor] Always allow e.g. rendering to continue even if there are errors, and add a stopAtErrors parameter to getDocument to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815)
Other PDF readers, e.g. Adobe Reader and PDFium (in Chrome), will attempt to render as much of a page as possible even if there are errors present.
Currently we just bail as soon the first error is hit, which means that we'll usually not render anything in these cases and just display a blank page instead.

NOTE: This patch changes the default behaviour of the PDF.js API to always attempt to recover as much data as possible, even when encountering errors during e.g. `getOperatorList`/`getTextContent`, which thus improve our handling of corrupt PDF files and allow the default viewer to handle errors slightly more gracefully.
In the event that an API consumer wishes to use the old behaviour, where we stop parsing as soon as an error is encountered, the `stopAtErrors` parameter can be set at `getDocument`.

Fixes, inasmuch it's possible since the PDF files are corrupt, e.g. issue 6342, issue 3795, and [bug 1130815](https://bugzilla.mozilla.org/show_bug.cgi?id=1130815) (and probably others too).
2017-04-11 08:59:22 +02:00
Brendan Dahl
4969b2ad97 Normalize blend mode names. 2017-04-10 16:18:08 -07:00
Jason O. Jensen
d230784ac3 Handle cff fonts with erroneous stackSize 2017-03-06 19:28:46 -05:00
Jonas Jenwald
4a0ff5dbf7 Ensure that we don't ignore 0 values in Page.getInheritedPageProp (issue 8125)
It appears that I accidentally broke this in PR 6065, sorry about that!

The issue in this particular PDF file is that there's `/Rotate` entries on different levels of the `/Pages` tree. We're supposed to use the `/Rotate` entry in the `/Page` dict (which is `0`), but because of an incorrect condition we instead ended up with the one from the `/Pages` dict (which is `180`).

Fixes 8125.
2017-03-03 12:27:40 +01:00
Jonas Jenwald
1ce295541c Always check all Kids nodes, in Catalog.getPageDict, to avoid getting stuck in an empty node further down in the Pages tree (issue 8088)
As discussed on IRC, we need to check all nodes at the *bottom* of the tree to ensure that we find the correct `Page` dict.
Furthermore, this patch also gets rid of the caching present in a previous version, since it's not clear if that really helps.

Note that this patch purposely adds an `eq` test, using a reduced test-case, so that we can be sure that the algorithm actually finds the correct `Page` dict for each `pageIndex`.

Fixes 8088.
2017-02-24 12:09:46 +01:00
Jonas Jenwald
ce072022c1 Always choose a (3, 1) cmap table for TrueType fonts that have an encoding specified, regardless of the Symbolic font flag (bug 1337429)
This patch basically reverts one aspect of TrueType (3, 1) cmap parsing to the state prior to PR 4259. After that PR, a number of regressions occurred in this particular code-path, which necessitated a number of follow-ups such as PRs 5703, 5743, and 6425.
The empirical data suggests, at least to me, that we should always prefer a (3, 1) cmap for TrueType fonts when they have an encoding, regardless of the Symbolic font flag.

Obviously this patch passes all unit/font/reference tests locally, and I made sure that all the PRs mentioned above landed with test-cases included.
However, in my opinion, there's still a very real possibility that this patch could potentially cause new regressions.

Given that the PDF file in bug 1337429 has been broken for almost *three* years before anyone noticed, and considering that the code-path in question has been the source of numerous regressions, I do *not* intend to request uplift of this patch to previous Firefox versions (assuming that it's even accepted).

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1337429.
2017-02-15 17:38:08 +01:00
Jonas Jenwald
23c62cc321 Consume the current character when encountering illegal characters in Lexer.getObject, in order to prevent infinite loops during reading of streams (issue 8061)
*Please note:* The rendering of the PDF file in issue 8061 first regressed in PR 7039, and then PR 7493 exacerbated the problem even further by causing an infinite loop.

In this particular case, when errors were encountered inside of the `Lexer.getObject` method *itself*, we didn't advance the stream position. This thus caused an inifinite loop in `parseCMap`, since the exact same character was then parsed over and over again.

Fixes 8061.
2017-02-11 19:32:48 +01:00
pmysore1
af8292058f Font ascent descent calculation fix 2017-02-11 01:25:05 -05:00
Jonas Jenwald
e963971244 Further adjust the heuristics used to detect OpenType font files with CFF data, to ensure that all Type0 fonts are handled the same way regardless of font Subtype (issue 7901)
Changing this particular code makes me somewhat nervous about regressions, since PR 5770 necessitated the follow-up PR 6270.
However, the patch passes all tests added in those PRs (and obviously all other tests). Furthermore, I've manually checked all the issues/bugs referenced in PRs 5770 and 6270 without finding any issues.

**Please note:** This patch fixes *only* the font bug, not the SVG conversion, present on pages two and three of the PDF file in issue 7901.
2016-12-20 17:03:51 +01:00
Yury Delendik
3b3a179486 Merge pull request #7879 from rossj/highlight-fix
Make use of textAdvanceScale consistent during combineTextItems. Fix for #7878.
2016-12-19 09:18:13 -06:00
Tim van der Meij
0c9a06c020 Button widget annotations: implement reference testing
Moreover, ensure that the read-only state is respected and improve CSS
names.
2016-12-17 20:33:35 +01:00