Commit Graph

281 Commits

Author SHA1 Message Date
Jani Pehkonen
05c527f035 Fix glyph 0 in CIDFontType2 that has a CIDToGIDMap stream 2019-05-07 18:44:37 +03:00
Jani Pehkonen
49c6233fbc Use CMap in Type0 fonts when CFF is not a CID font 2019-03-26 19:38:44 +02:00
Brendan Dahl
7d6ab081eb Put the string name of the glyph in the charset array.
Also, only warn once per font when missing a glyph name.
2019-03-01 18:03:51 -08:00
Brendan Dahl
34022d2fd1
Merge pull request #10591 from brendandahl/fix-charset
Add unique glyph names for CFF fonts.
2019-02-28 17:22:29 -08:00
Brendan Dahl
8a596ef5d5 Add unique glyph names for CFF fonts.
Printing on MacOS was broken with the previous approach of just mapping
all the glyphs to notdef.
2019-02-27 15:00:29 -08:00
Jonas Jenwald
db5dc14158 Move worker-thread only functions from src/shared/util.js and into a new src/core/core_utils.js file
The `src/shared/util.js` file is being bundled into both the `pdf.js` and `pdf.worker.js` files, meaning that its code is by definition duplicated.
Some main-thread only utility functions have already been moved to a separate `src/display/display_utils.js` file, and this patch simply extends that concept to utility functions which are used *only* on the worker-thread.

Note in particular the `getInheritableProperty` function, which expects a `Dict` as input and thus *cannot* possibly ever be used on the main-thread.
2019-02-24 00:35:39 +01:00
Jonas Jenwald
b6d090cc14 Fallback to the built-in font renderer when font loading fails
After PR 9340 all glyphs are now re-mapped to a Private Use Area (PUA) which means that if a font fails to load, for whatever reason[1], all glyphs in the font will now render as Unicode glyph outlines.
This obviously doesn't look good, to say the least, and might be seen as a "regression" since previously many glyphs were left in their original positions which provided a slightly better fallback[2].

Hence this patch, which implements a *general* fallback to the PDF.js built-in font renderer for fonts that fail to load (i.e. are rejected by the sanitizer). One caveat here is that this only works for the Font Loading API, since it's easy to handle errors in that case[3].

The solution implemented in this patch does *not* in any way delay the loading of valid fonts, which was the problem with my previous attempt at a solution, and will only require a bit of extra work/waiting for those fonts that actually fail to load.

*Please note:* This patch doesn't fix any of the underlying PDF.js font conversion bugs that's responsible for creating corrupt font files, however it does *improve* rendering in a number of cases; refer to this possibly incomplete list:

[Bug 1524888](https://bugzilla.mozilla.org/show_bug.cgi?id=1524888)
Issue 10175
Issue 10232

---
[1] Usually because the PDF.js font conversion code wasn't able to parse the font file correctly.

[2] Glyphs fell back to some default font, which while not accurate was more useful than the current state.

[3] Furthermore I'm not sure how to implement this generally, assuming that's even possible, and don't really have time/interest to look into it either.
2019-02-11 10:27:08 +01:00
Jonas Jenwald
24a688d6c6 Convert some usage of indexOf to startsWith/includes where applicable
In many cases in the code you don't actually care about the index itself, but rather just want to know if something exists in a String/Array or if a String starts in a particular way. With modern JavaScript functionality, it's thus possible to remove a number of existing `indexOf` cases.
2019-01-18 17:57:41 +01:00
Brendan Dahl
32eace043b Fix reading number of HTMX metrics.
The length of the HHEA table can be incorrect, so it is better to
read the number of metrics offset from beginning of table instead.
2019-01-04 15:13:13 -08:00
Jonas Jenwald
ad3e937816 Replace the Font.loading property with, the already existing, Font.missingFile property
The `Font.loading` property is only ever used *once* in the code, whereas `Font.missingFile` is more widely used. Furthermore the name `loading` feels, at least to me, slight less clear than `missingFile`. Finally, note that these two properties are the inverse of each other.
2018-09-29 15:57:04 +02:00
Brendan Dahl
b76cf665ec Map all glyphs to the private use area and duplicate the first glyph.
There have been lots of problems with trying to map glyphs to their unicode
values. It's more reliable to just use the private use areas so the browser's
font renderer doesn't mess with the glyphs.

Using the private use area for all glyphs did highlight other issues that this
patch also had to fix:

  * small private use area - Previously, only the BMP private use area was used
    which can't map many glyphs. Now, the (much bigger) PUP 16 area can also be
    used.

  * glyph zero not shown - Browsers will not use the glyph from a font if it is
    glyph id = 0. This issue was less prevalent when we mapped to unicode values
    since the fallback font would be used. However, when using the private use
    area, the glyph would not be drawn at all. This is illustrated in one of the
    current test cases (issue #8234) where there's an "ä" glyph at position
    zero. The PDF looked like it rendered correctly, but it was actually not
    using the glyph from the font. To properly show the first glyph it is always
    duplicated and appended to the glyphs and the maps are adjusted.

  * supplementary characters - The private use area PUP 16 is 4 bytes, so
    String.fromCodePoint must be used where we previously used
    String.fromCharCode. This is actually an issue that should have been fixed
    regardless of this patch.

  * charset - Freetype fails to load fonts when the charset size doesn't match
    number of glyphs in the font. We now write out a fake charset with the
    correct length. This also brought up the issue that glyphs with seac/endchar
    should only ever write a standard charset, but we now write a custom one.
    To get around this the seac analysis is permanently enabled so those glyphs
    are instead always drawn as two glyphs.
2018-09-05 14:04:54 -07:00
Jonas Jenwald
06d1ff5af4 Tweak the MMType1 font detection in getFontFileType to improve font telemetry (PR 9961 follow-up)
Please note that this patch does *not* affect rendering in any way, however it's relevant for font telemetry[1].

According to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1904956, Type1C is a valid subtype for *both* Type1 and MMType1 fonts.

---
[1] Refer to the font telemetry results in https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2018-06-25&keys=__none__!__none__!__none__&max_channel_version=nightly%252F62&measure=PDF_VIEWER_FONT_TYPES&min_channel_version=nightly%252F59&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2018-05-07&table=0&trim=1&use_submission_date=0

See also https://github.com/mozilla/pdf.js/wiki/Enumeration-Assignments-for-the-Telemetry-Histograms#pdf_viewer_font_types for help with interpreting the data.
2018-08-08 12:18:37 +02:00
Tim van der Meij
4111871ac5
Merge pull request #9958 from brendandahl/always-fallback
Always fallback to system font on font failure.
2018-08-05 19:58:48 +02:00
Jonas Jenwald
3177f6aa55 Parse the font file to determine the correct type/subtype, rather than relying on the (often incorrect) data in the font dictionary
The current font type/subtype detection code is quite inconsistent/unwieldy. In some cases it will simply assume that the font dictionary is correct, in others it will somewhat "arbitrarily" check the actual font file (more of these cases have been added over the years to fix specific bugs).

As is evident from e.g. issue 9949, the font type/subtype detection code is continuing to cause issues. In an attempt to get rid of these hacks once and for all, this patch instead re-factors the type/subtype detection to *always* parse the font file.

Please note that, as far as I can tell, we still appear to need to rely on the composite font detection based on the font dictionary. However, even if the composite/non-composite detection would get it wrong, that shouldn't really matter too much given that there's basically only two different code-paths (for "TrueType-like" vs "Type1-like" fonts).
2018-08-05 11:13:16 +02:00
Jonas Jenwald
9bbca04579 Add a (basic) isCFFFile helper function to detect CFF font files
Compared to most other font formats, the CFF doesn't have a constant header which makes is slightly more difficult to detect such font files.

Please refer to the Compact Font Format specification: https://www.adobe.com/content/dam/acom/en/devnet/font/pdfs/5176.CFF.pdf#G3.32094
2018-08-05 11:13:14 +02:00
Jonas Jenwald
f4db38aadf Update the TrueType font file detection to also recognize the Mac specific header 'true'
Please refer to the TrueType specification: https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6.html#ScalerTypeNote
2018-08-05 10:33:56 +02:00
Brendan Dahl
5f67a6a237 Always fallback to system font on font failure.
The font in the PDF is marked as a CIDFontType0, but the font file is
actually a true type font. To fully address this issue we should really
peek into the font file and try to determine what it is. However, this
is the first case of this issue, so I think this solution is acceptable for
now.
2018-08-03 16:49:22 -07:00
Jonas Jenwald
17eac2d48a Ensure that Type0, i.e. composite, OpenType fonts with CFF tables are *not* treated as CFF fonts if their glyph mapping is non-default (issue 9915)
This particular code-path has been the source of *numerous* regressions to date, so hopefully this patch won't cause any more of those.

Fixes 9915.
2018-07-29 23:06:15 +02:00
Jonas Jenwald
2b25deb84c Prevent errors in sanitizeTTProgram, during parsing of CALL functions, when encountering invalid functions stack deltas (bug 1473809)
*I was feeling bored; so this is a very quick, and somewhat naive, attempt at fixing the bug.*

The breaking error, i.e. `Error during font loading: invalid array length`, was thrown when attempting to re-size the `stack` to a *negative* length when parsing the CALL functions.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1473809.
2018-07-10 09:45:55 +02:00
Wojciech Maj
ea2850e9a7 Fix typos 2018-04-01 23:20:41 +02:00
Jonas Jenwald
a18c65ae9f Use the correct stream position when reading maxSizeOfInstructions from the maxp table (issue 9458)
Please refer to the `maxp` table specification, found at https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6maxp.html.

Fixes 9458.
2018-02-07 21:57:43 +01:00
Jonas Jenwald
712090eff8 Upstream the changes from: Bug 1339461 - Convert foo.indexOf(...) == -1 to foo.includes() and implement an eslint rule to enforce this
Yet another case where PDF.js code was modified in `mozilla-central` without the changes happening in the GitHub repo first; *sigh*.
If we don't upstream at least the changes in `extensions/firefox/`, any future update of PDF.js in `mozilla-central` will be blocked.

Please see:
 - https://bugzilla.mozilla.org/show_bug.cgi?id=1339461
 - https://hg.mozilla.org/mozilla-central/rev/d5a5ad1dbbf2
2018-02-04 14:59:27 +01:00
Jonas Jenwald
d6c028b946 Add support for TrueType Collection fonts (issue 9262)
The specification can be found at https://www.microsoft.com/typography/otspec/otff.htm, under the "Font Collections" heading.

Fixes 9262.
2018-01-08 22:31:08 +01:00
Tim van der Meij
6bbe91079b
Merge pull request #9272 from nveenjain/fix/8846
Replaced occurence of `throw new Error` with `unreachable`
2017-12-15 22:11:32 +01:00
Brendan Dahl
9b51cea724 Fix loca table when offsets aren't in ascending order. 2017-12-15 11:20:28 -06:00
Naveen Jain
1135674647 Replaced occurence of throw new Error with unreachable where applicable 2017-12-14 12:58:50 +05:30
Brendan Dahl
af1d80d45e
Merge pull request #9230 from Snuffleupagus/issue-9195
Add basic support for non-embedded Calibri fonts (issue 9195)
2017-12-08 10:15:43 -08:00
Jonas Jenwald
a5e3261b48
Merge pull request #9062 from mozilla/no_high
Move char codes from high surrogate pair range into private use.
2017-12-08 12:31:22 +01:00
Brendan Dahl
306999c325 Move char codes from high surrogate pair range into private use.
Fixes #2884
2017-12-07 10:35:50 -08:00
Jonas Jenwald
08de655177 Add basic support for non-embedded Calibri fonts (issue 9195)
There's a number of issues with the fonts in the referenced PDF file. First of all, they contain broken `ToUnicode` data (`NUL` bytes all over the place). However even if you skip those, the `ToUnicode` data appears to contain nothing but a `IdentityH` CMap which won't help provide a proper glyph mapping.

The real issue actually turns out to be that the PDF file uses the "Calibri" font[1], but doesn't include any font files. Since that one isn't a standard font, and uses a fairly different CID to GID map compared to the standard fonts, we're not able to render the file even remotely correct.
To work around this, I'm thus proposing that we include a (incomplete) glyph map for Calibri, and fallback to the standard Helvetica font. Obviously this isn't going to look perfect, but it's really the best that we can hope to achieve given that the PDF file is missing the necessary font data.

Finally, please note that none of the PDF readers I've tried (Adobe Reader, PDFium in Chrome) were able to extract the text (which isn't very surprising, given the broken `ToUnicode` data).

Fixes 9195.

---

[1] According to Wikipedia, see https://en.wikipedia.org/wiki/Calibri, Calibri is (primarily) a Windows font.
2017-12-03 17:23:33 +01:00
Jonas Jenwald
61e19bee43 Build a fallback ToUnicode map for simple fonts (issue 8229)
In some fonts, the included `ToUnicode` data is incomplete causing text-selection to not work properly. For simple fonts that contain encoding data, we can manually build a `ToUnicode` map to attempt to improve things.

Please note that since we're currently using the `ToUnicode` data during glyph mapping, in an attempt to avoid rendering regressions, I purposely didn't want to amend to original `ToUnicode` data for this text-selection edge-case.
Instead, I opted for the current solution, which will (hopefully) give slightly better text-extraction results in PDF file with incomplete `ToUnicode` data.

According to the PDF specification, see [section 9.10.2](http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1873172):

> A conforming reader can use these methods, in the priority given, to map a character code to a Unicode value.
> ...

Reading that paragraph literally, it doesn't seem too unreasonable to use *different* methods for different charcodes.

Fixes 8229.
2017-11-26 14:45:15 +01:00
Max Schaefer
3ae37d1b06 Remove a few useless assignments. 2017-11-03 11:36:48 +00:00
Brendan Dahl
17037b5e51 Overwrite glyphs contour count if it's less than -1.
The test pdf has a contour count of -70, but OTS doesn't
like values less than -1.

Fixes issue #7562.
2017-10-30 09:16:51 -07:00
Brendan Dahl
18e2321845 Overwrite maxSizeOfInstructions in maxp with computed value.
In issue #7507 the value is less than the actuall max size
of the glyph instructions causing OTS to fail the font.
2017-09-25 17:53:26 -07:00
Jonas Jenwald
cfb4955a92 Replace the isArray helper function with the native Array.isArray function
*Follow-up to PR 8813.*
2017-09-01 20:27:13 +02:00
Jonas Jenwald
11408da340 Replace the isInt helper function with the native Number.isInteger function
*Follow-up to PR 8643.*
2017-09-01 16:52:50 +02:00
Brendan Dahl
0bef50d56d Fix two cmap related issues.
In issue #8707, there's a char code mapped to a non-
existing glyph which shouldn't be drawn. However, we
saw it was missing and tried to then use the post table and
end up mapping it incorrectly.

This illuminated a problem with issue #5704 and bug
893730 where glyphs disappeared after above fix.  This was
from the cmap returning the wrong glyph id. Which in turn was
caused because the font had multiple of the same type of cmap
table and we were choosing the last one. Now, we instead
default to the first one. I'm unsure if we should instead be
merging the multiple cmaps, but using only the first one works.
2017-08-03 22:19:36 -07:00
Jonas Jenwald
e20d4a9c21 Merge pull request #8681 from brendandahl/glyph-ids
Fix several issues with glyph id mappings (issue 8668, bug 1383504)
2017-08-03 14:25:34 +02:00
Brendan Dahl
ac33358e1f Fix several issues with glyph id mappings.
The initial issue with #8255 was I added a missing glyphs
check to adjustMapping, but this caused us to skip re-mapping
a glyph if the fontCharCode was a missingGlyph which in turn
caused us to overwrite a valid glyph id with an invalid one. While
fixing this, I also added a warning if the private use area is full since
this also accidentally happened when I made a different mistake.

This brought to light a number of issues where we map
missing glyphs to notdef, but often the notdef is actually defined
and then ends up being drawn. Now the glyphs don't get
mapped in toFontChar and so they are not drawn by the canvas.

Fixing the above brought up another issue though in bug1050040.pdf.
In this PDF, the font fails to load by the browser and before we were still
drawing the glyphs because it looked like the font had them, but with the fixes
above the glyphs showed up as missing so we didn't attempt draw them. To
fix this, I now throw an error when the loca table is in really bad shape and
we fall back to trying to use a system font. We now also use this fall back if
there are any format errors during converting fonts.
2017-07-26 13:00:55 -07:00
Jonas Jenwald
814fa1dee3 Remove most assert() calls (issue 8506)
This replaces `assert` calls with `throw new FormatError()`/`throw new Error()`.
In a few places, throwing an `Error` (which is what `assert` meant) isn't correct since the enclosing function is supposed to return a `Promise`, hence some cases were changed to `Promise.reject(...)` and similarily for `createPromiseCapability` instances.
2017-07-21 18:51:02 +02:00
Yury Delendik
d028c26210 Removes error() 2017-07-07 09:40:24 -05:00
Jonas Jenwald
eff257b820 Merge pull request #8580 from brendandahl/missing-glyf
Fix how we detect and handle missing glyph data.
2017-07-04 12:16:07 +02:00
Brendan Dahl
efbbd8533f Only mask char codes of (3, 0) cmap tables in the range of 0xF000 to 0xF0FF. 2017-07-03 13:13:46 -07:00
Brendan Dahl
6d4f748fb1 Fix how we detect and handle missing glyph data. 2017-07-03 13:06:06 -07:00
Jonas Jenwald
8b4a42e5b8 Only special-case OpenType fonts with CFF data if it's both a composite (i.e. Type0) font and also has a non-default CID to GID map (issue 8480)
*As mentioned the last time that I touched this particular part of the font code, I'm sincerely hope that this doesn't cause any regressions!*

However, the patch passes all tests added in PRs 5770, 6270, and 7904 (and obviously all other tests as well). Furthermore, I've manually checked all the issues/bugs referenced in those PRs without finding any issues.

Fixes 8480.
2017-06-09 21:15:39 +02:00
Jonas Jenwald
999e30723d Reduce the duplication slightly when detecting an OpenType font (in the Font constructor) 2017-06-09 18:26:57 +02:00
Jonas Jenwald
a8c87f8019 Fix inconsistent spacing and trailing commas in objects in src/core/ files, so we can enable the comma-dangle and object-curly-spacing ESLint rules later on
*Unfortunately this patch is fairly big, even though it only covers the `src/core` folder, but splitting it even further seemed difficult.*

http://eslint.org/docs/rules/comma-dangle
http://eslint.org/docs/rules/object-curly-spacing

Given that we currently have quite inconsistent object formatting, fixing this in *one* big patch probably wouldn't be feasible (since I cannot imagine anyone wanting to review that); hence I've opted to try and do this piecewise instead.

Please note: This patch was created automatically, using the ESLint --fix command line option. In a couple of places this caused lines to become too long, and I've fixed those manually; please refer to the interdiff below for the only hand-edits in this patch.

```diff
diff --git a/src/core/evaluator.js b/src/core/evaluator.js
index abab9027..dcd3594b 100644
--- a/src/core/evaluator.js
+++ b/src/core/evaluator.js
@@ -2785,7 +2785,8 @@ var EvaluatorPreprocessor = (function EvaluatorPreprocessorClosure() {
     t['Tz'] = { id: OPS.setHScale, numArgs: 1, variableArgs: false, };
     t['TL'] = { id: OPS.setLeading, numArgs: 1, variableArgs: false, };
     t['Tf'] = { id: OPS.setFont, numArgs: 2, variableArgs: false, };
-    t['Tr'] = { id: OPS.setTextRenderingMode, numArgs: 1, variableArgs: false, };
+    t['Tr'] = { id: OPS.setTextRenderingMode, numArgs: 1,
+                variableArgs: false, };
     t['Ts'] = { id: OPS.setTextRise, numArgs: 1, variableArgs: false, };
     t['Td'] = { id: OPS.moveText, numArgs: 2, variableArgs: false, };
     t['TD'] = { id: OPS.setLeadingMoveText, numArgs: 2, variableArgs: false, };
diff --git a/src/core/jbig2.js b/src/core/jbig2.js
index 5a17d482..71671541 100644
--- a/src/core/jbig2.js
+++ b/src/core/jbig2.js
@@ -123,19 +123,22 @@ var Jbig2Image = (function Jbig2ImageClosure() {
      { x: -1, y: -1, }, { x: 0, y: -1, }, { x: 1, y: -1, }, { x: -2, y: 0, },
      { x: -1, y: 0, }],
     [{ x: -3, y: -1, }, { x: -2, y: -1, }, { x: -1, y: -1, }, { x: 0, y: -1, },
-     { x: 1, y: -1, }, { x: -4, y: 0, }, { x: -3, y: 0, }, { x: -2, y: 0, }, { x: -1, y: 0, }]
+     { x: 1, y: -1, }, { x: -4, y: 0, }, { x: -3, y: 0, }, { x: -2, y: 0, },
+     { x: -1, y: 0, }]
   ];

   var RefinementTemplates = [
     {
       coding: [{ x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }],
-      reference: [{ x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }, { x: 0, y: 0, },
-                  { x: 1, y: 0, }, { x: -1, y: 1, }, { x: 0, y: 1, }, { x: 1, y: 1, }],
+      reference: [{ x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, },
+                  { x: 0, y: 0, }, { x: 1, y: 0, }, { x: -1, y: 1, },
+                  { x: 0, y: 1, }, { x: 1, y: 1, }],
     },
     {
-      coding: [{ x: -1, y: -1, }, { x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }],
-      reference: [{ x: 0, y: -1, }, { x: -1, y: 0, }, { x: 0, y: 0, }, { x: 1, y: 0, },
-                  { x: 0, y: 1, }, { x: 1, y: 1, }],
+      coding: [{ x: -1, y: -1, }, { x: 0, y: -1, }, { x: 1, y: -1, },
+               { x: -1, y: 0, }],
+      reference: [{ x: 0, y: -1, }, { x: -1, y: 0, }, { x: 0, y: 0, },
+                  { x: 1, y: 0, }, { x: 0, y: 1, }, { x: 1, y: 1, }],
     }
   ];
```
2017-06-02 11:20:19 +02:00
Jonas Jenwald
982b6aa65b Convert the files in the /src/core folder to ES6 modules
Please note that the `glyphlist.js` and `unicode.js` files are converted to CommonJS modules instead, since Babel cannot handle files that large and they are thus excluded from transpilation.
2017-05-30 22:06:21 +02:00
Jonas Jenwald
4ce5e520fb Add different code-paths to {CMap, ToUnicodeMap}.charCodeOf depending on length, since Array.prototype.indexOf can be extremely inefficient for very large arrays (issue 8372)
Fixes 8372.
2017-05-24 19:47:04 +02:00
Jonas Jenwald
31c24ed631 Don't map glyphs to the HANGUL FILLER (0x3164) Unicode location (issue 8424)
*This patch follows a similar pattern as previous ones, by skipping certain problematic Unicode locations.*

According to http://searchfox.org/mozilla-central/rev/6c2dbacbba1d58b8679cee700fd0a54189e0cf1b/gfx/harfbuzz/src/hb-unicode-private.hh#136, it seems that the HANGUL FILLER (0x3164) location is "special".

Fixes 8424.
2017-05-23 16:12:45 +02:00