Commit Graph

317 Commits

Author SHA1 Message Date
Jonas Jenwald
81b9d553cf Add TeX-specific glyph names to glyphlist.js to improve both glyph mapping and text selection for mathematic fonts (issue 2594) 2016-10-26 16:39:58 +02:00
Brendan Dahl
8d036faf40 Move symbolic font glyphs to private use area if they don't have unicode mappings. 2016-10-26 16:39:21 +02:00
Jonas Jenwald
1da59bec9b Remove a remaining old-style preprocessor from src/core/fonts.js (PR 7322 follow-up)
Note that this code was added *after* PR 7322 was opened, which thus explains why it was missed during rebasing.
2016-10-15 11:33:09 +02:00
Jonas Jenwald
aadcbe98c8 Replace empty CharStrings with '.notdef' in Type1Font_wrap to prevent OTS from rejecting the font (bug 1252420)
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1252420.
2016-09-17 14:39:10 +02:00
Jonas Jenwald
325f7afcca For embedded Type1 fonts without included ToUnicode/Encoding data, attempt to improve text selection by using the builtInEncoding to amend the toUnicode map (issue 6901, issue 7182, issue 7217, bug 917796, bug 1242142)
Note that in order to prevent any possible issues, this patch does *not* try to amend the `toUnicode` data for Type1 fonts that contain either `ToUnicode` or `Encoding` entries in the font dictionary.

Fixes, or at least improves, issues/bugs such as e.g. 6658, 6901, 7182, 7217, bug 917796, bug 1242142.
2016-09-11 20:54:10 +02:00
Jonas Jenwald
0b75f63c03 Don't duplicate the first entry in the charCodeToGlyphId map for CIDFontType2 fonts with a CIDToGIDMap that already mapped the first entry to a non-zero glyphId (issue 7544)
Fixes 7544.
2016-09-09 22:33:41 +02:00
Jonas Jenwald
44b75c01a1 Check that Type1C fonts does not actually contain OpenType font files (issue 7598)
This patch is yet another instalment in the (never ending) series of patches for PDF files that specify completely incorrect Type/Subtype for its fonts. In this case Type1/Type1C, when in fact OpenType would have been correct.

Fixes 7598.
2016-09-06 10:13:11 +02:00
Jonas Jenwald
088ce6c009 Add a unit-test to check that ProblematicCharRanges contains valid entries
When adding new entries to `ProblematicCharRanges`, you have to be careful to not make any mistakes since that could cause glyph mapping issues.
Currently the existing reference tests should probably help catch any errors, but based on experience I think that having a unit-test which specifically checks `ProblematicCharRanges` would be both helpful and timesaving when modifying/reviewing changes to this code.

Hence this patch which adds a function (and unit-test) that is used to validate the entries in `ProblematicCharRanges`, and also checks that we don't accidentally add more character ranges than the Private Use Area can actually contain.
The way that the validation code, and thus the unit-test, is implemented also means that we have an easy way to tell how much of the Private Use Area is potentially utilized by re-mapped characters.
2016-08-27 11:56:00 +02:00
Tim van der Meij
10f9f11ec4 Merge pull request #7490 from Snuffleupagus/issue-7426
Don't map glyphs to the Lepcha Unicode block (issue 7426)
2016-07-21 14:39:19 +02:00
Jonas Jenwald
64783c8b6e Don't map glyphs to the Lepcha Unicode block (issue 7426)
In the PDF file in the issue, some of the glyphs end up being mapped to the Lepcha Unicode block; see https://en.wikipedia.org/wiki/Lepcha_(Unicode_block).
This didn't use to matter, but after HarfBuzz updates that improved support for Lepcha fonts, in particular https://bugzilla.mozilla.org/show_bug.cgi?id=1249861, some glyphs are now moved horizontally.
To avoid that, this patch adds the Lepcha block to the list of Unicode ranges that we skip when building the glyph mapping.

Fixes 7426.
2016-07-17 16:53:36 +02:00
klemens
6f03f62327 trivial spelling fixes 2016-07-17 14:33:41 +02:00
Jonas Jenwald
51e46fa1a7 Change the warn to info in recoverGlyphName to reduce the console spam
After PR 7441, where `recoverGlyphName` is used a lot more than before, many PDF files will generate a lot of warnings the console. For normal usage, compared to debugging/development, this is probably more annoying than helpful.
2016-07-09 12:08:41 +02:00
Brendan Dahl
1f3f4a8dd7 Merge pull request #7441 from Snuffleupagus/issue-7439
Fallback to attempt to recover standard glyph names when amending the `charCodeToGlyphId` with entries from the `differences` array in `type1FontGlyphMapping` (issue 7439)
2016-07-06 13:02:21 -07:00
Brendan Dahl
e2e657e44f Merge pull request #7390 from Snuffleupagus/issue-7180
Add upper-case `I` as a possible space replacement fallback in `Font.spaceWidth` to improve text-selection (issue 7180)
2016-06-29 15:11:19 -07:00
Jonas Jenwald
7866109af9 Fallback to attempt to recover standard glyph names when amending the charCodeToGlyphId with entries from the differences array in type1FontGlyphMapping (issue 7439)
Fixes 7439.
2016-06-25 14:54:34 +02:00
Jonas Jenwald
c1ca268ef3 Skip mapping of glyphs to Unicode "Ideographic space" (issue 7416)
Fixes 7416, which is an IE specific issue.
2016-06-22 08:58:00 +02:00
Jonas Jenwald
6a0b047bfa Add upper-case I as a possible space replacement fallback in Font.spaceWidth to improve text-selection (issue 7180)
In fonts with only upper-case glyphs, that are also missing a space glyph, `get spaceWidth` won't be able to return anything useful.
By adding upper-case `I` as a fallback, we can thus improve text-selection in some PDF files.
Note that locally, the patch causes slight movement in a few existing `text` tests, but in my opinion this actually looks like slight improvements.

Fixes 7180.
2016-06-07 22:55:25 +02:00
Jonas Jenwald
a36a946976 Move the isSpace utility function from core/parser.js to shared/util.js
Currently the `isSpace` utility function is a member of `Lexer`, which seems suboptimal, given that it's placed in `core/parser.js`. In practice, this means that in a number of `core/*.js` files we thus have an *otherwise* completely unnecessary dependency on `core/parser.js` for a one-line function.

Instead, this patch moves `isSpace` into `shared/util.js` which seems more appropriate for this kind of utility function. Not to mention that since all the affected `core/*.js` files already depends on `shared/util.js`, this doesn't incur any more file dependencies.
2016-06-06 09:11:33 +02:00
Yury Delendik
32ce369d88 Fixes some static analysis warnings and recommendations
* Useless conditional
* Superfluous trailing arguments
* Useless assignment to local variable
* Misspelled identifier
* JSDoc tag for non-existent parameter
2016-05-02 17:34:58 -05:00
Yury Delendik
118b71925c Forces UMD header to have relative path and extension for CommonJS. 2016-04-02 11:10:36 -05:00
Jonas Jenwald
ef551e8266 Extract Type1Parser from fonts.js 2016-04-01 23:38:53 +02:00
Jonas Jenwald
b961e1d21b Extract CFFParser from fonts.js (issue 6777) 2016-04-01 22:32:39 +02:00
Brendan Dahl
13d440df61 Merge pull request #7078 from Snuffleupagus/refactor-toFontChar-without-file
Refactor the building of `toFontChar` for non-embedded fonts
2016-03-31 10:43:11 -07:00
Jonas Jenwald
05cf709f8e Parse Type1 font files to determine the various Length{n} properties, instead of trusting the PDF file (issue 5686, issue 3928)
Fixes 5686.
Fixes 3928.
2016-03-31 11:08:12 +02:00
Jonas Jenwald
c40df8a393 Make Type1Font more class-like, by adding closure
*Note:* Ignoring whitespace should simplify reviewing a great deal.
2016-03-31 11:00:27 +02:00
Brendan Dahl
df7afcf004 Merge pull request #7053 from yurydelendik/rm-pdfjs-core
Removes global PDFJS usage from the src/core/.
2016-03-25 13:19:43 -07:00
Yury Delendik
bda5e6235e Removes global PDFJS usage from the src/core/. 2016-03-23 19:24:37 -05:00
Jonas Jenwald
d78fae0181 Ensure that TrueType font tables have uint32 checksums
According to "The table directory" under https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6.html#Directory, TrueType font tables should have `uint32` checksums.

This is something that I noticed, and was initially confused about, while debugging a TrueType issue.
As far as I can tell, the current (`int32`) checksums we use doesn't cause any issues in practice. However, I do think that this should be addressed to agree with the specification, and to reduce possible confusion when reading the font code.
2016-03-22 13:40:50 +01:00
Manas
f6d28ca323 Refactors CMapFactory.create to make it async 2016-03-21 23:08:19 +05:30
Jonas Jenwald
cd2bd057ab Refactor the building of toFontChar for non-embedded fonts
Currently there's a lot of duplicate code for non-embedded `toFontChar`, which this patch simplifies by extracting the code into a helper function instead.
2016-03-10 21:25:39 +01:00
Jonas Jenwald
dfe9015a43 Convert uniXXXX glyph names to proper ones when building the charCodeToGlyphId map for TrueType fonts (bug 1132849, issue 6893, issue 6894)
This patch adds a `getUnicodeForGlyph` helper function, which is used to recover Unicode values for non-standard glyph names.

Some PDF generators, e.g. Scribus PDF, use improper `uniXXXX` glyph names which breaks the glyph mapping. We can avoid this by converting them to "standard" glyph names instead.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1132849.
Fixes 6893.
Fixes 6894.
2016-03-09 19:37:15 +01:00
Preetham Mysore
be1e12dbcb Fix for descent calculation while reading font hhea headers 2016-03-03 08:51:41 -05:00
Jonas Jenwald
8402c79171 Merge pull request #7050 from brendandahl/issue4402
For CIDFontType2 use CID as glyph ID when missing CID to GID map.
2016-03-02 10:11:42 +01:00
Brendan Dahl
a6acf74b54 Merge pull request #7023 from brendandahl/issue6721
Only draw glyphs on canvas if they are in the font or the font file is missing.
2016-03-01 18:03:37 -08:00
Brendan Dahl
6e1d131384 For CIDFontType2 use CID as glyph ID when missing CID to GID map. 2016-03-01 17:05:33 -08:00
Brendan Dahl
ff87f3fb86 Only draw glyphs on canvas if they are in the font or the font file is missing. 2016-03-01 13:24:58 -08:00
Jonas Jenwald
505f15f221 Avoid accidentally getting the entire font file in readNameTable (issue 7020)
In the PDF file in question, some of the 'name' table entries have `record.length === 0`. This becomes problematic in the non-unicode case, since `font.getBytes(0)` will fetch the *entire* stream.
Given that OTS rejects 'name' entries larger than `2^16`, this thus explain the sanitizer errors.

Fixes 7020.
2016-03-01 21:59:49 +01:00
Tim van der Meij
02b161d432 Merge pull request #6933 from brendandahl/faster-decrypt
Make type 1 font program decryption faster.
2016-02-09 23:41:22 +01:00
Brendan Dahl
02331f6e33 Make type 1 font program decryption faster.
Discard the values first so we don't have to slice the array.
2016-01-29 11:10:30 -08:00
Yury Delendik
2edf2792dc Replaces literal {} created lookup tables with Object.create 2016-01-28 12:18:38 -06:00
Yury Delendik
55a201d92d Lazify NormalizedUnicodes 2016-01-28 11:56:42 -06:00
Yury Delendik
d0738d7e24 Lazify stdFontMap, serifFonts, GlyphMapForStandardFonts 2016-01-28 11:51:54 -06:00
Yury Delendik
1a9a665adf Refactor Encodings 2016-01-28 11:32:59 -06:00
Yury Delendik
4ef20de429 Lazify GlyphsUnicode. 2016-01-28 11:32:59 -06:00
Yury Delendik
0aa373cdf3 Merge pull request #6891 from Snuffleupagus/issue-6889
Map missing glyphs to the `notdef` glyph for TrueType (3, 1) fonts regardless if the 'post' table is defined or not (issue 6889)
2016-01-20 13:14:47 -06:00
Jonas Jenwald
4855d4cc9f Map missing glyphs to the notdef glyph for TrueType (3, 1) fonts regardless if the 'post' table is defined or not (issue 6889) 2016-01-17 22:58:00 +01:00
Jonas Jenwald
d52495a9c8 [TrueType] Recover from a missing "glyf" table by replacing it with dummy data, utilizing the existing code in sanitizeGlyphLocations
It seems to be fairly common for OCR software to include incomplete TrueType fonts, notable missing the "glyf" table, in PDF files. Since we currently reject such fonts, the result is that text-selection/copying is broken.

This patch contains a suggested approach to try and use these kind of broken fonts, by using existing code in `sanitizeGlyphLocations` to replace a missing "glyf" table with dummy data.

Fixes 4684.
Fixes 6007.
Fixes 6829.
2016-01-15 21:44:59 +01:00
Jonas Jenwald
896e390285 Check that CIDFontType0 fonts does not actually contain OpenType font files (issue 6782)
*This patch follows a similar idea as PR 5756.*

The patch is based on the nice debugging done by Brendan in the referenced issue 6782.
A better way to handle this, and similar issues, would probably be to completely ignore what the PDF file claims about font type/subtype, and just check the actual data. But until that kind of rewrite happens, this patch should help.

Fixes 6782.
2016-01-06 02:19:02 +01:00
Brendan Dahl
eb7c36beb6 Add validation for callsubr and callgsubr for type 2 charstrings. 2016-01-05 09:54:25 -08:00
Yury Delendik
6b60c8f4db Adds UMD headers to core, display and shared files. 2015-12-15 13:24:39 -06:00
Jonas Jenwald
ee0d522187 Use adjustWidths for TrueType fonts if we handle them as OpenType (issue 5027, issue 5084, issue 6556, bug 1204903)
In `Font_checkAndRepair` we can decide that a font isn't TrueType, and instead parse it as CFF. In that case it's quite possible that the `fontMatrix` will be changed, and without calling `adjustWidths` we're failing to update the glyph widths correctly.

Fixes 5027.
Fixes 5084.
Fixes 6556.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1204903.
2015-12-08 00:49:22 +01:00
Jonas Jenwald
4810b7b8fc Fix the charCodeOf method in IdentityToUnicodeMap in order to prevent text selection from breaking
After PR 6590, `font.spaceWidth` is now called in more cases than before (in `PartialEvaluator_getTextContent`), which exposed an underlying issue with `IdentityToUnicodeMap_charCodeOf` throwing an error.
This breaks text-selection in some PDF files found in the wild, hence this patch replaces the `error` with an actual function instead (modelled after `IdentityCMap_charCodeOf`).
2015-12-05 13:15:55 +01:00
Brendan Dahl
87762afec4 Remove glyph id's outside the range of valid glyphs.
OTS does not like invalid glyph ids in a camp table.
2015-12-03 11:53:06 -08:00
Manas
a2ba1b8189 Uses editorconfig to maintain consistent coding styles
Removes the following as they unnecessary
/* -*- Mode: Java; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
/* vim: set shiftwidth=2 tabstop=2 autoindent cindent expandtab: */
2015-11-14 07:32:18 +05:30
Jonas Jenwald
ff64ef0243 Prevent readCmapTable from failing if the cmap is missing in TrueType fonts
Fixes http://arrow.dit.ie/cgi/viewcontent.cgi?article=1000&context=aaschadpoth#page=3.
2015-11-08 16:48:37 +01:00
Yury Delendik
cc5bc18728 Fixes incorrect PDF file font metrics. 2015-11-06 14:47:10 -06:00
Yury Delendik
fa46b73c47 Better spacing in text layer. 2015-11-02 08:54:15 -06:00
Jonas Jenwald
29a1cdb6a6 Only choose a (3, 1) cmap table for TrueType fonts that have an encoding specified (issue 6410)
For (1, 0) cmaps, we have two different codepaths depending on whether the font has/hasn't got an encoding. But with (3, 1) cmaps we don't have a good fallback when the encoding is missing, hence this patch changes `readCmapTable` to only choose a (3, 1) cmap table if the font is non-symbolic *and* an encoding exists. Without this, we'll not be able to successfully create a working glyph map for some TrueType fonts with (3, 1) cmap tables.

Fixes 6410.
2015-09-07 16:56:05 +02:00
Jonas Jenwald
0fb31a4a9e Fallback in readCmapTable, instead of using error, for TrueType fonts with unsupported cmap formats (bug 1200096)
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1200096.

The problematic font has a `format 2` cmap, which we've never supported properly. Prior to PR 2606, we were able to fallback to a working state, despite not having proper support for that cmap format.

Obviously the best/correct solution would be to implement actual support for more cmap formats[1]. However, I'm hoping that a simple patch will be OK for now, given that:
 - `format 2` cmaps seem to be quite rare in practice, since this has been broken for 2.5 years before anyone noticed.
 - Having a simple patch will make potential uplifts a lot easier.

[1] See the specification at https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6cmap.html
2015-09-01 14:01:19 +02:00
Jonas Jenwald
99d29487ab Adjust which TrueType (3, 1) glyphs we attempt to skip mapping of (issue 6336)
Fixes 6336.
2015-08-09 12:51:43 +02:00
Jonas Jenwald
0a024b5051 Adjust the heuristics used to detect OpenType font file with CFF data (bug 1186827, bug 1182130, issue 6264)
*This is a tentative patch.*

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1186827.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1182130.
Fixes 6264.
2015-07-25 12:26:36 +02:00
Tim van der Meij
5af49f8bbb Merge pull request #6166 from Snuffleupagus/issue-5801-2
Add a supplemental glyph map for non-embedded ArialBlack fonts (issue 5801)
2015-07-10 22:29:50 +02:00
Yury Delendik
0787182e6f Adds more characters to the PUA range 2015-07-02 16:47:47 -05:00
Jonas Jenwald
d0477302be Add a supplemental glyph map for non-embedded ArialBlack fonts (issue 5801)
This should, hopefully, finally fix 5801.
2015-07-01 22:16:52 +02:00
Jonas Jenwald
aa3a64e975 Skip mapping of CIDFontType2 glyphs when the font either has a |IdentityToUnicodeMap| or a |toUnicodeMap| with 65536 elements (issue 5677)
This patch slightly extends the heuristics used when trying to skip mapping of missing glyphs.

Fixes 5677.
2015-06-18 21:53:15 +02:00
Jonas Jenwald
bf20334bea Merge pull request #6090 from Snuffleupagus/issue-6068
Map missing glyphs to the notdef glyph for TrueType (3, 1) fonts (issue 6068)
2015-06-13 00:29:08 +02:00
Jonas Jenwald
5eae3e29c5 Map missing glyphs to the notdef glyph for TrueType (3, 1) fonts (issue 6068)
Fixes 6068.

The most notable issue with the font in question is that the `differences` array contains lots of strange entries (of the type `uniXXXX`, instead of proper glyph names).
2015-06-06 18:28:16 +02:00
Jonas Jenwald
6f2f0700b7 Don't map glyphs to certain problematic Thai/Lao Unicode locations (issue 5994)
*This patch depends on PR 5990.*

According to https://dxr.mozilla.org/mozilla-central/source/gfx/harfbuzz/src/hb-ot-shape-fallback.cc#38, certain Thai/Lao characters are treated as special by the font shaping code in Firefox.
Further down in that file, https://dxr.mozilla.org/mozilla-central/source/gfx/harfbuzz/src/hb-ot-shape-fallback.cc#216, the vertical position of glyphs is modified, which should thus explain why some glyphs end up in the wrong position in the PDF file.

Fixes 5994.
2015-06-05 23:53:22 +02:00
Brendan Dahl
749a60a0b7 Merge pull request #5990 from Snuffleupagus/missing-glyphs-identityUnicode
Skip mapping of CIDFontType2 glyphs in fonts with a |IdentityToUnicodeMap|, unless |properties.widths| is defined for the glyph
2015-06-05 14:50:02 -07:00
Jonas Jenwald
6fbc5428bd Skip mapping of CIDFontType2 glyphs in fonts with a |IdentityToUnicodeMap|, unless |properties.widths| is defined for the glyph
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1142033.
Also fixes issue 5874.
2015-05-14 22:38:04 +02:00
Jonas Jenwald
0365baf5ab Fall back to the |defaultEncoding| when no valid "post" table is found in TrueType fonts (bug 1050040)
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1050040.

With this patch the file is completely readable, but given that the font is broken enough to be rejected by OTS the rendering differs slightly from Adobe Reader.

*Note:* the PDF file is sufficiently broken that even Adobe Reader complains about the font, *and* also about another more general issue.
2015-05-14 13:16:14 +02:00
Jonas Jenwald
70b839386a Ensure that the cmap position is within the bounds of the font file in |readCmapTable| 2015-05-14 13:16:09 +02:00
Tim van der Meij
b34366d2fc Merge pull request #5898 from stri8ed/master
Extract more accurate glyph heights from type3 fonts
2015-05-13 21:07:17 +02:00
Tim van der Meij
48b2f6d023 Merge pull request #5756 from Snuffleupagus/issue-5751
Guess CIDFontType0 subtype based on font file contents (issue 5751)
2015-04-24 23:50:07 +02:00
Jonas Jenwald
fda858ae33 Don't map glyphs to certain problematic General Punctuation Unicode locations (bug 911034)
Fixes the remaining missing characters in https://bugzilla.mozilla.org/show_bug.cgi?id=911034.

For reference, see http://www.unicode.org/charts/PDF/U2000.pdf (and also http://en.wikipedia.org/wiki/General_Punctuation_%28Unicode_block%29).
2015-04-09 17:27:03 +02:00
Jonas Jenwald
a54ec673c5 Move the checks for problematic Unicode locations from |adjustMapping| to a separate helper function 2015-04-09 12:56:29 +02:00
Levi Melamed
a5159a7942 extract more accurate glpyh heights from type-3 fonts 2015-04-03 08:49:06 -05:00
Jonas Jenwald
2b1a13ba28 Don't map glyphs to Unicode position 0x0E33, i.e. Thai character SARA AM (bug1046314)
*A similar approach as in PR 5705.*

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1046314.

According to https://dxr.mozilla.org/mozilla-central/source/gfx/harfbuzz/src/hb-ot-shape-complex-thai.cc#270-365, `0x0E33` is treated as a special case (by the font shaping code in Firefox). Hence it seems reasonable to skip it when adjusting the font mapping.
2015-03-26 13:22:45 +01:00
Brendan Dahl
3a8d4a7d72 Merge pull request #5713 from Snuffleupagus/evaluator-IdentityToUnicodeMap
Create a IdentityToUnicodeMap in evaluator.js when toUnicode contains IdentityH/IdentityV
2015-03-25 10:33:29 -07:00
Brendan Dahl
519b6669f0 Merge pull request #5705 from Snuffleupagus/bug-1108301
Don't map glyphs to Unicode "Dotted circle" combining mark (bug 1108301)
2015-03-24 16:33:04 -07:00
Jonas Jenwald
e894a0a4c6 Guess CIDFontType0 subtype based on font file contents (issue 5751) 2015-03-15 13:35:48 +01:00
Jonas Jenwald
f81fc9091a Correctly detect OpenType font files with CFF data
Fixes 5334.
Fixes 215.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1125614.

According to the specification, http://www.microsoft.com/typography/otspec/otff.htm, OpenType font files with CFF data should have `OTTO` in the header.
2015-02-28 13:43:53 +01:00
Jonas Jenwald
0a3341dadc Don't map glyphs to Unicode "Dotted circle" combining mark (bug 1108301)
It seems that `0x25CC` is another bad spot for charCodes.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1108301.
2015-02-27 00:20:38 +01:00
Jonas Jenwald
417800a1b5 Only skip the |!isSymbolicFont| check for TrueType (3, 1) cmap tables if no previous cmap table was found (PR 5703 followup)
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=894572.
2015-02-19 13:58:03 +01:00
Jonas Jenwald
592890a758 Relax the |isSymbolicFont| check for TrueType (3, 1) cmap tables (issue 5701) 2015-02-13 01:03:10 +01:00
Brendan Dahl
394b38b22f Merge pull request #5651 from Snuffleupagus/missing-glyphs
Try to skip mapping of missing TrueType and CIDFontType2 glyphs
2015-02-11 19:31:22 -08:00
Brendan Dahl
fb8200096b Merge pull request #5634 from Snuffleupagus/cmap-0,0
Add support for TrueType (0, 0) cmap tables (issue 5501, issue 5574, and bug 1037973)
2015-02-11 15:04:03 -08:00
Jonas Jenwald
f19a1db414 Create a IdentityToUnicodeMap in evaluator.js when toUnicode contains IdentityH/IdentityV
Currently if a font contains a `toUnicode` entry, we always create a new `ToUnicodeMap` in evaluator.js. This is done even for `IdentityV/IdentityH`, despite to possibility to use the much more compact `IdentityToUnicodeMap` representation.
This patch refactors the `IdentityH/IdentityV` cases, to:
 - Avoid calling `IdentityCMap.getMap`, since this prevents allocating and iterating through an array with 65536 elements.

 - Ensure that the handling of `toUnicode` is actually correct in fonts.js.
We rely on `toUnicode instanceof IdentityToUnicodeMap` in a few places, and currently this does not work correctly for `IdentityH/IdentityV`.
2015-02-09 16:52:31 +01:00
Jonas Jenwald
01e6565dd4 Try to skip mapping of missing TrueType glyphs
Also don't skip mapping of glyphs which are empty, if the corresponding charCode is included in toUnicode.
2015-02-07 12:19:38 +01:00
Jonas Jenwald
8174da61fb Don't skip mapping of glyphs for CIDFontType2 fonts with a CIDToGIDMap
Also don't skip mapping of glyphs which are empty, if the corresponding charCode is included in toUnicode.
2015-02-07 12:19:37 +01:00
Brendan Dahl
cb27707277 Try to skip mapping of missing glyphs. 2015-02-07 12:19:37 +01:00
Jonas Jenwald
c2c54257f2 Prevent setting |isStandardFont| to |undefined| for non-embedded fonts
This is a very small follow-up to PR 5536, which sets `isStandardFont` to `false` instead of `undefined` (as currently happens for some font names).

Since the patch is so small, I hope it's OK to also fix an unrelated copy-and-paste error in a comment that was added in PR 5260.
2015-01-13 15:44:34 +01:00
Jonas Jenwald
ad41a2d574 Add support for TrueType (0, 0) cmap tables (issue 5501 and 5574) 2015-01-11 14:54:12 +01:00
Jonas Jenwald
d8b905048b Add fallback for non-embedded "Century Gothic" CIDFontType2 font (issue 4722 and bug 879561)
According to practical experiments, falling back to "Helvetica" when we encounter a non-embedded "[Century Gothic](http://en.wikipedia.org/wiki/Century_Gothic)" `CIDFontType2` font seems to work well.
(Also, the section on Wikipedia about "Printer ink usage" *might* provide some anecdotal evidence that Century Gothic is a fairly standard sans-serif font.)

Obviously this patch doesn't make "Century Gothic" fonts render perfectly, as is often the case with non-embedded fonts, but all the text is now legible in the referenced issues.

Fixes 4722.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=879561.
2014-12-18 23:19:34 +01:00
Yury Delendik
31ae5f2a3d Merge pull request #5379 from brendandahl/nbsp
Don't map glyphs to unicode non breaking space.
2014-12-18 13:38:03 -06:00
Jonas Jenwald
96a77e9d6a Add basic support for non-embedded Wingdings fonts
This is a tentative patch that adds *very* basic support for non-embedded Wingdings fonts (a Windows version of Dingbats), by falling back to the ZapfDingbats encoding. Obviously this approach will not work perfectly, but in my opinion it seems to work reasonably well in pratice.

Instead of this very simple patch, another option would be to try and include more complete glyph data for Wingdings, e.g. a Unicode map and glyph widths, similar to what was done for ZapfDingbats.
However there is, in my opinion, one important difference between Wingdings and ZapfDingbats: ZapfDingbats is part of the 14 standard fonts, which in previous versions of the PDF specification was assumed to be available in PDF readers. To improve compatibility with older files, it thus makes sense for us to include data for ZapfDingbats.
However Wingdings has never been a standard font in PDF files, hence PDF files using it *should* thus contain all the necessary font data.

Given the above, I thus believe that it should be OK to fall back to ZapfDingbats for now. If non-embedded Wingdings fonts turns out to be *a lot* more common, then we can revisit this later.

Fixes 4301 completely.
Fixes 4837 almost completely. With this patch the bullets are displayed correctly, but the arrows are not of the correct type.
Fixes `artofwar.pdf`, pages 14 and 15.
2014-12-09 00:28:22 +01:00
Brendan Dahl
8a536ac346 Map missing glyphs in encoding to notdef glyph. 2014-10-03 12:11:20 -07:00
Brendan Dahl
2fc5e6a9ad Don't map glyphs to unicode non breaking space. 2014-10-02 10:58:56 -07:00
Jonas Jenwald
df2a4afd36 Use |toUnicode| when creating the glyph map for standard CIDFontType2 fonts without embedded font file 2014-09-27 13:20:04 +02:00
Yury Delendik
744c8e8d7e Merge pull request #5250 from Snuffleupagus/issue-5238
Fix Symbol fonts without font file but with Encoding dictionary (issue 5238)
2014-09-26 15:18:33 -05:00
Jonas Jenwald
3c759e296a Add support for MMType1 fonts with embedded font files 2014-09-18 16:10:46 +02:00
Jonas Jenwald
b16c973d9d Fix Symbol fonts without font file but with Encoding dictionary (issue 5238) 2014-09-16 21:38:53 +02:00
Yury Delendik
15681adbb9 Merge pull request #5245 from Snuffleupagus/issue-5244
Further amend GlyphMapForStandardFonts (issue 5244)
2014-09-16 10:12:07 -05:00
Brendan Dahl
403b7df6e7 Merge pull request #5233 from Snuffleupagus/bug-1057544
Workaround for TrueType fonts with exotic cmap tables (bug 1057544)
2014-09-15 14:47:31 -07:00
Jonas Jenwald
7b3f222787 Add |SpecialPUASymbols| map and refactor |mapSpecialUnicodeValues| 2014-09-04 13:41:15 +02:00
Jonas Jenwald
2d5596172c Add more cases to |mapSpecialUnicodeValues| to fix the rendering of various Symbol encoded brackets 2014-09-04 12:40:15 +02:00
Jonas Jenwald
4bda6ba1b8 Add basic support for ZapfDingbats 2014-09-03 21:54:04 +02:00
Jonas Jenwald
be595d0721 Further amend GlyphMapForStandardFonts (issue 5244) 2014-09-01 10:51:22 +02:00
Jonas Jenwald
cc8710acbf Workaround for TrueType fonts with exotic cmap tables (bug 1057544) 2014-08-23 11:27:41 +02:00
Jonas Jenwald
ae896fc071 Avoid creating intermediate strings in sanitizeMetrics
This patch avoids creating many intermediate strings, when adding dummy width/lsb entries for glyphs where those are missing.
For the relevant PDF files in our test suite, the average number of intermediate strings are well over 1000.
2014-08-20 23:55:57 +02:00
Yury Delendik
a2c2f81167 Use cff glyph width in the hmtx table 2014-08-14 16:11:09 -05:00
Yury Delendik
0ad323f621 Adds width at the beginning of the Type2 charstring 2014-08-13 21:15:40 -05:00
Nicholas Nethercote
61e6b576d4 Avoid an allocation in readCharCode().
readCharCode() returns two values, and currently allocates a length-2
array on every call to do so. This change makes it instead us a
passed-in object which can be reused.

This tiny change reduces the total JS allocations done for the document
in Mozilla bug 992125 by 4.2%.
2014-08-12 16:12:58 -07:00
Yury Delendik
ab8270ae3a Fixes searchRange calculation 2014-08-10 14:11:04 -05:00
Yury Delendik
42771159ca Removes stringToArray 2014-08-10 14:11:04 -05:00
Yury Delendik
350556f085 Removes bytesToString/stringToArray conversions in the font.js 2014-08-10 14:11:04 -05:00
Nicholas Nethercote
f82977caf9 Simplify isIdentityUnicode detection. 2014-08-08 02:02:42 -07:00
Nicholas Nethercote
6c8cca1284 Add IdentityToUnicodeMap class.
When loading the PDF from issue #4935, this change reduces peak RSS from
~2400 to ~300 MiB, and improves overall speed by ~81%, from 6336 ms to
1222 ms.
2014-08-07 20:45:11 -07:00
Nicholas Nethercote
9576047f0d Add ToUnicodeMap class. 2014-08-07 20:05:24 -07:00
Yury Delendik
46a9a35ddc Merge pull request #5071 from nnethercote/font-savings
Optimize a font-heavy document
2014-08-05 18:57:46 -05:00
Yury Delendik
fa53fcbf57 Merge pull request #5095 from Snuffleupagus/issue-5070
Adjust the heuristics to recognize more cases of unknown glyphs for |toUnicode| (issue 5070)
2014-08-05 17:41:38 -05:00
Yury Delendik
6865c284a7 Merge pull request #5111 from nnethercote/better-cidchars
Represent cid chars using integers, not strings.
2014-08-04 22:26:55 -05:00
Jonas Jenwald
8ecbb4da05 Adjust the heuristics to recognize more cases of unknown glyphs for |toUnicode| (issue 5070) 2014-08-03 21:18:23 +02:00
Jonas Jenwald
b918df3547 Re-factor heuristics to recognize unknown glyphs for |toUnicode| 2014-08-03 21:12:36 +02:00
Jonas Jenwald
97b3eadbc4 Add strict equalities in src/core/fonts.js 2014-08-01 21:56:03 +02:00
Nicholas Nethercote
adf58ed687 Represent cid chars using integers, not strings.
cid chars are 16-bit unsigned integers. Currently we convert them to
single-char strings when inserting them into the CMap, and then convert
them back to integers when extracting them from the CMap. This patch
changes CMap so that cid chars stay in integer format throughout, saving
both time and space.

When loading the PDF from issue #4580, this change reduces peak RSS from
~600 to ~370 MiB. It also improves overall speed on that PDF by ~26%,
going from 724 ms to 533 ms.
2014-08-01 02:35:17 -07:00
Nicholas Nethercote
b86daed29d Make CMap.map quasi-private.
This makes it easier for the representation to be improved.
2014-07-30 06:26:35 -07:00
Jonas Jenwald
c3c72948b9 Stop including cidmaps.js
In b5b94a4af3, i.e. PR #4259, we stopped using cidmaps.js. Despite that, it's still included when PDF.js is built. At almost 0.5 MB (and approx. 7000 lines), this is currently the single largest file in the codebase.
Including such a large file in the builds, when it is not actually used, seems extremely wasteful; hence this patch.
2014-07-25 21:53:09 +02:00
Tim van der Meij
62e6265fb3 Merge pull request #5074 from nnethercote/readPostScriptTable-join
Use Array.join to build up strings in readPostScriptTable().
2014-07-25 21:26:54 +02:00
Nicholas Nethercote
1039791472 Use Array.join to build up strings in readPostScriptTable().
This avoids about 5 MiB of string allocations on one test case.
2014-07-24 16:12:08 -07:00
Nicholas Nethercote
c7f02d2c8e Minimize memory usage of font-related arrays.
This patch replaces some vanilla arrays with typed arrays, and avoids
some array copying.

It reduces the peak RSS when viewing
http://www.dynacw.co.jp/Portals/3/fontsamplepdf/sample_4942546800828.pdf
from ~940 MiB to ~750 MiB, and reduces its load time from 83 to 76 ms.
2014-07-22 22:47:45 -07:00
Jonas Jenwald
f13c217b25 Fix another seac regression (issue 4801) 2014-07-22 21:44:13 +02:00
Jonas Jenwald
a7c786775d [CIDFontType2] Map characters missing in toUnicode to the private use area (bug 1028735 and issue 4881) 2014-07-05 00:18:51 +02:00
Jonas Jenwald
c5f4051a75 A few small optimizations of adjustMapping
Replace a couple of |in| checks with comparisons against undefined.
2014-06-27 00:59:42 +02:00
Jonas Jenwald
c121def806 A few small optimizations for CIDFontType2 fonts
Cache a constant length and replace one usage of |in| with a comparison against undefined.
2014-06-27 00:52:54 +02:00
Yury Delendik
10db93be29 Merge pull request #4980 from Snuffleupagus/bug-1027533
Additional heuristics to recognize unknown glyphs for toUnicode (bug 1027533)
2014-06-23 21:56:13 -05:00
Yury Delendik
c28839b2f3 Merge pull request #4944 from Snuffleupagus/issue-4934
Don't blindly trust toUnicode when building toFontChar for non-standard fonts without a font file (issue 4934)
2014-06-23 21:49:24 -05:00
Jonas Jenwald
b19bb74813 Additional heuristics to recognize unknown glyphs for toUnicode (bug 1027533) 2014-06-20 09:57:16 +02:00
Yury Delendik
0cd28ebfa3 Telemetry for used stream and font types 2014-06-16 16:41:04 -05:00
Jonas Jenwald
158790981c Don't blindly trust toUnicode when building toFontChar for non-standard fonts without a font file (issue 4934) 2014-06-14 22:59:08 +02:00
Jonas Jenwald
3c5dedf60d Prevent font error when no preferred cmap table is found (workaround for issue 4800) 2014-05-27 17:30:11 +02:00
Yury Delendik
e5a0d89da9 Refactors loadFont for translateFont be async; fixes type3 dup data 2014-05-19 16:27:54 -05:00
Jonas Jenwald
3e1db41ddd Fix loading of fonts with empty font files (bug 866395 and issue 3522) 2014-05-18 21:41:06 +02:00
Jonas Jenwald
0fa154be4e Amend GlyphMapForStandardFonts to fix issue 4276 2014-04-30 15:56:40 +02:00
Jonas Jenwald
747dec16b2 Prevent trying to map characters to the specials unicode block in adjustMapping (issue 4650) 2014-04-28 23:33:54 +02:00
Yury Delendik
98e023e464 Guesses Type1C font type based on file content 2014-04-24 11:48:18 -05:00
Yury Delendik
9a5c121e4d Fixes invalid CFF name for Mac OSX 2014-04-17 10:50:06 -05:00
Yury Delendik
a22258a6b3 Merge pull request #4638 from yurydelendik/issue4630
Recognizes ASCII type1 encoding
2014-04-17 08:39:31 -05:00
Yury Delendik
bf3a2488df Recognizes ascii type1 encoding 2014-04-17 07:52:33 -05:00
fkaelberer
b06c10cbbd rename getUint32 to getInt32 and collect readInt*() in util.js 2014-04-16 21:31:16 +02:00