Commit Graph

99 Commits

Author SHA1 Message Date
Nicholas Nethercote
9576047f0d Add ToUnicodeMap class. 2014-08-07 20:05:24 -07:00
Yury Delendik
46a9a35ddc Merge pull request #5071 from nnethercote/font-savings
Optimize a font-heavy document
2014-08-05 18:57:46 -05:00
Yury Delendik
fa53fcbf57 Merge pull request #5095 from Snuffleupagus/issue-5070
Adjust the heuristics to recognize more cases of unknown glyphs for |toUnicode| (issue 5070)
2014-08-05 17:41:38 -05:00
Yury Delendik
6865c284a7 Merge pull request #5111 from nnethercote/better-cidchars
Represent cid chars using integers, not strings.
2014-08-04 22:26:55 -05:00
Jonas Jenwald
8ecbb4da05 Adjust the heuristics to recognize more cases of unknown glyphs for |toUnicode| (issue 5070) 2014-08-03 21:18:23 +02:00
Jonas Jenwald
b918df3547 Re-factor heuristics to recognize unknown glyphs for |toUnicode| 2014-08-03 21:12:36 +02:00
Jonas Jenwald
97b3eadbc4 Add strict equalities in src/core/fonts.js 2014-08-01 21:56:03 +02:00
Nicholas Nethercote
adf58ed687 Represent cid chars using integers, not strings.
cid chars are 16-bit unsigned integers. Currently we convert them to
single-char strings when inserting them into the CMap, and then convert
them back to integers when extracting them from the CMap. This patch
changes CMap so that cid chars stay in integer format throughout, saving
both time and space.

When loading the PDF from issue #4580, this change reduces peak RSS from
~600 to ~370 MiB. It also improves overall speed on that PDF by ~26%,
going from 724 ms to 533 ms.
2014-08-01 02:35:17 -07:00
Nicholas Nethercote
b86daed29d Make CMap.map quasi-private.
This makes it easier for the representation to be improved.
2014-07-30 06:26:35 -07:00
Jonas Jenwald
c3c72948b9 Stop including cidmaps.js
In b5b94a4af3, i.e. PR #4259, we stopped using cidmaps.js. Despite that, it's still included when PDF.js is built. At almost 0.5 MB (and approx. 7000 lines), this is currently the single largest file in the codebase.
Including such a large file in the builds, when it is not actually used, seems extremely wasteful; hence this patch.
2014-07-25 21:53:09 +02:00
Tim van der Meij
62e6265fb3 Merge pull request #5074 from nnethercote/readPostScriptTable-join
Use Array.join to build up strings in readPostScriptTable().
2014-07-25 21:26:54 +02:00
Nicholas Nethercote
1039791472 Use Array.join to build up strings in readPostScriptTable().
This avoids about 5 MiB of string allocations on one test case.
2014-07-24 16:12:08 -07:00
Nicholas Nethercote
c7f02d2c8e Minimize memory usage of font-related arrays.
This patch replaces some vanilla arrays with typed arrays, and avoids
some array copying.

It reduces the peak RSS when viewing
http://www.dynacw.co.jp/Portals/3/fontsamplepdf/sample_4942546800828.pdf
from ~940 MiB to ~750 MiB, and reduces its load time from 83 to 76 ms.
2014-07-22 22:47:45 -07:00
Jonas Jenwald
f13c217b25 Fix another seac regression (issue 4801) 2014-07-22 21:44:13 +02:00
Jonas Jenwald
a7c786775d [CIDFontType2] Map characters missing in toUnicode to the private use area (bug 1028735 and issue 4881) 2014-07-05 00:18:51 +02:00
Jonas Jenwald
c5f4051a75 A few small optimizations of adjustMapping
Replace a couple of |in| checks with comparisons against undefined.
2014-06-27 00:59:42 +02:00
Jonas Jenwald
c121def806 A few small optimizations for CIDFontType2 fonts
Cache a constant length and replace one usage of |in| with a comparison against undefined.
2014-06-27 00:52:54 +02:00
Yury Delendik
10db93be29 Merge pull request #4980 from Snuffleupagus/bug-1027533
Additional heuristics to recognize unknown glyphs for toUnicode (bug 1027533)
2014-06-23 21:56:13 -05:00
Yury Delendik
c28839b2f3 Merge pull request #4944 from Snuffleupagus/issue-4934
Don't blindly trust toUnicode when building toFontChar for non-standard fonts without a font file (issue 4934)
2014-06-23 21:49:24 -05:00
Jonas Jenwald
b19bb74813 Additional heuristics to recognize unknown glyphs for toUnicode (bug 1027533) 2014-06-20 09:57:16 +02:00
Yury Delendik
0cd28ebfa3 Telemetry for used stream and font types 2014-06-16 16:41:04 -05:00
Jonas Jenwald
158790981c Don't blindly trust toUnicode when building toFontChar for non-standard fonts without a font file (issue 4934) 2014-06-14 22:59:08 +02:00
Jonas Jenwald
3c5dedf60d Prevent font error when no preferred cmap table is found (workaround for issue 4800) 2014-05-27 17:30:11 +02:00
Yury Delendik
e5a0d89da9 Refactors loadFont for translateFont be async; fixes type3 dup data 2014-05-19 16:27:54 -05:00
Jonas Jenwald
3e1db41ddd Fix loading of fonts with empty font files (bug 866395 and issue 3522) 2014-05-18 21:41:06 +02:00
Jonas Jenwald
0fa154be4e Amend GlyphMapForStandardFonts to fix issue 4276 2014-04-30 15:56:40 +02:00
Jonas Jenwald
747dec16b2 Prevent trying to map characters to the specials unicode block in adjustMapping (issue 4650) 2014-04-28 23:33:54 +02:00
Yury Delendik
98e023e464 Guesses Type1C font type based on file content 2014-04-24 11:48:18 -05:00
Yury Delendik
9a5c121e4d Fixes invalid CFF name for Mac OSX 2014-04-17 10:50:06 -05:00
Yury Delendik
a22258a6b3 Merge pull request #4638 from yurydelendik/issue4630
Recognizes ASCII type1 encoding
2014-04-17 08:39:31 -05:00
Yury Delendik
bf3a2488df Recognizes ascii type1 encoding 2014-04-17 07:52:33 -05:00
fkaelberer
b06c10cbbd rename getUint32 to getInt32 and collect readInt*() in util.js 2014-04-16 21:31:16 +02:00
Rob Wu
2e97c0d085 Remove some unused variables from src/
Only obviously useless, local variables have been removed.
2014-04-15 17:10:23 +02:00
Yury Delendik
65fa25ca36 Fixes number of glyphs in the generated font 2014-04-12 13:25:13 -05:00
Brendan Dahl
b242826d29 Fix seac regression. 2014-04-11 09:55:39 -07:00
Yury Delendik
88c1747cc3 Heuristics to recognize the unknown glyphs for toUnicode 2014-04-10 19:21:09 -05:00
Tim van der Meij
df91acf239 Fixes lint warning W004 in src/core 2014-04-11 00:41:08 +02:00
Brendan Dahl
5bd8a83c9b Build the text layer geometry on the worker. 2014-04-09 16:44:07 -07:00
Yury Delendik
9ccdbbcb55 Merge pull request #4574 from Snuffleupagus/bug-850854
Handle 'space' character correctly in WinAnsiEncoding (bug 850854)
2014-04-09 14:36:49 -05:00
Brendan Dahl
a6e5f31ca1 Merge pull request #4423 from chriskr/font-aliases
Treat fonts with the same font descriptor and encoding as aliases
2014-04-09 10:26:09 -07:00
Christian Krebs
79f34b183c Treat fonts with the same font descriptor, encoding and unicode map as aliases
Different fonts can point to the same font descriptor
(see https://github.com/mozilla/pdf.js/issues/4339 for details). With this
commit such fonts are treated as aliases if they have also the same encoding
and the same toUnicode map. The according info is stored on the font descriptor.
This change must also ensure that aliases use always the same font name
because translated fonts can get cleared depending on the CLEANUP_TIMEOUT setting.
2014-04-08 20:45:21 +02:00
Jonas Jenwald
9e6c66be12 Handle 'space' character correctly in WinAnsiEncoding (bug 850854) 2014-04-08 13:07:29 +02:00
Jonas Jenwald
8fc4ebd5cb Handle 'space' character correctly in MacRomanEncoding (bug 878026) 2014-04-07 20:59:26 +02:00
fkaelberer
c978c026fa clean up string conversion functions 2014-03-27 13:01:43 +01:00
Jonas Jenwald
66e243f506 Fix coding style in src/core/fonts.js 2014-03-22 16:19:07 +01:00
Brendan Dahl
10deadd416 Merge pull request #4453 from nnethercote/charToGlyph
Add a cache for glyphs
2014-03-19 16:30:02 -07:00
Brendan Dahl
1802ffffb8 Merge pull request #4447 from nnethercote/object-reduction
Allocate fewer objects
2014-03-17 12:50:23 -07:00
Brendan Dahl
68be273c69 Merge pull request #4470 from yurydelendik/packcmap
CMaps binary packing
2014-03-17 12:27:35 -07:00
Jonas Jenwald
5f021b067c Prevent infinite loop in CFFParser_parseHeader 2014-03-17 11:47:14 +01:00
Yury Delendik
69efd9cb96 CMaps binary packing 2014-03-14 16:46:35 -05:00