Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Jani Pehkonen	05c527f035	Fix glyph 0 in CIDFontType2 that has a CIDToGIDMap stream	2019-05-07 18:44:37 +03:00
Jani Pehkonen	49c6233fbc	Use CMap in Type0 fonts when CFF is not a CID font	2019-03-26 19:38:44 +02:00
Brendan Dahl	7d6ab081eb	Put the string name of the glyph in the charset array. Also, only warn once per font when missing a glyph name.	2019-03-01 18:03:51 -08:00
Brendan Dahl	34022d2fd1	Merge pull request #10591 from brendandahl/fix-charset Add unique glyph names for CFF fonts.	2019-02-28 17:22:29 -08:00
Brendan Dahl	8a596ef5d5	Add unique glyph names for CFF fonts. Printing on MacOS was broken with the previous approach of just mapping all the glyphs to notdef.	2019-02-27 15:00:29 -08:00
Jonas Jenwald	db5dc14158	Move worker-thread only functions from `src/shared/util.js` and into a new `src/core/core_utils.js` file The `src/shared/util.js` file is being bundled into both the `pdf.js` and `pdf.worker.js` files, meaning that its code is by definition duplicated. Some main-thread only utility functions have already been moved to a separate `src/display/display_utils.js` file, and this patch simply extends that concept to utility functions which are used only on the worker-thread. Note in particular the `getInheritableProperty` function, which expects a `Dict` as input and thus cannot possibly ever be used on the main-thread.	2019-02-24 00:35:39 +01:00
Jonas Jenwald	b6d090cc14	Fallback to the built-in font renderer when font loading fails After PR 9340 all glyphs are now re-mapped to a Private Use Area (PUA) which means that if a font fails to load, for whatever reason[1], all glyphs in the font will now render as Unicode glyph outlines. This obviously doesn't look good, to say the least, and might be seen as a "regression" since previously many glyphs were left in their original positions which provided a slightly better fallback[2]. Hence this patch, which implements a general fallback to the PDF.js built-in font renderer for fonts that fail to load (i.e. are rejected by the sanitizer). One caveat here is that this only works for the Font Loading API, since it's easy to handle errors in that case[3]. The solution implemented in this patch does not in any way delay the loading of valid fonts, which was the problem with my previous attempt at a solution, and will only require a bit of extra work/waiting for those fonts that actually fail to load. Please note: This patch doesn't fix any of the underlying PDF.js font conversion bugs that's responsible for creating corrupt font files, however it does improve rendering in a number of cases; refer to this possibly incomplete list: [Bug 1524888](https://bugzilla.mozilla.org/show_bug.cgi?id=1524888) Issue 10175 Issue 10232 --- [1] Usually because the PDF.js font conversion code wasn't able to parse the font file correctly. [2] Glyphs fell back to some default font, which while not accurate was more useful than the current state. [3] Furthermore I'm not sure how to implement this generally, assuming that's even possible, and don't really have time/interest to look into it either.	2019-02-11 10:27:08 +01:00
Jonas Jenwald	24a688d6c6	Convert some usage of `indexOf` to `startsWith`/`includes` where applicable In many cases in the code you don't actually care about the index itself, but rather just want to know if something exists in a String/Array or if a String starts in a particular way. With modern JavaScript functionality, it's thus possible to remove a number of existing `indexOf` cases.	2019-01-18 17:57:41 +01:00
Brendan Dahl	32eace043b	Fix reading number of HTMX metrics. The length of the HHEA table can be incorrect, so it is better to read the number of metrics offset from beginning of table instead.	2019-01-04 15:13:13 -08:00
Jonas Jenwald	ad3e937816	Replace the `Font.loading` property with, the already existing, `Font.missingFile` property The `Font.loading` property is only ever used once in the code, whereas `Font.missingFile` is more widely used. Furthermore the name `loading` feels, at least to me, slight less clear than `missingFile`. Finally, note that these two properties are the inverse of each other.	2018-09-29 15:57:04 +02:00
Brendan Dahl	b76cf665ec	Map all glyphs to the private use area and duplicate the first glyph. There have been lots of problems with trying to map glyphs to their unicode values. It's more reliable to just use the private use areas so the browser's font renderer doesn't mess with the glyphs. Using the private use area for all glyphs did highlight other issues that this patch also had to fix: * small private use area - Previously, only the BMP private use area was used which can't map many glyphs. Now, the (much bigger) PUP 16 area can also be used. * glyph zero not shown - Browsers will not use the glyph from a font if it is glyph id = 0. This issue was less prevalent when we mapped to unicode values since the fallback font would be used. However, when using the private use area, the glyph would not be drawn at all. This is illustrated in one of the current test cases (issue #8234) where there's an "ä" glyph at position zero. The PDF looked like it rendered correctly, but it was actually not using the glyph from the font. To properly show the first glyph it is always duplicated and appended to the glyphs and the maps are adjusted. * supplementary characters - The private use area PUP 16 is 4 bytes, so String.fromCodePoint must be used where we previously used String.fromCharCode. This is actually an issue that should have been fixed regardless of this patch. * charset - Freetype fails to load fonts when the charset size doesn't match number of glyphs in the font. We now write out a fake charset with the correct length. This also brought up the issue that glyphs with seac/endchar should only ever write a standard charset, but we now write a custom one. To get around this the seac analysis is permanently enabled so those glyphs are instead always drawn as two glyphs.	2018-09-05 14:04:54 -07:00
Jonas Jenwald	06d1ff5af4	Tweak the MMType1 font detection in `getFontFileType` to improve font telemetry (PR 9961 follow-up) Please note that this patch does not affect rendering in any way, however it's relevant for font telemetry[1]. According to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1904956, Type1C is a valid subtype for both Type1 and MMType1 fonts. --- [1] Refer to the font telemetry results in https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2018-06-25&keys=__none__!__none__!__none__&max_channel_version=nightly%252F62&measure=PDF_VIEWER_FONT_TYPES&min_channel_version=nightly%252F59&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2018-05-07&table=0&trim=1&use_submission_date=0 See also https://github.com/mozilla/pdf.js/wiki/Enumeration-Assignments-for-the-Telemetry-Histograms#pdf_viewer_font_types for help with interpreting the data.	2018-08-08 12:18:37 +02:00
Tim van der Meij	4111871ac5	Merge pull request #9958 from brendandahl/always-fallback Always fallback to system font on font failure.	2018-08-05 19:58:48 +02:00
Jonas Jenwald	3177f6aa55	Parse the font file to determine the correct type/subtype, rather than relying on the (often incorrect) data in the font dictionary The current font type/subtype detection code is quite inconsistent/unwieldy. In some cases it will simply assume that the font dictionary is correct, in others it will somewhat "arbitrarily" check the actual font file (more of these cases have been added over the years to fix specific bugs). As is evident from e.g. issue 9949, the font type/subtype detection code is continuing to cause issues. In an attempt to get rid of these hacks once and for all, this patch instead re-factors the type/subtype detection to always parse the font file. Please note that, as far as I can tell, we still appear to need to rely on the composite font detection based on the font dictionary. However, even if the composite/non-composite detection would get it wrong, that shouldn't really matter too much given that there's basically only two different code-paths (for "TrueType-like" vs "Type1-like" fonts).	2018-08-05 11:13:16 +02:00
Jonas Jenwald	9bbca04579	Add a (basic) `isCFFFile` helper function to detect CFF font files Compared to most other font formats, the CFF doesn't have a constant header which makes is slightly more difficult to detect such font files. Please refer to the Compact Font Format specification: https://www.adobe.com/content/dam/acom/en/devnet/font/pdfs/5176.CFF.pdf#G3.32094	2018-08-05 11:13:14 +02:00
Jonas Jenwald	f4db38aadf	Update the TrueType font file detection to also recognize the Mac specific header 'true' Please refer to the TrueType specification: https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6.html#ScalerTypeNote	2018-08-05 10:33:56 +02:00
Brendan Dahl	5f67a6a237	Always fallback to system font on font failure. The font in the PDF is marked as a CIDFontType0, but the font file is actually a true type font. To fully address this issue we should really peek into the font file and try to determine what it is. However, this is the first case of this issue, so I think this solution is acceptable for now.	2018-08-03 16:49:22 -07:00
Jonas Jenwald	17eac2d48a	Ensure that Type0, i.e. composite, OpenType fonts with `CFF` tables are not treated as CFF fonts if their glyph mapping is non-default (issue 9915) This particular code-path has been the source of numerous regressions to date, so hopefully this patch won't cause any more of those. Fixes 9915.	2018-07-29 23:06:15 +02:00
Jonas Jenwald	2b25deb84c	Prevent errors in `sanitizeTTProgram`, during parsing of CALL functions, when encountering invalid functions stack deltas (bug 1473809) I was feeling bored; so this is a very quick, and somewhat naive, attempt at fixing the bug. The breaking error, i.e. `Error during font loading: invalid array length`, was thrown when attempting to re-size the `stack` to a negative length when parsing the CALL functions. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1473809.	2018-07-10 09:45:55 +02:00
Wojciech Maj	ea2850e9a7	Fix typos	2018-04-01 23:20:41 +02:00
Jonas Jenwald	a18c65ae9f	Use the correct stream position when reading `maxSizeOfInstructions` from the `maxp` table (issue 9458) Please refer to the `maxp` table specification, found at https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6maxp.html. Fixes 9458.	2018-02-07 21:57:43 +01:00
Jonas Jenwald	712090eff8	Upstream the changes from: Bug 1339461 - Convert foo.indexOf(...) == -1 to foo.includes() and implement an eslint rule to enforce this Yet another case where PDF.js code was modified in `mozilla-central` without the changes happening in the GitHub repo first; sigh. If we don't upstream at least the changes in `extensions/firefox/`, any future update of PDF.js in `mozilla-central` will be blocked. Please see: - https://bugzilla.mozilla.org/show_bug.cgi?id=1339461 - https://hg.mozilla.org/mozilla-central/rev/d5a5ad1dbbf2	2018-02-04 14:59:27 +01:00
Jonas Jenwald	d6c028b946	Add support for TrueType Collection fonts (issue 9262) The specification can be found at https://www.microsoft.com/typography/otspec/otff.htm, under the "Font Collections" heading. Fixes 9262.	2018-01-08 22:31:08 +01:00
Tim van der Meij	6bbe91079b	Merge pull request #9272 from nveenjain/fix/8846 Replaced occurence of `throw new Error` with `unreachable`	2017-12-15 22:11:32 +01:00
Brendan Dahl	9b51cea724	Fix loca table when offsets aren't in ascending order.	2017-12-15 11:20:28 -06:00
Naveen Jain	1135674647	Replaced occurence of `throw new Error` with `unreachable` where applicable	2017-12-14 12:58:50 +05:30
Brendan Dahl	af1d80d45e	Merge pull request #9230 from Snuffleupagus/issue-9195 Add basic support for non-embedded Calibri fonts (issue 9195)	2017-12-08 10:15:43 -08:00
Jonas Jenwald	a5e3261b48	Merge pull request #9062 from mozilla/no_high Move char codes from high surrogate pair range into private use.	2017-12-08 12:31:22 +01:00
Brendan Dahl	306999c325	Move char codes from high surrogate pair range into private use. Fixes #2884	2017-12-07 10:35:50 -08:00
Jonas Jenwald	08de655177	Add basic support for non-embedded Calibri fonts (issue 9195) There's a number of issues with the fonts in the referenced PDF file. First of all, they contain broken `ToUnicode` data (`NUL` bytes all over the place). However even if you skip those, the `ToUnicode` data appears to contain nothing but a `IdentityH` CMap which won't help provide a proper glyph mapping. The real issue actually turns out to be that the PDF file uses the "Calibri" font[1], but doesn't include any font files. Since that one isn't a standard font, and uses a fairly different CID to GID map compared to the standard fonts, we're not able to render the file even remotely correct. To work around this, I'm thus proposing that we include a (incomplete) glyph map for Calibri, and fallback to the standard Helvetica font. Obviously this isn't going to look perfect, but it's really the best that we can hope to achieve given that the PDF file is missing the necessary font data. Finally, please note that none of the PDF readers I've tried (Adobe Reader, PDFium in Chrome) were able to extract the text (which isn't very surprising, given the broken `ToUnicode` data). Fixes 9195. --- [1] According to Wikipedia, see https://en.wikipedia.org/wiki/Calibri, Calibri is (primarily) a Windows font.	2017-12-03 17:23:33 +01:00
Jonas Jenwald	61e19bee43	Build a fallback `ToUnicode` map for simple fonts (issue 8229) In some fonts, the included `ToUnicode` data is incomplete causing text-selection to not work properly. For simple fonts that contain encoding data, we can manually build a `ToUnicode` map to attempt to improve things. Please note that since we're currently using the `ToUnicode` data during glyph mapping, in an attempt to avoid rendering regressions, I purposely didn't want to amend to original `ToUnicode` data for this text-selection edge-case. Instead, I opted for the current solution, which will (hopefully) give slightly better text-extraction results in PDF file with incomplete `ToUnicode` data. According to the PDF specification, see [section 9.10.2](http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1873172): > A conforming reader can use these methods, in the priority given, to map a character code to a Unicode value. > ... Reading that paragraph literally, it doesn't seem too unreasonable to use different methods for different charcodes. Fixes 8229.	2017-11-26 14:45:15 +01:00
Max Schaefer	3ae37d1b06	Remove a few useless assignments.	2017-11-03 11:36:48 +00:00
Brendan Dahl	17037b5e51	Overwrite glyphs contour count if it's less than -1. The test pdf has a contour count of -70, but OTS doesn't like values less than -1. Fixes issue #7562.	2017-10-30 09:16:51 -07:00
Brendan Dahl	18e2321845	Overwrite maxSizeOfInstructions in maxp with computed value. In issue #7507 the value is less than the actuall max size of the glyph instructions causing OTS to fail the font.	2017-09-25 17:53:26 -07:00
Jonas Jenwald	cfb4955a92	Replace the `isArray` helper function with the native `Array.isArray` function Follow-up to PR 8813.	2017-09-01 20:27:13 +02:00
Jonas Jenwald	11408da340	Replace the `isInt` helper function with the native `Number.isInteger` function Follow-up to PR 8643.	2017-09-01 16:52:50 +02:00
Brendan Dahl	0bef50d56d	Fix two cmap related issues. In issue #8707, there's a char code mapped to a non- existing glyph which shouldn't be drawn. However, we saw it was missing and tried to then use the post table and end up mapping it incorrectly. This illuminated a problem with issue #5704 and bug 893730 where glyphs disappeared after above fix. This was from the cmap returning the wrong glyph id. Which in turn was caused because the font had multiple of the same type of cmap table and we were choosing the last one. Now, we instead default to the first one. I'm unsure if we should instead be merging the multiple cmaps, but using only the first one works.	2017-08-03 22:19:36 -07:00
Jonas Jenwald	e20d4a9c21	Merge pull request #8681 from brendandahl/glyph-ids Fix several issues with glyph id mappings (issue 8668, bug 1383504)	2017-08-03 14:25:34 +02:00
Brendan Dahl	ac33358e1f	Fix several issues with glyph id mappings. The initial issue with #8255 was I added a missing glyphs check to adjustMapping, but this caused us to skip re-mapping a glyph if the fontCharCode was a missingGlyph which in turn caused us to overwrite a valid glyph id with an invalid one. While fixing this, I also added a warning if the private use area is full since this also accidentally happened when I made a different mistake. This brought to light a number of issues where we map missing glyphs to notdef, but often the notdef is actually defined and then ends up being drawn. Now the glyphs don't get mapped in toFontChar and so they are not drawn by the canvas. Fixing the above brought up another issue though in bug1050040.pdf. In this PDF, the font fails to load by the browser and before we were still drawing the glyphs because it looked like the font had them, but with the fixes above the glyphs showed up as missing so we didn't attempt draw them. To fix this, I now throw an error when the loca table is in really bad shape and we fall back to trying to use a system font. We now also use this fall back if there are any format errors during converting fonts.	2017-07-26 13:00:55 -07:00
Jonas Jenwald	814fa1dee3	Remove most `assert()` calls (issue 8506) This replaces `assert` calls with `throw new FormatError()`/`throw new Error()`. In a few places, throwing an `Error` (which is what `assert` meant) isn't correct since the enclosing function is supposed to return a `Promise`, hence some cases were changed to `Promise.reject(...)` and similarily for `createPromiseCapability` instances.	2017-07-21 18:51:02 +02:00
Yury Delendik	d028c26210	Removes error()	2017-07-07 09:40:24 -05:00
Jonas Jenwald	eff257b820	Merge pull request #8580 from brendandahl/missing-glyf Fix how we detect and handle missing glyph data.	2017-07-04 12:16:07 +02:00
Brendan Dahl	efbbd8533f	Only mask char codes of (3, 0) cmap tables in the range of 0xF000 to 0xF0FF.	2017-07-03 13:13:46 -07:00
Brendan Dahl	6d4f748fb1	Fix how we detect and handle missing glyph data.	2017-07-03 13:06:06 -07:00
Jonas Jenwald	8b4a42e5b8	Only special-case OpenType fonts with `CFF` data if it's both a composite (i.e. Type0) font and also has a non-default CID to GID map (issue 8480) As mentioned the last time that I touched this particular part of the font code, I'm sincerely hope that this doesn't cause any regressions! However, the patch passes all tests added in PRs 5770, 6270, and 7904 (and obviously all other tests as well). Furthermore, I've manually checked all the issues/bugs referenced in those PRs without finding any issues. Fixes 8480.	2017-06-09 21:15:39 +02:00
Jonas Jenwald	999e30723d	Reduce the duplication slightly when detecting an OpenType font (in the `Font` constructor)	2017-06-09 18:26:57 +02:00
Jonas Jenwald	a8c87f8019	Fix inconsistent spacing and trailing commas in objects in `src/core/` files, so we can enable the `comma-dangle` and `object-curly-spacing` ESLint rules later on Unfortunately this patch is fairly big, even though it only covers the `src/core` folder, but splitting it even further seemed difficult. http://eslint.org/docs/rules/comma-dangle http://eslint.org/docs/rules/object-curly-spacing Given that we currently have quite inconsistent object formatting, fixing this in one big patch probably wouldn't be feasible (since I cannot imagine anyone wanting to review that); hence I've opted to try and do this piecewise instead. Please note: This patch was created automatically, using the ESLint --fix command line option. In a couple of places this caused lines to become too long, and I've fixed those manually; please refer to the interdiff below for the only hand-edits in this patch. ```diff diff --git a/src/core/evaluator.js b/src/core/evaluator.js index abab9027..dcd3594b 100644 --- a/src/core/evaluator.js +++ b/src/core/evaluator.js @@ -2785,7 +2785,8 @@ var EvaluatorPreprocessor = (function EvaluatorPreprocessorClosure() { t['Tz'] = { id: OPS.setHScale, numArgs: 1, variableArgs: false, }; t['TL'] = { id: OPS.setLeading, numArgs: 1, variableArgs: false, }; t['Tf'] = { id: OPS.setFont, numArgs: 2, variableArgs: false, }; - t['Tr'] = { id: OPS.setTextRenderingMode, numArgs: 1, variableArgs: false, }; + t['Tr'] = { id: OPS.setTextRenderingMode, numArgs: 1, + variableArgs: false, }; t['Ts'] = { id: OPS.setTextRise, numArgs: 1, variableArgs: false, }; t['Td'] = { id: OPS.moveText, numArgs: 2, variableArgs: false, }; t['TD'] = { id: OPS.setLeadingMoveText, numArgs: 2, variableArgs: false, }; diff --git a/src/core/jbig2.js b/src/core/jbig2.js index 5a17d482..71671541 100644 --- a/src/core/jbig2.js +++ b/src/core/jbig2.js @@ -123,19 +123,22 @@ var Jbig2Image = (function Jbig2ImageClosure() { { x: -1, y: -1, }, { x: 0, y: -1, }, { x: 1, y: -1, }, { x: -2, y: 0, }, { x: -1, y: 0, }], [{ x: -3, y: -1, }, { x: -2, y: -1, }, { x: -1, y: -1, }, { x: 0, y: -1, }, - { x: 1, y: -1, }, { x: -4, y: 0, }, { x: -3, y: 0, }, { x: -2, y: 0, }, { x: -1, y: 0, }] + { x: 1, y: -1, }, { x: -4, y: 0, }, { x: -3, y: 0, }, { x: -2, y: 0, }, + { x: -1, y: 0, }] ]; var RefinementTemplates = [ { coding: [{ x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }], - reference: [{ x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }, { x: 0, y: 0, }, - { x: 1, y: 0, }, { x: -1, y: 1, }, { x: 0, y: 1, }, { x: 1, y: 1, }], + reference: [{ x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }, + { x: 0, y: 0, }, { x: 1, y: 0, }, { x: -1, y: 1, }, + { x: 0, y: 1, }, { x: 1, y: 1, }], }, { - coding: [{ x: -1, y: -1, }, { x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }], - reference: [{ x: 0, y: -1, }, { x: -1, y: 0, }, { x: 0, y: 0, }, { x: 1, y: 0, }, - { x: 0, y: 1, }, { x: 1, y: 1, }], + coding: [{ x: -1, y: -1, }, { x: 0, y: -1, }, { x: 1, y: -1, }, + { x: -1, y: 0, }], + reference: [{ x: 0, y: -1, }, { x: -1, y: 0, }, { x: 0, y: 0, }, + { x: 1, y: 0, }, { x: 0, y: 1, }, { x: 1, y: 1, }], } ]; ```	2017-06-02 11:20:19 +02:00
Jonas Jenwald	982b6aa65b	Convert the files in the `/src/core` folder to ES6 modules Please note that the `glyphlist.js` and `unicode.js` files are converted to CommonJS modules instead, since Babel cannot handle files that large and they are thus excluded from transpilation.	2017-05-30 22:06:21 +02:00
Jonas Jenwald	4ce5e520fb	Add different code-paths to `{CMap, ToUnicodeMap}.charCodeOf` depending on length, since `Array.prototype.indexOf` can be extremely inefficient for very large arrays (issue 8372) Fixes 8372.	2017-05-24 19:47:04 +02:00
Jonas Jenwald	31c24ed631	Don't map glyphs to the HANGUL FILLER (0x3164) Unicode location (issue 8424) This patch follows a similar pattern as previous ones, by skipping certain problematic Unicode locations. According to http://searchfox.org/mozilla-central/rev/6c2dbacbba1d58b8679cee700fd0a54189e0cf1b/gfx/harfbuzz/src/hb-unicode-private.hh#136, it seems that the HANGUL FILLER (0x3164) location is "special". Fixes 8424.	2017-05-23 16:12:45 +02:00

1 2 3 4 5 ...