pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	ecbcde7ff3	Fail early, in modern `GENERIC` builds, if certain required browser functionality is missing (PR 11771 follow-up) With two kind of builds now being produced, with/without translation/polyfills, it's unfortunately somewhat easy for users to accidentally pick the wrong one. In the case where a user would attempt to use a modern build of PDF.js in an older browser, such as e.g. IE11, the failure would be immediate when the code is loaded (given the use of unsupported ECMAScript features). However in some browsers/environments, a modern PDF.js build may load correctly and thus appear to function, only to fail for e.g. certain API calls. To hopefully lessen the support burden, and to try and improve things overall, this patch adds additional checks to ensure that a modern build of PDF.js cannot be used in browsers/environments which lack native support for `Promise.allSettled`.[1] Hence we'll fail early, with an error message telling users to pick an ES5-compatible build instead. Please note: While it's probably too early to tell if this will be a widespread issue, it's possible that this is the sort of patch that may warrant being `git cherry-pick`ed onto the current beta version (v2.4.456). --- [1] This was a fairly recent addition to the web platform, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/allSettled#Browser_compatibility	2020-04-11 13:42:03 +02:00
Tim van der Meij	70c54ab9d9	Merge pull request #11746 from Snuffleupagus/issue-11740 Create the glyph mapping correctly for composite Type1, i.e. CIDFontType0, fonts (issue 11740)	2020-04-07 00:10:12 +02:00
Jonas Jenwald	2d46230d23	[api-minor] Change `Font.exportData` to, by default, stop exporting properties which are completely unused on the main-thread and/or in the API (PR 11773 follow-up) For years now, the `Font.exportData` method has (because of its previous implementation) been exporting many properties despite them being completely unused on the main-thread and/or in the API. This is unfortunate, since among those properties there's a number of potentially very large data-structures, containing e.g. Arrays and Objects, which thus have to be first structured cloned and then stored on the main-thread. With the changes in this patch, we'll thus by default save memory for every `Font` instance created (there can be a lot in longer documents). The memory savings obviously depends a lot on the actual font data, but some approximate figures are: For non-embedded fonts it can save a couple of kilobytes, for simple embedded fonts a handful of kilobytes, and for composite fonts the size of this auxiliary can even be larger than the actual font program itself. All-in-all, there's no good reason to keep exporting these properties by default when they're unused. However, since we cannot be sure that every property is unused in custom implementations of the PDF.js library, this patch adds a new `getDocument` option (named `fontExtraProperties`) that still allows access to the following properties: - "cMap": An internal data structure, only used with composite fonts and never really intended to be exposed on the main-thread and/or in the API. Note also that the `CMap`/`IdentityCMap` classes are a lot more complex than simple Objects, but only their "internal" properties survive the structured cloning used to send data to the main-thread. Given that CMaps can often be very large, not exporting them can also save a fair bit of memory. - "defaultEncoding": An internal property used with simple fonts, and used when building the glyph mapping on the worker-thread. Considering how complex that topic is, and given that not all font types are handled identically, exposing this on the main-thread and/or in the API most likely isn't useful. - "differences": An internal property used with simple fonts, and used when building the glyph mapping on the worker-thread. Considering how complex that topic is, and given that not all font types are handled identically, exposing this on the main-thread and/or in the API most likely isn't useful. - "isSymbolicFont": An internal property, used during font parsing and building of the glyph mapping on the worker-thread. - "seacMap": An internal map, only potentially used with some Type1/CFF fonts and never intended to be exposed in the API. The existing `Font.{charToGlyph, charToGlyphs}` functionality already takes this data into account when handling text. - "toFontChar": The glyph map, necessary for mapping characters to glyphs in the font, which is built upon the various encoding information contained in the font dictionary and/or font program. This is not directly used on the main-thread and/or in the API. - "toUnicode": The unicode map, necessary for text-extraction to work correctly, which is built upon the ToUnicode/CMap information contained in the font dictionary, but not directly used on the main-thread and/or in the API. - "vmetrics": An array of width data used with fonts which are composite and vertical, but not directly used on the main-thread and/or in the API. - "widths": An array of width data used with most fonts, but not directly used on the main-thread and/or in the API.	2020-04-06 11:47:09 +02:00
Jonas Jenwald	8770ca3014	Make the `decryptAscii` helper function, in `src/core/type1_parser.js`, slightly more efficient By slicing the Uint8Array directly, rather than using the prototype and a `call` invocation, the runtime of `decryptAscii` is decreased slightly (~30% based on quick logging). The `decryptAscii` function is still less efficient than `decrypt`, however ASCII encoded Type1 font programs are sufficiently rare that it probably doesn't matter much (we've only seen two examples, issue 4630 and 11740).	2020-04-06 11:21:02 +02:00
Jonas Jenwald	938d519192	Create the glyph mapping correctly for composite Type1, i.e. CIDFontType0, fonts (issue 11740) This updates `Type1Font.getGlyphMapping` with a code-path "borrowed" from `CFFFont.getGlyphMapping`.	2020-04-06 11:21:02 +02:00
Jonas Jenwald	6a8c591301	Improve detection of binary/ASCII `eexec` encrypted Type1 font programs in `Type1Parser` (issue 11740) The PDF document, in the referenced issue, actually contains ASCII-encoded Type1 data which we currently incorrectly identify as binary. According to the specification, see https://www-cdf.fnal.gov/offline/PostScript/T1_SPEC.PDF#[{%22num%22%3A203%2C%22gen%22%3A0}%2C{%22name%22%3A%22XYZ%22}%2C87%2C452%2Cnull], the current checks are insufficient to decide between binary/ASCII encoded Type1 font programs.	2020-04-06 11:21:02 +02:00
Jonas Jenwald	2619272d73	Change the signature of `TranslatedFont`, and convert it to a proper class In preparation for the next patch, this changes the signature of `TranslatedFont` to take an object rather than individual parameters. This also, in my opinion, makes the call-sites easier to read since it essentially provides a small bit of documentation of the arguments. Finally, since it was necessary to touch `TranslatedFont` anyway it seemed like a good idea to also convert it to a proper `class`.	2020-04-05 20:53:48 +02:00
Tim van der Meij	0400109b87	Merge pull request #11773 from Snuffleupagus/Font-exportData-1 [api-minor] Change `Font.exportData` to use an explicit white-list of exportable properties, and stop exporting internal/unused properties	2020-04-05 20:50:33 +02:00
Jonas Jenwald	59f54b946d	Ensure that all `Font` instances have the `vertical` property set to a boolean Given that the `vertical` property is always accessed on the main-thread, ensuring that the property is explicitly defined seems like the correct thing to do since it also avoids boolean casting elsewhere in the code-base.	2020-04-05 16:27:50 +02:00
Jonas Jenwald	c5e1fd3fde	Use "standard" shadowing in the `Font.spaceWidth` method With `Font.exportData` now only exporting white-listed properties, there should no longer be any reason to not use standard shadowing in the `Font.spaceWidth` method. Furthermore, considering the amount of other changes to the code-base over the years it's not even clear to me that the special-case was necessary any more (regardless of the preceding patches).	2020-04-05 16:27:50 +02:00
Jonas Jenwald	a5e4cccf13	[api-minor] Prevent `Font.exportData` from exporting internal/unused properties A number of internal font properties, which only make sense on the worker-thread, were previously exported. Some of these properties could also contain potentially large Arrays/Objects, which thus unnecessarily increases memory usage since we're forced to copy these to the main-thread and also store them there. This patch stops exporting the following font properties: - "_shadowWidth": An internal property, which was never intended to be exported. - "charsCache": An internal cache, which was never intended to be exported and doesn't make any sense on the main-thread. Furthermore, by the time `Font.exportData` is called it's usually `undefined` or a mostly empty Object as well. - "cidEncoding": An internal property used with (some) composite fonts. As can be seen in the `PartialEvaluator.translateFont` method, `cidEncoding` will only be assigned a value when the font dictionary has an "Encoding" entry which is a `Name` (and not in the `Stream` case, since those obviously cannot be cloned). All-in-all this property doesn't really make sense on the main-thread and/or in the API, and note also that the resulting `cMap` property is (partially) available already. - "fallbackToUnicode": An internal map, part of the heuristics used to improve text-selection in (some) badly generated PDF documents with simple fonts. This was never intended to be exposed on the main-thread and/or in the API. - "glyphCache": An internal cache, which was never intended to be exported and which doesn't make any sense on the main-thread. Furthermore, by the time `Font.exportData` is called it's usually a mostly empty Object as well. - "isOpenType": An internal property, used only during font parsing on the worker-thread. In the very unlikely event that an API consumer actually needs that information, then `fontType` should be a (generally) much better property to use. Finally, in the (hopefully) unlikely event that any of these properties become necessary on the main-thread, re-adding them to the white-list is easy to do.	2020-04-05 16:27:50 +02:00
Jonas Jenwald	664f7de540	Change `Font.exportData` to use an explicit white-list of exportable properties This patch addresses an existing, and very long standing, TODO in the code such that it's no longer possible to send arbitrary/unnecessary font properties to the main-thread. Furthermore, by having a white-list it's also very easy to see exactly which font properties are being exported. Please note that in its current form, the list of exported properties contains every possible enumerable property that may exist in a `Font` instance. In practice no single font will contain all of these properties, and e.g. embedded/non-embedded/Type3 fonts will all differ slightly with respect to what properties are being defined. Hence why only explicitly set properties are included in the exported data, to avoid half of them being `undefined`, which however should not be a problem for any existing consumer (since they'd already need to handle those cases). Since a fair number of these font properties are completely internal functionality, and doesn't make any sense to expose on the main-thread and/or in the API, follow-up patch(es) will be required to trim down the list. (I purposely included all properties here for brevity and future documentation purposes.)	2020-04-05 16:27:48 +02:00
Jonas Jenwald	87142a635e	Ensure that `Font.charToGlyph` won't fail because `String.fromCodePoint` is given an invalid code point (issue 11768) Please note: This patch on its own is not sufficient to address the underlying problem in the referenced issue, hence why no test-case is included since the actual bug still needs to be fixed. As can be seen in the specification, https://tc39.es/ecma262/#sec-string.fromcodepoint, `String.fromCodePoint` will throw a RangeError for invalid code points. In the event that a CMap, in a composite font, contains invalid data and/or we fail to parse it correctly, it's thus possible that the glyph mapping that we build end up with entires that cause `String.fromCodePoint` to throw and thus `Font.charToGlyph` to break. If that happens, as is the case in issue 11768, significant portions of a page/document may fail to render which seems very unfortunate. While this patch doesn't fix the underlying problem, it's hopefully deemed useful not only for the referenced issue but also to prevent similar bugs in the future.	2020-04-03 09:49:50 +02:00
Jonas Jenwald	710704508c	Fail early, in modern `GENERIC` builds, if certain required browser functionality is missing (issue 11762) With two kind of builds now being produced, with/without translation/polyfills, it's unfortunately somewhat easy for users to accidentally pick the wrong one. In the case where a user would attempt to use a modern build of PDF.js in an older browser, such as e.g. IE11, the failure would be immediate when the code is loaded (given the use of unsupported ECMAScript features). However in some browsers/environments, in particular Node.js, a modern PDF.js build may load correctly and thus appear to function, only to fail for e.g. certain API calls. To hopefully lessen the support burden, and to try and improve things overall, this patch adds checks to ensure that a modern build of PDF.js cannot be used in browsers/environments which lack native support for critical functionality (such as e.g. `ReadableStream`). Hence we'll fail early, with an error message telling users to pick an ES5-compatible build instead. To ensure that we actually test things better especially w.r.t. usage of the PDF.js library in Node.js environments, the `gulp npm-test` task as used by Node.js/Travis was changed (back) to test an ES5-compatible build. (Since the bots still test the code as-is, without transpilation/polyfills, this shouldn't really be a problem as far as I can tell.) As part of these changes there's now both `gulp lib` and `gulp lib-es5` build targets, similar to e.g. the generic builds, which thanks to some re-factoring only required adding a small amount of code. Please note: While it's probably too early to tell if this will be a widespread issue, it's possible that this is the sort of patch that may warrant being `git cherry-pick`ed onto the current beta version (v2.4.456).	2020-04-01 19:42:48 +02:00
Jonas Jenwald	14c999e3ee	Remove the unused `sizes` and `encoding` properties on `Font` instances The `sizes` property doesn't appear to have been used ever since the code was first split into main/worker-threads, which is so many years ago that I wasn't able to easily find exactly in which PR/commit it became unused. The `encoding` property is always assigned the `properties.baseEncoding` value, however the `PartialEvaluator` doesn't actually compute/set that value any more. Again it was difficult to determine when it became unused, but it's been that way for years.	2020-03-27 10:12:01 +01:00
Jonas Jenwald	dcb16af968	Whitelist closure related cases to address the remaining `no-shadow` linting errors Given the way that "classes" were previously implemented in PDF.js, using regular functions and closures, there's a fair number of false positives when the `no-shadow` ESLint rule was enabled. Note that while some of these `eslint-disable` statements can be removed if/when the relevant code is converted to proper `class`es, we'll probably never be able to get rid of all of them given our naming/coding conventions (however I don't really see this being a problem).	2020-03-25 11:57:12 +01:00
Jonas Jenwald	1d2f787d6a	Enable the ESLint `no-shadow` rule This rule is not currently enabled in mozilla-central, but it appears commented out[1] in the ESLint definition file; see https://searchfox.org/mozilla-central/rev/c80fa7258c935223fe319c5345b58eae85d4c6ae/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#238-239 Unfortunately this rule is, for fairly obvious reasons, impossible to `--fix` automatically (even partially) and each case thus required careful manual analysis. Hence this ESLint rule is, by some margin, probably the most difficult one that we've enabled thus far. However, using this rule does seem like a good idea in general since allowing variable shadowing could lead to subtle (and difficult to find) bugs or at the very least confusing code. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-shadow --- [1] Most likely, a very large number of lint errors have prevented this rule from being enabled thus far.	2020-03-25 11:56:05 +01:00
Tim van der Meij	475fa1f97f	Merge pull request #11744 from janpe2/cff-glyph-zero The first glyph in CFF CIDFonts must be named 0 instead of ".notdef"	2020-03-24 23:52:21 +01:00
Tim van der Meij	292b77fe7b	Merge pull request #11707 from Snuffleupagus/issue-11694 Always prefer the PDF.js JPEG decoder for very large images, in order to reduce peak memory usage (issue 11694)	2020-03-24 23:51:31 +01:00
Jani Pehkonen	a22c0eab48	The first glyph in CFF CIDFonts must be named 0 instead of ".notdef" Fixes #11718 in which the `ff` ligature glyph is at index zero in a CFF font. Beacuse this is a CIDFont, glyph names are CIDs, which are integers. Thus the string `".notdef"` is not correct. The rest of the charset data is already parsed correctly as integers when the boolean argument `cid` is true.	2020-03-24 15:56:50 +02:00
Jonas Jenwald	216cbca16c	Remove variable shadowing from the JavaScript files in the `src/core/` folder This is part of a series of patches that will try to split PR 11566 into smaller chunks, to make reviewing more feasible. Once all the code has been fixed, we'll be able to eventually enable the ESLint no-shadow rule; see https://eslint.org/docs/rules/no-shadow	2020-03-23 18:28:30 +01:00
Tim van der Meij	6ecc9fae1c	Merge pull request #11720 from Snuffleupagus/eslint-no-unsanitized Update the `eslint-plugin-no-unsanitized` package to the latest version	2020-03-20 21:04:24 +01:00
Jonas Jenwald	62a9c26cda	Always prefer the PDF.js JPEG decoder for very large images, in order to reduce peak memory usage (issue 11694) When JPEG images are decoded by the browser, on the main-thread, there's a handful of short-lived copies of the image data; see `c3f4690bde/src/display/api.js (L2364-L2408)` That code thus becomes quite problematic for very big JPEG images, since it increases peak memory usage a lot during decoding. In the referenced issue there's a couple of JPEG images whose dimensions are `10006 x 7088` (i.e. ~68 mega-pixels), which causes the peak memory usage to increase by close to `1 GB` (i.e. one giga-byte) in my testing. By letting the PDF.js JPEG decoder, rather than the browser, handle very large images the peak memory usage is considerably reduced and the allocated memory also seem to be reclaimed faster. Please note: This will lead to movement in some existing `eq` tests.	2020-03-20 16:37:19 +01:00
Jonas Jenwald	b02be3b268	Update the `eslint-plugin-no-unsanitized` package to the latest version	2020-03-20 11:25:39 +01:00
Jonas Jenwald	1cd9d5a8fd	Remove the unused `wideChars` property on `Font` instances This property was added in PR 1599 (almost eight years ago), but has been unused ever since PR 3674 (six and a half years ago).	2020-03-20 10:37:32 +01:00
Jonas Jenwald	e011be037e	Enable the `prefer-exponentiation-operator` ESLint rule Please see https://eslint.org/docs/rules/prefer-exponentiation-operator for additional information.	2020-03-19 12:41:25 +01:00
Tim van der Meij	1bc5cef2b5	Merge pull request #11698 from Snuffleupagus/issue-11697 Don't accidentally accept invalid glyphNames which appear to follow the Cdd{d}/cdd{d} format in `PartialEvaluator._buildSimpleFontToUnicode` (issue 11697)	2020-03-15 13:36:09 +01:00
Tim van der Meij	aa3e5a2b8f	Merge pull request #11644 from Snuffleupagus/openAction [api-minor] Add more general OpenAction support (PR 10334 follow-up, issue 11642)	2020-03-15 13:16:37 +01:00
Jonas Jenwald	15e8692eff	Don't accidentally accept invalid glyphNames which appear to follow the Cdd{d}/cdd{d} format in `PartialEvaluator._buildSimpleFontToUnicode` (issue 11697) The /Differences array of the problematic font contains a `/c.1` entry, which is consequently detected as a possible Cdd{d}/cdd{d} glyphName by the existing heuristics. Because of how the base 10 conversion is implemented, which is necessary for the base 16 special case, the parsed charCode becomes `0.1` thus causing `String.fromCodePoint` to throw since that obviously isn't a valid code point. To fix the referenced issue, and to hopefully prevent similar ones in the future, the patch adds additional validation of the charCode found by the heuristics.	2020-03-13 23:35:47 +01:00
Jonas Jenwald	c5f67300e9	Rename the `isSpace` helper function to `isWhiteSpace` Trying to enable the ESLint rule `no-shadow`, against the `master` branch, would result in a fair number of errors in the `Glyph` class in `src/core/fonts.js`. Since the glyphs are exposed through the API, we can't very well change the `isSpace` property on `Glyph` instances. Thus the best approach seems, at least to me, to simply rename the `isSpace` helper function to `isWhiteSpace` which shouldn't cause any issues given that it's only used in the `src/core/` folder.	2020-03-12 11:36:59 +01:00
Jonas Jenwald	e4758beaaa	Move `IsLittleEndianCached` and `IsEvalSupportedCached` to `src/shared/util.js` Rather than duplicating the lookup and caching in multiple files, it seems easier to simply move all of this functionality into `src/shared/util.js` instead. This will also help avoid a bunch of ESLint errors once the `no-shadow` rule is eventually enabled.	2020-03-12 11:36:26 +01:00
Jonas Jenwald	3adbba55b2	Limit the number of warning messages printed by any one `Lexer.getHexString` invocation This patch fixes something that's annoyed me every now and then over the years, when debugging/fixing corrupt PDF documents. For corrupt PDF documents where `Lexer.getHexString` encounters invalid characters, there's very rarely just a handful of them. In practice it's not uncommon for there to be many hundreds, or even many thousands, invalid hex characters found. Not only is the resulting console warning spam utterly useless in these cases, there's often enough of it that performance may even suffer; hence this patch which limits the amount of messages that any one `Lexer.getHexString` invocation may print.	2020-03-09 13:34:53 +01:00
Jonas Jenwald	65e6ea2cb2	Prevent lookup errors in `PartialEvaluator.hasBlendModes` from breaking all parsing/rendering of a page (issue 11678) The PDF document in question is corrupt, since it contains an XObject with a truncated dictionary and where the stream contents start without a "stream" operator.	2020-03-09 12:00:12 +01:00
Tim van der Meij	1a97c142b3	Merge pull request #11523 from Snuffleupagus/issue-10880 Add a heuristic, in `src/core/jpg.js`, to handle JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10880)	2020-03-06 23:03:09 +01:00
Jonas Jenwald	160cfc4084	Slightly simplify the lookup of data in `Dict.{get, getAsync, has}` Note that `Dict.set` will only be called with values returned through `Parser.getObj`, and thus indirectly via `Lexer.getObj`. Since neither of those methods will ever return `undefined`, we can simply assert that that's the case when inserting data into the `Dict` and thus get rid of `in` checks when doing the data lookups. In this case, since `Dict.set` is fairly hot, the patch utilizes an inline check and when necessary a direct call to `unreachable` to not affect performance of `gulp server/test` too much (rather than always just calling `assert`). For very large and complex PDF files this will help performance slightly, since `Dict.{get, getAsync, has}` is called a lot during parsing in the worker. This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file: ``` [ { "id": "issue2618", "file": "../web/pdfs/issue2618.pdf", "md5": "", "rounds": 250, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 250 \| 2838 \| 2820 \| -18 \| -0.65 \| faster Firefox \| Page Request \| 250 \| 1 \| 2 \| 0 \| 11.92 \| slower Firefox \| Rendering \| 250 \| 2837 \| 2818 \| -19 \| -0.65 \| faster ```	2020-03-06 14:12:14 +01:00
Jonas Jenwald	01fb309a2a	[api-minor] Add more general OpenAction support (PR 10334 follow-up, issue 11642) This patch deprecates the existing `getOpenActionDestination` API method, in favor of a better and more general `getOpenAction` method instead. (For now JavaScript actions, related to printing, are still handled as before.) By clearly separating "regular" Print actions from the JavaScript handling, it's thus possible to get rid of the somewhat annoying and strictly incorrect warning when the viewer loads.	2020-03-06 13:03:00 +01:00
Tim van der Meij	c95b9b1e17	Merge pull request #11653 from Snuffleupagus/ensureStateFont Ensure that there's always a setFont (Tf) operator before text rendering operators (issue 11651)	2020-03-03 23:33:13 +01:00
Jani Pehkonen	71e7686950	Fix Type1 font parsing when .notdef is not at index zero Fixes #11477 The PDF draws many space characters but the embedded fonts don't have a glyph named `space`, so `.notdef` should be drawn instead. PDF.js assumed that Type1 fonts define `.notdef` as the first glyph (index 0). However, now the fonts have the glyph `A` at index 0 and `.notdef` is the last one, so `A` appears where spaces are expected. Because the rest of the font machinery in `core/fonts.js` assumes `.notdef` is at index zero, it's easiest to modify `core/type1_parser.js` so that it "repairs" fonts and makes sure `.notdef` is at index 0.	2020-03-03 21:55:51 +02:00
Jonas Jenwald	65e514e063	Ensure that there's always a setFont (Tf) operator before text rendering operators (issue 11651) The PDF document in question is corrupt, since it contains multiple instances of incorrect operators. We obviously don't want to slow down parsing of all documents (since most are valid), just to accommodate a particular bad PDF generator, hence the reason for the inline check before calling the `ensureStateFont` method.	2020-03-03 10:05:18 +01:00
Tim van der Meij	e1586016c5	Merge pull request #11577 from Snuffleupagus/Pages-tree-refs Prevent circular references in the /Pages tree	2020-02-27 23:36:11 +01:00
Jonas Jenwald	c55d30a715	Use the same non-embedded Wingdings fallback for fonts named "Wingdings-Regular" too (PR 5463 follow-up, issue 11451) This patch extends the existing heuristics, which are really the best that we can do in general for these kinds of non-embedded and non-standard fonts. Furthermore, this patch also tries to improve the copy-and-paste behaviour for non-embedded Wingdings fonts by also using the `ZapfDingbatsEncoding` in this case. Note: I'm not sure that adding additional tests for Wingdings fonts matters that much, given how limited our "support" for them really is.	2020-02-24 17:40:06 +01:00
Jonas Jenwald	bf09d79eea	Use the ESLint `no-restricted-syntax` rule to prevent direct usage of `new Cmd()`/`new Name()`/`new Ref()` Given that all of these primitives implement caching, to avoid unnecessarily duplicating those objects a lot during parsing, it would thus be good to actually enforce usage of `Cmd.get()`/`Name.get()`/`Ref.get()` in the code-base. Luckily it turns out that there's an ESLint rule, which is fairly easy to use, that can be used to disallow arbitrary JavaScript syntax. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-restricted-syntax	2020-02-22 21:15:00 +01:00
Jonas Jenwald	c3c3b8cd81	Add a heuristic, in `src/core/jpg.js`, to handle JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10880) This whole patch feels somewhat arbitrary, and I'd be slightly worried about possibly breaking something else. To limit the impact of these changes, we only re-parse JPEG images using a reduced `scanLines` value if and only if: An unexpected EOI (End of Image) marker was encountered during decoding of Scan data and the "actual" `scanLines` value is at least one order of magnitude smaller than expected.	2020-02-22 14:16:07 +01:00
Jonas Jenwald	5494f7d5bc	Add basic validation of the `scanLines` parameter in JPEG images, before delegating decoding to the browser In some cases PDF documents can contain JPEG images that the native browser decoder cannot handle, e.g. images with DNL (Define Number of Lines) markers or images where the SOF (Start of Frame) marker contains a wildly incorrect `scanLines` parameter. Currently, for "simple" JPEG images, we're relying on native image decoding to fail before falling back to the implementation in `src/core/jpg.js`. In some cases, note e.g. issue 10880, the native image decoder doesn't outright fail and thus some images may not render. In an attempt to improve the current situation, this patch adds additional validation of the JPEG image SOF data to force the use of `src/core/jpg.js` directly in cases where the native JPEG decoder cannot be trusted to do the right thing. The only way to implement this is unfortunately to parse the beginning of the JPEG image data, looking for a SOF marker. To limit the impact of this extra parsing, the result is cached on the `JpegStream` instance and this code is only run for images which passed all of the pre-existing "can the JPEG image be natively rendered and/or decoded" checks. --- Slightly off-topic: Working on this really makes me start questioning if native rendering/decoding of JPEG images is actually a good idea. There's certain kinds of JPEG images not supported natively, and all of the validation which is now necessary isn't "free". At this point, in the `NativeImageDecoder`, we're having to check for certain properties in the image dictionary, parse the `ColorSpace`, and finally read the actual image data to find the SOF marker. Furthermore, we cannot just send the image to the main-thread and be done in the "JpegStream" case, but we also need to wait for rendering to complete (or fail) before continuing with other parsing. In the "JpegDecode" case we're even having to parse part of the image on the main-thread, which seems completely at odds with the principle of doing all heavy parsing in the Worker, and there's also a couple of potentially large (temporary) allocations/copies of TypedArray data involved as well.	2020-02-22 14:16:07 +01:00
Jonas Jenwald	6b44ae2170	Remove the unused `thisArg` from `RefSetCache.forEach` Given that this is completely unused, and that a "normal" function call may be a tiny bit more efficient, there's no good reason as far as I can tell to keep it.	2020-02-21 14:23:05 +01:00
Jonas Jenwald	3c7b7be100	Prevent circular references in the /Pages tree	2020-02-19 01:49:39 +01:00
Jonas Jenwald	ae5a34c520	[api-minor] Ensure that the `Array.prototype` doesn't contain any enumerable properties Over the years there's been a fair number of issues/PRs opened, where people have wanted to add `hasOwnProperty` checks in (hot) loops in the font parsing code. This has always been rejected, since we don't want to risk reducing performance in the Firefox PDF viewer simply because some users of the general PDF.js library are incorrectly extending the `Array.prototype` with enumerable properties. With this patch the general PDF.js library will now fail immediately with a hopefully useful Error message, rather than having (some) fonts fail to render, when the `Array.prototype` is incorrectly extended. Note that I did consider making this a warning, but ultimately decided against it since it's first of all possible to disable those (with the `verbosity` parameter). Secondly, even when printed, warnings can be easy to overlook and finally a warning may also seem OK to ignore (as opposed to an actual Error).	2020-02-10 14:17:27 +01:00
Tim van der Meij	dced0a3821	Merge pull request #11579 from Snuffleupagus/issue-11578 Ignore spaces when normalizing the font name in `Font.fallbackToSystemFont` (issue 11578)	2020-02-09 17:33:09 +01:00
Tim van der Meij	61056a9238	Merge pull request #11551 from Snuffleupagus/issue-11549 Allow skipping of errors when reading broken/corrupt ToUnicode data (issue 11549)	2020-02-09 17:32:35 +01:00
Tim van der Meij	2fb4076e05	Merge pull request #11568 from Snuffleupagus/PDF-header-validation Ensure that the PDF header contains an actual number (PR 11463 follow-up)	2020-02-09 17:16:25 +01:00

1 2 3 4 5 ...

1646 Commits