pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	ae5a34c520	[api-minor] Ensure that the `Array.prototype` doesn't contain any enumerable properties Over the years there's been a fair number of issues/PRs opened, where people have wanted to add `hasOwnProperty` checks in (hot) loops in the font parsing code. This has always been rejected, since we don't want to risk reducing performance in the Firefox PDF viewer simply because some users of the general PDF.js library are incorrectly extending the `Array.prototype` with enumerable properties. With this patch the general PDF.js library will now fail immediately with a hopefully useful Error message, rather than having (some) fonts fail to render, when the `Array.prototype` is incorrectly extended. Note that I did consider making this a warning, but ultimately decided against it since it's first of all possible to disable those (with the `verbosity` parameter). Secondly, even when printed, warnings can be easy to overlook and finally a warning may also seem OK to ignore (as opposed to an actual Error).	2020-02-10 14:17:27 +01:00
Tim van der Meij	dced0a3821	Merge pull request #11579 from Snuffleupagus/issue-11578 Ignore spaces when normalizing the font name in `Font.fallbackToSystemFont` (issue 11578)	2020-02-09 17:33:09 +01:00
Tim van der Meij	61056a9238	Merge pull request #11551 from Snuffleupagus/issue-11549 Allow skipping of errors when reading broken/corrupt ToUnicode data (issue 11549)	2020-02-09 17:32:35 +01:00
Tim van der Meij	2fb4076e05	Merge pull request #11568 from Snuffleupagus/PDF-header-validation Ensure that the PDF header contains an actual number (PR 11463 follow-up)	2020-02-09 17:16:25 +01:00
Tim van der Meij	102af0f915	Merge pull request #11547 from Snuffleupagus/convertCmykToRgb-scale Use fewer multiplications in `JpegImage._convertCmykToRgb`	2020-02-09 17:06:23 +01:00
Tim van der Meij	f178805412	Merge pull request #11557 from Snuffleupagus/_getLinearizedBlockData-xScaleBlockOffset Avoid re-calculating the `xScaleBlockOffset` when not necessary in `JpegImage._getLinearizedBlockData`	2020-02-09 16:54:28 +01:00
Jonas Jenwald	7937165537	Ignore spaces when normalizing the font name in `Font.fallbackToSystemFont` (issue 11578)	2020-02-08 19:59:04 +01:00
Jonas Jenwald	88c35d872f	Ensure that the PDF header contains an actual number (PR 11463 follow-up) While it would be nice to change the `PDFFormatVersion` property, as returned through `PDFDocumentProxy.getMetadata`, to a number (rather than a string) that would unfortunately be a breaking API change. However, it does seem like a good idea to at least validate the PDF header version on the worker-thread, rather than potentially returning an arbitrary string.	2020-02-07 12:25:07 +01:00
Brendan Dahl	09a6e17d22	Merge pull request #11528 from janpe2/type1-nonemb-notdef Hide .notdef glyphs in non-embedded Type1 fonts and don't ignore Widths	2020-02-06 13:30:07 -08:00
Jonas Jenwald	a4440a1c6b	Avoid re-calculating the `xScaleBlockOffset` when not necessary in `JpegImage._getLinearizedBlockData` As can be seen in the code, the `xScaleBlockOffset` typed array doesn't depend on the actual image data but only on the width and x-scale. The width is obviously consistent for an image, and it turns out that in practice the `componentScaleX` is quite often identical between two (or more) adjacent image components. All-in-all it's thus not necessary to unconditionally re-compute the `xScaleBlockOffset` when getting the JPEG image data. While avoiding, in many cases, one or more loops can never be a bad thing these changes are unfortunately completely dominated by the rest of the JpegImage code and consequently doesn't really show up in benchmark results. Hence I'd understand if this patch is ultimately deemed not necessary.	2020-02-01 11:58:50 +01:00
Jonas Jenwald	4c54395ff6	Allow skipping of errors when reading broken/corrupt ToUnicode data (issue 11549) This will allow font loading/parsing to continue, rather than immediately failing, when broken/corrupt CMap data is encountered.	2020-01-30 13:19:05 +01:00
Jonas Jenwald	ce4f41d06a	Use fewer multiplications in `JpegImage._convertCmykToRgb` Note: This is inspired by PR 5473, which made similar changes for another kind of JPEG data. Since the implementation in `src/core/jpg.js` only supports 8-bit data, as opposed to similar code in `src/core/colorspace.js`, the computations can be further simplified since the `scale` is always constant. By updating the coefficients, effectively inlining the `scale`, we'll thus avoid four multiplications for each loop iteration. Unfortunately I wasn't able, based on a quick look through the test-files, to find a sufficiently large CMYK JPEG image in order for these changes to really show up in benchmark results. However, when testing the `cmykjpeg.pdf` manually there's a total of `120 000` fewer multiplication with this patch.	2020-01-29 18:34:58 +01:00
Jonas Jenwald	f5a617a334	Make the `decodeHuffman` function, in `src/core/jpg.js`, slightly more efficient Rather than repeating the `typeof node` check twice, we can use a `switch` statement instead. This patch was tested using the PDF file from issue 3809, i.e. https://web.archive.org/web/20140801150504/http://vs.twonky.dk/invitation.pdf, with the following manifest file: ``` [ { "id": "issue3809", "file": "../web/pdfs/issue3809.pdf", "md5": "", "rounds": 50, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 50 \| 12537 \| 12451 \| -86 \| -0.69 \| faster Firefox \| Page Request \| 50 \| 5 \| 5 \| 0 \| 0.77 \| Firefox \| Rendering \| 50 \| 12532 \| 12446 \| -86 \| -0.69 \| faster ```	2020-01-28 14:23:58 +01:00
Jonas Jenwald	13930e5202	Simplify the handling of unsupported/incorrect markers in `src/core/jpg.js` - Re-factor the "incorrect encoding" check, since this can be easily achieved using the general `findNextFileMarker` helper function (with a suitable `startPos` argument). - Tweak a condition, to make it easier to see that the end of the data has been reached. - Add a reference test for issue 1877, since it's what prompted the "incorrect encoding" check.	2020-01-25 22:52:24 +01:00
Tim van der Meij	3775b711ed	Merge pull request #11482 from Snuffleupagus/more-core-utils Convert `src/core/jpg.js` to use the `readUint16` helper function in `src/core/core_utils.js`, rather than re-implementing it twice	2020-01-25 21:38:34 +01:00
Tim van der Meij	cbbda9d883	Merge pull request #11515 from Snuffleupagus/cache-fallback-font Cache the fallback font dictionary on the `PartialEvaluator` (PR 11218 follow-up)	2020-01-25 21:32:28 +01:00
Jonas Jenwald	188b320e18	Convert `src/core/jpg.js` to use the `readUint16` helper function in `src/core/core_utils.js`, rather than re-implementing it twice The other image decoders, i.e. the JBIG2 and JPEG 2000 ones, are using the common helper function `readUint16`. Most likely, the only reason that the JPEG decoder is doing it this way is because it originated outside of the PDF.js library. Hence we can simply re-factor `src/core/jpg.js` to use the common `readUint16` helper function, which is especially nice given that the functionality was essentially duplicated in the code.	2020-01-25 00:35:10 +01:00
Jonas Jenwald	3f031f69c2	Move additional worker-thread only functions from `src/shared/util.js` and into a `src/core/core_utils.js` instead This moves the `log2`, `readInt8`, `readUint16`, `readUint32`, and `isSpace` functions since they are only used in the worker-thread.	2020-01-25 00:33:52 +01:00
Jonas Jenwald	83bdb525a4	Fix remaining linting errors, from enabling the `prefer-const` ESLint rule globally This covers cases that the `--fix` command couldn't deal with, and in a few cases (notably `src/core/jbig2.js`) the code was changed to use block-scoped variables instead.	2020-01-25 00:20:23 +01:00
Jonas Jenwald	9e262ae7fa	Enable the ESLint `prefer-const` rule globally (PR 11450 follow-up) Please find additional details about the ESLint rule at https://eslint.org/docs/rules/prefer-const With the recent introduction of Prettier this sort of mass enabling of ESLint rules becomes a lot easier, since the code will be automatically reformatted as necessary to account for e.g. changed line lengths. Note that this patch is generated automatically, by using the ESLint `--fix` argument, and will thus require some additional clean-up (which is done separately).	2020-01-25 00:20:22 +01:00
Tim van der Meij	d2d9441373	Merge pull request #11489 from Snuffleupagus/rm-FIREFOX-define Remove the `FIREFOX` build flag, since it's completely unused and simplify a couple of `PDFJSDev` checks	2020-01-24 23:59:13 +01:00
Tim van der Meij	a88dec197f	Merge pull request #11511 from Snuffleupagus/eslint-no-nested-ternary Enable the `no-nested-ternary` ESLint rule (PR 11488 follow-up)	2020-01-22 22:52:59 +01:00
Jonas Jenwald	3b78f4e8f8	Fix a couple of cases where Prettier broke existing formatting (PR 11446 follow-up) These two cases should have been whitelisted prior to re-formatting respectively had the comments fixed afterwards, however I unfortunately missed them because of the massive size of the diff.	2020-01-22 09:12:12 +01:00
Jani Pehkonen	809b96b40c	Hide .notdef glyphs in non-embedded Type1 fonts and don't ignore Widths Fixes #11403 The PDF uses the non-embedded Type1 font Helvetica. Character codes 194 and 160 (`Â` and `NBSP`) are encoded as `.notdef`. We shouldn't show those glyphs because it seems that Acrobat Reader doesn't draw glyphs that are named `.notdef` in fonts like this. In addition to testing `glyphName === ".notdef"`, we must test also `glyphName === ""` because the name `""` is used in `core/encodings.js` for undefined glyphs in encodings like `WinAnsiEncoding`. The solution above hides the `Â` characters but now the replacement character (space) appears to be too wide. I found out that PDF.js ignores font's `Widths` array if the font has no `FontDescriptor` entry. That happens in #11403, so the default widths of Helvetica were used as specified in `core/metrics.js` and `.nodef` got a width of 333. The correct width is 0 as specified by the `Widths` array in the PDF. Thus we must never ignore `Widths`.	2020-01-21 21:35:25 +02:00
Jonas Jenwald	a39943554a	Simplify, and tweak, a couple of `PDFJSDev` checks This removes a couple of, thanks to preceeding code, unnecessary `typeof PDFJSDev` checks, and also fixes a couple of incorrectly implemented (my fault) checks intended for `TESTING` builds.	2020-01-21 00:06:15 +01:00
Jonas Jenwald	9ab7c280aa	Cache the fallback font dictionary on the `PartialEvaluator` (PR 11218 follow-up) This way we'll benefit from the existing font caching, and can thus avoid re-creating a fallback font over and over again during parsing. (Thece changes necessitated the previous patch, since otherwise breakage could occur e.g. with fake workers.)	2020-01-16 15:12:05 +01:00
Jonas Jenwald	090ff116d4	Ensure that full clean-up is always run when handling the "Terminate" message in `src/core/worker.js` This is beneficial in situations where the Worker is being re-used, for example with fake workers, since it ensures that things like font resources are actually released.	2020-01-16 15:11:56 +01:00
Jonas Jenwald	c591826f3b	Enable the `no-nested-ternary` ESLint rule (PR 11488 follow-up) This rule is already enabled in mozilla-central, and helps avoid some confusing formatting, see https://searchfox.org/mozilla-central/rev/9e45d74b956be046e5021a746b0c8912f1c27318/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#209-210 With the recent introduction of Prettier some of the existing nested ternary statements became even more difficult to read, since any possibly helpful indentation was removed. This particular ESLint rule wasn't entirely straightforward to enable, and I do recognize that there's a certain amount of subjectivity in the changes being made. Generally, the changes in this patch fall into three categories: - Cases where a value is only clamped to a certain range (the easiest ones to update). - Cases where the values involved are "simple", such as Numbers and Strings, which are re-factored to initialize the variable with the default value and only update it when necessary by using `if`/`else if` statements. - Cases with more complex and/or larger values, such as TypedArrays, which are re-factored to let the variable be (implicitly) undefined and where all values are then set through `if`/`else if`/`else` statements. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-nested-ternary	2020-01-14 17:49:39 +01:00
Jonas Jenwald	6590cc32f2	Extract the subroutine bias computation into a helper function in `src/core/font_renderer.js`	2020-01-14 15:29:53 +01:00
Jonas Jenwald	36881e3770	Ensure that all `import` and `require` statements, in the entire code-base, have a `.js` file extension In order to eventually get rid of SystemJS and start using native `import`s instead, we'll need to provide "complete" file identifiers since otherwise there'll be MIME type errors when attempting to use `import`.	2020-01-04 13:01:43 +01:00
Jonas Jenwald	f8ab8c4d3a	Move the SegoeUISymbol font to the `getNonStdFontMap` (PR 8698 follow-up) For reasons that I now cannot even begin to understand, the non-standard SegoeUISymbol font was placed in the `getStdFontMap`. That honestly makes no sense, hence this patch which does what I should have done from the start.	2019-12-28 11:02:49 +01:00
Jonas Jenwald	a63f7ad486	Fix the linting errors, from the Prettier auto-formatting, that ESLint `--fix` couldn't handle This patch makes the follow changes: - Remove no longer necessary inline `// eslint-disable-...` comments. - Fix `// eslint-disable-...` comments that Prettier moved down, thus causing new linting errors. - Concatenate strings which now fit on just one line. - Fix comments that are now too long. - Finally, and most importantly, adjust comments that Prettier moved down, since the new positions often is confusing or outright wrong.	2019-12-26 12:35:12 +01:00
Jonas Jenwald	de36b2aaba	Enable auto-formatting of the entire code-base using Prettier (issue 11444) Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes). Prettier is being used for a couple of reasons: - To be consistent with `mozilla-central`, where Prettier is already in use across the tree. - To ensure a consistent coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters. Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some). Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that comments won't become too long. Please note: This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a separate commit. (On a more personal note, I'll readily admit that some of the changes Prettier makes are extremely ugly. However, in the name of consistency we'll probably have to live with that.)	2019-12-26 12:34:24 +01:00
Jonas Jenwald	8ec1dfde49	Add `// prettier-ignore` comments to prevent re-formatting of certain data structures There's a fair number of (primarily) `Array`s/`TypedArray`s whose formatting we don't want disturb, since in many cases that would lead to the code becoming much more difficult to read and/or break existing inline comments. Please note: It may be a good idea to look through these cases individually, and possibly re-write some of the them (especially the `String` ones) to reduce the need for all of these ignore commands.	2019-12-26 00:14:03 +01:00
Jonas Jenwald	70e3345cb4	Support OpenAction dictionaries without `Type` entries when parsing `Print` actions (issue 11442) The PDF generator didn't bother including the `Type` entry in the OpenAction dictionary, hence we skipped parsing the `Print` action.	2019-12-24 10:41:33 +01:00
Jonas Jenwald	dbb82f05fc	Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization. This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function. The main benefits here are: - No longer necessary to allocate temporary `1 kB` strings during initial parsing, thus saving some memory. - In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a total of only 30 loop iterations.)	2019-12-14 13:43:26 +01:00
Jonas Jenwald	b00835f589	Attempt to improve the `PDFDocument` error message for empty files (issue 5887) Given that the error in question is surfaced on the API-side, this patch makes the following changes: - Updates the wording such that it'll hopefully be slightly easier for users to understand. - Changes the plain `Error` to an `InvalidPDFException` instead, since that should work better with the existing Error handling. - Adds a unit-test which loads an empty PDF document (and also improves a pre-existing `InvalidPDFException` message and its test-case).	2019-12-09 15:45:50 +01:00
Tim van der Meij	a6db045789	Merge pull request #11387 from Snuffleupagus/issue-11385 Handle corrupt ASCII85Decode inline images with truncated EOD markers (issue 11385)	2019-12-08 20:27:46 +01:00
Jonas Jenwald	a02122e984	Ensure that `PDFDocument.checkFirstPage` waits for cleanup to complete (PR 10392 follow-up) Given how this method is currently used there shouldn't be any fonts loaded at the point in time where it's called, but it does seem like a bad idea to assume that that's always going to be the case. Since `PDFDocument.checkFirstPage` is already asynchronous, it's easy enough to simply await `Catalog.cleanup` here. (The patch also makes a tiny simplification in a loop in `Catalog.cleanup`.)	2019-12-07 12:31:41 +01:00
Jonas Jenwald	5c0336872e	Handle corrupt ASCII85Decode inline images with truncated EOD markers (issue 11385) In the PDF document in question, there's an ASCII85Decode inline image where the '>' part of EOD (end-of-data) marker is missing; hence the PDF document is corrupt.	2019-12-05 15:53:18 +01:00
Jonas Jenwald	c3b1c8f857	Slightly simplify the XRef cache lookup in `XRef.fetch` Note that the XRef cache will only hold objects returned through `Parser.getObj`, and indirectly via `Lexer.getObj`. Since neither of those methods will ever return `undefined`, we can simply `assert` that when inserting objects into the cache and thus get rid of one function call when doing cache lookups. Obviously this won't have a huge effect on performance, however `XRef.fetch` is usually called a lot in larger documents and this patch thus cannot hurt.	2019-11-30 22:41:53 +01:00
Jonas Jenwald	168c6aecae	Stop caching Streams in `XRef.fetchCompressed` I'm slightly surprised that this hasn't actually caused any (known) bugs, but that may be more luck than anything else since it fortunately doesn't seem common for Streams to be defined inside of an 'ObjStm'.[1] Note that in the `XRef.fetchUncompressed` method we're not caching Streams, and that for very good reasons too. - Streams, especially the `DecodeStream` ones, can become very large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so. - Attempting to read from the same Stream more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position. - Given that even the `src/core/` code is now fairly asynchronous, see e.g. the `PartialEvaluator`, it's generally impossible to assert that any one Stream isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Streams isn't going to work in the general case. All in all, I cannot understand why it'd ever be correct to cache Streams in the `XRef.fetchCompressed` method. --- [1] One example where that happens is the `issue3115r.pdf` file in the test-suite, where the streams in question are not actually used for anything within the PDF.js code.	2019-11-30 10:21:08 +01:00
Jonas Jenwald	06412a557b	Slighthly re-factor `XRef.fetchCompressed` - Change all occurences of `var` to `let`/`const`. - Initialize the (temporary) Arrays with the correct sizes upfront. - Inline the `isCmd` check. Obviously this won't make a huge difference, but given that the check is only relevant for corrupt documents it cannot hurt.	2019-11-30 09:49:51 +01:00
Jonas Jenwald	725566cfea	Remove the `Number.isInteger` checks from `XRef.fetchUncompressed` (PR 8857 follow-up) Having ran the entire test-suite locally with these `Number.isInteger` checks removed, there wasn't a single test failure anywhere; see also PR 8857. Hence everything points to this being completely unnecessary now, and by removing this code there's thus fewer function calls being made in `XRef.fetchUncompressed`.	2019-11-28 23:25:39 +01:00
Jonas Jenwald	cc76132c24	Remove outdated, and misleading, JSDoc comment from the `PDFDocument` class The contents of this comment hasn't been correct for years, ever since the library was properly split into main/worker-threads, so it's probably high time for this to be updated.	2019-11-25 11:36:29 +01:00
Jonas Jenwald	a965662184	Enable the `getter-return`, `no-dupe-else-if`, and `no-setter-return` ESLint rules All of these rules can help catch errors during development. Please note that only `getter-return` required a few changes, which was limited to disabling the rule in a couple of spots; please find additional details about these rules at: - https://eslint.org/docs/rules/getter-return - https://eslint.org/docs/rules/no-dupe-else-if - https://eslint.org/docs/rules/no-setter-return	2019-11-23 11:40:30 +01:00
Tim van der Meij	be02e67972	Merge pull request #11335 from Snuffleupagus/issue-11330 Subtract `stream.start` when getting the `startXRef` property for documents with a Linearization dictionary (issue 11330)	2019-11-16 13:56:01 +01:00
Jonas Jenwald	9199b02a42	Subtract `stream.start` when getting the `startXRef` property for documents with a Linearization dictionary (issue 11330) For documents with a Linearization dictionary the computed `startXRef` position will be relative to the raw file, rather than the actual PDF document itself (which begins with `%PDF-`). Hence it's necessary to subtract `stream.start` in this case, since otherwise the `XRef.readXRef` method will increment the position too far resulting in parsing errors.	2019-11-16 09:29:10 +01:00
Jonas Jenwald	688d15526e	Use `getBytes`, rather than looping over `getByte`, in `FlateStream.prototype.readBlock` Please note: A a similar change was attempted in PR 5005, but it was subsequently backed out (in PR 5069) since other parts of the patch caused issues. With these changes, it's possible to replace repeated function calls within a loop with just a single function call and subsequent assignment instead.	2019-11-15 15:45:31 +01:00
Jonas Jenwald	74e00ed93c	Change `isNodeJS` from a function to a constant Given that this shouldn't change after the `pdf.js`/`pdf.worker.js` files have been loaded, it doesn't seems necessary to keep this as a function.	2019-11-10 16:44:29 +01:00

1 2 3 4 5 ...

1600 Commits