Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	fb0775525e	Stop special-casing the `dict` parameter in the `Jbig2Stream`/`JpegStream`/`JpxStream` constructors For all of the other `DecodeStream`s we're not passing in a `Dict`-instance manually, but instead get it from the `stream`-parameter. Hence there's no particularly good reason, as far as I can tell, to not do the same thing in `Jbig2Stream`/`JpegStream`/`JpxStream` as well.	2021-04-28 13:44:47 +02:00
Jonas Jenwald	1e5bf352a5	Move the `FlateStream` from `src/core/stream.js` and into its own file	2021-04-28 10:16:51 +02:00
Jonas Jenwald	66d9d83dcb	Move the `PredictorStream` from `src/core/stream.js` and into its own file	2021-04-28 10:16:51 +02:00
Jonas Jenwald	3294d4d5a3	Move the `Ascii85Stream` from `src/core/stream.js` and into its own file	2021-04-28 10:16:51 +02:00
Jonas Jenwald	d63df04854	Move the `AsciiHexStream` from `src/core/stream.js` and into its own file	2021-04-28 10:16:50 +02:00
Jonas Jenwald	342b0c1bbc	Move the `RunLengthStream` from `src/core/stream.js` and into its own file	2021-04-28 10:16:50 +02:00
Jonas Jenwald	6c1a321500	Move the `LZWStream` from `src/core/stream.js` and into its own file	2021-04-28 10:16:50 +02:00
Jonas Jenwald	9416b14e8b	Re-factor how the ESLint `no-var` rule is enabled in the `src/` folder This simplifies/consolidates the ESLint configuration slightly in the `src/` folder, and prevents the addition of any new files where `var` is being used.[1] Hence we no longer need to manually add `/* eslint no-var: error */` in files, which is easy to forget, and can instead disable the rule in the `src/core/` files where `var` is still in use. --- [1] Obviously the `no-var` rule can, in the same way as every other rule, be disabled on a case-by-case basis where actually necessary.	2020-10-03 20:15:29 +02:00
Jonas Jenwald	28d2ada59c	Attempt to detect inline images which contain "EI" sequence in the actual image data (issue 11124) This should reduce the possibility of accidentally truncating some inline images, while not causing the "EI" detection to become significantly slower.[1] There's obviously a possibility that these added checks are not sufficient to catch every single case of "EI" sequences within the actual inline image data, but without specific test-cases I decided against over-engineering the solution here. Please note: The interpolation issues are somewhat orthogonal to the main issue here, which is the truncated image, and it's already tracked elsewhere. --- [1] I've looked at the issue a few times, and this is the first approach that I was able to come up with that didn't cause unacceptable performance regressions in e.g. issue 2618.	2020-06-26 13:15:06 +02:00
Jonas Jenwald	e1f340a0c2	Use the ESLint `no-restricted-syntax` rule to ensure that `assert` is always called with two arguments Having `assert` calls without a message string isn't very helpful when debugging, and it turns out that it's easy enough to make use of ESLint to enforce better `assert` call-sites. In a couple of cases the `assert` calls were changed to "regular" throwing of errors instead, since that seemed more appropriate. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-restricted-syntax	2020-05-05 13:40:05 +02:00
Jonas Jenwald	e011be037e	Enable the `prefer-exponentiation-operator` ESLint rule Please see https://eslint.org/docs/rules/prefer-exponentiation-operator for additional information.	2020-03-19 12:41:25 +01:00
Jonas Jenwald	c5f67300e9	Rename the `isSpace` helper function to `isWhiteSpace` Trying to enable the ESLint rule `no-shadow`, against the `master` branch, would result in a fair number of errors in the `Glyph` class in `src/core/fonts.js`. Since the glyphs are exposed through the API, we can't very well change the `isSpace` property on `Glyph` instances. Thus the best approach seems, at least to me, to simply rename the `isSpace` helper function to `isWhiteSpace` which shouldn't cause any issues given that it's only used in the `src/core/` folder.	2020-03-12 11:36:59 +01:00
Jonas Jenwald	3adbba55b2	Limit the number of warning messages printed by any one `Lexer.getHexString` invocation This patch fixes something that's annoyed me every now and then over the years, when debugging/fixing corrupt PDF documents. For corrupt PDF documents where `Lexer.getHexString` encounters invalid characters, there's very rarely just a handful of them. In practice it's not uncommon for there to be many hundreds, or even many thousands, invalid hex characters found. Not only is the resulting console warning spam utterly useless in these cases, there's often enough of it that performance may even suffer; hence this patch which limits the amount of messages that any one `Lexer.getHexString` invocation may print.	2020-03-09 13:34:53 +01:00
Jonas Jenwald	3f031f69c2	Move additional worker-thread only functions from `src/shared/util.js` and into a `src/core/core_utils.js` instead This moves the `log2`, `readInt8`, `readUint16`, `readUint32`, and `isSpace` functions since they are only used in the worker-thread.	2020-01-25 00:33:52 +01:00
Jonas Jenwald	83bdb525a4	Fix remaining linting errors, from enabling the `prefer-const` ESLint rule globally This covers cases that the `--fix` command couldn't deal with, and in a few cases (notably `src/core/jbig2.js`) the code was changed to use block-scoped variables instead.	2020-01-25 00:20:23 +01:00
Jonas Jenwald	9e262ae7fa	Enable the ESLint `prefer-const` rule globally (PR 11450 follow-up) Please find additional details about the ESLint rule at https://eslint.org/docs/rules/prefer-const With the recent introduction of Prettier this sort of mass enabling of ESLint rules becomes a lot easier, since the code will be automatically reformatted as necessary to account for e.g. changed line lengths. Note that this patch is generated automatically, by using the ESLint `--fix` argument, and will thus require some additional clean-up (which is done separately).	2020-01-25 00:20:22 +01:00
Jonas Jenwald	36881e3770	Ensure that all `import` and `require` statements, in the entire code-base, have a `.js` file extension In order to eventually get rid of SystemJS and start using native `import`s instead, we'll need to provide "complete" file identifiers since otherwise there'll be MIME type errors when attempting to use `import`.	2020-01-04 13:01:43 +01:00
Jonas Jenwald	a63f7ad486	Fix the linting errors, from the Prettier auto-formatting, that ESLint `--fix` couldn't handle This patch makes the follow changes: - Remove no longer necessary inline `// eslint-disable-...` comments. - Fix `// eslint-disable-...` comments that Prettier moved down, thus causing new linting errors. - Concatenate strings which now fit on just one line. - Fix comments that are now too long. - Finally, and most importantly, adjust comments that Prettier moved down, since the new positions often is confusing or outright wrong.	2019-12-26 12:35:12 +01:00
Jonas Jenwald	de36b2aaba	Enable auto-formatting of the entire code-base using Prettier (issue 11444) Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes). Prettier is being used for a couple of reasons: - To be consistent with `mozilla-central`, where Prettier is already in use across the tree. - To ensure a consistent coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters. Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some). Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that comments won't become too long. Please note: This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a separate commit. (On a more personal note, I'll readily admit that some of the changes Prettier makes are extremely ugly. However, in the name of consistency we'll probably have to live with that.)	2019-12-26 12:34:24 +01:00
Jonas Jenwald	8ec1dfde49	Add `// prettier-ignore` comments to prevent re-formatting of certain data structures There's a fair number of (primarily) `Array`s/`TypedArray`s whose formatting we don't want disturb, since in many cases that would lead to the code becoming much more difficult to read and/or break existing inline comments. Please note: It may be a good idea to look through these cases individually, and possibly re-write some of the them (especially the `String` ones) to reduce the need for all of these ignore commands.	2019-12-26 00:14:03 +01:00
Jonas Jenwald	5c0336872e	Handle corrupt ASCII85Decode inline images with truncated EOD markers (issue 11385) In the PDF document in question, there's an ASCII85Decode inline image where the '>' part of EOD (end-of-data) marker is missing; hence the PDF document is corrupt.	2019-12-05 15:53:18 +01:00
Jonas Jenwald	40d3916f31	Reduce the number of temporary variables in the `Parser.getObj` method This avoids allocating approximately 1.7 million short-lived variables when loading the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, in the default viewer.	2019-08-16 13:51:41 +02:00
Jonas Jenwald	7728a6630c	Inline the `isString` check in the `Parser.getObj` method For very large and complex PDF files this will help performance slightly, since `Parser.getObj` is called a lot during parsing in the worker. This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file: ``` [ { "id": "issue2618", "file": "../web/pdfs/issue2618.pdf", "md5": "", "rounds": 200, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 200 \| 2847 \| 2830 \| -17 \| -0.60 \| faster Firefox \| Page Request \| 200 \| 2 \| 2 \| 0 \| -7.14 \| Firefox \| Rendering \| 200 \| 2844 \| 2827 \| -17 \| -0.60 \| faster ```	2019-08-16 10:34:24 +02:00
Tim van der Meij	e0b38bed3c	Merge pull request #11029 from brendandahl/pdfjs-telemetry-update [api-minor] Update telemetry to use 'categorical' histograms.	2019-08-02 00:11:02 +02:00
Brendan Dahl	31d71808e7	[api-minor] Update telemetry to use 'categorical' histograms. Firefox telemetry supports using string labels now. Convert our integers that we used for categories to just use strings. The upstream work will happen in: https://bugzilla.mozilla.org/show_bug.cgi?id=1566882	2019-08-01 09:51:02 -07:00
Jonas Jenwald	ff90aa4323	Inline the `isCmd` check in the `Parser.shift` method For very large and complex PDF files this will help performance slightly, since `Parser.shift` is called a lot during parsing. This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471 (with well over four million `Parser.shift` calls for just the one page), using the following manifest file: ``` [ { "id": "issue2618", "file": "../web/pdfs/issue2618.pdf", "md5": "", "rounds": 100, "type": "eq" } ] ``` This gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 100 \| 3386 \| 3322 \| -65 \| -1.92 \| faster Firefox \| Page Request \| 100 \| 1 \| 1 \| 0 \| -8.08 \| Firefox \| Rendering \| 100 \| 3385 \| 3321 \| -65 \| -1.92 \| faster ```	2019-07-22 12:07:36 +02:00
Jonas Jenwald	f710eb56e4	Change the signature of the `Parser` constructor to take a parameter object A lot of the `new Parser()` call-sites look quite unwieldy/ugly as-is, with a bunch of somewhat randomly ordered arguments, which we can avoid by changing the constructor to accept an object instead. As an added bonus, this provides better documentation without having to add inline argument comments in the code.	2019-06-23 16:01:45 +02:00
Jonas Jenwald	2fe9f3ff8f	Add caching to reduce the number of `Ref` objects This is similar to the existing caching used to reduced the number of `Cmd` and `Name` objects. With the `tracemonkey.pdf` file, this patch changes the number of `Ref` objects as follows (in the default viewer): \| \| Loading the first page \| Loading all the pages \| \|----------\|------------------------\|-------------------------\| \| `master` \| 332 \| 3265 \| \| `patch` \| 163 \| 996 \|	2019-05-26 12:23:37 +02:00
Tim van der Meij	7d3cb19571	Convert the `Linearization` class in `src/core/parser.js` to ES6 syntax Moreover, disable `var` usage for this file.	2019-03-17 13:27:45 +01:00
Tim van der Meij	8d4d7dbf58	Convert the `Lexer` class in `src/core/parser.js` to ES6 syntax	2019-03-10 19:04:36 +01:00
Tim van der Meij	7d0ecee771	Convert the `Parser` class in `src/core/parser.js` to ES6 syntax	2019-03-10 19:04:35 +01:00
Jonas Jenwald	3ce8fe7927	Handle corrupt ASCII85Decode inline images with whitespace "inside" of the EOD marker (issue 10614) There's a number of things wrong with the PDF document, since its inline images are first all a lot larger than the 4 KB limit (as mandated by the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.1852045). Furthermore the actual ASCII85Decode data is interspersed with a lot of needless whitespace, in particular also "inside" of the EOD (end-of-data) marker which thus completely breaks the detection. Note that according to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.1940130, this patch should be safe since it explicitly mentions that all whitespace should be ignored.	2019-03-04 23:41:36 +01:00
Jonas Jenwald	db5dc14158	Move worker-thread only functions from `src/shared/util.js` and into a new `src/core/core_utils.js` file The `src/shared/util.js` file is being bundled into both the `pdf.js` and `pdf.worker.js` files, meaning that its code is by definition duplicated. Some main-thread only utility functions have already been moved to a separate `src/display/display_utils.js` file, and this patch simply extends that concept to utility functions which are used only on the worker-thread. Note in particular the `getInheritableProperty` function, which expects a `Dict` as input and thus cannot possibly ever be used on the main-thread.	2019-02-24 00:35:39 +01:00
Jonas Jenwald	b531fc4106	Avoid truncating inline images, where the data and the "EI" marker is glued together (issue 10388) (#10436 ) Thanks to the excellent debugging done by @janpe2, this was easy to fix!	2019-01-12 20:31:23 +01:00
Jonas Jenwald	95e5bad4c4	Attempt to find truncated endstream commands, in the fallback code-path, in `Parser.makeStream` (issue 10004) Apparently there's some PDF generators, in this case the culprit is "Nooog Pdf Library / Nooog PStoPDF v1.5", that manage to mess up PDF creation enough that endstream[1] commands actually become truncated. Please note: The solution implemented here isn't perfect, since it won't be able to cope with PDF files that contains a mixture of correct and truncated endstream commands. However, considering that this particular mode of corruption fortunately doesn't seem very common[2], a slightly less complex solution ought to suffice for now. Fixes 10004. --- [1] Scanning through the PDF data to find endstream commands becomes necessary, in order to determine the stream length in cases where the `Length` entry of the (stream) dictionary is missing/incorrect. [2] I cannot recall having seen any (previous) issues/bugs with "Missing endstream" errors.	2018-08-26 11:51:11 +02:00
Jonas Jenwald	c81cbe113c	Extract the "scanning for endstream command" part of `Parser.makeStream` into a helper method With this code now living in a separate method, it can be simplified slightly (e.g. by using early returns).	2018-08-26 11:51:09 +02:00
Jonas Jenwald	6bbcafcd26	Let `Lexer.getNumber` treat a single decimal point as zero (issue 9252) This is consistent with the behaviour in Adobe Reader.	2018-06-20 13:41:21 +02:00
Jonas Jenwald	df4799a12a	Ensure that line-breaks are only skipped after operators in `Lexer.getNumber` (PR 8359 follow-up) With the current code line-breaks are accepted not just after an operator, but after a decimal point as well. When looking at this again, the latter case seems prone to cause false positives and might also interfere with subsequent patches. Hence this is code is adjusted to actually do what the original commit message says, and nothing more.	2018-06-20 13:41:15 +02:00
Jonas Jenwald	4b69bb7fe9	Add a TESTING build option, to enable using non-production/test-only code-paths Since the tests (currently) run with the `pdf.worker.js` file built, i.e. with `PRODUCTION = true` set, there's no simple way to add e.g. `assert` calls for both non-production and test-only builds without also affecting PRODUCTION builds.	2018-06-12 11:01:32 +02:00
Jonas Jenwald	f05e5c5460	Take the dictionary, and not just the image data, into account when caching inline images (issue 9398) The reason for the bug is that we're only computing a checksum of the image data itself, but completely ignore the inline dictionary. The latter is important, since in practice it's not uncommon for inline images to be identical but use e.g. different ColourSpaces. There's obviously a couple of different ways that we could compute a hash/checksum of the dictionary. Initially I tried using `MurmurHash3_64` to compute a hash of the keys/values in the dictionary. Unfortunately this approach turned out to be way too slow in practice, especially for PDF files with a huge number of inline images; in particular issue 2618 would regresses quite badly with this solution. The solution that is instead implemented in this patch, is to compute a checksum of the dictionary contents. While this is a much simpler, not to mention a lot more efficient, solution there's one drawback associated with it: If the contents of inline image dictionaries are ordered differently, they will not be considered equal with this approach which could thus lead to failures to cache repeated inline images. In practice this doesn't seem to be a problem in any of the PDF files I've tested, and generally I'd rather err on the side of not caching given that too aggressive caching can easily lead to rendering bugs. One small, but somewhat annoying, complication is that by the time `Parser.makeInlineImage` is called, we no longer know the exact stream position where the inline image dictionary starts. Having access to that information is crucial here, and the easiest solution I could come up with is to track this in the current `Lexer` instance.[1] With the patch, we're thus able to fix the referenced issues without incurring large regressions in problematic cases such as issue 2618. Fixes 9398; also improves/fixes the `issue8823` reference test. --- [1] Obviously I'd have preferred if this patch could be limited to `Parser.makeInlineImage`, without the need for this "hack", but I'm not sure what that'd look like here.	2018-02-12 16:43:47 +01:00
Jonas Jenwald	36593d6bbc	Move `JpegStream` and `JpxStream` to their own files	2017-11-11 11:22:16 +01:00
Max Schaefer	bc8f673522	Remove spurious arguments to `NullStream` constructor.	2017-11-03 10:14:32 +00:00
Jonas Jenwald	bb35095083	Move `CCITTFaxStream` and `Jbig2Stream`, from `src/core/stream.js`, to separate files	2017-10-24 12:00:40 +02:00
Jonas Jenwald	eece66fa3e	For /Filter entries containing `Name`s, ignore the /DecodeParms entry if it contains an Array (issue 8895)	2017-09-15 23:02:16 +02:00
Jonas Jenwald	cfb4955a92	Replace the `isArray` helper function with the native `Array.isArray` function Follow-up to PR 8813.	2017-09-01 20:27:13 +02:00
Jonas Jenwald	11408da340	Replace the `isInt` helper function with the native `Number.isInteger` function Follow-up to PR 8643.	2017-09-01 16:52:50 +02:00
Jonas Jenwald	49b8cd5a6a	Attempt to improve the `EI` detection heuristics, for inline images, in streams containing `NUL` bytes (issue 8823) Since this patch will now treat (some) `NUL` bytes as "ASCII", the number of `followingBytes` checked are thus increased to (hopefully) reduce the risk of introducing new false positives. Fixes 8823.	2017-08-27 12:48:28 +02:00
Jonas Jenwald	cb55506b95	Try to recover if we reach the end of the stream when searching for the `EI` marker of an inline image (issue 8798)	2017-08-22 09:33:13 +02:00
Jonas Jenwald	2112999db7	Fix caching of small inline images in `Parser.makeInlineImage` (issue 8790) Follow-up to PR 5445. Using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file: ```json [ { "id": "issue2618", "file": "../web/pdfs/issue2618.pdf", "md5": "", "rounds": 50, "type": "eq" } ] ``` I get the following results when comparing `master` against this patch: ``` browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| ---- \| ------ \| ------------- firefox \| Overall \| 50 \| 4694 \| 3974 \| -721 \| -15.35 \| faster firefox \| Page Request \| 50 \| 2 \| 1 \| 0 \| -22.83 \| firefox \| Rendering \| 50 \| 4692 \| 3972 \| -720 \| -15.35 \| faster ``` So, based on these results, it seems like a fairly clear win to fix this broken caching :-)	2017-08-18 23:08:55 +02:00
Yury Delendik	d028c26210	Removes error()	2017-07-07 09:40:24 -05:00

1 2 3