Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	804aa896a7	Stop using the `PRODUCTION` build-target in the JavaScript code This special build-target is very old, and was introduced with the first pre-processor that only uses comments to enable/disable code. When the new pre-processor was added `PRODUCTION` effectively became redundant, at least in JavaScript code, since `typeof PDFJSDev === "undefined"` checks now do the same thing. This patch proposes that we remove `PRODUCTION` from the JavaScript code, since that simplifies the conditions and thus improves readability in many cases. Please note: There's not, nor has there ever been, any gulp-task that set `PRODUCTION = false` during building.	2023-04-17 12:04:34 +02:00
Calixte Denizet	fd03cd5493	[api-minor] Generate images in the worker instead of the main thread. We introduced the use of OffscreenCanvas in #14754 and this patch aims to use them for all kind of images. It'll slightly improve performances (and maybe slightly decrease memory use). Since an image can be rendered in using some transfer maps but because of OffscreenCanvas we don't have the underlying pixels array the transfer maps stuff is re-implemented in using the SVG filter feComponentTransfer.	2023-03-01 17:40:12 +01:00
Jonas Jenwald	60f6272ed9	Use more `for...of` loops in the code-base Most, if not all, of this code is old enough to predate the general availability of `for...of` iteration.	2022-10-03 13:08:38 +02:00
Jonas Jenwald	f1b0dc6f04	Tweak the heuristic that handles JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 15492)	2022-09-22 14:09:04 +02:00
Jonas Jenwald	a54bed4963	Enable the ESLint `no-loss-of-precision` rule Please refer to https://eslint.org/docs/rules/no-loss-of-precision	2021-11-14 10:48:50 +01:00
Jonas Jenwald	6167566f1b	Re-factor the `BaseException.name` handling, and clean-up some code Once we're finally able to get rid of SystemJS, which is unfortunately still blocked on [bug 1247687](https://bugzilla.mozilla.org/show_bug.cgi?id=1247687), we might also want to clean-up (or even completely remove) the `BaseException` abstraction and simply extend `Error` directly instead. At that point we'd need to (explicitly) set the `name` on each class anyway, so this patch is essentially preparing for future clean-up. Furthermore, after the `BaseException` abstraction was added there's been multiple issues filed about third-party minification breaking our code since `this.constructor.name` is not guaranteed to always do what you intended. While hard-coding the strings indeed feels quite unfortunate, it's likely the "best" solution to avoid the problem described above.	2021-08-10 11:27:47 +02:00
Jonas Jenwald	1a8d05fdcf	Remove some, with Prettier `2.3.0`, unnecessary `// prettier-ignore` comments To get the maximum benefit from something like Prettier, you obviously don't want to disable the automatic formatting unless absolutely necessary. When we added Prettier there were a number of cases, mostly involving larger Arrays, which required disabling of the automatic formatting for overall readability and/or to not break inline comments. With changes in Prettier version `2.3.0`, see [the release notes](https://prettier.io/blog/2021/05/09/2.3.0.html#concise-formatting-of-number-only-arrays-10106httpsgithubcomprettierprettierpull10106-10160httpsgithubcomprettierprettierpull10160-by-thorn0httpsgithubcomthorn0), there's now better formatting support for Arrays containing only numbers. Hence we can now remove a number of `// prettier-ignore` comments, and thus get the benefit of automatic formatting in (slightly) more of the code-base.	2021-05-19 11:36:03 +02:00
Jonas Jenwald	69dea39a42	Convert `src/core/jpg.js` to use standard classes Please note: Ignoring whitespace-only changes is probably necessary in order to review this.	2021-05-05 14:02:21 +02:00
Jonas Jenwald	d0a299713c	Fix the remaining `no-var` failures, which couldn't be handled automatically, in the `src/core/jpg.js` file	2021-05-05 14:02:21 +02:00
Jonas Jenwald	1e5a179600	Enable the `no-var` rule in the `src/core/jpg.js` file These changes were made automatically, using `gulp lint --fix`.	2021-05-05 14:02:21 +02:00
Jonas Jenwald	cd9422a075	Improve handling of JPEG images without an EOI marker (issue 12841) Given that the PDF document in the issue contains the same very large JPEG image three times, this patch includes a test-case where only the first page has been extracted from it.	2021-01-09 20:19:39 +01:00
Jonas Jenwald	9416b14e8b	Re-factor how the ESLint `no-var` rule is enabled in the `src/` folder This simplifies/consolidates the ESLint configuration slightly in the `src/` folder, and prevents the addition of any new files where `var` is being used.[1] Hence we no longer need to manually add `/* eslint no-var: error */` in files, which is easy to forget, and can instead disable the rule in the `src/core/` files where `var` is still in use. --- [1] Obviously the `no-var` rule can, in the same way as every other rule, be disabled on a case-by-case basis where actually necessary.	2020-10-03 20:15:29 +02:00
Jonas Jenwald	1d66fce781	Tweak the heuristic, in `src/core/jpg.js`, that handles JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10989)	2020-07-06 13:06:49 +02:00
Carlos Rodríguez	802aa14a99	Jpeg encoded with RGB -instead of YCbCr- write the components index as "RGB" in ASCII to say it so On ISO/IEC 10918-6:2013 (E), section 6.1: (http://www.itu.int/rec/T-REC-T.872-201206-I/en) "Images encoded with three components are assumed to be RGB data encoded as YCbCr unless the image contains an APP14 marker segment as specified in 6.5.3, in which case the colour encoding is considered either RGB or YCbCr according to the application data of the APP14 marker segment" But common jpeg libraries consider RGB too if components index are ASCII R (0x52), G (0x47) and B (0x42): https://stackoverflow.com/questions/50798014/determining-color-space-for-jpeg/50861048 Issue #11931	2020-06-04 15:08:47 +02:00
Jonas Jenwald	518d26dfb4	[src/core/jpg.js] Remove redundant marker validation at the end of the `decodeScan` function (PR 11805 follow-up) With the MCU parsing changes made in PR 11805, the final marker validation is no longer necessary before the `decodeScan` function returns.	2020-04-17 15:40:02 +02:00
Jonas Jenwald	06f6f8719f	Always skip over any additional, unexpected, RSTx (restart) markers in corrupt JPEG images (issue 11794)	2020-04-14 23:27:08 +02:00
Jonas Jenwald	26cffd03b0	[src/core/jpg.js] Remove some redundant marker validation during the MCU parsing in the `decodeScan` function Some of the code in `src/core/jpg.js` is fairly old, and has with time become unnecessary when the surrounding code has been updated to handle various types of JPEG corruption. In particular the `if (!marker \|\| marker <= 0xff00) { ... }` branch is now dead code, since: - The `!marker` case can no longer happen, since we would already have broken out of the loop thanks to the `!fileMarker` branch a handful of lines above. - The `marker <= 0xff00` case can also no longer happen, since the `findNextFileMarker` function validate markers much more thoroughly (by checking `marker >= 0xffc0 && marker <= 0xfffe`). Hence we'd again have broken out of the loop via the `!fileMarker` branch above when no valid marker was found.	2020-04-14 23:27:08 +02:00
Jonas Jenwald	dcb16af968	Whitelist closure related cases to address the remaining `no-shadow` linting errors Given the way that "classes" were previously implemented in PDF.js, using regular functions and closures, there's a fair number of false positives when the `no-shadow` ESLint rule was enabled. Note that while some of these `eslint-disable` statements can be removed if/when the relevant code is converted to proper `class`es, we'll probably never be able to get rid of all of them given our naming/coding conventions (however I don't really see this being a problem).	2020-03-25 11:57:12 +01:00
Jonas Jenwald	216cbca16c	Remove variable shadowing from the JavaScript files in the `src/core/` folder This is part of a series of patches that will try to split PR 11566 into smaller chunks, to make reviewing more feasible. Once all the code has been fixed, we'll be able to eventually enable the ESLint no-shadow rule; see https://eslint.org/docs/rules/no-shadow	2020-03-23 18:28:30 +01:00
Jonas Jenwald	c3c3b8cd81	Add a heuristic, in `src/core/jpg.js`, to handle JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10880) This whole patch feels somewhat arbitrary, and I'd be slightly worried about possibly breaking something else. To limit the impact of these changes, we only re-parse JPEG images using a reduced `scanLines` value if and only if: An unexpected EOI (End of Image) marker was encountered during decoding of Scan data and the "actual" `scanLines` value is at least one order of magnitude smaller than expected.	2020-02-22 14:16:07 +01:00
Tim van der Meij	102af0f915	Merge pull request #11547 from Snuffleupagus/convertCmykToRgb-scale Use fewer multiplications in `JpegImage._convertCmykToRgb`	2020-02-09 17:06:23 +01:00
Jonas Jenwald	a4440a1c6b	Avoid re-calculating the `xScaleBlockOffset` when not necessary in `JpegImage._getLinearizedBlockData` As can be seen in the code, the `xScaleBlockOffset` typed array doesn't depend on the actual image data but only on the width and x-scale. The width is obviously consistent for an image, and it turns out that in practice the `componentScaleX` is quite often identical between two (or more) adjacent image components. All-in-all it's thus not necessary to unconditionally re-compute the `xScaleBlockOffset` when getting the JPEG image data. While avoiding, in many cases, one or more loops can never be a bad thing these changes are unfortunately completely dominated by the rest of the JpegImage code and consequently doesn't really show up in benchmark results. Hence I'd understand if this patch is ultimately deemed not necessary.	2020-02-01 11:58:50 +01:00
Jonas Jenwald	ce4f41d06a	Use fewer multiplications in `JpegImage._convertCmykToRgb` Note: This is inspired by PR 5473, which made similar changes for another kind of JPEG data. Since the implementation in `src/core/jpg.js` only supports 8-bit data, as opposed to similar code in `src/core/colorspace.js`, the computations can be further simplified since the `scale` is always constant. By updating the coefficients, effectively inlining the `scale`, we'll thus avoid four multiplications for each loop iteration. Unfortunately I wasn't able, based on a quick look through the test-files, to find a sufficiently large CMYK JPEG image in order for these changes to really show up in benchmark results. However, when testing the `cmykjpeg.pdf` manually there's a total of `120 000` fewer multiplication with this patch.	2020-01-29 18:34:58 +01:00
Jonas Jenwald	f5a617a334	Make the `decodeHuffman` function, in `src/core/jpg.js`, slightly more efficient Rather than repeating the `typeof node` check twice, we can use a `switch` statement instead. This patch was tested using the PDF file from issue 3809, i.e. https://web.archive.org/web/20140801150504/http://vs.twonky.dk/invitation.pdf, with the following manifest file: ``` [ { "id": "issue3809", "file": "../web/pdfs/issue3809.pdf", "md5": "", "rounds": 50, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 50 \| 12537 \| 12451 \| -86 \| -0.69 \| faster Firefox \| Page Request \| 50 \| 5 \| 5 \| 0 \| 0.77 \| Firefox \| Rendering \| 50 \| 12532 \| 12446 \| -86 \| -0.69 \| faster ```	2020-01-28 14:23:58 +01:00
Jonas Jenwald	13930e5202	Simplify the handling of unsupported/incorrect markers in `src/core/jpg.js` - Re-factor the "incorrect encoding" check, since this can be easily achieved using the general `findNextFileMarker` helper function (with a suitable `startPos` argument). - Tweak a condition, to make it easier to see that the end of the data has been reached. - Add a reference test for issue 1877, since it's what prompted the "incorrect encoding" check.	2020-01-25 22:52:24 +01:00
Jonas Jenwald	188b320e18	Convert `src/core/jpg.js` to use the `readUint16` helper function in `src/core/core_utils.js`, rather than re-implementing it twice The other image decoders, i.e. the JBIG2 and JPEG 2000 ones, are using the common helper function `readUint16`. Most likely, the only reason that the JPEG decoder is doing it this way is because it originated outside of the PDF.js library. Hence we can simply re-factor `src/core/jpg.js` to use the common `readUint16` helper function, which is especially nice given that the functionality was essentially duplicated in the code.	2020-01-25 00:35:10 +01:00
Jonas Jenwald	9e262ae7fa	Enable the ESLint `prefer-const` rule globally (PR 11450 follow-up) Please find additional details about the ESLint rule at https://eslint.org/docs/rules/prefer-const With the recent introduction of Prettier this sort of mass enabling of ESLint rules becomes a lot easier, since the code will be automatically reformatted as necessary to account for e.g. changed line lengths. Note that this patch is generated automatically, by using the ESLint `--fix` argument, and will thus require some additional clean-up (which is done separately).	2020-01-25 00:20:22 +01:00
Tim van der Meij	d2d9441373	Merge pull request #11489 from Snuffleupagus/rm-FIREFOX-define Remove the `FIREFOX` build flag, since it's completely unused and simplify a couple of `PDFJSDev` checks	2020-01-24 23:59:13 +01:00
Jonas Jenwald	a39943554a	Simplify, and tweak, a couple of `PDFJSDev` checks This removes a couple of, thanks to preceeding code, unnecessary `typeof PDFJSDev` checks, and also fixes a couple of incorrectly implemented (my fault) checks intended for `TESTING` builds.	2020-01-21 00:06:15 +01:00
Jonas Jenwald	c591826f3b	Enable the `no-nested-ternary` ESLint rule (PR 11488 follow-up) This rule is already enabled in mozilla-central, and helps avoid some confusing formatting, see https://searchfox.org/mozilla-central/rev/9e45d74b956be046e5021a746b0c8912f1c27318/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#209-210 With the recent introduction of Prettier some of the existing nested ternary statements became even more difficult to read, since any possibly helpful indentation was removed. This particular ESLint rule wasn't entirely straightforward to enable, and I do recognize that there's a certain amount of subjectivity in the changes being made. Generally, the changes in this patch fall into three categories: - Cases where a value is only clamped to a certain range (the easiest ones to update). - Cases where the values involved are "simple", such as Numbers and Strings, which are re-factored to initialize the variable with the default value and only update it when necessary by using `if`/`else if` statements. - Cases with more complex and/or larger values, such as TypedArrays, which are re-factored to let the variable be (implicitly) undefined and where all values are then set through `if`/`else if`/`else` statements. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-nested-ternary	2020-01-14 17:49:39 +01:00
Jonas Jenwald	36881e3770	Ensure that all `import` and `require` statements, in the entire code-base, have a `.js` file extension In order to eventually get rid of SystemJS and start using native `import`s instead, we'll need to provide "complete" file identifiers since otherwise there'll be MIME type errors when attempting to use `import`.	2020-01-04 13:01:43 +01:00
Jonas Jenwald	a63f7ad486	Fix the linting errors, from the Prettier auto-formatting, that ESLint `--fix` couldn't handle This patch makes the follow changes: - Remove no longer necessary inline `// eslint-disable-...` comments. - Fix `// eslint-disable-...` comments that Prettier moved down, thus causing new linting errors. - Concatenate strings which now fit on just one line. - Fix comments that are now too long. - Finally, and most importantly, adjust comments that Prettier moved down, since the new positions often is confusing or outright wrong.	2019-12-26 12:35:12 +01:00
Jonas Jenwald	de36b2aaba	Enable auto-formatting of the entire code-base using Prettier (issue 11444) Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes). Prettier is being used for a couple of reasons: - To be consistent with `mozilla-central`, where Prettier is already in use across the tree. - To ensure a consistent coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters. Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some). Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that comments won't become too long. Please note: This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a separate commit. (On a more personal note, I'll readily admit that some of the changes Prettier makes are extremely ugly. However, in the name of consistency we'll probably have to live with that.)	2019-12-26 12:34:24 +01:00
Jonas Jenwald	8ec1dfde49	Add `// prettier-ignore` comments to prevent re-formatting of certain data structures There's a fair number of (primarily) `Array`s/`TypedArray`s whose formatting we don't want disturb, since in many cases that would lead to the code becoming much more difficult to read and/or break existing inline comments. Please note: It may be a good idea to look through these cases individually, and possibly re-write some of the them (especially the `String` ones) to reduce the need for all of these ignore commands.	2019-12-26 00:14:03 +01:00
Jonas Jenwald	572abdcb4a	Convert the various image decoder `...Error`s to classes extending `BaseException` (PR 11185 follow-up) Somehow I missed these in PR 11185, but there's no good reason not to convert them as well.	2019-10-01 13:10:14 +02:00
Jonas Jenwald	0f78fdb229	Handle some corrupt/truncated JPEG images that are missing the EOI (End of Image) marker (issue 11052) Note that even Adobe Reader cannot render the PDF file completely, which is always a good indication that it's corrupt.	2019-08-08 10:37:41 +02:00
Jonas Jenwald	173fbef05b	Enable the `consistent-return` ESLint rule This rule is already enabled in mozilla-central, and helps ensure more consistent functions/methods, see https://searchfox.org/mozilla-central/rev/b9da45f63cb567244933c77b2c7e827a057d3f9b/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#119-120 Please see https://eslint.org/docs/rules/consistent-return for additional information.	2019-05-11 14:27:21 +02:00
Jonas Jenwald	5181172498	Slightly improve the `isSourcePDF` parameter handling in `JpegImage` (PR 10031 follow-up) Currently there's only a single spot in the code-base where `JpegImage.getData` is called, however it nonetheless seem like a good idea to ensure during tests that the `isSourcePDF` parameter is correctly set. (Especially considering that the PDF use-cases will break without it.) Additionally, in `JpegImage._getLinearizedBlockData`, the code can be made a tiny bit more efficient by checking the value of `isSourcePDF` first to avoid useless checks (for the default PDF use-cases).	2018-09-12 11:30:59 +02:00
Jonas Jenwald	663922f93f	Add a new parameter to `JpegImage.getData` to indicate the source of the image data (issue 9513) The purpose of this patch is to provide a better default behaviour when `JpegImage` is used to parse standalone JPEG images with CMYK colour spaces. Since the issue that the patch concerns is somewhat of a special-case, the implementation utilizes the already existing decode support in an attempt to minimize the impact w.r.t. code size. Please note: It's always possible for the user of `JpegImage` to control image inversion, and thus override the new behaviour, by simply passing a custom `decodeTransform` array upon initialization.	2018-09-02 14:15:22 +02:00
Jonas Jenwald	47bf12cbac	Change `JpegImage._isColorConversionNeeded` into a getter, rather than a regular function Given how `_isColorConversionNeeded` is used, and that it always returns a boolean value, having it be a getter seems more appropriate.	2018-09-02 13:06:28 +02:00
Jonas Jenwald	682672db8e	Change the signature of the `JpegImage` constructor, to allow passing in various options directly	2018-06-16 17:56:54 +02:00
Jonas Jenwald	83ff7d9de9	Simplify the DNL (Define Number of Lines) marker warning in `JpegImage.parse`	2018-05-30 22:40:11 +02:00
Jonas Jenwald	620f65488b	Ignore the rest of the image when encountering an EOI (End of Image) marker while parsing Scan data (issue 9679)	2018-05-30 22:40:11 +02:00
Jonas Jenwald	11ab3b5c00	Ensure that `JpegImage.getData` returns the correct data length when `forceRGBoutput == true` (issue 4888) With PDF.js version `2.0` we'll only support browsers with built-in `TypedArray` functionality, hence there doesn't seem to be any good reason not to implement this now. Fixes 4888.	2018-02-13 20:44:21 +01:00
Jonas Jenwald	bf4166e6c9	Attempt to handle DNL (Define Number of Lines) markers when parsing JPEG images (issue 8614) Please refer to the specification, found at https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=49 Given how the JPEG decoder is currently implemented, we need to know the value of the scanLines parameter (among others) before parsing of the SOS (Start of Scan) data begins. Hence the best solution I could come up with here, is to re-parse the image in the hopefully rare case of JPEG images that include a DNL (Define Number of Lines) marker. Fixes 8614.	2018-02-05 21:05:32 +01:00
Jonas Jenwald	f4a95de694	Attempt to find the next valid marker when encountering invalid image data in `JpegImage.parse` (issue 9425) In the JPEG images in the referenced PDF file, the DHT (Define Huffman Tables) segments contain more data than expected based on the length parameter. Fixes 9425.	2018-02-03 16:01:19 +01:00
Jonas Jenwald	c5700211d6	Adjust `decodeACSuccessive` in src/core/jpg.js to improve the rendering quality of (progressive) JPEG images I've been looking into the remaining point in 8637 about blurry images, to see if we could perhaps improve the rendering quality slightly there. After quite a bit of debugging, it seems that the issue is limited to certain progressive JPEG images. As mentioned previously, I've got no detailed knowledge of the JPEG format, but this patch does seem to improve things quite a bit for the images in question. Squinting at https://searchfox.org/mozilla-central/rev/6c33dde6ca02b389c52e8db3d22494df8b916f33/media/libjpeg/jdphuff.c#492-639, it seems reasonable that we should take the sign of the data into account. Furthermore, looking at the specification in https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=118, the "F.2.4.3 Decoding the binary decision sequence for non-zero DC differences and AC coefficients" section even contains a description of this (even though I cannot claim to really understand the details).	2017-12-30 15:24:09 +01:00
Jonas Jenwald	d6eed132e5	Correct the indentation in the `switch` statement in `decodeACSuccessive` in src/core/jpg.js	2017-12-30 15:22:30 +01:00
Jonas Jenwald	d6cd5355f0	Use `Uint8ClampedArray`, when returning data, and remove manual clamping in `src/core/jpg.js` (issue 4901) This patch removes the `clamp0to255` helper function, as well as manual clamping code in `src/core/jpg.js`. The adjusted constants in `_convertCmykToRgb` were taken from CMYK to RGB conversion code found in `src/core/colorspace.js`. Please note: There will be some very slight movement in a number of existing test-cases, since `Uint8ClampedArray` appears to use `Math.round` (or equivalent) and the old code used (basically) `Math.floor`.	2017-08-14 16:19:57 +02:00
Jonas Jenwald	e2ea9b693c	In `src/core/jpg.js`, ensure that the Adobe JPEG marker always takes precedence, even when the color transform code is zero According to the PDF specification, please see http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.2394361, if an Adobe JPEG marker is present it should always take precedence. This even seem to be consistent with the existing comment that is present in the code. Hence it seems reasonable to interpret `transformCode === 0` as no color conversion being necessary. Fixes the rendering of page 1 in `issue-4926` (from the test-suite), when the built-in `src/core/jpg.js` image decoder is used.	2017-07-11 17:08:30 +02:00

1 2