2017-10-26 20:00:09 +09:00
|
|
|
/* Copyright 2012 Mozilla Foundation
|
|
|
|
*
|
|
|
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
|
* you may not use this file except in compliance with the License.
|
|
|
|
* You may obtain a copy of the License at
|
|
|
|
*
|
|
|
|
* http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
*
|
|
|
|
* Unless required by applicable law or agreed to in writing, software
|
|
|
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
* See the License for the specific language governing permissions and
|
|
|
|
* limitations under the License.
|
|
|
|
*/
|
|
|
|
|
2021-04-27 23:18:52 +09:00
|
|
|
import { DecodeStream } from "./decode_stream.js";
|
2020-01-02 20:00:16 +09:00
|
|
|
import { isDict } from "./primitives.js";
|
|
|
|
import { JpegImage } from "./jpg.js";
|
[api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js`
Currently some JPEG images are decoded by the built-in PDF.js decoder in `src/core/jpg.js`, while others attempt to use the browser JPEG decoder. This inconsistency seem unfortunate for a number of reasons:
- It adds, compared to the other image formats supported in the PDF specification, a fair amount of code/complexity to the image handling in the PDF.js library.
- The PDF specification support JPEG images with features, e.g. certain ColorSpaces, that browsers are unable to decode natively. Hence, determining if a JPEG image is possible to decode natively in the browser require a non-trivial amount of parsing. In particular, we're parsing (part of) the raw JPEG data to extract certain marker data and we also need to parse the ColorSpace for the JPEG image.
- While some JPEG images may, for all intents and purposes, appear to be natively supported there's still cases where the browser may fail to decode some JPEG images. In order to support those cases, we've had to implement a fallback to the PDF.js JPEG decoder if there's any issues during the native decoding. This also means that it's no longer possible to simply send the JPEG image to the main-thread and continue parsing, but you now need to actually wait for the main-thread to indicate success/failure first.
In practice this means that there's a code-path where the worker-thread is forced to wait for the main-thread, while the reverse should *always* be the case.
- The native decoding, for anything except the *simplest* of JPEG images, result in increased peak memory usage because there's a handful of short-lived copies of the JPEG data (see PR 11707).
Furthermore this also leads to data being *parsed* on the main-thread, rather than the worker-thread, which you usually want to avoid for e.g. performance and UI-reponsiveness reasons.
- Not all environments, e.g. Node.js, fully support native JPEG decoding. This has, historically, lead to some issues and support requests.
- Different browsers may use different JPEG decoders, possibly leading to images being rendered slightly differently depending on the platform/browser where the PDF.js library is used.
Originally the implementation in `src/core/jpg.js` were unable to handle all of the JPEG images in the test-suite, but over the last couple of years I've fixed (hopefully) all of those issues.
At this point in time, there's two kinds of failure with this patch:
- Changes which are basically imperceivable to the naked eye, where some pixels in the images are essentially off-by-one (in all components), which could probably be attributed to things such as different rounding behaviour in the browser/PDF.js JPEG decoder.
This type of "failure" accounts for the *vast* majority of the total number of changes in the reference tests.
- Changes where the JPEG images now looks *ever so slightly* blurrier than with the native browser decoder. For quite some time I've just assumed that this pointed to a general deficiency in the `src/core/jpg.js` implementation, however I've discovered when comparing two viewers side-by-side that the differences vanish at higher zoom levels (usually around 200% is enough).
Basically if you disable [this downscaling in canvas.js](https://github.com/mozilla/pdf.js/blob/8fb82e939cf0c8618a4e775ff17fc96f726872b5/src/display/canvas.js#L2356-L2395), which is what happens when zooming in, the differences simply vanish!
Hence I'm pretty satisfied that there's no significant problems with the `src/core/jpg.js` implementation, and the problems are rather tied to the general quality of the downscaling algorithm used. It could even be seen as a positive that *all* images now share the same downscaling behaviour, since this actually fixes one old bug; see issue 7041.
2020-01-20 20:10:16 +09:00
|
|
|
import { shadow } from "../shared/util.js";
|
2017-10-26 20:00:09 +09:00
|
|
|
|
|
|
|
/**
|
[api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js`
Currently some JPEG images are decoded by the built-in PDF.js decoder in `src/core/jpg.js`, while others attempt to use the browser JPEG decoder. This inconsistency seem unfortunate for a number of reasons:
- It adds, compared to the other image formats supported in the PDF specification, a fair amount of code/complexity to the image handling in the PDF.js library.
- The PDF specification support JPEG images with features, e.g. certain ColorSpaces, that browsers are unable to decode natively. Hence, determining if a JPEG image is possible to decode natively in the browser require a non-trivial amount of parsing. In particular, we're parsing (part of) the raw JPEG data to extract certain marker data and we also need to parse the ColorSpace for the JPEG image.
- While some JPEG images may, for all intents and purposes, appear to be natively supported there's still cases where the browser may fail to decode some JPEG images. In order to support those cases, we've had to implement a fallback to the PDF.js JPEG decoder if there's any issues during the native decoding. This also means that it's no longer possible to simply send the JPEG image to the main-thread and continue parsing, but you now need to actually wait for the main-thread to indicate success/failure first.
In practice this means that there's a code-path where the worker-thread is forced to wait for the main-thread, while the reverse should *always* be the case.
- The native decoding, for anything except the *simplest* of JPEG images, result in increased peak memory usage because there's a handful of short-lived copies of the JPEG data (see PR 11707).
Furthermore this also leads to data being *parsed* on the main-thread, rather than the worker-thread, which you usually want to avoid for e.g. performance and UI-reponsiveness reasons.
- Not all environments, e.g. Node.js, fully support native JPEG decoding. This has, historically, lead to some issues and support requests.
- Different browsers may use different JPEG decoders, possibly leading to images being rendered slightly differently depending on the platform/browser where the PDF.js library is used.
Originally the implementation in `src/core/jpg.js` were unable to handle all of the JPEG images in the test-suite, but over the last couple of years I've fixed (hopefully) all of those issues.
At this point in time, there's two kinds of failure with this patch:
- Changes which are basically imperceivable to the naked eye, where some pixels in the images are essentially off-by-one (in all components), which could probably be attributed to things such as different rounding behaviour in the browser/PDF.js JPEG decoder.
This type of "failure" accounts for the *vast* majority of the total number of changes in the reference tests.
- Changes where the JPEG images now looks *ever so slightly* blurrier than with the native browser decoder. For quite some time I've just assumed that this pointed to a general deficiency in the `src/core/jpg.js` implementation, however I've discovered when comparing two viewers side-by-side that the differences vanish at higher zoom levels (usually around 200% is enough).
Basically if you disable [this downscaling in canvas.js](https://github.com/mozilla/pdf.js/blob/8fb82e939cf0c8618a4e775ff17fc96f726872b5/src/display/canvas.js#L2356-L2395), which is what happens when zooming in, the differences simply vanish!
Hence I'm pretty satisfied that there's no significant problems with the `src/core/jpg.js` implementation, and the problems are rather tied to the general quality of the downscaling algorithm used. It could even be seen as a positive that *all* images now share the same downscaling behaviour, since this actually fixes one old bug; see issue 7041.
2020-01-20 20:10:16 +09:00
|
|
|
* For JPEG's we use a library to decode these images and the stream behaves
|
|
|
|
* like all the other DecodeStreams.
|
2017-10-26 20:00:09 +09:00
|
|
|
*/
|
2021-04-27 19:36:33 +09:00
|
|
|
class JpegStream extends DecodeStream {
|
2021-04-28 05:35:53 +09:00
|
|
|
constructor(stream, maybeLength, params) {
|
2017-10-26 20:00:09 +09:00
|
|
|
// Some images may contain 'junk' before the SOI (start-of-image) marker.
|
|
|
|
// Note: this seems to mainly affect inline images.
|
2017-10-26 20:15:57 +09:00
|
|
|
let ch;
|
2017-10-26 20:00:09 +09:00
|
|
|
while ((ch = stream.getByte()) !== -1) {
|
2019-12-26 04:03:46 +09:00
|
|
|
// Find the first byte of the SOI marker (0xFFD8).
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
if (ch === 0xff) {
|
2017-10-26 20:00:09 +09:00
|
|
|
stream.skip(-1); // Reset the stream position to the SOI.
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2021-04-27 19:36:33 +09:00
|
|
|
super(maybeLength);
|
|
|
|
|
2017-10-26 20:00:09 +09:00
|
|
|
this.stream = stream;
|
2021-04-28 05:35:53 +09:00
|
|
|
this.dict = stream.dict;
|
2017-10-26 20:00:09 +09:00
|
|
|
this.maybeLength = maybeLength;
|
|
|
|
this.params = params;
|
|
|
|
}
|
|
|
|
|
2021-04-27 19:36:33 +09:00
|
|
|
get bytes() {
|
|
|
|
// If `this.maybeLength` is null, we'll get the entire stream.
|
|
|
|
return shadow(this, "bytes", this.stream.getBytes(this.maybeLength));
|
|
|
|
}
|
2017-10-26 20:00:09 +09:00
|
|
|
|
2021-04-27 19:36:33 +09:00
|
|
|
ensureBuffer(requested) {
|
2017-10-26 20:15:57 +09:00
|
|
|
// No-op, since `this.readBlock` will always parse the entire image and
|
|
|
|
// directly insert all of its data into `this.buffer`.
|
2021-04-27 19:36:33 +09:00
|
|
|
}
|
2017-10-26 20:15:57 +09:00
|
|
|
|
2021-04-27 19:36:33 +09:00
|
|
|
readBlock() {
|
2017-10-26 20:15:57 +09:00
|
|
|
if (this.eof) {
|
2017-10-26 20:00:09 +09:00
|
|
|
return;
|
|
|
|
}
|
2020-01-24 17:48:21 +09:00
|
|
|
const jpegOptions = {
|
2018-05-16 20:49:01 +09:00
|
|
|
decodeTransform: undefined,
|
|
|
|
colorTransform: undefined,
|
|
|
|
};
|
2017-10-26 20:00:09 +09:00
|
|
|
|
|
|
|
// Checking if values need to be transformed before conversion.
|
2020-01-24 17:48:21 +09:00
|
|
|
const decodeArr = this.dict.getArray("Decode", "D");
|
2017-10-26 20:00:09 +09:00
|
|
|
if (this.forceRGB && Array.isArray(decodeArr)) {
|
2020-01-24 17:48:21 +09:00
|
|
|
const bitsPerComponent = this.dict.get("BitsPerComponent") || 8;
|
|
|
|
const decodeArrLength = decodeArr.length;
|
|
|
|
const transform = new Int32Array(decodeArrLength);
|
2017-10-26 20:15:57 +09:00
|
|
|
let transformNeeded = false;
|
2020-01-24 17:48:21 +09:00
|
|
|
const maxValue = (1 << bitsPerComponent) - 1;
|
2017-10-26 20:15:57 +09:00
|
|
|
for (let i = 0; i < decodeArrLength; i += 2) {
|
2017-10-26 20:00:09 +09:00
|
|
|
transform[i] = ((decodeArr[i + 1] - decodeArr[i]) * 256) | 0;
|
|
|
|
transform[i + 1] = (decodeArr[i] * maxValue) | 0;
|
|
|
|
if (transform[i] !== 256 || transform[i + 1] !== 0) {
|
|
|
|
transformNeeded = true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (transformNeeded) {
|
2018-05-16 20:49:01 +09:00
|
|
|
jpegOptions.decodeTransform = transform;
|
2017-10-26 20:00:09 +09:00
|
|
|
}
|
|
|
|
}
|
|
|
|
// Fetching the 'ColorTransform' entry, if it exists.
|
|
|
|
if (isDict(this.params)) {
|
2020-01-24 17:48:21 +09:00
|
|
|
const colorTransform = this.params.get("ColorTransform");
|
2017-10-26 20:00:09 +09:00
|
|
|
if (Number.isInteger(colorTransform)) {
|
2018-05-16 20:49:01 +09:00
|
|
|
jpegOptions.colorTransform = colorTransform;
|
2017-10-26 20:00:09 +09:00
|
|
|
}
|
|
|
|
}
|
2018-05-16 20:49:01 +09:00
|
|
|
const jpegImage = new JpegImage(jpegOptions);
|
2017-10-26 20:00:09 +09:00
|
|
|
|
|
|
|
jpegImage.parse(this.bytes);
|
2020-01-24 17:48:21 +09:00
|
|
|
const data = jpegImage.getData({
|
2018-09-02 20:55:27 +09:00
|
|
|
width: this.drawWidth,
|
|
|
|
height: this.drawHeight,
|
|
|
|
forceRGB: this.forceRGB,
|
|
|
|
isSourcePDF: true,
|
|
|
|
});
|
2017-10-26 20:00:09 +09:00
|
|
|
this.buffer = data;
|
|
|
|
this.bufferLength = data.length;
|
|
|
|
this.eof = true;
|
2021-04-27 19:36:33 +09:00
|
|
|
}
|
|
|
|
}
|
2017-10-26 20:00:09 +09:00
|
|
|
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
export { JpegStream };
|