2012-09-01 07:48:21 +09:00
|
|
|
|
/* Copyright 2012 Mozilla Foundation
|
|
|
|
|
*
|
|
|
|
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
|
|
* you may not use this file except in compliance with the License.
|
|
|
|
|
* You may obtain a copy of the License at
|
|
|
|
|
*
|
|
|
|
|
* http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
|
*
|
|
|
|
|
* Unless required by applicable law or agreed to in writing, software
|
|
|
|
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
|
* See the License for the specific language governing permissions and
|
|
|
|
|
* limitations under the License.
|
|
|
|
|
*/
|
2021-05-13 17:40:08 +09:00
|
|
|
|
/* eslint-disable no-var */
|
2011-10-26 10:18:22 +09:00
|
|
|
|
|
2017-04-02 23:14:30 +09:00
|
|
|
|
import {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
AbortException,
|
|
|
|
|
assert,
|
|
|
|
|
CMapCompressionType,
|
|
|
|
|
createPromiseCapability,
|
|
|
|
|
FONT_IDENTITY_MATRIX,
|
|
|
|
|
FormatError,
|
|
|
|
|
IDENTITY_MATRIX,
|
|
|
|
|
info,
|
|
|
|
|
isArrayEqual,
|
|
|
|
|
OPS,
|
Add local caching of `Function`s, by reference, in the `PDFFunctionFactory` (issue 2541)
Note that compared other structures, such as e.g. Images and ColorSpaces, `Function`s are not referred to by name, which however does bring the advantage of being able to share the cache for an *entire* page.
Furthermore, similar to ColorSpaces, the parsing of individual `Function`s are generally fast enough to not really warrant trying to cache them in any "smarter" way than by reference. (Hence trying to do caching similar to e.g. Fonts would most likely be a losing proposition, given the amount of data lookup/parsing that'd be required.)
Originally I tried implementing this similar to e.g. the recently added ColorSpace caching (and in a couple of different ways), however it unfortunately turned out to be quite ugly/unwieldy given the sheer number of functions/methods where you'd thus need to pass in a `LocalFunctionCache` instance. (Also, the affected functions/methods didn't exactly have short signatures as-is.)
After going back and forth on this for a while it seemed to me that the simplest, or least "invasive" if you will, solution would be if each `PartialEvaluator` instance had its *own* `PDFFunctionFactory` instance (since the latter is already passed to all of the required code). This way each `PDFFunctionFactory` instances could have a local `Function` cache, without it being necessary to provide a `LocalFunctionCache` instance manually at every `PDFFunctionFactory.{create, createFromArray}` call-site.
Obviously, with this patch, there's now (potentially) more `PDFFunctionFactory` instances than before when the entire document shared just one. However, each such instance is really quite small and it's also tied to a `PartialEvaluator` instance and those are *not* kept alive and/or cached. To reduce the impact of these changes, I've tried to make as many of these structures as possible *lazily initialized*, specifically:
- The `PDFFunctionFactory`, on `PartialEvaluator` instances, since not all kinds of general parsing actually requires it. For example: `getTextContent` calls won't cause any `Function` to be parsed, and even some `getOperatorList` calls won't trigger `Function` parsing (if a page contains e.g. no Patterns or "complex" ColorSpaces).
- The `LocalFunctionCache`, on `PDFFunctionFactory` instances, since only certain parsing requires it. Generally speaking, only e.g. Patterns, "complex" ColorSpaces, and/or (some) SoftMasks will trigger any `Function` parsing.
To put these changes into perspective, when loading/rendering all (14) pages of the default `tracemonkey.pdf` file there's now a total of 6 `PDFFunctionFactory` and 1 `LocalFunctionCache` instances created thanks to the lazy initialization.
(If you instead would keep the document-"global" `PDFFunctionFactory` instance and pass around `LocalFunctionCache` instances everywhere, the numbers for the `tracemonkey.pdf` file would be instead be something like 1 `PDFFunctionFactory` and 6 `LocalFunctionCache` instances.)
All-in-all, I thus don't think that the `PDFFunctionFactory` changes should be generally problematic.
With these changes, we can also modify (some) call-sites to pass in a `Reference` rather than the actual `Function` data. This is nice since `Function`s can also be `Streams`, which are not cached on the `XRef` instance (given their potential size), and this way we can avoid unnecessary lookups and thus save some additional time/resources.
Obviously I had intended to include (standard) benchmark results with these changes, but for reasons I don't really understand the test run-time (even with `master`) of the document in issue 2541 is quite a bit slower than in the development viewer.
However, logging the time it takes for the relevant `PDFFunctionFactory`/`PDFFunction ` parsing shows that it takes *approximately* `0.5 ms` for the `Function` in question. Looking up a cached `Function`, on the other hand, is *one order of magnitude faster* which does add up when the same `Function` is invoked close to 2000 times.
2020-06-28 20:12:24 +09:00
|
|
|
|
shadow,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
stringToPDFString,
|
|
|
|
|
TextRenderingMode,
|
|
|
|
|
UNSUPPORTED_FEATURES,
|
|
|
|
|
Util,
|
|
|
|
|
warn,
|
2020-01-02 20:00:16 +09:00
|
|
|
|
} from "../shared/util.js";
|
|
|
|
|
import { CMapFactory, IdentityCMap } from "./cmap.js";
|
2022-02-21 20:44:56 +09:00
|
|
|
|
import { Cmd, Dict, EOF, isName, Name, Ref, RefSet } from "./primitives.js";
|
2021-05-02 23:11:01 +09:00
|
|
|
|
import { ErrorFont, Font } from "./fonts.js";
|
|
|
|
|
import { FontFlags, getFontType } from "./fonts_utils.js";
|
2017-04-02 23:14:30 +09:00
|
|
|
|
import {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
getEncoding,
|
|
|
|
|
MacRomanEncoding,
|
|
|
|
|
StandardEncoding,
|
|
|
|
|
SymbolSetEncoding,
|
|
|
|
|
WinAnsiEncoding,
|
|
|
|
|
ZapfDingbatsEncoding,
|
2020-01-02 20:00:16 +09:00
|
|
|
|
} from "./encodings.js";
|
2017-04-02 23:14:30 +09:00
|
|
|
|
import {
|
2021-06-09 03:50:31 +09:00
|
|
|
|
getFontNameToFileMap,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
getSerifFonts,
|
2020-12-11 10:32:18 +09:00
|
|
|
|
getStandardFontName,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
getStdFontMap,
|
|
|
|
|
getSymbolsFonts,
|
2020-01-02 20:00:16 +09:00
|
|
|
|
} from "./standard_fonts.js";
|
2021-06-09 03:50:31 +09:00
|
|
|
|
import {
|
|
|
|
|
getNormalizedUnicodes,
|
|
|
|
|
getUnicodeForGlyph,
|
|
|
|
|
reverseIfRtl,
|
|
|
|
|
} from "./unicode.js";
|
2020-01-02 20:00:16 +09:00
|
|
|
|
import { getTilingPatternIR, Pattern } from "./pattern.js";
|
2021-07-29 01:30:22 +09:00
|
|
|
|
import { getXfaFontDict, getXfaFontName } from "./xfa_fonts.js";
|
2021-05-02 19:04:34 +09:00
|
|
|
|
import { IdentityToUnicodeMap, ToUnicodeMap } from "./to_unicode_map.js";
|
Add local caching of `Function`s, by reference, in the `PDFFunctionFactory` (issue 2541)
Note that compared other structures, such as e.g. Images and ColorSpaces, `Function`s are not referred to by name, which however does bring the advantage of being able to share the cache for an *entire* page.
Furthermore, similar to ColorSpaces, the parsing of individual `Function`s are generally fast enough to not really warrant trying to cache them in any "smarter" way than by reference. (Hence trying to do caching similar to e.g. Fonts would most likely be a losing proposition, given the amount of data lookup/parsing that'd be required.)
Originally I tried implementing this similar to e.g. the recently added ColorSpace caching (and in a couple of different ways), however it unfortunately turned out to be quite ugly/unwieldy given the sheer number of functions/methods where you'd thus need to pass in a `LocalFunctionCache` instance. (Also, the affected functions/methods didn't exactly have short signatures as-is.)
After going back and forth on this for a while it seemed to me that the simplest, or least "invasive" if you will, solution would be if each `PartialEvaluator` instance had its *own* `PDFFunctionFactory` instance (since the latter is already passed to all of the required code). This way each `PDFFunctionFactory` instances could have a local `Function` cache, without it being necessary to provide a `LocalFunctionCache` instance manually at every `PDFFunctionFactory.{create, createFromArray}` call-site.
Obviously, with this patch, there's now (potentially) more `PDFFunctionFactory` instances than before when the entire document shared just one. However, each such instance is really quite small and it's also tied to a `PartialEvaluator` instance and those are *not* kept alive and/or cached. To reduce the impact of these changes, I've tried to make as many of these structures as possible *lazily initialized*, specifically:
- The `PDFFunctionFactory`, on `PartialEvaluator` instances, since not all kinds of general parsing actually requires it. For example: `getTextContent` calls won't cause any `Function` to be parsed, and even some `getOperatorList` calls won't trigger `Function` parsing (if a page contains e.g. no Patterns or "complex" ColorSpaces).
- The `LocalFunctionCache`, on `PDFFunctionFactory` instances, since only certain parsing requires it. Generally speaking, only e.g. Patterns, "complex" ColorSpaces, and/or (some) SoftMasks will trigger any `Function` parsing.
To put these changes into perspective, when loading/rendering all (14) pages of the default `tracemonkey.pdf` file there's now a total of 6 `PDFFunctionFactory` and 1 `LocalFunctionCache` instances created thanks to the lazy initialization.
(If you instead would keep the document-"global" `PDFFunctionFactory` instance and pass around `LocalFunctionCache` instances everywhere, the numbers for the `tracemonkey.pdf` file would be instead be something like 1 `PDFFunctionFactory` and 6 `LocalFunctionCache` instances.)
All-in-all, I thus don't think that the `PDFFunctionFactory` changes should be generally problematic.
With these changes, we can also modify (some) call-sites to pass in a `Reference` rather than the actual `Function` data. This is nice since `Function`s can also be `Streams`, which are not cached on the `XRef` instance (given their potential size), and this way we can avoid unnecessary lookups and thus save some additional time/resources.
Obviously I had intended to include (standard) benchmark results with these changes, but for reasons I don't really understand the test run-time (even with `master`) of the document in issue 2541 is quite a bit slower than in the development viewer.
However, logging the time it takes for the relevant `PDFFunctionFactory`/`PDFFunction ` parsing shows that it takes *approximately* `0.5 ms` for the `Function` in question. Looking up a cached `Function`, on the other hand, is *one order of magnitude faster* which does add up when the same `Function` is invoked close to 2000 times.
2020-06-28 20:12:24 +09:00
|
|
|
|
import { isPDFFunction, PDFFunctionFactory } from "./function.js";
|
2020-01-02 20:00:16 +09:00
|
|
|
|
import { Lexer, Parser } from "./parser.js";
|
2020-07-11 20:52:11 +09:00
|
|
|
|
import {
|
|
|
|
|
LocalColorSpaceCache,
|
|
|
|
|
LocalGStateCache,
|
|
|
|
|
LocalImageCache,
|
2020-10-09 00:33:23 +09:00
|
|
|
|
LocalTilingPatternCache,
|
2020-07-11 20:52:11 +09:00
|
|
|
|
} from "./image_utils.js";
|
2020-12-11 10:32:18 +09:00
|
|
|
|
import { NullStream, Stream } from "./stream.js";
|
2021-10-08 19:21:26 +09:00
|
|
|
|
import { BaseStream } from "./base_stream.js";
|
2020-01-02 20:00:16 +09:00
|
|
|
|
import { bidi } from "./bidi.js";
|
|
|
|
|
import { ColorSpace } from "./colorspace.js";
|
2021-04-27 23:18:52 +09:00
|
|
|
|
import { DecodeStream } from "./decode_stream.js";
|
2020-01-02 20:00:16 +09:00
|
|
|
|
import { getGlyphsUnicode } from "./glyphlist.js";
|
2021-02-13 20:12:14 +09:00
|
|
|
|
import { getLookupTableFactory } from "./core_utils.js";
|
2020-01-02 20:00:16 +09:00
|
|
|
|
import { getMetrics } from "./metrics.js";
|
|
|
|
|
import { MurmurHash3_64 } from "./murmurhash3.js";
|
|
|
|
|
import { OperatorList } from "./operator_list.js";
|
|
|
|
|
import { PDFImage } from "./image.js";
|
2015-11-22 01:32:47 +09:00
|
|
|
|
|
2020-07-05 19:06:56 +09:00
|
|
|
|
const DefaultPartialEvaluatorOptions = Object.freeze({
|
|
|
|
|
maxImageSize: -1,
|
|
|
|
|
disableFontFace: false,
|
|
|
|
|
ignoreErrors: false,
|
|
|
|
|
isEvalSupported: true,
|
|
|
|
|
fontExtraProperties: false,
|
2020-12-11 10:32:18 +09:00
|
|
|
|
useSystemFonts: true,
|
2021-06-08 19:02:26 +09:00
|
|
|
|
cMapUrl: null,
|
|
|
|
|
standardFontDataUrl: null,
|
2020-07-05 19:06:56 +09:00
|
|
|
|
});
|
|
|
|
|
|
|
|
|
|
const PatternType = {
|
|
|
|
|
TILING: 1,
|
|
|
|
|
SHADING: 2,
|
|
|
|
|
};
|
|
|
|
|
|
[api-minor] Reduce `postMessage` overhead, in `PartialEvaluator.getTextContent`, by sending text chunks in batches (issue 13962)
Following the STR in the issue, this patch reduces the number of `PartialEvaluator.getTextContent`-related `postMessage`-calls by approximately 78 percent.[1]
Note that by enforcing a relatively low value when batching text chunks, we should thus improve worst-case scenarios while not negatively affect all `textLayer` building.
While working on these changes I noticed, thanks to our unit-tests, that the implementation of the `appendEOL` function unfortunately means that the number and content of the textItems could actually be affected by the particular chunking used.
That seems *extremely* unfortunate, since in practice this means that the particular chunking used is thus observable through the API. Obviously that should be a completely internal implementation detail, which is why this patch also modifies `appendEOL` to mitigate that.[2]
Given that this patch adds a *minimum* batch size in `enqueueChunk`, there's obviously nothing preventing it from becoming a lot larger then the limit (depending e.g. on the PDF structure and the CPU load/speed).
While sending more text chunks at once isn't an issue in itself, it could become problematic at the main-thread during `textLayer` building. Note how both the `PartialEvaluator` and `CanvasGraphics` implementations utilize `Date.now()`-checks, to prevent long-running parsing/rendering from "hanging" the respective thread. In the `textLayer` building we don't utilize such a construction[3], and streaming of textContent is thus essentially acting as a *simple* stand-in for that functionality.
Hence why we want to avoid choosing a too large minimum batch size, since that could thus indirectly affect main-thread performance negatively.
---
[1] While it'd be possible to go even lower, that'd likely require more invasive re-factoring/changes to the `PartialEvaluator.getTextContent`-code to ensure that the batches don't become too large.
[2] This should also, as far as I can tell, explain some of the regressions observed in the "enhance" text-selection tests back in PR 13257.
Looking closer at the `appendEOL` function it should potentially be changed even more, however that should probably not be done here.
[3] I'd really like to avoid implementing something like that for the `textLayer` building as well, given that it'd require adding a fair bit of complexity.
2021-09-03 20:07:04 +09:00
|
|
|
|
// Optionally avoid sending individual, or very few, text chunks to reduce
|
|
|
|
|
// `postMessage` overhead with ReadableStream (see issue 13962).
|
|
|
|
|
//
|
|
|
|
|
// PLEASE NOTE: This value should *not* be too large (it's used as a lower limit
|
|
|
|
|
// in `enqueueChunk`), since that would cause streaming of textContent to become
|
|
|
|
|
// essentially useless in practice by sending all (or most) chunks at once.
|
|
|
|
|
// Also, a too large value would (indirectly) affect the main-thread `textLayer`
|
|
|
|
|
// building negatively by forcing all textContent to be handled at once, which
|
|
|
|
|
// could easily end up hurting *overall* performance (e.g. rendering as well).
|
|
|
|
|
const TEXT_CHUNK_BATCH_SIZE = 10;
|
|
|
|
|
|
2020-07-05 19:06:56 +09:00
|
|
|
|
const deferred = Promise.resolve();
|
|
|
|
|
|
|
|
|
|
// Convert PDF blend mode names to HTML5 blend mode names.
|
|
|
|
|
function normalizeBlendMode(value, parsingArray = false) {
|
|
|
|
|
if (Array.isArray(value)) {
|
|
|
|
|
// Use the first *supported* BM value in the Array (fixes issue11279.pdf).
|
|
|
|
|
for (let i = 0, ii = value.length; i < ii; i++) {
|
|
|
|
|
const maybeBM = normalizeBlendMode(value[i], /* parsingArray = */ true);
|
|
|
|
|
if (maybeBM) {
|
|
|
|
|
return maybeBM;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
warn(`Unsupported blend mode Array: ${value}`);
|
|
|
|
|
return "source-over";
|
|
|
|
|
}
|
|
|
|
|
|
2022-02-21 20:45:00 +09:00
|
|
|
|
if (!(value instanceof Name)) {
|
2020-07-05 19:06:56 +09:00
|
|
|
|
if (parsingArray) {
|
|
|
|
|
return null;
|
|
|
|
|
}
|
|
|
|
|
return "source-over";
|
|
|
|
|
}
|
|
|
|
|
switch (value.name) {
|
|
|
|
|
case "Normal":
|
|
|
|
|
case "Compatible":
|
|
|
|
|
return "source-over";
|
|
|
|
|
case "Multiply":
|
|
|
|
|
return "multiply";
|
|
|
|
|
case "Screen":
|
|
|
|
|
return "screen";
|
|
|
|
|
case "Overlay":
|
|
|
|
|
return "overlay";
|
|
|
|
|
case "Darken":
|
|
|
|
|
return "darken";
|
|
|
|
|
case "Lighten":
|
|
|
|
|
return "lighten";
|
|
|
|
|
case "ColorDodge":
|
|
|
|
|
return "color-dodge";
|
|
|
|
|
case "ColorBurn":
|
|
|
|
|
return "color-burn";
|
|
|
|
|
case "HardLight":
|
|
|
|
|
return "hard-light";
|
|
|
|
|
case "SoftLight":
|
|
|
|
|
return "soft-light";
|
|
|
|
|
case "Difference":
|
|
|
|
|
return "difference";
|
|
|
|
|
case "Exclusion":
|
|
|
|
|
return "exclusion";
|
|
|
|
|
case "Hue":
|
|
|
|
|
return "hue";
|
|
|
|
|
case "Saturation":
|
|
|
|
|
return "saturation";
|
|
|
|
|
case "Color":
|
|
|
|
|
return "color";
|
|
|
|
|
case "Luminosity":
|
|
|
|
|
return "luminosity";
|
|
|
|
|
}
|
|
|
|
|
if (parsingArray) {
|
|
|
|
|
return null;
|
|
|
|
|
}
|
|
|
|
|
warn(`Unsupported blend mode: ${value.name}`);
|
|
|
|
|
return "source-over";
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Trying to minimize Date.now() usage and check every 100 time.
|
2020-07-05 19:20:10 +09:00
|
|
|
|
class TimeSlotManager {
|
|
|
|
|
static get TIME_SLOT_DURATION_MS() {
|
|
|
|
|
return shadow(this, "TIME_SLOT_DURATION_MS", 20);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static get CHECK_TIME_EVERY() {
|
|
|
|
|
return shadow(this, "CHECK_TIME_EVERY", 100);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
constructor() {
|
|
|
|
|
this.reset();
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
check() {
|
|
|
|
|
if (++this.checked < TimeSlotManager.CHECK_TIME_EVERY) {
|
2020-07-05 19:06:56 +09:00
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
this.checked = 0;
|
|
|
|
|
return this.endTime <= Date.now();
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
reset() {
|
|
|
|
|
this.endTime = Date.now() + TimeSlotManager.TIME_SLOT_DURATION_MS;
|
2020-07-05 19:06:56 +09:00
|
|
|
|
this.checked = 0;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
}
|
2016-03-03 09:48:21 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
class PartialEvaluator {
|
|
|
|
|
constructor({
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
xref,
|
|
|
|
|
handler,
|
|
|
|
|
pageIndex,
|
|
|
|
|
idFactory,
|
|
|
|
|
fontCache,
|
|
|
|
|
builtInCMapCache,
|
2021-06-08 20:58:52 +09:00
|
|
|
|
standardFontDataCache,
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
|
globalImageCache,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
options = null,
|
|
|
|
|
}) {
|
2011-10-25 08:55:23 +09:00
|
|
|
|
this.xref = xref;
|
|
|
|
|
this.handler = handler;
|
2012-10-29 05:10:34 +09:00
|
|
|
|
this.pageIndex = pageIndex;
|
2017-01-09 00:51:30 +09:00
|
|
|
|
this.idFactory = idFactory;
|
2013-11-15 06:43:38 +09:00
|
|
|
|
this.fontCache = fontCache;
|
2017-02-14 22:28:31 +09:00
|
|
|
|
this.builtInCMapCache = builtInCMapCache;
|
2021-06-08 20:58:52 +09:00
|
|
|
|
this.standardFontDataCache = standardFontDataCache;
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
|
this.globalImageCache = globalImageCache;
|
2016-03-03 09:48:21 +09:00
|
|
|
|
this.options = options || DefaultPartialEvaluatorOptions;
|
2019-04-11 19:26:15 +09:00
|
|
|
|
this.parsingType3Font = false;
|
2017-02-12 23:54:41 +09:00
|
|
|
|
|
2020-06-25 00:23:41 +09:00
|
|
|
|
this._fetchBuiltInCMapBound = this.fetchBuiltInCMap.bind(this);
|
2011-10-25 08:55:23 +09:00
|
|
|
|
}
|
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
/**
|
|
|
|
|
* Since Functions are only cached (locally) by reference, we can share one
|
|
|
|
|
* `PDFFunctionFactory` instance within this `PartialEvaluator` instance.
|
|
|
|
|
*/
|
|
|
|
|
get _pdfFunctionFactory() {
|
|
|
|
|
const pdfFunctionFactory = new PDFFunctionFactory({
|
|
|
|
|
xref: this.xref,
|
|
|
|
|
isEvalSupported: this.options.isEvalSupported,
|
|
|
|
|
});
|
|
|
|
|
return shadow(this, "_pdfFunctionFactory", pdfFunctionFactory);
|
|
|
|
|
}
|
2019-10-28 19:28:13 +09:00
|
|
|
|
|
2021-05-31 19:13:20 +09:00
|
|
|
|
clone(newOptions = null) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const newEvaluator = Object.create(this);
|
2021-05-31 19:13:20 +09:00
|
|
|
|
newEvaluator.options = Object.assign(
|
|
|
|
|
Object.create(null),
|
|
|
|
|
this.options,
|
|
|
|
|
newOptions
|
|
|
|
|
);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return newEvaluator;
|
|
|
|
|
}
|
|
|
|
|
|
Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in `PartialEvaluator.hasBlendModes`
The `PartialEvaluator.hasBlendModes` method is necessary to determine if there's any blend modes on a page, which unfortunately requires *synchronous* parsing of the /Resources of each page before its rendering can start (see the "StartRenderPage"-message).
In practice it's not uncommon for certain /Resources-entries to be found on more than one page (referenced via the XRef-table), which thus leads to unnecessary re-fetching/re-parsing of data in `PartialEvaluator.hasBlendModes`.
To improve performance, especially in pathological cases, we can cache /Resources-entries when it's absolutely clear that they do not contain *any* blend modes at all[1]. This way, subsequent `PartialEvaluator.hasBlendModes` calls can be made significantly more efficient.
This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf:
```
[
{ "id": "issue6961",
"file": "../web/pdfs/issue6961.pdf",
"md5": "a80e4357a8fda758d96c2c76f2980b03",
"rounds": 100,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | 0 | Overall | 100 | 1034 | 555 | -480 | -46.39 | faster
firefox | 0 | Page Request | 100 | 489 | 7 | -482 | -98.67 | faster
firefox | 0 | Rendering | 100 | 545 | 548 | 2 | 0.45 |
firefox | 1 | Overall | 100 | 912 | 428 | -484 | -53.06 | faster
firefox | 1 | Page Request | 100 | 487 | 1 | -486 | -99.77 | faster
firefox | 1 | Rendering | 100 | 425 | 427 | 2 | 0.51 |
```
---
[1] In the case where blend modes *are* found, it becomes a lot more difficult to know if it's generally safe to skip /Resources-entries. Hence we don't cache anything in that case, however note that most document/pages do not utilize blend modes anyway.
2020-11-05 21:35:33 +09:00
|
|
|
|
hasBlendModes(resources, nonBlendModesSet) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (!(resources instanceof Dict)) {
|
|
|
|
|
return false;
|
|
|
|
|
}
|
Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in `PartialEvaluator.hasBlendModes`
The `PartialEvaluator.hasBlendModes` method is necessary to determine if there's any blend modes on a page, which unfortunately requires *synchronous* parsing of the /Resources of each page before its rendering can start (see the "StartRenderPage"-message).
In practice it's not uncommon for certain /Resources-entries to be found on more than one page (referenced via the XRef-table), which thus leads to unnecessary re-fetching/re-parsing of data in `PartialEvaluator.hasBlendModes`.
To improve performance, especially in pathological cases, we can cache /Resources-entries when it's absolutely clear that they do not contain *any* blend modes at all[1]. This way, subsequent `PartialEvaluator.hasBlendModes` calls can be made significantly more efficient.
This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf:
```
[
{ "id": "issue6961",
"file": "../web/pdfs/issue6961.pdf",
"md5": "a80e4357a8fda758d96c2c76f2980b03",
"rounds": 100,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | 0 | Overall | 100 | 1034 | 555 | -480 | -46.39 | faster
firefox | 0 | Page Request | 100 | 489 | 7 | -482 | -98.67 | faster
firefox | 0 | Rendering | 100 | 545 | 548 | 2 | 0.45 |
firefox | 1 | Overall | 100 | 912 | 428 | -484 | -53.06 | faster
firefox | 1 | Page Request | 100 | 487 | 1 | -486 | -99.77 | faster
firefox | 1 | Rendering | 100 | 425 | 427 | 2 | 0.51 |
```
---
[1] In the case where blend modes *are* found, it becomes a lot more difficult to know if it's generally safe to skip /Resources-entries. Hence we don't cache anything in that case, however note that most document/pages do not utilize blend modes anyway.
2020-11-05 21:35:33 +09:00
|
|
|
|
if (resources.objId && nonBlendModesSet.has(resources.objId)) {
|
|
|
|
|
return false;
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
|
Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in `PartialEvaluator.hasBlendModes`
The `PartialEvaluator.hasBlendModes` method is necessary to determine if there's any blend modes on a page, which unfortunately requires *synchronous* parsing of the /Resources of each page before its rendering can start (see the "StartRenderPage"-message).
In practice it's not uncommon for certain /Resources-entries to be found on more than one page (referenced via the XRef-table), which thus leads to unnecessary re-fetching/re-parsing of data in `PartialEvaluator.hasBlendModes`.
To improve performance, especially in pathological cases, we can cache /Resources-entries when it's absolutely clear that they do not contain *any* blend modes at all[1]. This way, subsequent `PartialEvaluator.hasBlendModes` calls can be made significantly more efficient.
This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf:
```
[
{ "id": "issue6961",
"file": "../web/pdfs/issue6961.pdf",
"md5": "a80e4357a8fda758d96c2c76f2980b03",
"rounds": 100,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | 0 | Overall | 100 | 1034 | 555 | -480 | -46.39 | faster
firefox | 0 | Page Request | 100 | 489 | 7 | -482 | -98.67 | faster
firefox | 0 | Rendering | 100 | 545 | 548 | 2 | 0.45 |
firefox | 1 | Overall | 100 | 912 | 428 | -484 | -53.06 | faster
firefox | 1 | Page Request | 100 | 487 | 1 | -486 | -99.77 | faster
firefox | 1 | Rendering | 100 | 425 | 427 | 2 | 0.51 |
```
---
[1] In the case where blend modes *are* found, it becomes a lot more difficult to know if it's generally safe to skip /Resources-entries. Hence we don't cache anything in that case, however note that most document/pages do not utilize blend modes anyway.
2020-11-05 21:35:33 +09:00
|
|
|
|
const processed = new RefSet(nonBlendModesSet);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (resources.objId) {
|
2020-07-15 19:05:05 +09:00
|
|
|
|
processed.put(resources.objId);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const nodes = [resources],
|
2020-07-05 19:20:10 +09:00
|
|
|
|
xref = this.xref;
|
|
|
|
|
while (nodes.length) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const node = nodes.shift();
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// First check the current resources for blend modes.
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const graphicStates = node.get("ExtGState");
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (graphicStates instanceof Dict) {
|
2020-07-17 19:57:34 +09:00
|
|
|
|
for (let graphicState of graphicStates.getRawValues()) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (graphicState instanceof Ref) {
|
2020-07-15 19:05:05 +09:00
|
|
|
|
if (processed.has(graphicState)) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
continue; // The ExtGState has already been processed.
|
2016-02-10 01:09:17 +09:00
|
|
|
|
}
|
2020-03-09 19:37:33 +09:00
|
|
|
|
try {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
graphicState = xref.fetch(graphicState);
|
2020-03-09 19:37:33 +09:00
|
|
|
|
} catch (ex) {
|
2020-07-14 20:00:35 +09:00
|
|
|
|
// Avoid parsing a corrupt ExtGState more than once.
|
2020-07-15 19:05:05 +09:00
|
|
|
|
processed.put(graphicState);
|
2020-07-14 20:00:35 +09:00
|
|
|
|
|
|
|
|
|
info(`hasBlendModes - ignoring ExtGState: "${ex}".`);
|
|
|
|
|
continue;
|
2020-03-09 19:37:33 +09:00
|
|
|
|
}
|
2016-02-10 01:09:17 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (!(graphicState instanceof Dict)) {
|
2013-08-01 03:17:36 +09:00
|
|
|
|
continue;
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (graphicState.objId) {
|
2020-07-15 19:05:05 +09:00
|
|
|
|
processed.put(graphicState.objId);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
const bm = graphicState.get("BM");
|
|
|
|
|
if (bm instanceof Name) {
|
|
|
|
|
if (bm.name !== "Normal") {
|
|
|
|
|
return true;
|
2014-06-10 18:29:25 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
continue;
|
2014-06-10 18:29:25 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (bm !== undefined && Array.isArray(bm)) {
|
2020-07-15 18:51:45 +09:00
|
|
|
|
for (const element of bm) {
|
|
|
|
|
if (element instanceof Name && element.name !== "Normal") {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return true;
|
|
|
|
|
}
|
2014-03-26 23:07:38 +09:00
|
|
|
|
}
|
2013-08-01 03:17:36 +09:00
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// Descend into the XObjects to look for more resources and blend modes.
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const xObjects = node.get("XObject");
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (!(xObjects instanceof Dict)) {
|
|
|
|
|
continue;
|
2020-06-25 00:23:41 +09:00
|
|
|
|
}
|
2020-07-17 19:57:34 +09:00
|
|
|
|
for (let xObject of xObjects.getRawValues()) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (xObject instanceof Ref) {
|
2020-07-15 19:05:05 +09:00
|
|
|
|
if (processed.has(xObject)) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// The XObject has already been processed, and by avoiding a
|
|
|
|
|
// redundant `xref.fetch` we can *significantly* reduce the load
|
|
|
|
|
// time for badly generated PDF files (fixes issue6961.pdf).
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
try {
|
|
|
|
|
xObject = xref.fetch(xObject);
|
|
|
|
|
} catch (ex) {
|
2020-07-14 20:00:35 +09:00
|
|
|
|
// Avoid parsing a corrupt XObject more than once.
|
2020-07-15 19:05:05 +09:00
|
|
|
|
processed.put(xObject);
|
2020-07-14 20:00:35 +09:00
|
|
|
|
|
|
|
|
|
info(`hasBlendModes - ignoring XObject: "${ex}".`);
|
|
|
|
|
continue;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
}
|
2022-02-17 21:45:42 +09:00
|
|
|
|
if (!(xObject instanceof BaseStream)) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
if (xObject.dict.objId) {
|
2020-07-15 19:05:05 +09:00
|
|
|
|
processed.put(xObject.dict.objId);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const xResources = xObject.dict.get("Resources");
|
2020-07-15 18:51:45 +09:00
|
|
|
|
if (!(xResources instanceof Dict)) {
|
|
|
|
|
continue;
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// Checking objId to detect an infinite loop.
|
2020-07-15 19:05:05 +09:00
|
|
|
|
if (xResources.objId && processed.has(xResources.objId)) {
|
2020-07-15 18:51:45 +09:00
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
nodes.push(xResources);
|
|
|
|
|
if (xResources.objId) {
|
2020-07-15 19:05:05 +09:00
|
|
|
|
processed.put(xResources.objId);
|
2020-06-25 00:23:41 +09:00
|
|
|
|
}
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in `PartialEvaluator.hasBlendModes`
The `PartialEvaluator.hasBlendModes` method is necessary to determine if there's any blend modes on a page, which unfortunately requires *synchronous* parsing of the /Resources of each page before its rendering can start (see the "StartRenderPage"-message).
In practice it's not uncommon for certain /Resources-entries to be found on more than one page (referenced via the XRef-table), which thus leads to unnecessary re-fetching/re-parsing of data in `PartialEvaluator.hasBlendModes`.
To improve performance, especially in pathological cases, we can cache /Resources-entries when it's absolutely clear that they do not contain *any* blend modes at all[1]. This way, subsequent `PartialEvaluator.hasBlendModes` calls can be made significantly more efficient.
This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf:
```
[
{ "id": "issue6961",
"file": "../web/pdfs/issue6961.pdf",
"md5": "a80e4357a8fda758d96c2c76f2980b03",
"rounds": 100,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | 0 | Overall | 100 | 1034 | 555 | -480 | -46.39 | faster
firefox | 0 | Page Request | 100 | 489 | 7 | -482 | -98.67 | faster
firefox | 0 | Rendering | 100 | 545 | 548 | 2 | 0.45 |
firefox | 1 | Overall | 100 | 912 | 428 | -484 | -53.06 | faster
firefox | 1 | Page Request | 100 | 487 | 1 | -486 | -99.77 | faster
firefox | 1 | Rendering | 100 | 425 | 427 | 2 | 0.51 |
```
---
[1] In the case where blend modes *are* found, it becomes a lot more difficult to know if it's generally safe to skip /Resources-entries. Hence we don't cache anything in that case, however note that most document/pages do not utilize blend modes anyway.
2020-11-05 21:35:33 +09:00
|
|
|
|
|
|
|
|
|
// When no blend modes exist, there's no need re-fetch/re-parse any of the
|
|
|
|
|
// processed `Ref`s again for subsequent pages. This helps reduce redundant
|
|
|
|
|
// `XRef.fetch` calls for some documents (e.g. issue6961.pdf).
|
2022-03-18 22:18:03 +09:00
|
|
|
|
for (const ref of processed) {
|
Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in `PartialEvaluator.hasBlendModes`
The `PartialEvaluator.hasBlendModes` method is necessary to determine if there's any blend modes on a page, which unfortunately requires *synchronous* parsing of the /Resources of each page before its rendering can start (see the "StartRenderPage"-message).
In practice it's not uncommon for certain /Resources-entries to be found on more than one page (referenced via the XRef-table), which thus leads to unnecessary re-fetching/re-parsing of data in `PartialEvaluator.hasBlendModes`.
To improve performance, especially in pathological cases, we can cache /Resources-entries when it's absolutely clear that they do not contain *any* blend modes at all[1]. This way, subsequent `PartialEvaluator.hasBlendModes` calls can be made significantly more efficient.
This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf:
```
[
{ "id": "issue6961",
"file": "../web/pdfs/issue6961.pdf",
"md5": "a80e4357a8fda758d96c2c76f2980b03",
"rounds": 100,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | 0 | Overall | 100 | 1034 | 555 | -480 | -46.39 | faster
firefox | 0 | Page Request | 100 | 489 | 7 | -482 | -98.67 | faster
firefox | 0 | Rendering | 100 | 545 | 548 | 2 | 0.45 |
firefox | 1 | Overall | 100 | 912 | 428 | -484 | -53.06 | faster
firefox | 1 | Page Request | 100 | 487 | 1 | -486 | -99.77 | faster
firefox | 1 | Rendering | 100 | 425 | 427 | 2 | 0.51 |
```
---
[1] In the case where blend modes *are* found, it becomes a lot more difficult to know if it's generally safe to skip /Resources-entries. Hence we don't cache anything in that case, however note that most document/pages do not utilize blend modes anyway.
2020-11-05 21:35:33 +09:00
|
|
|
|
nonBlendModesSet.put(ref);
|
2022-03-18 22:18:03 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return false;
|
|
|
|
|
}
|
2020-06-25 00:23:41 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
async fetchBuiltInCMap(name) {
|
|
|
|
|
const cachedData = this.builtInCMapCache.get(name);
|
|
|
|
|
if (cachedData) {
|
|
|
|
|
return cachedData;
|
|
|
|
|
}
|
2021-06-08 19:02:26 +09:00
|
|
|
|
let data;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
|
2021-06-08 19:02:26 +09:00
|
|
|
|
if (this.options.cMapUrl !== null) {
|
|
|
|
|
// Only compressed CMaps are (currently) supported here.
|
|
|
|
|
const url = `${this.options.cMapUrl}${name}.bcmap`;
|
|
|
|
|
const response = await fetch(url);
|
|
|
|
|
if (!response.ok) {
|
|
|
|
|
throw new Error(
|
|
|
|
|
`fetchBuiltInCMap: failed to fetch file "${url}" with "${response.statusText}".`
|
|
|
|
|
);
|
2018-10-16 22:23:14 +09:00
|
|
|
|
}
|
2021-06-08 19:02:26 +09:00
|
|
|
|
data = {
|
|
|
|
|
cMapData: new Uint8Array(await response.arrayBuffer()),
|
|
|
|
|
compressionType: CMapCompressionType.BINARY,
|
|
|
|
|
};
|
|
|
|
|
} else {
|
|
|
|
|
// Get the data on the main-thread instead.
|
|
|
|
|
data = await this.handler.sendWithPromise("FetchBuiltInCMap", { name });
|
|
|
|
|
}
|
2013-04-09 07:14:56 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (data.compressionType !== CMapCompressionType.NONE) {
|
|
|
|
|
// Given the size of uncompressed CMaps, only cache compressed ones.
|
|
|
|
|
this.builtInCMapCache.set(name, data);
|
|
|
|
|
}
|
|
|
|
|
return data;
|
|
|
|
|
}
|
Add local caching of `ColorSpace`s, by name, in `PartialEvaluator.getOperatorList` (issue 2504)
By caching parsed `ColorSpace`s, we thus don't need to re-parse the same data over and over which saves CPU cycles *and* reduces peak memory usage. (Obviously persistent memory usage *may* increase a tiny bit, but since the caching is done per `PartialEvaluator.getOperatorList` invocation and given that `ColorSpace` instances generally hold very little data this shouldn't be much of an issue.)
Furthermore, by caching `ColorSpace`s we can also lookup the already parsed ones *synchronously* during the `OperatorList` building, instead of having to defer to the event loop/microtask queue since the parsing is done asynchronously (such that error handling is easier).
Possible future improvements:
- Cache/lookup parsed `ColorSpaces` used in `Pattern`s and `Image`s.
- Attempt to cache *local* `ColorSpace`s by reference as well, in addition to only by name, assuming that there's documents where that would be beneficial and that it's not too difficult to implement.
- Assuming there's documents that would benefit from it, also cache repeated `ColorSpace`s *globally* as well.
Given that we've never, until now, been doing *any* caching of parsed `ColorSpace`s and that even using a simple name-only *local* cache helps tremendously in pathological cases, I purposely decided against complicating the implementation too much initially.
Also, compared to parsing of `Image`s, simply creating a `ColorSpace` instance isn't that expensive (hence I'd be somewhat surprised if adding a *global* cache would help much).
---
This patch was tested using:
- The default `tracemonkey` PDF file, which was included mostly to show that "normal" documents aren't negatively affected by these changes.
- The PDF file from issue 2504, i.e. https://dl-ctlg.panasonic.com/jp/manual/sd/sd_rbm1000_0.pdf, where most pages will switch *thousands* of times between a handful of `ColorSpace`s.
with the following manifest file:
```
[
{ "id": "tracemonkey",
"file": "pdfs/tracemonkey.pdf",
"md5": "9a192d8b1a7dc652a19835f6f08098bd",
"rounds": 100,
"type": "eq"
},
{ "id": "issue2504",
"file": "../web/pdfs/issue2504.pdf",
"md5": "",
"rounds": 20,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
- Overall
```
-- Grouped By browser, pdf, stat --
browser | pdf | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ----------- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | issue2504 | Overall | 640 | 977 | 497 | -479 | -49.08 | faster
firefox | issue2504 | Page Request | 640 | 3 | 4 | 1 | 59.18 |
firefox | issue2504 | Rendering | 640 | 974 | 493 | -481 | -49.37 | faster
firefox | tracemonkey | Overall | 1400 | 116 | 111 | -5 | -4.43 |
firefox | tracemonkey | Page Request | 1400 | 2 | 2 | 0 | -2.86 |
firefox | tracemonkey | Rendering | 1400 | 114 | 109 | -5 | -4.47 |
```
- Page-specific
```
-- Grouped By browser, pdf, page, stat --
browser | pdf | page | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ----------- | ---- | ------------ | ----- | ------------ | ----------- | ----- | ------- | -------------
firefox | issue2504 | 0 | Overall | 20 | 2295 | 1268 | -1027 | -44.76 | faster
firefox | issue2504 | 0 | Page Request | 20 | 6 | 7 | 1 | 15.32 |
firefox | issue2504 | 0 | Rendering | 20 | 2288 | 1260 | -1028 | -44.93 | faster
firefox | issue2504 | 1 | Overall | 20 | 3059 | 2806 | -252 | -8.25 | faster
firefox | issue2504 | 1 | Page Request | 20 | 11 | 14 | 3 | 23.25 | slower
firefox | issue2504 | 1 | Rendering | 20 | 3047 | 2792 | -255 | -8.37 | faster
firefox | issue2504 | 2 | Overall | 20 | 411 | 295 | -116 | -28.20 | faster
firefox | issue2504 | 2 | Page Request | 20 | 2 | 42 | 40 | 1897.62 |
firefox | issue2504 | 2 | Rendering | 20 | 409 | 253 | -156 | -38.09 | faster
firefox | issue2504 | 3 | Overall | 20 | 736 | 299 | -437 | -59.34 | faster
firefox | issue2504 | 3 | Page Request | 20 | 2 | 2 | 0 | 0.00 |
firefox | issue2504 | 3 | Rendering | 20 | 734 | 297 | -437 | -59.49 | faster
firefox | issue2504 | 4 | Overall | 20 | 356 | 458 | 102 | 28.63 |
firefox | issue2504 | 4 | Page Request | 20 | 1 | 2 | 1 | 57.14 | slower
firefox | issue2504 | 4 | Rendering | 20 | 354 | 455 | 101 | 28.53 |
firefox | issue2504 | 5 | Overall | 20 | 1381 | 765 | -616 | -44.59 | faster
firefox | issue2504 | 5 | Page Request | 20 | 3 | 5 | 2 | 50.00 | slower
firefox | issue2504 | 5 | Rendering | 20 | 1378 | 760 | -617 | -44.81 | faster
firefox | issue2504 | 6 | Overall | 20 | 757 | 299 | -459 | -60.57 | faster
firefox | issue2504 | 6 | Page Request | 20 | 2 | 5 | 3 | 150.00 | slower
firefox | issue2504 | 6 | Rendering | 20 | 755 | 294 | -462 | -61.11 | faster
firefox | issue2504 | 7 | Overall | 20 | 394 | 302 | -92 | -23.39 | faster
firefox | issue2504 | 7 | Page Request | 20 | 2 | 1 | -1 | -34.88 | faster
firefox | issue2504 | 7 | Rendering | 20 | 392 | 301 | -91 | -23.32 | faster
firefox | issue2504 | 8 | Overall | 20 | 2875 | 979 | -1896 | -65.95 | faster
firefox | issue2504 | 8 | Page Request | 20 | 1 | 2 | 0 | 11.11 |
firefox | issue2504 | 8 | Rendering | 20 | 2874 | 978 | -1896 | -65.99 | faster
firefox | issue2504 | 9 | Overall | 20 | 700 | 332 | -368 | -52.60 | faster
firefox | issue2504 | 9 | Page Request | 20 | 3 | 2 | 0 | -4.00 |
firefox | issue2504 | 9 | Rendering | 20 | 698 | 329 | -368 | -52.78 | faster
firefox | issue2504 | 10 | Overall | 20 | 3296 | 926 | -2370 | -71.91 | faster
firefox | issue2504 | 10 | Page Request | 20 | 2 | 2 | 0 | -18.75 |
firefox | issue2504 | 10 | Rendering | 20 | 3293 | 924 | -2370 | -71.96 | faster
firefox | issue2504 | 11 | Overall | 20 | 524 | 197 | -327 | -62.34 | faster
firefox | issue2504 | 11 | Page Request | 20 | 2 | 3 | 1 | 58.54 |
firefox | issue2504 | 11 | Rendering | 20 | 522 | 194 | -328 | -62.81 | faster
firefox | issue2504 | 12 | Overall | 20 | 752 | 369 | -384 | -50.98 | faster
firefox | issue2504 | 12 | Page Request | 20 | 3 | 2 | -1 | -36.51 | faster
firefox | issue2504 | 12 | Rendering | 20 | 749 | 367 | -382 | -51.05 | faster
firefox | issue2504 | 13 | Overall | 20 | 679 | 487 | -193 | -28.38 | faster
firefox | issue2504 | 13 | Page Request | 20 | 4 | 2 | -2 | -48.68 | faster
firefox | issue2504 | 13 | Rendering | 20 | 676 | 485 | -191 | -28.28 | faster
firefox | issue2504 | 14 | Overall | 20 | 474 | 283 | -191 | -40.26 | faster
firefox | issue2504 | 14 | Page Request | 20 | 2 | 4 | 2 | 78.57 |
firefox | issue2504 | 14 | Rendering | 20 | 471 | 279 | -192 | -40.79 | faster
firefox | issue2504 | 15 | Overall | 20 | 860 | 618 | -241 | -28.05 | faster
firefox | issue2504 | 15 | Page Request | 20 | 2 | 3 | 0 | 10.87 |
firefox | issue2504 | 15 | Rendering | 20 | 857 | 616 | -241 | -28.15 | faster
firefox | issue2504 | 16 | Overall | 20 | 389 | 243 | -147 | -37.71 | faster
firefox | issue2504 | 16 | Page Request | 20 | 2 | 2 | 0 | 2.33 |
firefox | issue2504 | 16 | Rendering | 20 | 387 | 240 | -147 | -37.94 | faster
firefox | issue2504 | 17 | Overall | 20 | 1484 | 672 | -812 | -54.70 | faster
firefox | issue2504 | 17 | Page Request | 20 | 2 | 3 | 1 | 37.21 |
firefox | issue2504 | 17 | Rendering | 20 | 1482 | 669 | -812 | -54.84 | faster
firefox | issue2504 | 18 | Overall | 20 | 575 | 252 | -323 | -56.12 | faster
firefox | issue2504 | 18 | Page Request | 20 | 2 | 2 | 0 | -16.22 |
firefox | issue2504 | 18 | Rendering | 20 | 573 | 251 | -322 | -56.24 | faster
firefox | issue2504 | 19 | Overall | 20 | 517 | 227 | -290 | -56.08 | faster
firefox | issue2504 | 19 | Page Request | 20 | 2 | 2 | 0 | 21.62 |
firefox | issue2504 | 19 | Rendering | 20 | 515 | 225 | -290 | -56.37 | faster
firefox | issue2504 | 20 | Overall | 20 | 668 | 670 | 2 | 0.31 |
firefox | issue2504 | 20 | Page Request | 20 | 4 | 2 | -1 | -34.29 |
firefox | issue2504 | 20 | Rendering | 20 | 664 | 667 | 3 | 0.49 |
firefox | issue2504 | 21 | Overall | 20 | 486 | 309 | -177 | -36.44 | faster
firefox | issue2504 | 21 | Page Request | 20 | 2 | 2 | 0 | 16.13 |
firefox | issue2504 | 21 | Rendering | 20 | 484 | 307 | -177 | -36.60 | faster
firefox | issue2504 | 22 | Overall | 20 | 543 | 267 | -276 | -50.85 | faster
firefox | issue2504 | 22 | Page Request | 20 | 2 | 2 | 0 | 10.26 |
firefox | issue2504 | 22 | Rendering | 20 | 541 | 265 | -276 | -51.07 | faster
firefox | issue2504 | 23 | Overall | 20 | 3246 | 871 | -2375 | -73.17 | faster
firefox | issue2504 | 23 | Page Request | 20 | 2 | 3 | 1 | 37.21 |
firefox | issue2504 | 23 | Rendering | 20 | 3243 | 868 | -2376 | -73.25 | faster
firefox | issue2504 | 24 | Overall | 20 | 379 | 156 | -223 | -58.83 | faster
firefox | issue2504 | 24 | Page Request | 20 | 2 | 2 | 0 | -2.86 |
firefox | issue2504 | 24 | Rendering | 20 | 378 | 154 | -223 | -59.10 | faster
firefox | issue2504 | 25 | Overall | 20 | 176 | 127 | -50 | -28.19 | faster
firefox | issue2504 | 25 | Page Request | 20 | 2 | 1 | 0 | -15.63 |
firefox | issue2504 | 25 | Rendering | 20 | 175 | 125 | -49 | -28.31 | faster
firefox | issue2504 | 26 | Overall | 20 | 181 | 108 | -74 | -40.67 | faster
firefox | issue2504 | 26 | Page Request | 20 | 3 | 2 | -1 | -39.13 | faster
firefox | issue2504 | 26 | Rendering | 20 | 178 | 105 | -72 | -40.69 | faster
firefox | issue2504 | 27 | Overall | 20 | 208 | 104 | -104 | -49.92 | faster
firefox | issue2504 | 27 | Page Request | 20 | 2 | 2 | 1 | 48.39 |
firefox | issue2504 | 27 | Rendering | 20 | 206 | 102 | -104 | -50.64 | faster
firefox | issue2504 | 28 | Overall | 20 | 241 | 111 | -131 | -54.16 | faster
firefox | issue2504 | 28 | Page Request | 20 | 2 | 2 | -1 | -33.33 |
firefox | issue2504 | 28 | Rendering | 20 | 239 | 109 | -130 | -54.39 | faster
firefox | issue2504 | 29 | Overall | 20 | 321 | 196 | -125 | -39.05 | faster
firefox | issue2504 | 29 | Page Request | 20 | 1 | 2 | 0 | 17.86 |
firefox | issue2504 | 29 | Rendering | 20 | 319 | 194 | -126 | -39.35 | faster
firefox | issue2504 | 30 | Overall | 20 | 651 | 271 | -380 | -58.41 | faster
firefox | issue2504 | 30 | Page Request | 20 | 1 | 2 | 1 | 50.00 |
firefox | issue2504 | 30 | Rendering | 20 | 649 | 269 | -381 | -58.60 | faster
firefox | issue2504 | 31 | Overall | 20 | 1635 | 647 | -988 | -60.42 | faster
firefox | issue2504 | 31 | Page Request | 20 | 1 | 2 | 0 | 30.43 |
firefox | issue2504 | 31 | Rendering | 20 | 1634 | 645 | -988 | -60.49 | faster
firefox | tracemonkey | 0 | Overall | 100 | 51 | 51 | 0 | 0.02 |
firefox | tracemonkey | 0 | Page Request | 100 | 1 | 1 | 0 | -4.76 |
firefox | tracemonkey | 0 | Rendering | 100 | 50 | 50 | 0 | 0.12 |
firefox | tracemonkey | 1 | Overall | 100 | 97 | 91 | -5 | -5.52 | faster
firefox | tracemonkey | 1 | Page Request | 100 | 3 | 3 | 0 | -1.32 |
firefox | tracemonkey | 1 | Rendering | 100 | 94 | 88 | -5 | -5.73 | faster
firefox | tracemonkey | 2 | Overall | 100 | 40 | 40 | 0 | 0.50 |
firefox | tracemonkey | 2 | Page Request | 100 | 1 | 1 | 0 | 3.16 |
firefox | tracemonkey | 2 | Rendering | 100 | 39 | 39 | 0 | 0.54 |
firefox | tracemonkey | 3 | Overall | 100 | 62 | 62 | -1 | -0.94 |
firefox | tracemonkey | 3 | Page Request | 100 | 1 | 1 | 0 | 17.05 |
firefox | tracemonkey | 3 | Rendering | 100 | 61 | 61 | -1 | -1.11 |
firefox | tracemonkey | 4 | Overall | 100 | 56 | 58 | 2 | 3.41 |
firefox | tracemonkey | 4 | Page Request | 100 | 1 | 1 | 0 | 15.31 |
firefox | tracemonkey | 4 | Rendering | 100 | 55 | 57 | 2 | 3.23 |
firefox | tracemonkey | 5 | Overall | 100 | 73 | 71 | -2 | -2.28 |
firefox | tracemonkey | 5 | Page Request | 100 | 2 | 2 | 0 | 12.20 |
firefox | tracemonkey | 5 | Rendering | 100 | 71 | 69 | -2 | -2.69 |
firefox | tracemonkey | 6 | Overall | 100 | 85 | 69 | -16 | -18.73 | faster
firefox | tracemonkey | 6 | Page Request | 100 | 2 | 2 | 0 | -9.90 |
firefox | tracemonkey | 6 | Rendering | 100 | 83 | 67 | -16 | -18.97 | faster
firefox | tracemonkey | 7 | Overall | 100 | 65 | 64 | 0 | -0.37 |
firefox | tracemonkey | 7 | Page Request | 100 | 1 | 1 | 0 | -11.94 |
firefox | tracemonkey | 7 | Rendering | 100 | 63 | 63 | 0 | -0.05 |
firefox | tracemonkey | 8 | Overall | 100 | 53 | 54 | 1 | 2.04 |
firefox | tracemonkey | 8 | Page Request | 100 | 1 | 1 | 0 | 17.02 |
firefox | tracemonkey | 8 | Rendering | 100 | 52 | 53 | 1 | 1.82 |
firefox | tracemonkey | 9 | Overall | 100 | 79 | 73 | -6 | -7.86 | faster
firefox | tracemonkey | 9 | Page Request | 100 | 2 | 2 | 0 | -15.14 |
firefox | tracemonkey | 9 | Rendering | 100 | 77 | 71 | -6 | -7.86 | faster
firefox | tracemonkey | 10 | Overall | 100 | 545 | 519 | -27 | -4.86 | faster
firefox | tracemonkey | 10 | Page Request | 100 | 14 | 13 | 0 | -3.56 |
firefox | tracemonkey | 10 | Rendering | 100 | 532 | 506 | -26 | -4.90 | faster
firefox | tracemonkey | 11 | Overall | 100 | 42 | 41 | -1 | -2.50 |
firefox | tracemonkey | 11 | Page Request | 100 | 1 | 1 | 0 | -27.42 | faster
firefox | tracemonkey | 11 | Rendering | 100 | 41 | 40 | -1 | -1.75 |
firefox | tracemonkey | 12 | Overall | 100 | 350 | 332 | -18 | -5.16 | faster
firefox | tracemonkey | 12 | Page Request | 100 | 3 | 3 | 0 | -5.17 |
firefox | tracemonkey | 12 | Rendering | 100 | 347 | 329 | -18 | -5.15 | faster
firefox | tracemonkey | 13 | Overall | 100 | 31 | 31 | 0 | 0.52 |
firefox | tracemonkey | 13 | Page Request | 100 | 1 | 1 | 0 | 4.95 |
firefox | tracemonkey | 13 | Rendering | 100 | 30 | 30 | 0 | 0.20 |
```
2020-06-13 21:12:40 +09:00
|
|
|
|
|
2020-12-11 10:32:18 +09:00
|
|
|
|
async fetchStandardFontData(name) {
|
2021-06-08 20:58:52 +09:00
|
|
|
|
const cachedData = this.standardFontDataCache.get(name);
|
|
|
|
|
if (cachedData) {
|
|
|
|
|
return new Stream(cachedData);
|
|
|
|
|
}
|
|
|
|
|
|
2021-06-19 20:34:19 +09:00
|
|
|
|
// The symbol fonts are not consistent across platforms, always load the
|
|
|
|
|
// standard font data for them.
|
|
|
|
|
if (
|
|
|
|
|
this.options.useSystemFonts &&
|
|
|
|
|
name !== "Symbol" &&
|
|
|
|
|
name !== "ZapfDingbats"
|
|
|
|
|
) {
|
|
|
|
|
return null;
|
2020-12-11 10:32:18 +09:00
|
|
|
|
}
|
2021-06-08 20:58:52 +09:00
|
|
|
|
|
|
|
|
|
const standardFontNameToFileName = getFontNameToFileMap(),
|
|
|
|
|
filename = standardFontNameToFileName[name];
|
|
|
|
|
let data;
|
|
|
|
|
|
2020-12-11 10:32:18 +09:00
|
|
|
|
if (this.options.standardFontDataUrl !== null) {
|
2021-06-09 03:50:31 +09:00
|
|
|
|
const url = `${this.options.standardFontDataUrl}${filename}`;
|
2020-12-11 10:32:18 +09:00
|
|
|
|
const response = await fetch(url);
|
|
|
|
|
if (!response.ok) {
|
|
|
|
|
warn(
|
2021-06-08 20:58:52 +09:00
|
|
|
|
`fetchStandardFontData: failed to fetch file "${url}" with "${response.statusText}".`
|
|
|
|
|
);
|
|
|
|
|
} else {
|
|
|
|
|
data = await response.arrayBuffer();
|
|
|
|
|
}
|
|
|
|
|
} else {
|
|
|
|
|
// Get the data on the main-thread instead.
|
|
|
|
|
try {
|
|
|
|
|
data = await this.handler.sendWithPromise("FetchStandardFontData", {
|
|
|
|
|
filename,
|
|
|
|
|
});
|
|
|
|
|
} catch (e) {
|
|
|
|
|
warn(
|
|
|
|
|
`fetchStandardFontData: failed to fetch file "${filename}" with "${e}".`
|
2020-12-11 10:32:18 +09:00
|
|
|
|
);
|
|
|
|
|
}
|
|
|
|
|
}
|
2021-06-08 20:58:52 +09:00
|
|
|
|
|
|
|
|
|
if (!data) {
|
|
|
|
|
return null;
|
2020-12-11 10:32:18 +09:00
|
|
|
|
}
|
2021-06-08 20:58:52 +09:00
|
|
|
|
// Cache the "raw" standard font data, to avoid fetching it repeateadly
|
|
|
|
|
// (see e.g. issue 11399).
|
|
|
|
|
this.standardFontDataCache.set(name, data);
|
|
|
|
|
|
|
|
|
|
return new Stream(data);
|
2020-12-11 10:32:18 +09:00
|
|
|
|
}
|
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
async buildFormXObject(
|
|
|
|
|
resources,
|
|
|
|
|
xobj,
|
|
|
|
|
smask,
|
|
|
|
|
operatorList,
|
|
|
|
|
task,
|
|
|
|
|
initialState,
|
|
|
|
|
localColorSpaceCache
|
|
|
|
|
) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const dict = xobj.dict;
|
|
|
|
|
const matrix = dict.getArray("Matrix");
|
|
|
|
|
let bbox = dict.getArray("BBox");
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (Array.isArray(bbox) && bbox.length === 4) {
|
|
|
|
|
bbox = Util.normalizeRect(bbox);
|
|
|
|
|
} else {
|
|
|
|
|
bbox = null;
|
|
|
|
|
}
|
2021-08-27 00:05:30 +09:00
|
|
|
|
|
|
|
|
|
let optionalContent, groupOptions;
|
2020-07-15 07:17:27 +09:00
|
|
|
|
if (dict.has("OC")) {
|
|
|
|
|
optionalContent = await this.parseMarkedContentProps(
|
|
|
|
|
dict.get("OC"),
|
|
|
|
|
resources
|
|
|
|
|
);
|
2021-08-27 00:05:30 +09:00
|
|
|
|
}
|
|
|
|
|
if (optionalContent !== undefined) {
|
2020-07-15 07:17:27 +09:00
|
|
|
|
operatorList.addOp(OPS.beginMarkedContentProps, ["OC", optionalContent]);
|
|
|
|
|
}
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const group = dict.get("Group");
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (group) {
|
2021-05-16 22:15:12 +09:00
|
|
|
|
groupOptions = {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
matrix,
|
|
|
|
|
bbox,
|
|
|
|
|
smask,
|
|
|
|
|
isolated: false,
|
|
|
|
|
knockout: false,
|
|
|
|
|
};
|
|
|
|
|
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const groupSubtype = group.get("S");
|
|
|
|
|
let colorSpace = null;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (isName(groupSubtype, "Transparency")) {
|
|
|
|
|
groupOptions.isolated = group.get("I") || false;
|
|
|
|
|
groupOptions.knockout = group.get("K") || false;
|
|
|
|
|
if (group.has("CS")) {
|
|
|
|
|
const cs = group.getRaw("CS");
|
|
|
|
|
|
|
|
|
|
const cachedColorSpace = ColorSpace.getCached(
|
|
|
|
|
cs,
|
|
|
|
|
this.xref,
|
|
|
|
|
localColorSpaceCache
|
|
|
|
|
);
|
|
|
|
|
if (cachedColorSpace) {
|
|
|
|
|
colorSpace = cachedColorSpace;
|
|
|
|
|
} else {
|
|
|
|
|
colorSpace = await this.parseColorSpace({
|
2020-06-18 01:45:11 +09:00
|
|
|
|
cs,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
resources,
|
|
|
|
|
localColorSpaceCache,
|
|
|
|
|
});
|
2017-09-19 20:49:30 +09:00
|
|
|
|
}
|
2013-04-09 07:14:56 +09:00
|
|
|
|
}
|
2012-10-16 01:48:45 +09:00
|
|
|
|
}
|
2012-09-14 00:09:46 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (smask && smask.backdrop) {
|
|
|
|
|
colorSpace = colorSpace || ColorSpace.singletons.rgb;
|
|
|
|
|
smask.backdrop = colorSpace.getRgb(smask.backdrop, 0);
|
|
|
|
|
}
|
2012-09-14 00:09:46 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
operatorList.addOp(OPS.beginGroup, [groupOptions]);
|
|
|
|
|
}
|
Change the signatures of the `PartialEvaluator` "constructor" and its `getOperatorList`/`getTextContent` methods to take parameter objects
Currently these methods accept a large number of parameters, which creates quite unwieldy call-sites. When invoking them, you have to remember not only what arguments to supply, but also the correct order, to avoid runtime errors.
Furthermore, since some of the parameters are optional, you also have to remember to pass e.g. `null` or `undefined` for those ones.
Also, adding new parameters to these methods (which happens occasionally), often becomes unnecessarily tedious (based on personal experience).
Please note that I do *not* think that we need/should convert *every* single method in `evaluator.js` (or elsewhere in `/core` files) to take parameter objects. However, in my opinion, once a method starts relying on approximately five parameter (or even more), passing them in individually becomes quite cumbersome.
With these changes, I obviously needed to update the `evaluator_spec.js` unit-tests. The main change there, except the new method signatures[1], is that it's now re-using *one* `PartialEvalutor` instance, since I couldn't see any compelling reason for creating a new one in every single test.
*Note:* If this patch is accepted, my intention is to (time permitting) see if it makes sense to convert additional methods in `evaluator.js` (and other `/core` files) in a similar fashion, but I figured that it'd be a good idea to limit the initial scope somewhat.
---
[1] A fun fact here, note how the `PartialEvaluator` signature used in `evaluator_spec.js` wasn't even correct in the current `master`.
2017-04-30 06:13:51 +09:00
|
|
|
|
|
2021-11-06 06:50:48 +09:00
|
|
|
|
// If it's a group, a new canvas will be created that is the size of the
|
|
|
|
|
// bounding box and translated to the correct position so we don't need to
|
|
|
|
|
// apply the bounding box to it.
|
|
|
|
|
const args = group ? [matrix, null] : [matrix, bbox];
|
|
|
|
|
operatorList.addOp(OPS.paintFormXObjectBegin, args);
|
2011-10-25 08:55:23 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return this.getOperatorList({
|
|
|
|
|
stream: xobj,
|
|
|
|
|
task,
|
|
|
|
|
resources: dict.get("Resources") || resources,
|
|
|
|
|
operatorList,
|
|
|
|
|
initialState,
|
|
|
|
|
}).then(function () {
|
|
|
|
|
operatorList.addOp(OPS.paintFormXObjectEnd, []);
|
2020-06-07 19:01:51 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (group) {
|
|
|
|
|
operatorList.addOp(OPS.endGroup, [groupOptions]);
|
2020-06-07 19:01:51 +09:00
|
|
|
|
}
|
2020-07-15 07:17:27 +09:00
|
|
|
|
|
2021-08-27 00:05:30 +09:00
|
|
|
|
if (optionalContent !== undefined) {
|
2020-07-15 07:17:27 +09:00
|
|
|
|
operatorList.addOp(OPS.endMarkedContent, []);
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
});
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
_sendImgData(objId, imgData, cacheGlobally = false) {
|
|
|
|
|
const transfers = imgData ? [imgData.data.buffer] : null;
|
|
|
|
|
|
2020-07-27 01:05:38 +09:00
|
|
|
|
if (this.parsingType3Font || cacheGlobally) {
|
2020-06-07 19:01:51 +09:00
|
|
|
|
return this.handler.send(
|
2020-07-05 19:20:10 +09:00
|
|
|
|
"commonobj",
|
|
|
|
|
[objId, "Image", imgData],
|
2020-06-07 19:01:51 +09:00
|
|
|
|
transfers
|
|
|
|
|
);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
return this.handler.send(
|
|
|
|
|
"obj",
|
|
|
|
|
[objId, this.pageIndex, "Image", imgData],
|
|
|
|
|
transfers
|
|
|
|
|
);
|
|
|
|
|
}
|
2013-07-11 01:52:37 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
async buildPaintImageXObject({
|
|
|
|
|
resources,
|
|
|
|
|
image,
|
|
|
|
|
isInline = false,
|
|
|
|
|
operatorList,
|
|
|
|
|
cacheKey,
|
|
|
|
|
localImageCache,
|
|
|
|
|
localColorSpaceCache,
|
|
|
|
|
}) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const dict = image.dict;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
const imageRef = dict.objId;
|
2021-11-10 06:39:21 +09:00
|
|
|
|
const w = dict.get("W", "Width");
|
|
|
|
|
const h = dict.get("H", "Height");
|
2019-03-16 20:09:14 +09:00
|
|
|
|
|
2022-02-22 19:55:34 +09:00
|
|
|
|
if (!(w && typeof w === "number") || !(h && typeof h === "number")) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
warn("Image dimensions are missing, or not numbers.");
|
2021-08-24 18:30:19 +09:00
|
|
|
|
return;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const maxImageSize = this.options.maxImageSize;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (maxImageSize !== -1 && w * h > maxImageSize) {
|
2022-03-03 21:04:42 +09:00
|
|
|
|
const msg = "Image exceeded maximum allowed size and was removed.";
|
|
|
|
|
|
|
|
|
|
if (this.options.ignoreErrors) {
|
|
|
|
|
warn(msg);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
throw new Error(msg);
|
2021-08-24 18:30:19 +09:00
|
|
|
|
}
|
|
|
|
|
|
2021-08-27 00:05:30 +09:00
|
|
|
|
let optionalContent;
|
2021-08-24 18:30:19 +09:00
|
|
|
|
if (dict.has("OC")) {
|
|
|
|
|
optionalContent = await this.parseMarkedContentProps(
|
|
|
|
|
dict.get("OC"),
|
|
|
|
|
resources
|
|
|
|
|
);
|
2021-08-27 00:05:30 +09:00
|
|
|
|
}
|
|
|
|
|
if (optionalContent !== undefined) {
|
2021-08-24 18:30:19 +09:00
|
|
|
|
operatorList.addOp(OPS.beginMarkedContentProps, ["OC", optionalContent]);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2013-04-09 07:14:56 +09:00
|
|
|
|
|
2021-11-10 06:39:21 +09:00
|
|
|
|
const imageMask = dict.get("IM", "ImageMask") || false;
|
|
|
|
|
const interpolate = dict.get("I", "Interpolate");
|
2021-05-06 16:39:21 +09:00
|
|
|
|
let imgData, args;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (imageMask) {
|
|
|
|
|
// This depends on a tmpCanvas being filled with the
|
|
|
|
|
// current fillStyle, such that processing the pixel
|
|
|
|
|
// data can't be done here. Instead of creating a
|
|
|
|
|
// complete PDFImage, only read the information needed
|
|
|
|
|
// for later.
|
2021-11-10 06:39:21 +09:00
|
|
|
|
const bitStrideLength = (w + 7) >> 3;
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const imgArray = image.getBytes(
|
2021-11-10 06:39:21 +09:00
|
|
|
|
bitStrideLength * h,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
/* forceClamped = */ true
|
|
|
|
|
);
|
2021-11-10 06:39:21 +09:00
|
|
|
|
const decode = dict.getArray("D", "Decode");
|
2020-07-05 19:20:10 +09:00
|
|
|
|
|
|
|
|
|
imgData = PDFImage.createMask({
|
|
|
|
|
imgArray,
|
2021-11-10 06:39:21 +09:00
|
|
|
|
width: w,
|
|
|
|
|
height: h,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
imageIsFromDecodeStream: image instanceof DecodeStream,
|
|
|
|
|
inverseDecode: !!decode && decode[0] > 0,
|
2021-09-09 09:31:10 +09:00
|
|
|
|
interpolate,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
});
|
|
|
|
|
imgData.cached = !!cacheKey;
|
|
|
|
|
args = [imgData];
|
2013-04-09 07:14:56 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
operatorList.addOp(OPS.paintImageMaskXObject, args);
|
|
|
|
|
if (cacheKey) {
|
|
|
|
|
localImageCache.set(cacheKey, imageRef, {
|
|
|
|
|
fn: OPS.paintImageMaskXObject,
|
|
|
|
|
args,
|
2017-09-21 19:14:05 +09:00
|
|
|
|
});
|
2019-04-11 19:26:15 +09:00
|
|
|
|
}
|
2021-08-24 18:30:19 +09:00
|
|
|
|
|
2021-08-27 00:05:30 +09:00
|
|
|
|
if (optionalContent !== undefined) {
|
2021-08-24 18:30:19 +09:00
|
|
|
|
operatorList.addOp(OPS.endMarkedContent, []);
|
|
|
|
|
}
|
|
|
|
|
return;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2013-04-09 07:14:56 +09:00
|
|
|
|
|
2021-11-10 06:39:21 +09:00
|
|
|
|
const softMask = dict.get("SM", "SMask") || false;
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const mask = dict.get("Mask") || false;
|
2018-02-02 00:43:10 +09:00
|
|
|
|
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const SMALL_IMAGE_DIMENSIONS = 200;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// Inlining small images into the queue as RGB data
|
|
|
|
|
if (isInline && !softMask && !mask && w + h < SMALL_IMAGE_DIMENSIONS) {
|
|
|
|
|
const imageObj = new PDFImage({
|
2017-09-21 19:14:05 +09:00
|
|
|
|
xref: this.xref,
|
|
|
|
|
res: resources,
|
|
|
|
|
image,
|
2018-02-01 23:44:49 +09:00
|
|
|
|
isInline,
|
Add local caching of `Function`s, by reference, in the `PDFFunctionFactory` (issue 2541)
Note that compared other structures, such as e.g. Images and ColorSpaces, `Function`s are not referred to by name, which however does bring the advantage of being able to share the cache for an *entire* page.
Furthermore, similar to ColorSpaces, the parsing of individual `Function`s are generally fast enough to not really warrant trying to cache them in any "smarter" way than by reference. (Hence trying to do caching similar to e.g. Fonts would most likely be a losing proposition, given the amount of data lookup/parsing that'd be required.)
Originally I tried implementing this similar to e.g. the recently added ColorSpace caching (and in a couple of different ways), however it unfortunately turned out to be quite ugly/unwieldy given the sheer number of functions/methods where you'd thus need to pass in a `LocalFunctionCache` instance. (Also, the affected functions/methods didn't exactly have short signatures as-is.)
After going back and forth on this for a while it seemed to me that the simplest, or least "invasive" if you will, solution would be if each `PartialEvaluator` instance had its *own* `PDFFunctionFactory` instance (since the latter is already passed to all of the required code). This way each `PDFFunctionFactory` instances could have a local `Function` cache, without it being necessary to provide a `LocalFunctionCache` instance manually at every `PDFFunctionFactory.{create, createFromArray}` call-site.
Obviously, with this patch, there's now (potentially) more `PDFFunctionFactory` instances than before when the entire document shared just one. However, each such instance is really quite small and it's also tied to a `PartialEvaluator` instance and those are *not* kept alive and/or cached. To reduce the impact of these changes, I've tried to make as many of these structures as possible *lazily initialized*, specifically:
- The `PDFFunctionFactory`, on `PartialEvaluator` instances, since not all kinds of general parsing actually requires it. For example: `getTextContent` calls won't cause any `Function` to be parsed, and even some `getOperatorList` calls won't trigger `Function` parsing (if a page contains e.g. no Patterns or "complex" ColorSpaces).
- The `LocalFunctionCache`, on `PDFFunctionFactory` instances, since only certain parsing requires it. Generally speaking, only e.g. Patterns, "complex" ColorSpaces, and/or (some) SoftMasks will trigger any `Function` parsing.
To put these changes into perspective, when loading/rendering all (14) pages of the default `tracemonkey.pdf` file there's now a total of 6 `PDFFunctionFactory` and 1 `LocalFunctionCache` instances created thanks to the lazy initialization.
(If you instead would keep the document-"global" `PDFFunctionFactory` instance and pass around `LocalFunctionCache` instances everywhere, the numbers for the `tracemonkey.pdf` file would be instead be something like 1 `PDFFunctionFactory` and 6 `LocalFunctionCache` instances.)
All-in-all, I thus don't think that the `PDFFunctionFactory` changes should be generally problematic.
With these changes, we can also modify (some) call-sites to pass in a `Reference` rather than the actual `Function` data. This is nice since `Function`s can also be `Streams`, which are not cached on the `XRef` instance (given their potential size), and this way we can avoid unnecessary lookups and thus save some additional time/resources.
Obviously I had intended to include (standard) benchmark results with these changes, but for reasons I don't really understand the test run-time (even with `master`) of the document in issue 2541 is quite a bit slower than in the development viewer.
However, logging the time it takes for the relevant `PDFFunctionFactory`/`PDFFunction ` parsing shows that it takes *approximately* `0.5 ms` for the `Function` in question. Looking up a cached `Function`, on the other hand, is *one order of magnitude faster* which does add up when the same `Function` is invoked close to 2000 times.
2020-06-28 20:12:24 +09:00
|
|
|
|
pdfFunctionFactory: this._pdfFunctionFactory,
|
2020-06-18 01:45:11 +09:00
|
|
|
|
localColorSpaceCache,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
});
|
|
|
|
|
// We force the use of RGBA_32BPP images here, because we can't handle
|
|
|
|
|
// any other kind.
|
|
|
|
|
imgData = imageObj.createImageData(/* forceRGBA = */ true);
|
|
|
|
|
operatorList.addOp(OPS.paintInlineImageXObject, [imgData]);
|
2021-08-24 18:30:19 +09:00
|
|
|
|
|
2021-08-27 00:05:30 +09:00
|
|
|
|
if (optionalContent !== undefined) {
|
2021-08-24 18:30:19 +09:00
|
|
|
|
operatorList.addOp(OPS.endMarkedContent, []);
|
|
|
|
|
}
|
|
|
|
|
return;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// If there is no imageMask, create the PDFImage and a lot
|
|
|
|
|
// of image processing can be done here.
|
|
|
|
|
let objId = `img_${this.idFactory.createObjId()}`,
|
|
|
|
|
cacheGlobally = false;
|
|
|
|
|
|
|
|
|
|
if (this.parsingType3Font) {
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
|
objId = `${this.idFactory.getDocId()}_type3_${objId}`;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
} else if (imageRef) {
|
|
|
|
|
cacheGlobally = this.globalImageCache.shouldCache(
|
|
|
|
|
imageRef,
|
|
|
|
|
this.pageIndex
|
|
|
|
|
);
|
2014-03-23 03:15:51 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (cacheGlobally) {
|
|
|
|
|
objId = `${this.idFactory.getDocId()}_${objId}`;
|
2019-04-11 19:26:15 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2019-04-11 19:26:15 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// Ensure that the dependency is added before the image is decoded.
|
|
|
|
|
operatorList.addDependency(objId);
|
|
|
|
|
args = [objId, w, h];
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
|
|
Improve how Type3-fonts with dependencies are handled
While the `CharProcs` streams of Type3-fonts *usually* don't rely on dependencies, such as e.g. images, it does happen in some cases.
Currently any dependencies are simply appended to the parent operatorList, which in practice means *only* the operatorList of the *first* page where the Type3-font is being used.
However, there's one thing that's slightly unfortunate with that approach: Since fonts are global to the PDF document, we really ought to ensure that any Type3 dependencies are appended to the operatorList of *all* pages where the Type3-font is being used. Otherwise there's a theoretical risk that, if one page has its rendering paused, another page may try to use a Type3-font whose dependencies are not yet fully resolved. In that case there would be errors, since Type3 operatorLists are executed synchronously.
Hence this patch, which ensures that all relevant pages will have Type3 dependencies appended to the main operatorList. (Note here that the `OperatorList.addDependencies` method, via `OperatorList.addDependency`, ensures that a dependency is only added *once* to any operatorList.)
Finally, these changes also remove the need for the "waiting for the main-thread"-hack that was added to `PartialEvaluator.buildPaintImageXObject` as part of fixing issue 10717.
2020-07-26 19:23:28 +09:00
|
|
|
|
PDFImage.buildImage({
|
2020-07-05 19:20:10 +09:00
|
|
|
|
xref: this.xref,
|
|
|
|
|
res: resources,
|
|
|
|
|
image,
|
|
|
|
|
isInline,
|
|
|
|
|
pdfFunctionFactory: this._pdfFunctionFactory,
|
|
|
|
|
localColorSpaceCache,
|
|
|
|
|
})
|
|
|
|
|
.then(imageObj => {
|
|
|
|
|
imgData = imageObj.createImageData(/* forceRGBA = */ false);
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
|
|
Improve global image caching for small images (PR 11912 follow-up, issue 12098)
When implementing the `GlobalImageCache` functionality I was mostly worried about the effect of *very large* images, hence the maximum number of cached images were purposely kept quite low[1].
However, there's one fairly obvious problem with that approach: In documents with hundreds, or even thousands, of *small* images the `GlobalImageCache` as implemented becomes essentially pointless.
Hence this patch, where the `GlobalImageCache`-implementation is changed in the following ways:
- We're still guaranteed to be able to cache a *minimum* number of images, set to `10` (similar as before).
- If the *total* size of all the cached image data is below a threshold[2], we're allowed to cache additional images.
This patch thus *improve*, but doesn't completely fix, issue 12098. Note that that document is created by a *very poor* PDF generator, since every single page contains the *entire* document (with all of its /Resources) and to create the individual pages clipping is used.[3]
---
[1] Currently set to `10` images; imagine what would happen to overall memory usage if we encountered e.g. 50 images each 10 MB in size.
[2] This value was chosen, somewhat randomly, to be `40` megabytes; basically five times the [maximum individual image size per page](https://github.com/mozilla/pdf.js/blob/6249ef517d3aaacc9aa6c9e1f5377acfaa4bc2a7/src/display/api.js#L2483-L2484).
[3] This surely has to be some kind of record w.r.t. how badly PDF generators can mess things up...
2021-01-25 23:09:11 +09:00
|
|
|
|
if (cacheKey && imageRef && cacheGlobally) {
|
|
|
|
|
this.globalImageCache.addByteSize(imageRef, imgData.data.length);
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return this._sendImgData(objId, imgData, cacheGlobally);
|
|
|
|
|
})
|
|
|
|
|
.catch(reason => {
|
|
|
|
|
warn(`Unable to decode image "${objId}": "${reason}".`);
|
2013-04-09 07:14:56 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return this._sendImgData(objId, /* imgData = */ null, cacheGlobally);
|
|
|
|
|
});
|
|
|
|
|
|
|
|
|
|
operatorList.addOp(OPS.paintImageXObject, args);
|
|
|
|
|
if (cacheKey) {
|
|
|
|
|
localImageCache.set(cacheKey, imageRef, {
|
|
|
|
|
fn: OPS.paintImageXObject,
|
|
|
|
|
args,
|
|
|
|
|
});
|
|
|
|
|
|
|
|
|
|
if (imageRef) {
|
|
|
|
|
assert(!isInline, "Cannot cache an inline image globally.");
|
|
|
|
|
this.globalImageCache.addPageIndex(imageRef, this.pageIndex);
|
|
|
|
|
|
|
|
|
|
if (cacheGlobally) {
|
|
|
|
|
this.globalImageCache.setData(imageRef, {
|
|
|
|
|
objId,
|
|
|
|
|
fn: OPS.paintImageXObject,
|
|
|
|
|
args,
|
Improve global image caching for small images (PR 11912 follow-up, issue 12098)
When implementing the `GlobalImageCache` functionality I was mostly worried about the effect of *very large* images, hence the maximum number of cached images were purposely kept quite low[1].
However, there's one fairly obvious problem with that approach: In documents with hundreds, or even thousands, of *small* images the `GlobalImageCache` as implemented becomes essentially pointless.
Hence this patch, where the `GlobalImageCache`-implementation is changed in the following ways:
- We're still guaranteed to be able to cache a *minimum* number of images, set to `10` (similar as before).
- If the *total* size of all the cached image data is below a threshold[2], we're allowed to cache additional images.
This patch thus *improve*, but doesn't completely fix, issue 12098. Note that that document is created by a *very poor* PDF generator, since every single page contains the *entire* document (with all of its /Resources) and to create the individual pages clipping is used.[3]
---
[1] Currently set to `10` images; imagine what would happen to overall memory usage if we encountered e.g. 50 images each 10 MB in size.
[2] This value was chosen, somewhat randomly, to be `40` megabytes; basically five times the [maximum individual image size per page](https://github.com/mozilla/pdf.js/blob/6249ef517d3aaacc9aa6c9e1f5377acfaa4bc2a7/src/display/api.js#L2483-L2484).
[3] This surely has to be some kind of record w.r.t. how badly PDF generators can mess things up...
2021-01-25 23:09:11 +09:00
|
|
|
|
byteSize: 0, // Temporary entry, note `addByteSize` above.
|
2020-07-05 19:20:10 +09:00
|
|
|
|
});
|
2015-12-05 03:52:45 +09:00
|
|
|
|
}
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2021-08-24 18:30:19 +09:00
|
|
|
|
|
2021-08-27 00:05:30 +09:00
|
|
|
|
if (optionalContent !== undefined) {
|
2021-08-24 18:30:19 +09:00
|
|
|
|
operatorList.addOp(OPS.endMarkedContent, []);
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2015-12-05 03:52:45 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
handleSMask(
|
|
|
|
|
smask,
|
|
|
|
|
resources,
|
|
|
|
|
operatorList,
|
|
|
|
|
task,
|
|
|
|
|
stateManager,
|
|
|
|
|
localColorSpaceCache
|
|
|
|
|
) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const smaskContent = smask.get("G");
|
|
|
|
|
const smaskOptions = {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
subtype: smask.get("S").name,
|
|
|
|
|
backdrop: smask.get("BC"),
|
|
|
|
|
};
|
2014-01-24 02:13:32 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// The SMask might have a alpha/luminosity value transfer function --
|
|
|
|
|
// we will build a map of integer values in range 0..255 to be fast.
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const transferObj = smask.get("TR");
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (isPDFFunction(transferObj)) {
|
|
|
|
|
const transferFn = this._pdfFunctionFactory.create(transferObj);
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const transferMap = new Uint8Array(256);
|
|
|
|
|
const tmp = new Float32Array(1);
|
|
|
|
|
for (let i = 0; i < 256; i++) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
tmp[0] = i / 255;
|
|
|
|
|
transferFn(tmp, 0, tmp, 0);
|
|
|
|
|
transferMap[i] = (tmp[0] * 255) | 0;
|
|
|
|
|
}
|
|
|
|
|
smaskOptions.transferMap = transferMap;
|
|
|
|
|
}
|
2012-09-14 00:09:46 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return this.buildFormXObject(
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
resources,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
smaskContent,
|
|
|
|
|
smaskOptions,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
operatorList,
|
|
|
|
|
task,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
stateManager.state.clone(),
|
|
|
|
|
localColorSpaceCache
|
|
|
|
|
);
|
|
|
|
|
}
|
2013-08-01 03:17:36 +09:00
|
|
|
|
|
2020-08-17 15:49:19 +09:00
|
|
|
|
handleTransferFunction(tr) {
|
|
|
|
|
let transferArray;
|
|
|
|
|
if (Array.isArray(tr)) {
|
|
|
|
|
transferArray = tr;
|
|
|
|
|
} else if (isPDFFunction(tr)) {
|
|
|
|
|
transferArray = [tr];
|
|
|
|
|
} else {
|
|
|
|
|
return null; // Not a valid transfer function entry.
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
const transferMaps = [];
|
|
|
|
|
let numFns = 0,
|
|
|
|
|
numEffectfulFns = 0;
|
|
|
|
|
for (const entry of transferArray) {
|
|
|
|
|
const transferObj = this.xref.fetchIfRef(entry);
|
|
|
|
|
numFns++;
|
|
|
|
|
|
|
|
|
|
if (isName(transferObj, "Identity")) {
|
|
|
|
|
transferMaps.push(null);
|
|
|
|
|
continue;
|
|
|
|
|
} else if (!isPDFFunction(transferObj)) {
|
|
|
|
|
return null; // Not a valid transfer function object.
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
const transferFn = this._pdfFunctionFactory.create(transferObj);
|
|
|
|
|
const transferMap = new Uint8Array(256),
|
|
|
|
|
tmp = new Float32Array(1);
|
|
|
|
|
for (let j = 0; j < 256; j++) {
|
|
|
|
|
tmp[0] = j / 255;
|
|
|
|
|
transferFn(tmp, 0, tmp, 0);
|
|
|
|
|
transferMap[j] = (tmp[0] * 255) | 0;
|
|
|
|
|
}
|
|
|
|
|
transferMaps.push(transferMap);
|
|
|
|
|
numEffectfulFns++;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (!(numFns === 1 || numFns === 4)) {
|
|
|
|
|
return null; // Only 1 or 4 functions are supported, by the specification.
|
|
|
|
|
}
|
|
|
|
|
if (numEffectfulFns === 0) {
|
|
|
|
|
return null; // Only /Identity transfer functions found, which are no-ops.
|
|
|
|
|
}
|
|
|
|
|
return transferMaps;
|
|
|
|
|
}
|
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
handleTilingType(
|
|
|
|
|
fn,
|
2020-10-09 00:33:23 +09:00
|
|
|
|
color,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
resources,
|
|
|
|
|
pattern,
|
|
|
|
|
patternDict,
|
|
|
|
|
operatorList,
|
2020-10-09 00:33:23 +09:00
|
|
|
|
task,
|
|
|
|
|
localTilingPatternCache
|
2020-07-05 19:20:10 +09:00
|
|
|
|
) {
|
|
|
|
|
// Create an IR of the pattern code.
|
|
|
|
|
const tilingOpList = new OperatorList();
|
|
|
|
|
// Merge the available resources, to prevent issues when the patternDict
|
|
|
|
|
// is missing some /Resources entries (fixes issue6541.pdf).
|
2020-08-28 08:05:33 +09:00
|
|
|
|
const patternResources = Dict.merge({
|
|
|
|
|
xref: this.xref,
|
|
|
|
|
dictArray: [patternDict.get("Resources"), resources],
|
|
|
|
|
});
|
2020-07-05 19:20:10 +09:00
|
|
|
|
|
|
|
|
|
return this.getOperatorList({
|
|
|
|
|
stream: pattern,
|
|
|
|
|
task,
|
|
|
|
|
resources: patternResources,
|
|
|
|
|
operatorList: tilingOpList,
|
|
|
|
|
})
|
|
|
|
|
.then(function () {
|
2020-10-09 00:33:23 +09:00
|
|
|
|
const operatorListIR = tilingOpList.getIR();
|
|
|
|
|
const tilingPatternIR = getTilingPatternIR(
|
|
|
|
|
operatorListIR,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
patternDict,
|
2020-10-09 00:33:23 +09:00
|
|
|
|
color
|
2020-07-05 19:20:10 +09:00
|
|
|
|
);
|
2020-10-09 00:33:23 +09:00
|
|
|
|
// Add the dependencies to the parent operator list so they are
|
|
|
|
|
// resolved before the sub operator list is executed synchronously.
|
|
|
|
|
operatorList.addDependencies(tilingOpList.dependencies);
|
|
|
|
|
operatorList.addOp(fn, tilingPatternIR);
|
|
|
|
|
|
2021-08-18 19:49:01 +09:00
|
|
|
|
if (patternDict.objId) {
|
|
|
|
|
localTilingPatternCache.set(/* name = */ null, patternDict.objId, {
|
2020-10-09 00:33:23 +09:00
|
|
|
|
operatorListIR,
|
|
|
|
|
dict: patternDict,
|
|
|
|
|
});
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
})
|
2020-10-09 00:33:23 +09:00
|
|
|
|
.catch(reason => {
|
|
|
|
|
if (reason instanceof AbortException) {
|
|
|
|
|
return;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2020-10-09 00:33:23 +09:00
|
|
|
|
if (this.options.ignoreErrors) {
|
|
|
|
|
// Error(s) in the TilingPattern -- sending unsupported feature
|
|
|
|
|
// notification and allow rendering to continue.
|
|
|
|
|
this.handler.send("UnsupportedFeature", {
|
|
|
|
|
featureId: UNSUPPORTED_FEATURES.errorTilingPattern,
|
|
|
|
|
});
|
|
|
|
|
warn(`handleTilingType - ignoring pattern: "${reason}".`);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
throw reason;
|
|
|
|
|
});
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2013-03-13 09:20:38 +09:00
|
|
|
|
|
2020-10-22 00:21:33 +09:00
|
|
|
|
handleSetFont(
|
|
|
|
|
resources,
|
|
|
|
|
fontArgs,
|
|
|
|
|
fontRef,
|
|
|
|
|
operatorList,
|
|
|
|
|
task,
|
|
|
|
|
state,
|
2021-03-26 17:28:18 +09:00
|
|
|
|
fallbackFontDict = null,
|
|
|
|
|
cssFontInfo = null
|
2020-10-22 00:21:33 +09:00
|
|
|
|
) {
|
2021-04-01 22:19:45 +09:00
|
|
|
|
const fontName =
|
|
|
|
|
fontArgs && fontArgs[0] instanceof Name ? fontArgs[0].name : null;
|
2014-05-04 00:28:30 +09:00
|
|
|
|
|
2021-03-26 17:28:18 +09:00
|
|
|
|
return this.loadFont(
|
|
|
|
|
fontName,
|
|
|
|
|
fontRef,
|
|
|
|
|
resources,
|
|
|
|
|
fallbackFontDict,
|
|
|
|
|
cssFontInfo
|
|
|
|
|
)
|
2020-07-05 19:20:10 +09:00
|
|
|
|
.then(translated => {
|
|
|
|
|
if (!translated.font.isType3Font) {
|
|
|
|
|
return translated;
|
2013-08-20 08:33:20 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return translated
|
Improve how Type3-fonts with dependencies are handled
While the `CharProcs` streams of Type3-fonts *usually* don't rely on dependencies, such as e.g. images, it does happen in some cases.
Currently any dependencies are simply appended to the parent operatorList, which in practice means *only* the operatorList of the *first* page where the Type3-font is being used.
However, there's one thing that's slightly unfortunate with that approach: Since fonts are global to the PDF document, we really ought to ensure that any Type3 dependencies are appended to the operatorList of *all* pages where the Type3-font is being used. Otherwise there's a theoretical risk that, if one page has its rendering paused, another page may try to use a Type3-font whose dependencies are not yet fully resolved. In that case there would be errors, since Type3 operatorLists are executed synchronously.
Hence this patch, which ensures that all relevant pages will have Type3 dependencies appended to the main operatorList. (Note here that the `OperatorList.addDependencies` method, via `OperatorList.addDependency`, ensures that a dependency is only added *once* to any operatorList.)
Finally, these changes also remove the need for the "waiting for the main-thread"-hack that was added to `PartialEvaluator.buildPaintImageXObject` as part of fixing issue 10717.
2020-07-26 19:23:28 +09:00
|
|
|
|
.loadType3Data(this, resources, task)
|
2020-07-05 19:20:10 +09:00
|
|
|
|
.then(function () {
|
Improve how Type3-fonts with dependencies are handled
While the `CharProcs` streams of Type3-fonts *usually* don't rely on dependencies, such as e.g. images, it does happen in some cases.
Currently any dependencies are simply appended to the parent operatorList, which in practice means *only* the operatorList of the *first* page where the Type3-font is being used.
However, there's one thing that's slightly unfortunate with that approach: Since fonts are global to the PDF document, we really ought to ensure that any Type3 dependencies are appended to the operatorList of *all* pages where the Type3-font is being used. Otherwise there's a theoretical risk that, if one page has its rendering paused, another page may try to use a Type3-font whose dependencies are not yet fully resolved. In that case there would be errors, since Type3 operatorLists are executed synchronously.
Hence this patch, which ensures that all relevant pages will have Type3 dependencies appended to the main operatorList. (Note here that the `OperatorList.addDependencies` method, via `OperatorList.addDependency`, ensures that a dependency is only added *once* to any operatorList.)
Finally, these changes also remove the need for the "waiting for the main-thread"-hack that was added to `PartialEvaluator.buildPaintImageXObject` as part of fixing issue 10717.
2020-07-26 19:23:28 +09:00
|
|
|
|
// Add the dependencies to the parent operatorList so they are
|
|
|
|
|
// resolved before Type3 operatorLists are executed synchronously.
|
|
|
|
|
operatorList.addDependencies(translated.type3Dependencies);
|
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return translated;
|
|
|
|
|
})
|
|
|
|
|
.catch(reason => {
|
|
|
|
|
// Error in the font data -- sending unsupported feature
|
|
|
|
|
// notification.
|
|
|
|
|
this.handler.send("UnsupportedFeature", {
|
|
|
|
|
featureId: UNSUPPORTED_FEATURES.errorFontLoadType3,
|
|
|
|
|
});
|
|
|
|
|
return new TranslatedFont({
|
|
|
|
|
loadedName: "g_font_error",
|
|
|
|
|
font: new ErrorFont(`Type3 font load error: ${reason}`),
|
|
|
|
|
dict: translated.font,
|
2021-05-16 01:21:18 +09:00
|
|
|
|
evaluatorOptions: this.options,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
});
|
|
|
|
|
});
|
|
|
|
|
})
|
|
|
|
|
.then(translated => {
|
|
|
|
|
state.font = translated.font;
|
|
|
|
|
translated.send(this.handler);
|
|
|
|
|
return translated.loadedName;
|
|
|
|
|
});
|
|
|
|
|
}
|
2020-03-02 23:34:00 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
handleText(chars, state) {
|
|
|
|
|
const font = state.font;
|
|
|
|
|
const glyphs = font.charsToGlyphs(chars);
|
|
|
|
|
|
|
|
|
|
if (font.data) {
|
|
|
|
|
const isAddToPathSet = !!(
|
|
|
|
|
state.textRenderingMode & TextRenderingMode.ADD_TO_PATH_FLAG
|
|
|
|
|
);
|
|
|
|
|
if (
|
|
|
|
|
isAddToPathSet ||
|
|
|
|
|
state.fillColorSpace.name === "Pattern" ||
|
|
|
|
|
font.disableFontFace ||
|
|
|
|
|
this.options.disableFontFace
|
|
|
|
|
) {
|
2021-05-16 01:21:18 +09:00
|
|
|
|
PartialEvaluator.buildFontPaths(
|
|
|
|
|
font,
|
|
|
|
|
glyphs,
|
|
|
|
|
this.handler,
|
|
|
|
|
this.options
|
|
|
|
|
);
|
2020-03-02 23:34:00 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
return glyphs;
|
|
|
|
|
}
|
2020-03-02 23:34:00 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
ensureStateFont(state) {
|
|
|
|
|
if (state.font) {
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
const reason = new FormatError(
|
|
|
|
|
"Missing setFont (Tf) operator before text rendering operator."
|
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
if (this.options.ignoreErrors) {
|
|
|
|
|
// Missing setFont operator before text rendering operator -- sending
|
|
|
|
|
// unsupported feature notification and allow rendering to continue.
|
|
|
|
|
this.handler.send("UnsupportedFeature", {
|
|
|
|
|
featureId: UNSUPPORTED_FEATURES.errorFontState,
|
|
|
|
|
});
|
|
|
|
|
warn(`ensureStateFont: "${reason}".`);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
throw reason;
|
|
|
|
|
}
|
|
|
|
|
|
2020-07-15 21:25:24 +09:00
|
|
|
|
async setGState({
|
2020-07-05 19:20:10 +09:00
|
|
|
|
resources,
|
|
|
|
|
gState,
|
|
|
|
|
operatorList,
|
2020-07-11 20:52:11 +09:00
|
|
|
|
cacheKey,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
task,
|
|
|
|
|
stateManager,
|
2020-07-11 20:52:11 +09:00
|
|
|
|
localGStateCache,
|
|
|
|
|
localColorSpaceCache,
|
|
|
|
|
}) {
|
|
|
|
|
const gStateRef = gState.objId;
|
|
|
|
|
let isSimpleGState = true;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// This array holds the converted/processed state data.
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const gStateObj = [];
|
|
|
|
|
const gStateKeys = gState.getKeys();
|
|
|
|
|
let promise = Promise.resolve();
|
|
|
|
|
for (let i = 0, ii = gStateKeys.length; i < ii; i++) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
const key = gStateKeys[i];
|
|
|
|
|
const value = gState.get(key);
|
|
|
|
|
switch (key) {
|
|
|
|
|
case "Type":
|
|
|
|
|
break;
|
|
|
|
|
case "LW":
|
|
|
|
|
case "LC":
|
|
|
|
|
case "LJ":
|
|
|
|
|
case "ML":
|
|
|
|
|
case "D":
|
|
|
|
|
case "RI":
|
|
|
|
|
case "FL":
|
|
|
|
|
case "CA":
|
|
|
|
|
case "ca":
|
|
|
|
|
gStateObj.push([key, value]);
|
|
|
|
|
break;
|
|
|
|
|
case "Font":
|
2020-08-17 00:32:15 +09:00
|
|
|
|
isSimpleGState = false;
|
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
promise = promise.then(() => {
|
|
|
|
|
return this.handleSetFont(
|
|
|
|
|
resources,
|
|
|
|
|
null,
|
|
|
|
|
value[0],
|
|
|
|
|
operatorList,
|
|
|
|
|
task,
|
|
|
|
|
stateManager.state
|
|
|
|
|
).then(function (loadedName) {
|
|
|
|
|
operatorList.addDependency(loadedName);
|
|
|
|
|
gStateObj.push([key, [loadedName, value[1]]]);
|
|
|
|
|
});
|
|
|
|
|
});
|
|
|
|
|
break;
|
|
|
|
|
case "BM":
|
|
|
|
|
gStateObj.push([key, normalizeBlendMode(value)]);
|
|
|
|
|
break;
|
|
|
|
|
case "SMask":
|
|
|
|
|
if (isName(value, "None")) {
|
|
|
|
|
gStateObj.push([key, false]);
|
2013-04-09 07:14:56 +09:00
|
|
|
|
break;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2022-02-21 20:44:56 +09:00
|
|
|
|
if (value instanceof Dict) {
|
2020-07-11 20:52:11 +09:00
|
|
|
|
isSimpleGState = false;
|
|
|
|
|
|
2017-04-30 06:36:43 +09:00
|
|
|
|
promise = promise.then(() => {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return this.handleSMask(
|
|
|
|
|
value,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
resources,
|
|
|
|
|
operatorList,
|
|
|
|
|
task,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
stateManager,
|
|
|
|
|
localColorSpaceCache
|
|
|
|
|
);
|
2014-05-10 10:21:15 +09:00
|
|
|
|
});
|
2020-07-05 19:20:10 +09:00
|
|
|
|
gStateObj.push([key, true]);
|
|
|
|
|
} else {
|
|
|
|
|
warn("Unsupported SMask type");
|
|
|
|
|
}
|
2020-08-17 15:49:19 +09:00
|
|
|
|
break;
|
|
|
|
|
case "TR":
|
|
|
|
|
const transferMaps = this.handleTransferFunction(value);
|
|
|
|
|
gStateObj.push([key, transferMaps]);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
break;
|
|
|
|
|
// Only generate info log messages for the following since
|
|
|
|
|
// they are unlikely to have a big impact on the rendering.
|
|
|
|
|
case "OP":
|
|
|
|
|
case "op":
|
|
|
|
|
case "OPM":
|
|
|
|
|
case "BG":
|
|
|
|
|
case "BG2":
|
|
|
|
|
case "UCR":
|
|
|
|
|
case "UCR2":
|
|
|
|
|
case "TR2":
|
|
|
|
|
case "HT":
|
|
|
|
|
case "SM":
|
|
|
|
|
case "SA":
|
|
|
|
|
case "AIS":
|
|
|
|
|
case "TK":
|
|
|
|
|
// TODO implement these operators.
|
|
|
|
|
info("graphic state operator " + key);
|
|
|
|
|
break;
|
|
|
|
|
default:
|
|
|
|
|
info("Unknown graphic state operator " + key);
|
|
|
|
|
break;
|
2013-04-09 07:14:56 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
return promise.then(function () {
|
|
|
|
|
if (gStateObj.length > 0) {
|
|
|
|
|
operatorList.addOp(OPS.setGState, [gStateObj]);
|
2013-06-26 02:33:53 +09:00
|
|
|
|
}
|
2020-07-11 20:52:11 +09:00
|
|
|
|
|
|
|
|
|
if (isSimpleGState) {
|
|
|
|
|
localGStateCache.set(cacheKey, gStateRef, gStateObj);
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
});
|
|
|
|
|
}
|
2019-10-02 21:13:49 +09:00
|
|
|
|
|
2021-03-26 17:28:18 +09:00
|
|
|
|
loadFont(
|
|
|
|
|
fontName,
|
|
|
|
|
font,
|
|
|
|
|
resources,
|
|
|
|
|
fallbackFontDict = null,
|
|
|
|
|
cssFontInfo = null
|
|
|
|
|
) {
|
2020-10-17 00:45:01 +09:00
|
|
|
|
const errorFont = async () => {
|
|
|
|
|
return new TranslatedFont({
|
|
|
|
|
loadedName: "g_font_error",
|
|
|
|
|
font: new ErrorFont(`Font "${fontName}" is not available.`),
|
|
|
|
|
dict: font,
|
2021-05-16 01:21:18 +09:00
|
|
|
|
evaluatorOptions: this.options,
|
2020-10-17 00:45:01 +09:00
|
|
|
|
});
|
2020-07-05 19:20:10 +09:00
|
|
|
|
};
|
2019-10-02 21:13:49 +09:00
|
|
|
|
|
2021-05-06 17:08:09 +09:00
|
|
|
|
const xref = this.xref;
|
|
|
|
|
let fontRef;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (font) {
|
|
|
|
|
// Loading by ref.
|
2022-02-18 20:11:45 +09:00
|
|
|
|
if (!(font instanceof Ref)) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
throw new FormatError('The "font" object should be a reference.');
|
2014-06-12 19:46:39 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
fontRef = font;
|
|
|
|
|
} else {
|
|
|
|
|
// Loading by name.
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const fontRes = resources.get("Font");
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (fontRes) {
|
|
|
|
|
fontRef = fontRes.getRaw(fontName);
|
2013-06-26 02:33:53 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
if (!fontRef) {
|
|
|
|
|
const partialMsg = `Font "${
|
|
|
|
|
fontName || (font && font.toString())
|
|
|
|
|
}" is not available`;
|
2013-06-26 02:33:53 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (!this.options.ignoreErrors && !this.parsingType3Font) {
|
|
|
|
|
warn(`${partialMsg}.`);
|
2013-08-01 03:17:36 +09:00
|
|
|
|
return errorFont();
|
2013-05-04 03:13:45 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// Font not found -- sending unsupported feature notification.
|
|
|
|
|
this.handler.send("UnsupportedFeature", {
|
|
|
|
|
featureId: UNSUPPORTED_FEATURES.errorFontMissing,
|
|
|
|
|
});
|
|
|
|
|
warn(`${partialMsg} -- attempting to fallback to a default font.`);
|
2014-03-04 02:44:45 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// Falling back to a default font to avoid completely broken rendering,
|
|
|
|
|
// but note that there're no guarantees that things will look "correct".
|
2020-10-22 00:21:33 +09:00
|
|
|
|
if (fallbackFontDict) {
|
|
|
|
|
fontRef = fallbackFontDict;
|
|
|
|
|
} else {
|
|
|
|
|
fontRef = PartialEvaluator.fallbackFontDict;
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2014-05-20 06:27:54 +09:00
|
|
|
|
|
2022-01-14 01:36:36 +09:00
|
|
|
|
if (this.parsingType3Font && this.type3FontRefs.has(fontRef)) {
|
|
|
|
|
return errorFont();
|
|
|
|
|
}
|
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (this.fontCache.has(fontRef)) {
|
|
|
|
|
return this.fontCache.get(fontRef);
|
|
|
|
|
}
|
2014-05-10 10:21:15 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
font = xref.fetchIfRef(fontRef);
|
2022-02-21 20:44:56 +09:00
|
|
|
|
if (!(font instanceof Dict)) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return errorFont();
|
|
|
|
|
}
|
Slightly refactor the `fontRef` handling in `PartialEvaluator_loadFont` (issue 7403 and issue 7402)
Originally, I was just going to change this code to use `Ref_toString` in a couple more places. When I started reading the code, I figured that it wouldn't hurt to clean up a couple of comments. While doing this, I noticed that the logic for the (rare) `isDict(fontRef)` case could do with a few improvements.
There should be no functional changes with this patch, but given the added reference checks, we will now avoid bogus `Ref`s when resolving font aliases. In practice, as issue 7403 shows, the current code can break certain PDF files even if it's very rare.
Note that the only thing that this patch will change, is the `font.loadedName` in the case where a `fontRef` is a reference *and* the font doesn't have a descriptor. Previously for `fontRef = Ref(4, 0)` we'd get `font.loadedName = 'g_d0_f4_0'`, and with this patch `font.loadedName = g_d0_f4R`, which is actually one character shorted in most cases. (Given that `Ref_toString` contains an optimization for the `gen === 0` case, which is by far the most common `gen` value.)
In the already existing fallback case, where the `fontName` is used to when creating the `font.loadedName`, we allow any alphanumeric character. Hence I don't see how (as mentioned above) e.g. `font.loadedName = g_d0_f4R` would be an issue here.
2016-05-23 22:32:04 +09:00
|
|
|
|
|
2020-10-17 00:45:01 +09:00
|
|
|
|
// We are holding `font.cacheKey` references only for `fontRef`s that
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// are not actually `Ref`s, but rather `Dict`s. See explanation below.
|
2020-10-17 00:45:01 +09:00
|
|
|
|
if (font.cacheKey && this.fontCache.has(font.cacheKey)) {
|
|
|
|
|
return this.fontCache.get(font.cacheKey);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
Slightly refactor the `fontRef` handling in `PartialEvaluator_loadFont` (issue 7403 and issue 7402)
Originally, I was just going to change this code to use `Ref_toString` in a couple more places. When I started reading the code, I figured that it wouldn't hurt to clean up a couple of comments. While doing this, I noticed that the logic for the (rare) `isDict(fontRef)` case could do with a few improvements.
There should be no functional changes with this patch, but given the added reference checks, we will now avoid bogus `Ref`s when resolving font aliases. In practice, as issue 7403 shows, the current code can break certain PDF files even if it's very rare.
Note that the only thing that this patch will change, is the `font.loadedName` in the case where a `fontRef` is a reference *and* the font doesn't have a descriptor. Previously for `fontRef = Ref(4, 0)` we'd get `font.loadedName = 'g_d0_f4_0'`, and with this patch `font.loadedName = g_d0_f4R`, which is actually one character shorted in most cases. (Given that `Ref_toString` contains an optimization for the `gen === 0` case, which is by far the most common `gen` value.)
In the already existing fallback case, where the `fontName` is used to when creating the `font.loadedName`, we allow any alphanumeric character. Hence I don't see how (as mentioned above) e.g. `font.loadedName = g_d0_f4R` would be an issue here.
2016-05-23 22:32:04 +09:00
|
|
|
|
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const fontCapability = createPromiseCapability();
|
2020-07-05 19:20:10 +09:00
|
|
|
|
|
2021-01-07 19:25:09 +09:00
|
|
|
|
let preEvaluatedFont;
|
|
|
|
|
try {
|
|
|
|
|
preEvaluatedFont = this.preEvaluateFont(font);
|
2021-03-26 17:28:18 +09:00
|
|
|
|
preEvaluatedFont.cssFontInfo = cssFontInfo;
|
2021-01-07 19:25:09 +09:00
|
|
|
|
} catch (reason) {
|
2021-02-07 01:48:26 +09:00
|
|
|
|
warn(`loadFont - preEvaluateFont failed: "${reason}".`);
|
2021-01-07 19:25:09 +09:00
|
|
|
|
return errorFont();
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
const { descriptor, hash } = preEvaluatedFont;
|
|
|
|
|
|
2022-02-18 20:11:45 +09:00
|
|
|
|
const fontRefIsRef = fontRef instanceof Ref;
|
2021-05-06 17:08:09 +09:00
|
|
|
|
let fontID;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (fontRefIsRef) {
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
|
fontID = `f${fontRef.toString()}`;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
|
2022-02-21 20:44:56 +09:00
|
|
|
|
if (hash && descriptor instanceof Dict) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (!descriptor.fontAliases) {
|
|
|
|
|
descriptor.fontAliases = Object.create(null);
|
|
|
|
|
}
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const fontAliases = descriptor.fontAliases;
|
2014-03-04 02:44:45 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (fontAliases[hash]) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const aliasFontRef = fontAliases[hash].aliasRef;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (fontRefIsRef && aliasFontRef && this.fontCache.has(aliasFontRef)) {
|
|
|
|
|
this.fontCache.putAlias(fontRef, aliasFontRef);
|
|
|
|
|
return this.fontCache.get(fontRef);
|
Slightly refactor the `fontRef` handling in `PartialEvaluator_loadFont` (issue 7403 and issue 7402)
Originally, I was just going to change this code to use `Ref_toString` in a couple more places. When I started reading the code, I figured that it wouldn't hurt to clean up a couple of comments. While doing this, I noticed that the logic for the (rare) `isDict(fontRef)` case could do with a few improvements.
There should be no functional changes with this patch, but given the added reference checks, we will now avoid bogus `Ref`s when resolving font aliases. In practice, as issue 7403 shows, the current code can break certain PDF files even if it's very rare.
Note that the only thing that this patch will change, is the `font.loadedName` in the case where a `fontRef` is a reference *and* the font doesn't have a descriptor. Previously for `fontRef = Ref(4, 0)` we'd get `font.loadedName = 'g_d0_f4_0'`, and with this patch `font.loadedName = g_d0_f4R`, which is actually one character shorted in most cases. (Given that `Ref_toString` contains an optimization for the `gen === 0` case, which is by far the most common `gen` value.)
In the already existing fallback case, where the `fontName` is used to when creating the `font.loadedName`, we allow any alphanumeric character. Hence I don't see how (as mentioned above) e.g. `font.loadedName = g_d0_f4R` would be an issue here.
2016-05-23 22:32:04 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
} else {
|
|
|
|
|
fontAliases[hash] = {
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
|
fontID: this.idFactory.createFontId(),
|
2020-07-05 19:20:10 +09:00
|
|
|
|
};
|
2014-03-04 02:44:45 +09:00
|
|
|
|
}
|
|
|
|
|
|
Slightly refactor the `fontRef` handling in `PartialEvaluator_loadFont` (issue 7403 and issue 7402)
Originally, I was just going to change this code to use `Ref_toString` in a couple more places. When I started reading the code, I figured that it wouldn't hurt to clean up a couple of comments. While doing this, I noticed that the logic for the (rare) `isDict(fontRef)` case could do with a few improvements.
There should be no functional changes with this patch, but given the added reference checks, we will now avoid bogus `Ref`s when resolving font aliases. In practice, as issue 7403 shows, the current code can break certain PDF files even if it's very rare.
Note that the only thing that this patch will change, is the `font.loadedName` in the case where a `fontRef` is a reference *and* the font doesn't have a descriptor. Previously for `fontRef = Ref(4, 0)` we'd get `font.loadedName = 'g_d0_f4_0'`, and with this patch `font.loadedName = g_d0_f4R`, which is actually one character shorted in most cases. (Given that `Ref_toString` contains an optimization for the `gen === 0` case, which is by far the most common `gen` value.)
In the already existing fallback case, where the `fontName` is used to when creating the `font.loadedName`, we allow any alphanumeric character. Hence I don't see how (as mentioned above) e.g. `font.loadedName = g_d0_f4R` would be an issue here.
2016-05-23 22:32:04 +09:00
|
|
|
|
if (fontRefIsRef) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
fontAliases[hash].aliasRef = fontRef;
|
2013-12-17 08:19:31 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
fontID = fontAliases[hash].fontID;
|
|
|
|
|
}
|
2013-05-04 03:13:45 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// Workaround for bad PDF generators that reference fonts incorrectly,
|
|
|
|
|
// where `fontRef` is a `Dict` rather than a `Ref` (fixes bug946506.pdf).
|
2020-10-17 00:45:01 +09:00
|
|
|
|
// In this case we cannot put the font into `this.fontCache` (which is
|
|
|
|
|
// a `RefSetCache`), since it's not possible to use a `Dict` as a key.
|
2020-07-05 19:20:10 +09:00
|
|
|
|
//
|
|
|
|
|
// However, if we don't cache the font it's not possible to remove it
|
|
|
|
|
// when `cleanup` is triggered from the API, which causes issues on
|
2020-10-17 00:45:01 +09:00
|
|
|
|
// subsequent rendering operations (see issue7403.pdf) and would force us
|
|
|
|
|
// to unnecessarily load the same fonts over and over.
|
2020-07-05 19:20:10 +09:00
|
|
|
|
//
|
2020-10-17 00:45:01 +09:00
|
|
|
|
// Instead, we cheat a bit by using a modified `fontID` as a key in
|
|
|
|
|
// `this.fontCache`, to allow the font to be cached.
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// NOTE: This works because `RefSetCache` calls `toString()` on provided
|
|
|
|
|
// keys. Also, since `fontRef` is used when getting cached fonts,
|
|
|
|
|
// we'll not accidentally match fonts cached with the `fontID`.
|
|
|
|
|
if (fontRefIsRef) {
|
|
|
|
|
this.fontCache.put(fontRef, fontCapability.promise);
|
|
|
|
|
} else {
|
|
|
|
|
if (!fontID) {
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
|
fontID = this.idFactory.createFontId();
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2020-10-17 00:45:01 +09:00
|
|
|
|
font.cacheKey = `cacheKey_${fontID}`;
|
|
|
|
|
this.fontCache.put(font.cacheKey, fontCapability.promise);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
|
assert(
|
|
|
|
|
fontID && fontID.startsWith("f"),
|
|
|
|
|
'The "fontID" must be (correctly) defined.'
|
|
|
|
|
);
|
2012-02-21 22:28:42 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// Keep track of each font we translated so the caller can
|
|
|
|
|
// load them asynchronously before calling display on a page.
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
|
font.loadedName = `${this.idFactory.getDocId()}_${fontID}`;
|
2013-04-09 07:14:56 +09:00
|
|
|
|
|
2020-10-15 16:30:54 +09:00
|
|
|
|
this.translateFont(preEvaluatedFont)
|
2020-07-05 19:20:10 +09:00
|
|
|
|
.then(translatedFont => {
|
|
|
|
|
if (translatedFont.fontType !== undefined) {
|
[api-minor] Replace `PDFDocumentProxy.getStats` with a synchronous `PDFDocumentProxy.stats` getter
*Please note:* These changes will primarily benefit longer documents, somewhat at the expense of e.g. one-page documents.
The existing `PDFDocumentProxy.getStats` function, which in the default viewer is called for each rendered page, requires a round-trip to the worker-thread in order to obtain the current document stats. In the default viewer, we currently make one such API-call for *every rendered* page.
This patch proposes replacing that method with a *synchronous* `PDFDocumentProxy.stats` getter instead, combined with re-factoring the worker-thread code by adding a `DocStats`-class to track Stream/Font-types and *only send* them to the main-thread *the first time* that a type is encountered.
Note that in practice most PDF documents only use a fairly limited number of Stream/Font-types, which means that in longer documents most of the `PDFDocumentProxy.getStats`-calls will return the same data.[1]
This re-factoring will obviously benefit longer document the most[2], and could actually be seen as a regression for one-page documents, since in practice there'll usually be a couple of "DocStats" messages sent during the parsing of the first page. However, if the user zooms/rotates the document (which causes re-rendering), note that even a one-page document would start to benefit from these changes.
Another benefit of having the data available/cached in the API is that unless the document stats change during parsing, repeated `PDFDocumentProxy.stats`-calls will return *the same identical* object.
This is something that we can easily take advantage of in the default viewer, by now *only* reporting "documentStats" telemetry[3] when the data actually have changed rather than once per rendered page (again beneficial in longer documents).
---
[1] Furthermore, the maximium number of `StreamType`/`FontType` are `10` respectively `12`, which means that regardless of the complexity and page count in a PDF document there'll never be more than twenty-two "DocStats" messages sent; see https://github.com/mozilla/pdf.js/blob/41ac3f0c07128bf34baccdcc067a108c712fd6ef/src/shared/util.js#L206-L232
[2] One example is the `pdf.pdf` document in the test-suite, where rendering all of its 1310 pages only result in a total of seven "DocStats" messages being sent from the worker-thread.
[3] Reporting telemetry, in Firefox, includes using `JSON.stringify` on the data and then sending an event to the `PdfStreamConverter.jsm`-code.
In that code the event is handled and `JSON.parse` is used to retrieve the data, and in the "documentStats"-case we'll then iterate through the data to avoid double-reporting telemetry; see https://searchfox.org/mozilla-central/rev/8f4c180b87e52f3345ef8a3432d6e54bd1eb18dc/toolkit/components/pdfjs/content/PdfStreamConverter.jsm#515-549
2021-11-12 02:14:26 +09:00
|
|
|
|
xref.stats.addFontType(translatedFont.fontType);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
fontCapability.resolve(
|
|
|
|
|
new TranslatedFont({
|
|
|
|
|
loadedName: font.loadedName,
|
|
|
|
|
font: translatedFont,
|
|
|
|
|
dict: font,
|
2021-05-16 01:21:18 +09:00
|
|
|
|
evaluatorOptions: this.options,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
})
|
|
|
|
|
);
|
|
|
|
|
})
|
|
|
|
|
.catch(reason => {
|
|
|
|
|
// TODO fontCapability.reject?
|
|
|
|
|
// Error in the font data -- sending unsupported feature notification.
|
|
|
|
|
this.handler.send("UnsupportedFeature", {
|
|
|
|
|
featureId: UNSUPPORTED_FEATURES.errorFontTranslate,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
});
|
2021-02-07 01:48:26 +09:00
|
|
|
|
warn(`loadFont - translateFont failed: "${reason}".`);
|
2013-04-09 07:14:56 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
try {
|
|
|
|
|
// error, but it's still nice to have font type reported
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const fontFile3 = descriptor && descriptor.get("FontFile3");
|
|
|
|
|
const subtype = fontFile3 && fontFile3.get("Subtype");
|
|
|
|
|
const fontType = getFontType(
|
2020-07-05 19:20:10 +09:00
|
|
|
|
preEvaluatedFont.type,
|
|
|
|
|
subtype && subtype.name
|
|
|
|
|
);
|
[api-minor] Replace `PDFDocumentProxy.getStats` with a synchronous `PDFDocumentProxy.stats` getter
*Please note:* These changes will primarily benefit longer documents, somewhat at the expense of e.g. one-page documents.
The existing `PDFDocumentProxy.getStats` function, which in the default viewer is called for each rendered page, requires a round-trip to the worker-thread in order to obtain the current document stats. In the default viewer, we currently make one such API-call for *every rendered* page.
This patch proposes replacing that method with a *synchronous* `PDFDocumentProxy.stats` getter instead, combined with re-factoring the worker-thread code by adding a `DocStats`-class to track Stream/Font-types and *only send* them to the main-thread *the first time* that a type is encountered.
Note that in practice most PDF documents only use a fairly limited number of Stream/Font-types, which means that in longer documents most of the `PDFDocumentProxy.getStats`-calls will return the same data.[1]
This re-factoring will obviously benefit longer document the most[2], and could actually be seen as a regression for one-page documents, since in practice there'll usually be a couple of "DocStats" messages sent during the parsing of the first page. However, if the user zooms/rotates the document (which causes re-rendering), note that even a one-page document would start to benefit from these changes.
Another benefit of having the data available/cached in the API is that unless the document stats change during parsing, repeated `PDFDocumentProxy.stats`-calls will return *the same identical* object.
This is something that we can easily take advantage of in the default viewer, by now *only* reporting "documentStats" telemetry[3] when the data actually have changed rather than once per rendered page (again beneficial in longer documents).
---
[1] Furthermore, the maximium number of `StreamType`/`FontType` are `10` respectively `12`, which means that regardless of the complexity and page count in a PDF document there'll never be more than twenty-two "DocStats" messages sent; see https://github.com/mozilla/pdf.js/blob/41ac3f0c07128bf34baccdcc067a108c712fd6ef/src/shared/util.js#L206-L232
[2] One example is the `pdf.pdf` document in the test-suite, where rendering all of its 1310 pages only result in a total of seven "DocStats" messages being sent from the worker-thread.
[3] Reporting telemetry, in Firefox, includes using `JSON.stringify` on the data and then sending an event to the `PdfStreamConverter.jsm`-code.
In that code the event is handled and `JSON.parse` is used to retrieve the data, and in the "documentStats"-case we'll then iterate through the data to avoid double-reporting telemetry; see https://searchfox.org/mozilla-central/rev/8f4c180b87e52f3345ef8a3432d6e54bd1eb18dc/toolkit/components/pdfjs/content/PdfStreamConverter.jsm#515-549
2021-11-12 02:14:26 +09:00
|
|
|
|
if (fontType !== undefined) {
|
|
|
|
|
xref.stats.addFontType(fontType);
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
} catch (ex) {}
|
2019-04-22 00:03:38 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
fontCapability.resolve(
|
|
|
|
|
new TranslatedFont({
|
|
|
|
|
loadedName: font.loadedName,
|
|
|
|
|
font: new ErrorFont(
|
|
|
|
|
reason instanceof Error ? reason.message : reason
|
|
|
|
|
),
|
|
|
|
|
dict: font,
|
2021-05-16 01:21:18 +09:00
|
|
|
|
evaluatorOptions: this.options,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
})
|
|
|
|
|
);
|
|
|
|
|
});
|
|
|
|
|
return fontCapability.promise;
|
|
|
|
|
}
|
2019-04-22 00:03:38 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
buildPath(operatorList, fn, args, parsingText = false) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const lastIndex = operatorList.length - 1;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (!args) {
|
|
|
|
|
args = [];
|
|
|
|
|
}
|
|
|
|
|
if (
|
|
|
|
|
lastIndex < 0 ||
|
|
|
|
|
operatorList.fnArray[lastIndex] !== OPS.constructPath
|
|
|
|
|
) {
|
|
|
|
|
// Handle corrupt PDF documents that contains path operators inside of
|
|
|
|
|
// text objects, which may shift subsequent text, by enclosing the path
|
|
|
|
|
// operator in save/restore operators (fixes issue10542_reduced.pdf).
|
|
|
|
|
//
|
|
|
|
|
// Note that this will effectively disable the optimization in the
|
|
|
|
|
// `else` branch below, but given that this type of corruption is
|
|
|
|
|
// *extremely* rare that shouldn't really matter much in practice.
|
|
|
|
|
if (parsingText) {
|
|
|
|
|
warn(`Encountered path operator "${fn}" inside of a text object.`);
|
|
|
|
|
operatorList.addOp(OPS.save, null);
|
2014-04-30 23:09:04 +09:00
|
|
|
|
}
|
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
operatorList.addOp(OPS.constructPath, [[fn], args]);
|
2019-10-31 23:53:51 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (parsingText) {
|
|
|
|
|
operatorList.addOp(OPS.restore, null);
|
|
|
|
|
}
|
|
|
|
|
} else {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const opArgs = operatorList.argsArray[lastIndex];
|
2020-07-05 19:20:10 +09:00
|
|
|
|
opArgs[0].push(fn);
|
|
|
|
|
Array.prototype.push.apply(opArgs[1], args);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
parseColorSpace({ cs, resources, localColorSpaceCache }) {
|
|
|
|
|
return ColorSpace.parseAsync({
|
2020-06-18 01:45:11 +09:00
|
|
|
|
cs,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
xref: this.xref,
|
2020-06-18 01:45:11 +09:00
|
|
|
|
resources,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
pdfFunctionFactory: this._pdfFunctionFactory,
|
|
|
|
|
localColorSpaceCache,
|
|
|
|
|
}).catch(reason => {
|
|
|
|
|
if (reason instanceof AbortException) {
|
|
|
|
|
return null;
|
|
|
|
|
}
|
|
|
|
|
if (this.options.ignoreErrors) {
|
|
|
|
|
// Error(s) in the ColorSpace -- sending unsupported feature
|
|
|
|
|
// notification and allow rendering to continue.
|
|
|
|
|
this.handler.send("UnsupportedFeature", {
|
|
|
|
|
featureId: UNSUPPORTED_FEATURES.errorColorSpace,
|
|
|
|
|
});
|
|
|
|
|
warn(`parseColorSpace - ignoring ColorSpace: "${reason}".`);
|
|
|
|
|
return null;
|
2014-05-22 02:47:42 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
throw reason;
|
|
|
|
|
});
|
|
|
|
|
}
|
2014-05-22 02:47:42 +09:00
|
|
|
|
|
2021-07-22 04:27:39 +09:00
|
|
|
|
parseShading({
|
|
|
|
|
shading,
|
|
|
|
|
resources,
|
|
|
|
|
localColorSpaceCache,
|
|
|
|
|
localShadingPatternCache,
|
|
|
|
|
}) {
|
|
|
|
|
// Shadings and patterns may be referenced by the same name but the resource
|
|
|
|
|
// dictionary could be different so we can't use the name for the cache key.
|
2021-07-28 11:58:06 +09:00
|
|
|
|
let id = localShadingPatternCache.get(shading);
|
2021-07-22 04:27:39 +09:00
|
|
|
|
if (!id) {
|
|
|
|
|
var shadingFill = Pattern.parseShading(
|
|
|
|
|
shading,
|
|
|
|
|
this.xref,
|
|
|
|
|
resources,
|
|
|
|
|
this.handler,
|
|
|
|
|
this._pdfFunctionFactory,
|
|
|
|
|
localColorSpaceCache
|
|
|
|
|
);
|
|
|
|
|
const patternIR = shadingFill.getIR();
|
|
|
|
|
id = `pattern_${this.idFactory.createObjId()}`;
|
2021-07-28 11:58:06 +09:00
|
|
|
|
localShadingPatternCache.set(shading, id);
|
2021-07-22 04:27:39 +09:00
|
|
|
|
this.handler.send("obj", [id, this.pageIndex, "Pattern", patternIR]);
|
|
|
|
|
}
|
|
|
|
|
return id;
|
|
|
|
|
}
|
|
|
|
|
|
2020-10-09 00:33:23 +09:00
|
|
|
|
handleColorN(
|
2020-07-05 19:20:10 +09:00
|
|
|
|
operatorList,
|
|
|
|
|
fn,
|
|
|
|
|
args,
|
|
|
|
|
cs,
|
|
|
|
|
patterns,
|
|
|
|
|
resources,
|
|
|
|
|
task,
|
2020-10-09 00:33:23 +09:00
|
|
|
|
localColorSpaceCache,
|
2021-07-22 04:27:39 +09:00
|
|
|
|
localTilingPatternCache,
|
|
|
|
|
localShadingPatternCache
|
2020-07-05 19:20:10 +09:00
|
|
|
|
) {
|
|
|
|
|
// compile tiling patterns
|
Improve argument/name handling when parsing TilingPatterns (PR 12458 follow-up)
- Handle the arguments correctly in `PartialEvaluator.handleColorN`.
For TilingPatterns with a base-ColorSpace, we're currently using the `args` when computing the color. However, as can be seen we're passing the Array as-is to the `ColorSpace.getRgb` method, which means that the `Name` is included as well.[1]
Thankfully this hasn't, as far as I know, caused any actual bugs, but that may be more luck than anything else given how the `ColorSpace` code is implemented. This can be easily fixed though, simply by popping the `Name`-object off of the `args` Array.
- Cache TilingPatterns using the `Name`-string, rather than the object directly.
This is not only consistent with other caches in `PartialEvaluator`, but importantly it also ensures that the cache lookup always works correctly. Note that since `Name`-objects, similar to other primitives, uses a cache themselves a *manually* triggered `cleanup`-call could thus (theoretically) cause the `LocalTilingPatternCache` to not find an existing entry. While the likelihood of this happening is *extremely* small, it's still something that we should fix.
---
[1] The `args` Array can e.g. look like this: `[0.043, 0.09, 0.188, 0.004, /P1]`, which means that we're passing in the `Name`-object to the `ColorSpace` method.
2020-10-24 20:29:48 +09:00
|
|
|
|
const patternName = args.pop();
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// SCN/scn applies patterns along with normal colors
|
2020-10-09 00:33:23 +09:00
|
|
|
|
if (patternName instanceof Name) {
|
2021-08-18 19:49:01 +09:00
|
|
|
|
const rawPattern = patterns.getRaw(patternName.name);
|
Improve argument/name handling when parsing TilingPatterns (PR 12458 follow-up)
- Handle the arguments correctly in `PartialEvaluator.handleColorN`.
For TilingPatterns with a base-ColorSpace, we're currently using the `args` when computing the color. However, as can be seen we're passing the Array as-is to the `ColorSpace.getRgb` method, which means that the `Name` is included as well.[1]
Thankfully this hasn't, as far as I know, caused any actual bugs, but that may be more luck than anything else given how the `ColorSpace` code is implemented. This can be easily fixed though, simply by popping the `Name`-object off of the `args` Array.
- Cache TilingPatterns using the `Name`-string, rather than the object directly.
This is not only consistent with other caches in `PartialEvaluator`, but importantly it also ensures that the cache lookup always works correctly. Note that since `Name`-objects, similar to other primitives, uses a cache themselves a *manually* triggered `cleanup`-call could thus (theoretically) cause the `LocalTilingPatternCache` to not find an existing entry. While the likelihood of this happening is *extremely* small, it's still something that we should fix.
---
[1] The `args` Array can e.g. look like this: `[0.043, 0.09, 0.188, 0.004, /P1]`, which means that we're passing in the `Name`-object to the `ColorSpace` method.
2020-10-24 20:29:48 +09:00
|
|
|
|
|
2021-08-18 19:49:01 +09:00
|
|
|
|
const localTilingPattern =
|
|
|
|
|
rawPattern instanceof Ref &&
|
|
|
|
|
localTilingPatternCache.getByRef(rawPattern);
|
2020-10-09 00:33:23 +09:00
|
|
|
|
if (localTilingPattern) {
|
|
|
|
|
try {
|
|
|
|
|
const color = cs.base ? cs.base.getRgb(args, 0) : null;
|
|
|
|
|
const tilingPatternIR = getTilingPatternIR(
|
|
|
|
|
localTilingPattern.operatorListIR,
|
|
|
|
|
localTilingPattern.dict,
|
|
|
|
|
color
|
|
|
|
|
);
|
|
|
|
|
operatorList.addOp(fn, tilingPatternIR);
|
|
|
|
|
return undefined;
|
|
|
|
|
} catch (ex) {
|
|
|
|
|
// Handle any errors during normal TilingPattern parsing.
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2021-08-18 19:49:01 +09:00
|
|
|
|
const pattern = this.xref.fetchIfRef(rawPattern);
|
2020-10-09 00:33:23 +09:00
|
|
|
|
if (pattern) {
|
2022-02-17 21:45:42 +09:00
|
|
|
|
const dict = pattern instanceof BaseStream ? pattern.dict : pattern;
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const typeNum = dict.get("PatternType");
|
2020-10-09 00:33:23 +09:00
|
|
|
|
|
|
|
|
|
if (typeNum === PatternType.TILING) {
|
|
|
|
|
const color = cs.base ? cs.base.getRgb(args, 0) : null;
|
|
|
|
|
return this.handleTilingType(
|
|
|
|
|
fn,
|
|
|
|
|
color,
|
|
|
|
|
resources,
|
|
|
|
|
pattern,
|
|
|
|
|
dict,
|
|
|
|
|
operatorList,
|
|
|
|
|
task,
|
|
|
|
|
localTilingPatternCache
|
|
|
|
|
);
|
|
|
|
|
} else if (typeNum === PatternType.SHADING) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const shading = dict.get("Shading");
|
|
|
|
|
const matrix = dict.getArray("Matrix");
|
2021-07-22 04:27:39 +09:00
|
|
|
|
const objId = this.parseShading({
|
2020-10-09 00:33:23 +09:00
|
|
|
|
shading,
|
|
|
|
|
resources,
|
2021-07-22 04:27:39 +09:00
|
|
|
|
localColorSpaceCache,
|
|
|
|
|
localShadingPatternCache,
|
|
|
|
|
});
|
2021-07-28 11:58:06 +09:00
|
|
|
|
operatorList.addOp(fn, ["Shading", objId, matrix]);
|
2020-10-09 00:33:23 +09:00
|
|
|
|
return undefined;
|
|
|
|
|
}
|
|
|
|
|
throw new FormatError(`Unknown PatternType: ${typeNum}`);
|
[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815)
Other PDF readers, e.g. Adobe Reader and PDFium (in Chrome), will attempt to render as much of a page as possible even if there are errors present.
Currently we just bail as soon the first error is hit, which means that we'll usually not render anything in these cases and just display a blank page instead.
NOTE: This patch changes the default behaviour of the PDF.js API to always attempt to recover as much data as possible, even when encountering errors during e.g. `getOperatorList`/`getTextContent`, which thus improve our handling of corrupt PDF files and allow the default viewer to handle errors slightly more gracefully.
In the event that an API consumer wishes to use the old behaviour, where we stop parsing as soon as an error is encountered, the `stopAtErrors` parameter can be set at `getDocument`.
Fixes, inasmuch it's possible since the PDF files are corrupt, e.g. issue 6342, issue 3795, and [bug 1130815](https://bugzilla.mozilla.org/show_bug.cgi?id=1130815) (and probably others too).
2017-02-19 22:03:08 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
throw new FormatError(`Unknown PatternName: ${patternName}`);
|
|
|
|
|
}
|
[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815)
Other PDF readers, e.g. Adobe Reader and PDFium (in Chrome), will attempt to render as much of a page as possible even if there are errors present.
Currently we just bail as soon the first error is hit, which means that we'll usually not render anything in these cases and just display a blank page instead.
NOTE: This patch changes the default behaviour of the PDF.js API to always attempt to recover as much data as possible, even when encountering errors during e.g. `getOperatorList`/`getTextContent`, which thus improve our handling of corrupt PDF files and allow the default viewer to handle errors slightly more gracefully.
In the event that an API consumer wishes to use the old behaviour, where we stop parsing as soon as an error is encountered, the `stopAtErrors` parameter can be set at `getDocument`.
Fixes, inasmuch it's possible since the PDF files are corrupt, e.g. issue 6342, issue 3795, and [bug 1130815](https://bugzilla.mozilla.org/show_bug.cgi?id=1130815) (and probably others too).
2017-02-19 22:03:08 +09:00
|
|
|
|
|
2021-04-14 20:58:43 +09:00
|
|
|
|
_parseVisibilityExpression(array, nestingCounter, currentResult) {
|
|
|
|
|
const MAX_NESTING = 10;
|
|
|
|
|
if (++nestingCounter > MAX_NESTING) {
|
|
|
|
|
warn("Visibility expression is too deeply nested");
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
const length = array.length;
|
|
|
|
|
const operator = this.xref.fetchIfRef(array[0]);
|
2022-02-21 20:45:00 +09:00
|
|
|
|
if (length < 2 || !(operator instanceof Name)) {
|
2021-04-14 20:58:43 +09:00
|
|
|
|
warn("Invalid visibility expression");
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
switch (operator.name) {
|
|
|
|
|
case "And":
|
|
|
|
|
case "Or":
|
|
|
|
|
case "Not":
|
|
|
|
|
currentResult.push(operator.name);
|
|
|
|
|
break;
|
|
|
|
|
default:
|
|
|
|
|
warn(`Invalid operator ${operator.name} in visibility expression`);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
for (let i = 1; i < length; i++) {
|
|
|
|
|
const raw = array[i];
|
|
|
|
|
const object = this.xref.fetchIfRef(raw);
|
|
|
|
|
if (Array.isArray(object)) {
|
|
|
|
|
const nestedResult = [];
|
|
|
|
|
currentResult.push(nestedResult);
|
|
|
|
|
// Recursively parse a subarray.
|
|
|
|
|
this._parseVisibilityExpression(object, nestingCounter, nestedResult);
|
2022-02-18 20:11:45 +09:00
|
|
|
|
} else if (raw instanceof Ref) {
|
2021-04-14 20:58:43 +09:00
|
|
|
|
// Reference to an OCG dictionary.
|
|
|
|
|
currentResult.push(raw.toString());
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2020-07-15 07:17:27 +09:00
|
|
|
|
async parseMarkedContentProps(contentProperties, resources) {
|
|
|
|
|
let optionalContent;
|
2022-02-21 20:45:00 +09:00
|
|
|
|
if (contentProperties instanceof Name) {
|
2020-07-15 07:17:27 +09:00
|
|
|
|
const properties = resources.get("Properties");
|
|
|
|
|
optionalContent = properties.get(contentProperties.name);
|
2022-02-21 20:44:56 +09:00
|
|
|
|
} else if (contentProperties instanceof Dict) {
|
2020-07-15 07:17:27 +09:00
|
|
|
|
optionalContent = contentProperties;
|
|
|
|
|
} else {
|
|
|
|
|
throw new FormatError("Optional content properties malformed.");
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
const optionalContentType = optionalContent.get("Type").name;
|
|
|
|
|
if (optionalContentType === "OCG") {
|
|
|
|
|
return {
|
|
|
|
|
type: optionalContentType,
|
|
|
|
|
id: optionalContent.objId,
|
|
|
|
|
};
|
|
|
|
|
} else if (optionalContentType === "OCMD") {
|
2021-04-14 20:58:43 +09:00
|
|
|
|
const expression = optionalContent.get("VE");
|
|
|
|
|
if (Array.isArray(expression)) {
|
|
|
|
|
const result = [];
|
|
|
|
|
this._parseVisibilityExpression(expression, 0, result);
|
|
|
|
|
if (result.length > 0) {
|
|
|
|
|
return {
|
|
|
|
|
type: "OCMD",
|
|
|
|
|
expression: result,
|
|
|
|
|
};
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2020-07-15 07:17:27 +09:00
|
|
|
|
const optionalContentGroups = optionalContent.get("OCGs");
|
|
|
|
|
if (
|
|
|
|
|
Array.isArray(optionalContentGroups) ||
|
2022-02-21 20:44:56 +09:00
|
|
|
|
optionalContentGroups instanceof Dict
|
2020-07-15 07:17:27 +09:00
|
|
|
|
) {
|
|
|
|
|
const groupIds = [];
|
|
|
|
|
if (Array.isArray(optionalContentGroups)) {
|
2021-04-24 19:36:01 +09:00
|
|
|
|
for (const ocg of optionalContentGroups) {
|
2020-07-15 07:17:27 +09:00
|
|
|
|
groupIds.push(ocg.toString());
|
2021-04-24 19:36:01 +09:00
|
|
|
|
}
|
2020-07-15 07:17:27 +09:00
|
|
|
|
} else {
|
|
|
|
|
// Dictionary, just use the obj id.
|
|
|
|
|
groupIds.push(optionalContentGroups.objId);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return {
|
|
|
|
|
type: optionalContentType,
|
|
|
|
|
ids: groupIds,
|
2022-02-21 20:45:00 +09:00
|
|
|
|
policy:
|
|
|
|
|
optionalContent.get("P") instanceof Name
|
|
|
|
|
? optionalContent.get("P").name
|
|
|
|
|
: null,
|
2021-04-14 20:58:43 +09:00
|
|
|
|
expression: null,
|
2020-07-15 07:17:27 +09:00
|
|
|
|
};
|
2022-02-18 20:11:45 +09:00
|
|
|
|
} else if (optionalContentGroups instanceof Ref) {
|
2020-07-15 07:17:27 +09:00
|
|
|
|
return {
|
|
|
|
|
type: optionalContentType,
|
|
|
|
|
id: optionalContentGroups.toString(),
|
|
|
|
|
};
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return null;
|
|
|
|
|
}
|
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
getOperatorList({
|
|
|
|
|
stream,
|
|
|
|
|
task,
|
|
|
|
|
resources,
|
|
|
|
|
operatorList,
|
|
|
|
|
initialState = null,
|
2020-10-22 00:21:33 +09:00
|
|
|
|
fallbackFontDict = null,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}) {
|
|
|
|
|
// Ensure that `resources`/`initialState` is correctly initialized,
|
|
|
|
|
// even if the provided parameter is e.g. `null`.
|
|
|
|
|
resources = resources || Dict.empty;
|
|
|
|
|
initialState = initialState || new EvalState();
|
2014-03-23 03:15:51 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (!operatorList) {
|
|
|
|
|
throw new Error('getOperatorList: missing "operatorList" parameter');
|
|
|
|
|
}
|
2017-09-17 20:35:18 +09:00
|
|
|
|
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const self = this;
|
|
|
|
|
const xref = this.xref;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
let parsingText = false;
|
|
|
|
|
const localImageCache = new LocalImageCache();
|
|
|
|
|
const localColorSpaceCache = new LocalColorSpaceCache();
|
2020-07-11 20:52:11 +09:00
|
|
|
|
const localGStateCache = new LocalGStateCache();
|
2020-10-09 00:33:23 +09:00
|
|
|
|
const localTilingPatternCache = new LocalTilingPatternCache();
|
2021-07-22 04:27:39 +09:00
|
|
|
|
const localShadingPatternCache = new Map();
|
Improve the *local* image caching in `PartialEvaluator.getOperatorList`
Currently the local `imageCache`, as used in `PartialEvaluator.getOperatorList`, will miss certain cases of repeated images because the caching is *only* done by name (usually using a format such as e.g. "Im0", "Im1", ...).
However, in some PDF documents the `/XObject` dictionaries many contain hundreds (or even thousands) of distinctly named images, despite them referring to only a handful of actual image objects (via the XRef table).
With these changes we'll now cache *local* images using both name and (where applicable) reference, thus improving re-usage of images resources even further.
This patch was tested using the PDF file from [bug 857031](https://bugzilla.mozilla.org/show_bug.cgi?id=857031), i.e. https://bug857031.bmoattachments.org/attachment.cgi?id=732270, with the following manifest file:
```
[
{ "id": "bug857031",
"file": "../web/pdfs/bug857031.pdf",
"md5": "",
"rounds": 250,
"lastPage": 1,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
firefox | 0 | Overall | 250 | 2749 | 2656 | -93 | -3.38 | faster
firefox | 0 | Page Request | 250 | 3 | 4 | 1 | 50.14 | slower
firefox | 0 | Rendering | 250 | 2746 | 2652 | -94 | -3.44 | faster
```
While this is certainly an improvement, since we now avoid re-parsing ~1000 images on the first page, all of the image resources are small enough that the total rendering time doesn't improve that much in this particular case.
In pathological cases, such as e.g. the PDF document in issue 4958, the improvements with this patch can be very significant. Looking for example at page 2, from issue 4958, the rendering time drops from ~60 seconds with `master` to ~30 seconds with this patch (obviously still slow, but it really showcases the potential of this patch nicely).
Finally, note that there's also potential for additional improvements by re-using `LocalImageCache` instances for e.g. /XObject data of the `Form`-type. However, given that recent changes in this area I purposely didn't want to complicate *this* patch more than necessary.
2020-05-23 20:55:31 +09:00
|
|
|
|
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const xobjs = resources.get("XObject") || Dict.empty;
|
|
|
|
|
const patterns = resources.get("Pattern") || Dict.empty;
|
|
|
|
|
const stateManager = new StateManager(initialState);
|
|
|
|
|
const preprocessor = new EvaluatorPreprocessor(stream, xref, stateManager);
|
|
|
|
|
const timeSlotManager = new TimeSlotManager();
|
Improve the *local* image caching in `PartialEvaluator.getOperatorList`
Currently the local `imageCache`, as used in `PartialEvaluator.getOperatorList`, will miss certain cases of repeated images because the caching is *only* done by name (usually using a format such as e.g. "Im0", "Im1", ...).
However, in some PDF documents the `/XObject` dictionaries many contain hundreds (or even thousands) of distinctly named images, despite them referring to only a handful of actual image objects (via the XRef table).
With these changes we'll now cache *local* images using both name and (where applicable) reference, thus improving re-usage of images resources even further.
This patch was tested using the PDF file from [bug 857031](https://bugzilla.mozilla.org/show_bug.cgi?id=857031), i.e. https://bug857031.bmoattachments.org/attachment.cgi?id=732270, with the following manifest file:
```
[
{ "id": "bug857031",
"file": "../web/pdfs/bug857031.pdf",
"md5": "",
"rounds": 250,
"lastPage": 1,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
firefox | 0 | Overall | 250 | 2749 | 2656 | -93 | -3.38 | faster
firefox | 0 | Page Request | 250 | 3 | 4 | 1 | 50.14 | slower
firefox | 0 | Rendering | 250 | 2746 | 2652 | -94 | -3.44 | faster
```
While this is certainly an improvement, since we now avoid re-parsing ~1000 images on the first page, all of the image resources are small enough that the total rendering time doesn't improve that much in this particular case.
In pathological cases, such as e.g. the PDF document in issue 4958, the improvements with this patch can be very significant. Looking for example at page 2, from issue 4958, the rendering time drops from ~60 seconds with `master` to ~30 seconds with this patch (obviously still slow, but it really showcases the potential of this patch nicely).
Finally, note that there's also potential for additional improvements by re-using `LocalImageCache` instances for e.g. /XObject data of the `Form`-type. However, given that recent changes in this area I purposely didn't want to complicate *this* patch more than necessary.
2020-05-23 20:55:31 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
function closePendingRestoreOPS(argument) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
for (let i = 0, ii = preprocessor.savedStatesDepth; i < ii; i++) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
operatorList.addOp(OPS.restore, []);
|
|
|
|
|
}
|
|
|
|
|
}
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return new Promise(function promiseBody(resolve, reject) {
|
|
|
|
|
const next = function (promise) {
|
|
|
|
|
Promise.all([promise, operatorList.ready]).then(function () {
|
|
|
|
|
try {
|
|
|
|
|
promiseBody(resolve, reject);
|
|
|
|
|
} catch (ex) {
|
|
|
|
|
reject(ex);
|
|
|
|
|
}
|
|
|
|
|
}, reject);
|
|
|
|
|
};
|
|
|
|
|
task.ensureNotTerminated();
|
|
|
|
|
timeSlotManager.reset();
|
2021-05-06 17:08:09 +09:00
|
|
|
|
|
|
|
|
|
const operation = {};
|
2021-07-15 04:38:19 +09:00
|
|
|
|
let stop, i, ii, cs, name, isValidName;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
while (!(stop = timeSlotManager.check())) {
|
|
|
|
|
// The arguments parsed by read() are used beyond this loop, so we
|
|
|
|
|
// cannot reuse the same array on each iteration. Therefore we pass
|
|
|
|
|
// in |null| as the initial value (see the comment on
|
|
|
|
|
// EvaluatorPreprocessor_read() for why).
|
|
|
|
|
operation.args = null;
|
|
|
|
|
if (!preprocessor.read(operation)) {
|
|
|
|
|
break;
|
|
|
|
|
}
|
2021-05-06 16:39:21 +09:00
|
|
|
|
let args = operation.args;
|
|
|
|
|
let fn = operation.fn;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
|
|
|
|
|
switch (fn | 0) {
|
2021-05-13 17:40:08 +09:00
|
|
|
|
case OPS.paintXObject:
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// eagerly compile XForm objects
|
2021-07-15 04:38:19 +09:00
|
|
|
|
isValidName = args[0] instanceof Name;
|
2021-05-13 17:40:08 +09:00
|
|
|
|
name = args[0].name;
|
2021-07-15 04:38:19 +09:00
|
|
|
|
|
|
|
|
|
if (isValidName) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
const localImage = localImageCache.getByName(name);
|
|
|
|
|
if (localImage) {
|
|
|
|
|
operatorList.addOp(localImage.fn, localImage.args);
|
|
|
|
|
args = null;
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
}
|
Improve the *local* image caching in `PartialEvaluator.getOperatorList`
Currently the local `imageCache`, as used in `PartialEvaluator.getOperatorList`, will miss certain cases of repeated images because the caching is *only* done by name (usually using a format such as e.g. "Im0", "Im1", ...).
However, in some PDF documents the `/XObject` dictionaries many contain hundreds (or even thousands) of distinctly named images, despite them referring to only a handful of actual image objects (via the XRef table).
With these changes we'll now cache *local* images using both name and (where applicable) reference, thus improving re-usage of images resources even further.
This patch was tested using the PDF file from [bug 857031](https://bugzilla.mozilla.org/show_bug.cgi?id=857031), i.e. https://bug857031.bmoattachments.org/attachment.cgi?id=732270, with the following manifest file:
```
[
{ "id": "bug857031",
"file": "../web/pdfs/bug857031.pdf",
"md5": "",
"rounds": 250,
"lastPage": 1,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
firefox | 0 | Overall | 250 | 2749 | 2656 | -93 | -3.38 | faster
firefox | 0 | Page Request | 250 | 3 | 4 | 1 | 50.14 | slower
firefox | 0 | Rendering | 250 | 2746 | 2652 | -94 | -3.44 | faster
```
While this is certainly an improvement, since we now avoid re-parsing ~1000 images on the first page, all of the image resources are small enough that the total rendering time doesn't improve that much in this particular case.
In pathological cases, such as e.g. the PDF document in issue 4958, the improvements with this patch can be very significant. Looking for example at page 2, from issue 4958, the rendering time drops from ~60 seconds with `master` to ~30 seconds with this patch (obviously still slow, but it really showcases the potential of this patch nicely).
Finally, note that there's also potential for additional improvements by re-using `LocalImageCache` instances for e.g. /XObject data of the `Form`-type. However, given that recent changes in this area I purposely didn't want to complicate *this* patch more than necessary.
2020-05-23 20:55:31 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
next(
|
|
|
|
|
new Promise(function (resolveXObject, rejectXObject) {
|
2021-07-15 04:38:19 +09:00
|
|
|
|
if (!isValidName) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
throw new FormatError("XObject must be referred to by name.");
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
let xobj = xobjs.getRaw(name);
|
|
|
|
|
if (xobj instanceof Ref) {
|
|
|
|
|
const localImage = localImageCache.getByRef(xobj);
|
|
|
|
|
if (localImage) {
|
|
|
|
|
operatorList.addOp(localImage.fn, localImage.args);
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
|
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
resolveXObject();
|
|
|
|
|
return;
|
|
|
|
|
}
|
2014-05-10 10:21:15 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
const globalImage = self.globalImageCache.getData(
|
|
|
|
|
xobj,
|
|
|
|
|
self.pageIndex
|
|
|
|
|
);
|
|
|
|
|
if (globalImage) {
|
|
|
|
|
operatorList.addDependency(globalImage.objId);
|
|
|
|
|
operatorList.addOp(globalImage.fn, globalImage.args);
|
2014-05-10 10:21:15 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
resolveXObject();
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
return;
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
|
|
|
|
|
xobj = xref.fetch(xobj);
|
|
|
|
|
}
|
|
|
|
|
|
2022-02-17 21:45:42 +09:00
|
|
|
|
if (!(xobj instanceof BaseStream)) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
throw new FormatError("XObject should be a stream");
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
const type = xobj.dict.get("Subtype");
|
2022-02-21 20:45:00 +09:00
|
|
|
|
if (!(type instanceof Name)) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
throw new FormatError("XObject should have a Name subtype");
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (type.name === "Form") {
|
|
|
|
|
stateManager.save();
|
|
|
|
|
self
|
|
|
|
|
.buildFormXObject(
|
|
|
|
|
resources,
|
|
|
|
|
xobj,
|
|
|
|
|
null,
|
|
|
|
|
operatorList,
|
|
|
|
|
task,
|
|
|
|
|
stateManager.state.clone(),
|
|
|
|
|
localColorSpaceCache
|
|
|
|
|
)
|
|
|
|
|
.then(function () {
|
|
|
|
|
stateManager.restore();
|
|
|
|
|
resolveXObject();
|
|
|
|
|
}, rejectXObject);
|
|
|
|
|
return;
|
|
|
|
|
} else if (type.name === "Image") {
|
|
|
|
|
self
|
|
|
|
|
.buildPaintImageXObject({
|
|
|
|
|
resources,
|
|
|
|
|
image: xobj,
|
|
|
|
|
operatorList,
|
|
|
|
|
cacheKey: name,
|
|
|
|
|
localImageCache,
|
|
|
|
|
localColorSpaceCache,
|
|
|
|
|
})
|
|
|
|
|
.then(resolveXObject, rejectXObject);
|
|
|
|
|
return;
|
|
|
|
|
} else if (type.name === "PS") {
|
|
|
|
|
// PostScript XObjects are unused when viewing documents.
|
|
|
|
|
// See section 4.7.1 of Adobe's PDF reference.
|
|
|
|
|
info("Ignored XObject subtype PS");
|
|
|
|
|
} else {
|
|
|
|
|
throw new FormatError(
|
|
|
|
|
`Unhandled XObject subtype ${type.name}`
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
);
|
2014-05-10 10:21:15 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
resolveXObject();
|
|
|
|
|
}).catch(function (reason) {
|
|
|
|
|
if (reason instanceof AbortException) {
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
if (self.options.ignoreErrors) {
|
|
|
|
|
// Error(s) in the XObject -- sending unsupported feature
|
|
|
|
|
// notification and allow rendering to continue.
|
|
|
|
|
self.handler.send("UnsupportedFeature", {
|
|
|
|
|
featureId: UNSUPPORTED_FEATURES.errorXObject,
|
|
|
|
|
});
|
|
|
|
|
warn(`getOperatorList - ignoring XObject: "${reason}".`);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
throw reason;
|
|
|
|
|
})
|
|
|
|
|
);
|
|
|
|
|
return;
|
2021-05-13 17:40:08 +09:00
|
|
|
|
case OPS.setFont:
|
|
|
|
|
var fontSize = args[1];
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// eagerly collect all fonts
|
|
|
|
|
next(
|
|
|
|
|
self
|
|
|
|
|
.handleSetFont(
|
|
|
|
|
resources,
|
|
|
|
|
args,
|
|
|
|
|
null,
|
|
|
|
|
operatorList,
|
|
|
|
|
task,
|
2020-10-22 00:21:33 +09:00
|
|
|
|
stateManager.state,
|
|
|
|
|
fallbackFontDict
|
2020-07-05 19:20:10 +09:00
|
|
|
|
)
|
|
|
|
|
.then(function (loadedName) {
|
|
|
|
|
operatorList.addDependency(loadedName);
|
|
|
|
|
operatorList.addOp(OPS.setFont, [loadedName, fontSize]);
|
|
|
|
|
})
|
|
|
|
|
);
|
|
|
|
|
return;
|
|
|
|
|
case OPS.beginText:
|
|
|
|
|
parsingText = true;
|
|
|
|
|
break;
|
|
|
|
|
case OPS.endText:
|
|
|
|
|
parsingText = false;
|
|
|
|
|
break;
|
2021-05-13 17:40:08 +09:00
|
|
|
|
case OPS.endInlineImage:
|
|
|
|
|
var cacheKey = args[0].cacheKey;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (cacheKey) {
|
|
|
|
|
const localImage = localImageCache.getByName(cacheKey);
|
|
|
|
|
if (localImage) {
|
|
|
|
|
operatorList.addOp(localImage.fn, localImage.args);
|
|
|
|
|
args = null;
|
2020-03-02 23:34:00 +09:00
|
|
|
|
continue;
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
next(
|
|
|
|
|
self.buildPaintImageXObject({
|
|
|
|
|
resources,
|
|
|
|
|
image: args[0],
|
|
|
|
|
isInline: true,
|
|
|
|
|
operatorList,
|
|
|
|
|
cacheKey,
|
|
|
|
|
localImageCache,
|
|
|
|
|
localColorSpaceCache,
|
|
|
|
|
})
|
|
|
|
|
);
|
|
|
|
|
return;
|
|
|
|
|
case OPS.showText:
|
|
|
|
|
if (!stateManager.state.font) {
|
|
|
|
|
self.ensureStateFont(stateManager.state);
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
args[0] = self.handleText(args[0], stateManager.state);
|
|
|
|
|
break;
|
|
|
|
|
case OPS.showSpacedText:
|
|
|
|
|
if (!stateManager.state.font) {
|
|
|
|
|
self.ensureStateFont(stateManager.state);
|
|
|
|
|
continue;
|
|
|
|
|
}
|
2021-05-13 17:40:08 +09:00
|
|
|
|
var arr = args[0];
|
|
|
|
|
var combinedGlyphs = [];
|
|
|
|
|
var arrLength = arr.length;
|
|
|
|
|
var state = stateManager.state;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
for (i = 0; i < arrLength; ++i) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const arrItem = arr[i];
|
2022-02-24 01:02:19 +09:00
|
|
|
|
if (typeof arrItem === "string") {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
Array.prototype.push.apply(
|
|
|
|
|
combinedGlyphs,
|
|
|
|
|
self.handleText(arrItem, state)
|
|
|
|
|
);
|
2022-02-22 19:55:34 +09:00
|
|
|
|
} else if (typeof arrItem === "number") {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
combinedGlyphs.push(arrItem);
|
2020-03-02 23:34:00 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
args[0] = combinedGlyphs;
|
|
|
|
|
fn = OPS.showText;
|
|
|
|
|
break;
|
|
|
|
|
case OPS.nextLineShowText:
|
|
|
|
|
if (!stateManager.state.font) {
|
|
|
|
|
self.ensureStateFont(stateManager.state);
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
operatorList.addOp(OPS.nextLine);
|
|
|
|
|
args[0] = self.handleText(args[0], stateManager.state);
|
|
|
|
|
fn = OPS.showText;
|
|
|
|
|
break;
|
|
|
|
|
case OPS.nextLineSetSpacingShowText:
|
|
|
|
|
if (!stateManager.state.font) {
|
|
|
|
|
self.ensureStateFont(stateManager.state);
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
operatorList.addOp(OPS.nextLine);
|
|
|
|
|
operatorList.addOp(OPS.setWordSpacing, [args.shift()]);
|
|
|
|
|
operatorList.addOp(OPS.setCharSpacing, [args.shift()]);
|
|
|
|
|
args[0] = self.handleText(args[0], stateManager.state);
|
|
|
|
|
fn = OPS.showText;
|
|
|
|
|
break;
|
|
|
|
|
case OPS.setTextRenderingMode:
|
|
|
|
|
stateManager.state.textRenderingMode = args[0];
|
|
|
|
|
break;
|
2014-05-22 02:47:42 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
case OPS.setFillColorSpace: {
|
|
|
|
|
const cachedColorSpace = ColorSpace.getCached(
|
|
|
|
|
args[0],
|
|
|
|
|
xref,
|
|
|
|
|
localColorSpaceCache
|
|
|
|
|
);
|
|
|
|
|
if (cachedColorSpace) {
|
|
|
|
|
stateManager.state.fillColorSpace = cachedColorSpace;
|
|
|
|
|
continue;
|
|
|
|
|
}
|
Add local caching of `ColorSpace`s, by name, in `PartialEvaluator.getOperatorList` (issue 2504)
By caching parsed `ColorSpace`s, we thus don't need to re-parse the same data over and over which saves CPU cycles *and* reduces peak memory usage. (Obviously persistent memory usage *may* increase a tiny bit, but since the caching is done per `PartialEvaluator.getOperatorList` invocation and given that `ColorSpace` instances generally hold very little data this shouldn't be much of an issue.)
Furthermore, by caching `ColorSpace`s we can also lookup the already parsed ones *synchronously* during the `OperatorList` building, instead of having to defer to the event loop/microtask queue since the parsing is done asynchronously (such that error handling is easier).
Possible future improvements:
- Cache/lookup parsed `ColorSpaces` used in `Pattern`s and `Image`s.
- Attempt to cache *local* `ColorSpace`s by reference as well, in addition to only by name, assuming that there's documents where that would be beneficial and that it's not too difficult to implement.
- Assuming there's documents that would benefit from it, also cache repeated `ColorSpace`s *globally* as well.
Given that we've never, until now, been doing *any* caching of parsed `ColorSpace`s and that even using a simple name-only *local* cache helps tremendously in pathological cases, I purposely decided against complicating the implementation too much initially.
Also, compared to parsing of `Image`s, simply creating a `ColorSpace` instance isn't that expensive (hence I'd be somewhat surprised if adding a *global* cache would help much).
---
This patch was tested using:
- The default `tracemonkey` PDF file, which was included mostly to show that "normal" documents aren't negatively affected by these changes.
- The PDF file from issue 2504, i.e. https://dl-ctlg.panasonic.com/jp/manual/sd/sd_rbm1000_0.pdf, where most pages will switch *thousands* of times between a handful of `ColorSpace`s.
with the following manifest file:
```
[
{ "id": "tracemonkey",
"file": "pdfs/tracemonkey.pdf",
"md5": "9a192d8b1a7dc652a19835f6f08098bd",
"rounds": 100,
"type": "eq"
},
{ "id": "issue2504",
"file": "../web/pdfs/issue2504.pdf",
"md5": "",
"rounds": 20,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
- Overall
```
-- Grouped By browser, pdf, stat --
browser | pdf | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ----------- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | issue2504 | Overall | 640 | 977 | 497 | -479 | -49.08 | faster
firefox | issue2504 | Page Request | 640 | 3 | 4 | 1 | 59.18 |
firefox | issue2504 | Rendering | 640 | 974 | 493 | -481 | -49.37 | faster
firefox | tracemonkey | Overall | 1400 | 116 | 111 | -5 | -4.43 |
firefox | tracemonkey | Page Request | 1400 | 2 | 2 | 0 | -2.86 |
firefox | tracemonkey | Rendering | 1400 | 114 | 109 | -5 | -4.47 |
```
- Page-specific
```
-- Grouped By browser, pdf, page, stat --
browser | pdf | page | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ----------- | ---- | ------------ | ----- | ------------ | ----------- | ----- | ------- | -------------
firefox | issue2504 | 0 | Overall | 20 | 2295 | 1268 | -1027 | -44.76 | faster
firefox | issue2504 | 0 | Page Request | 20 | 6 | 7 | 1 | 15.32 |
firefox | issue2504 | 0 | Rendering | 20 | 2288 | 1260 | -1028 | -44.93 | faster
firefox | issue2504 | 1 | Overall | 20 | 3059 | 2806 | -252 | -8.25 | faster
firefox | issue2504 | 1 | Page Request | 20 | 11 | 14 | 3 | 23.25 | slower
firefox | issue2504 | 1 | Rendering | 20 | 3047 | 2792 | -255 | -8.37 | faster
firefox | issue2504 | 2 | Overall | 20 | 411 | 295 | -116 | -28.20 | faster
firefox | issue2504 | 2 | Page Request | 20 | 2 | 42 | 40 | 1897.62 |
firefox | issue2504 | 2 | Rendering | 20 | 409 | 253 | -156 | -38.09 | faster
firefox | issue2504 | 3 | Overall | 20 | 736 | 299 | -437 | -59.34 | faster
firefox | issue2504 | 3 | Page Request | 20 | 2 | 2 | 0 | 0.00 |
firefox | issue2504 | 3 | Rendering | 20 | 734 | 297 | -437 | -59.49 | faster
firefox | issue2504 | 4 | Overall | 20 | 356 | 458 | 102 | 28.63 |
firefox | issue2504 | 4 | Page Request | 20 | 1 | 2 | 1 | 57.14 | slower
firefox | issue2504 | 4 | Rendering | 20 | 354 | 455 | 101 | 28.53 |
firefox | issue2504 | 5 | Overall | 20 | 1381 | 765 | -616 | -44.59 | faster
firefox | issue2504 | 5 | Page Request | 20 | 3 | 5 | 2 | 50.00 | slower
firefox | issue2504 | 5 | Rendering | 20 | 1378 | 760 | -617 | -44.81 | faster
firefox | issue2504 | 6 | Overall | 20 | 757 | 299 | -459 | -60.57 | faster
firefox | issue2504 | 6 | Page Request | 20 | 2 | 5 | 3 | 150.00 | slower
firefox | issue2504 | 6 | Rendering | 20 | 755 | 294 | -462 | -61.11 | faster
firefox | issue2504 | 7 | Overall | 20 | 394 | 302 | -92 | -23.39 | faster
firefox | issue2504 | 7 | Page Request | 20 | 2 | 1 | -1 | -34.88 | faster
firefox | issue2504 | 7 | Rendering | 20 | 392 | 301 | -91 | -23.32 | faster
firefox | issue2504 | 8 | Overall | 20 | 2875 | 979 | -1896 | -65.95 | faster
firefox | issue2504 | 8 | Page Request | 20 | 1 | 2 | 0 | 11.11 |
firefox | issue2504 | 8 | Rendering | 20 | 2874 | 978 | -1896 | -65.99 | faster
firefox | issue2504 | 9 | Overall | 20 | 700 | 332 | -368 | -52.60 | faster
firefox | issue2504 | 9 | Page Request | 20 | 3 | 2 | 0 | -4.00 |
firefox | issue2504 | 9 | Rendering | 20 | 698 | 329 | -368 | -52.78 | faster
firefox | issue2504 | 10 | Overall | 20 | 3296 | 926 | -2370 | -71.91 | faster
firefox | issue2504 | 10 | Page Request | 20 | 2 | 2 | 0 | -18.75 |
firefox | issue2504 | 10 | Rendering | 20 | 3293 | 924 | -2370 | -71.96 | faster
firefox | issue2504 | 11 | Overall | 20 | 524 | 197 | -327 | -62.34 | faster
firefox | issue2504 | 11 | Page Request | 20 | 2 | 3 | 1 | 58.54 |
firefox | issue2504 | 11 | Rendering | 20 | 522 | 194 | -328 | -62.81 | faster
firefox | issue2504 | 12 | Overall | 20 | 752 | 369 | -384 | -50.98 | faster
firefox | issue2504 | 12 | Page Request | 20 | 3 | 2 | -1 | -36.51 | faster
firefox | issue2504 | 12 | Rendering | 20 | 749 | 367 | -382 | -51.05 | faster
firefox | issue2504 | 13 | Overall | 20 | 679 | 487 | -193 | -28.38 | faster
firefox | issue2504 | 13 | Page Request | 20 | 4 | 2 | -2 | -48.68 | faster
firefox | issue2504 | 13 | Rendering | 20 | 676 | 485 | -191 | -28.28 | faster
firefox | issue2504 | 14 | Overall | 20 | 474 | 283 | -191 | -40.26 | faster
firefox | issue2504 | 14 | Page Request | 20 | 2 | 4 | 2 | 78.57 |
firefox | issue2504 | 14 | Rendering | 20 | 471 | 279 | -192 | -40.79 | faster
firefox | issue2504 | 15 | Overall | 20 | 860 | 618 | -241 | -28.05 | faster
firefox | issue2504 | 15 | Page Request | 20 | 2 | 3 | 0 | 10.87 |
firefox | issue2504 | 15 | Rendering | 20 | 857 | 616 | -241 | -28.15 | faster
firefox | issue2504 | 16 | Overall | 20 | 389 | 243 | -147 | -37.71 | faster
firefox | issue2504 | 16 | Page Request | 20 | 2 | 2 | 0 | 2.33 |
firefox | issue2504 | 16 | Rendering | 20 | 387 | 240 | -147 | -37.94 | faster
firefox | issue2504 | 17 | Overall | 20 | 1484 | 672 | -812 | -54.70 | faster
firefox | issue2504 | 17 | Page Request | 20 | 2 | 3 | 1 | 37.21 |
firefox | issue2504 | 17 | Rendering | 20 | 1482 | 669 | -812 | -54.84 | faster
firefox | issue2504 | 18 | Overall | 20 | 575 | 252 | -323 | -56.12 | faster
firefox | issue2504 | 18 | Page Request | 20 | 2 | 2 | 0 | -16.22 |
firefox | issue2504 | 18 | Rendering | 20 | 573 | 251 | -322 | -56.24 | faster
firefox | issue2504 | 19 | Overall | 20 | 517 | 227 | -290 | -56.08 | faster
firefox | issue2504 | 19 | Page Request | 20 | 2 | 2 | 0 | 21.62 |
firefox | issue2504 | 19 | Rendering | 20 | 515 | 225 | -290 | -56.37 | faster
firefox | issue2504 | 20 | Overall | 20 | 668 | 670 | 2 | 0.31 |
firefox | issue2504 | 20 | Page Request | 20 | 4 | 2 | -1 | -34.29 |
firefox | issue2504 | 20 | Rendering | 20 | 664 | 667 | 3 | 0.49 |
firefox | issue2504 | 21 | Overall | 20 | 486 | 309 | -177 | -36.44 | faster
firefox | issue2504 | 21 | Page Request | 20 | 2 | 2 | 0 | 16.13 |
firefox | issue2504 | 21 | Rendering | 20 | 484 | 307 | -177 | -36.60 | faster
firefox | issue2504 | 22 | Overall | 20 | 543 | 267 | -276 | -50.85 | faster
firefox | issue2504 | 22 | Page Request | 20 | 2 | 2 | 0 | 10.26 |
firefox | issue2504 | 22 | Rendering | 20 | 541 | 265 | -276 | -51.07 | faster
firefox | issue2504 | 23 | Overall | 20 | 3246 | 871 | -2375 | -73.17 | faster
firefox | issue2504 | 23 | Page Request | 20 | 2 | 3 | 1 | 37.21 |
firefox | issue2504 | 23 | Rendering | 20 | 3243 | 868 | -2376 | -73.25 | faster
firefox | issue2504 | 24 | Overall | 20 | 379 | 156 | -223 | -58.83 | faster
firefox | issue2504 | 24 | Page Request | 20 | 2 | 2 | 0 | -2.86 |
firefox | issue2504 | 24 | Rendering | 20 | 378 | 154 | -223 | -59.10 | faster
firefox | issue2504 | 25 | Overall | 20 | 176 | 127 | -50 | -28.19 | faster
firefox | issue2504 | 25 | Page Request | 20 | 2 | 1 | 0 | -15.63 |
firefox | issue2504 | 25 | Rendering | 20 | 175 | 125 | -49 | -28.31 | faster
firefox | issue2504 | 26 | Overall | 20 | 181 | 108 | -74 | -40.67 | faster
firefox | issue2504 | 26 | Page Request | 20 | 3 | 2 | -1 | -39.13 | faster
firefox | issue2504 | 26 | Rendering | 20 | 178 | 105 | -72 | -40.69 | faster
firefox | issue2504 | 27 | Overall | 20 | 208 | 104 | -104 | -49.92 | faster
firefox | issue2504 | 27 | Page Request | 20 | 2 | 2 | 1 | 48.39 |
firefox | issue2504 | 27 | Rendering | 20 | 206 | 102 | -104 | -50.64 | faster
firefox | issue2504 | 28 | Overall | 20 | 241 | 111 | -131 | -54.16 | faster
firefox | issue2504 | 28 | Page Request | 20 | 2 | 2 | -1 | -33.33 |
firefox | issue2504 | 28 | Rendering | 20 | 239 | 109 | -130 | -54.39 | faster
firefox | issue2504 | 29 | Overall | 20 | 321 | 196 | -125 | -39.05 | faster
firefox | issue2504 | 29 | Page Request | 20 | 1 | 2 | 0 | 17.86 |
firefox | issue2504 | 29 | Rendering | 20 | 319 | 194 | -126 | -39.35 | faster
firefox | issue2504 | 30 | Overall | 20 | 651 | 271 | -380 | -58.41 | faster
firefox | issue2504 | 30 | Page Request | 20 | 1 | 2 | 1 | 50.00 |
firefox | issue2504 | 30 | Rendering | 20 | 649 | 269 | -381 | -58.60 | faster
firefox | issue2504 | 31 | Overall | 20 | 1635 | 647 | -988 | -60.42 | faster
firefox | issue2504 | 31 | Page Request | 20 | 1 | 2 | 0 | 30.43 |
firefox | issue2504 | 31 | Rendering | 20 | 1634 | 645 | -988 | -60.49 | faster
firefox | tracemonkey | 0 | Overall | 100 | 51 | 51 | 0 | 0.02 |
firefox | tracemonkey | 0 | Page Request | 100 | 1 | 1 | 0 | -4.76 |
firefox | tracemonkey | 0 | Rendering | 100 | 50 | 50 | 0 | 0.12 |
firefox | tracemonkey | 1 | Overall | 100 | 97 | 91 | -5 | -5.52 | faster
firefox | tracemonkey | 1 | Page Request | 100 | 3 | 3 | 0 | -1.32 |
firefox | tracemonkey | 1 | Rendering | 100 | 94 | 88 | -5 | -5.73 | faster
firefox | tracemonkey | 2 | Overall | 100 | 40 | 40 | 0 | 0.50 |
firefox | tracemonkey | 2 | Page Request | 100 | 1 | 1 | 0 | 3.16 |
firefox | tracemonkey | 2 | Rendering | 100 | 39 | 39 | 0 | 0.54 |
firefox | tracemonkey | 3 | Overall | 100 | 62 | 62 | -1 | -0.94 |
firefox | tracemonkey | 3 | Page Request | 100 | 1 | 1 | 0 | 17.05 |
firefox | tracemonkey | 3 | Rendering | 100 | 61 | 61 | -1 | -1.11 |
firefox | tracemonkey | 4 | Overall | 100 | 56 | 58 | 2 | 3.41 |
firefox | tracemonkey | 4 | Page Request | 100 | 1 | 1 | 0 | 15.31 |
firefox | tracemonkey | 4 | Rendering | 100 | 55 | 57 | 2 | 3.23 |
firefox | tracemonkey | 5 | Overall | 100 | 73 | 71 | -2 | -2.28 |
firefox | tracemonkey | 5 | Page Request | 100 | 2 | 2 | 0 | 12.20 |
firefox | tracemonkey | 5 | Rendering | 100 | 71 | 69 | -2 | -2.69 |
firefox | tracemonkey | 6 | Overall | 100 | 85 | 69 | -16 | -18.73 | faster
firefox | tracemonkey | 6 | Page Request | 100 | 2 | 2 | 0 | -9.90 |
firefox | tracemonkey | 6 | Rendering | 100 | 83 | 67 | -16 | -18.97 | faster
firefox | tracemonkey | 7 | Overall | 100 | 65 | 64 | 0 | -0.37 |
firefox | tracemonkey | 7 | Page Request | 100 | 1 | 1 | 0 | -11.94 |
firefox | tracemonkey | 7 | Rendering | 100 | 63 | 63 | 0 | -0.05 |
firefox | tracemonkey | 8 | Overall | 100 | 53 | 54 | 1 | 2.04 |
firefox | tracemonkey | 8 | Page Request | 100 | 1 | 1 | 0 | 17.02 |
firefox | tracemonkey | 8 | Rendering | 100 | 52 | 53 | 1 | 1.82 |
firefox | tracemonkey | 9 | Overall | 100 | 79 | 73 | -6 | -7.86 | faster
firefox | tracemonkey | 9 | Page Request | 100 | 2 | 2 | 0 | -15.14 |
firefox | tracemonkey | 9 | Rendering | 100 | 77 | 71 | -6 | -7.86 | faster
firefox | tracemonkey | 10 | Overall | 100 | 545 | 519 | -27 | -4.86 | faster
firefox | tracemonkey | 10 | Page Request | 100 | 14 | 13 | 0 | -3.56 |
firefox | tracemonkey | 10 | Rendering | 100 | 532 | 506 | -26 | -4.90 | faster
firefox | tracemonkey | 11 | Overall | 100 | 42 | 41 | -1 | -2.50 |
firefox | tracemonkey | 11 | Page Request | 100 | 1 | 1 | 0 | -27.42 | faster
firefox | tracemonkey | 11 | Rendering | 100 | 41 | 40 | -1 | -1.75 |
firefox | tracemonkey | 12 | Overall | 100 | 350 | 332 | -18 | -5.16 | faster
firefox | tracemonkey | 12 | Page Request | 100 | 3 | 3 | 0 | -5.17 |
firefox | tracemonkey | 12 | Rendering | 100 | 347 | 329 | -18 | -5.15 | faster
firefox | tracemonkey | 13 | Overall | 100 | 31 | 31 | 0 | 0.52 |
firefox | tracemonkey | 13 | Page Request | 100 | 1 | 1 | 0 | 4.95 |
firefox | tracemonkey | 13 | Rendering | 100 | 30 | 30 | 0 | 0.20 |
```
2020-06-13 21:12:40 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
next(
|
|
|
|
|
self
|
|
|
|
|
.parseColorSpace({
|
|
|
|
|
cs: args[0],
|
|
|
|
|
resources,
|
|
|
|
|
localColorSpaceCache,
|
|
|
|
|
})
|
|
|
|
|
.then(function (colorSpace) {
|
|
|
|
|
if (colorSpace) {
|
|
|
|
|
stateManager.state.fillColorSpace = colorSpace;
|
|
|
|
|
}
|
|
|
|
|
})
|
|
|
|
|
);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
case OPS.setStrokeColorSpace: {
|
|
|
|
|
const cachedColorSpace = ColorSpace.getCached(
|
|
|
|
|
args[0],
|
|
|
|
|
xref,
|
|
|
|
|
localColorSpaceCache
|
|
|
|
|
);
|
|
|
|
|
if (cachedColorSpace) {
|
|
|
|
|
stateManager.state.strokeColorSpace = cachedColorSpace;
|
|
|
|
|
continue;
|
Add local caching of `ColorSpace`s, by name, in `PartialEvaluator.getOperatorList` (issue 2504)
By caching parsed `ColorSpace`s, we thus don't need to re-parse the same data over and over which saves CPU cycles *and* reduces peak memory usage. (Obviously persistent memory usage *may* increase a tiny bit, but since the caching is done per `PartialEvaluator.getOperatorList` invocation and given that `ColorSpace` instances generally hold very little data this shouldn't be much of an issue.)
Furthermore, by caching `ColorSpace`s we can also lookup the already parsed ones *synchronously* during the `OperatorList` building, instead of having to defer to the event loop/microtask queue since the parsing is done asynchronously (such that error handling is easier).
Possible future improvements:
- Cache/lookup parsed `ColorSpaces` used in `Pattern`s and `Image`s.
- Attempt to cache *local* `ColorSpace`s by reference as well, in addition to only by name, assuming that there's documents where that would be beneficial and that it's not too difficult to implement.
- Assuming there's documents that would benefit from it, also cache repeated `ColorSpace`s *globally* as well.
Given that we've never, until now, been doing *any* caching of parsed `ColorSpace`s and that even using a simple name-only *local* cache helps tremendously in pathological cases, I purposely decided against complicating the implementation too much initially.
Also, compared to parsing of `Image`s, simply creating a `ColorSpace` instance isn't that expensive (hence I'd be somewhat surprised if adding a *global* cache would help much).
---
This patch was tested using:
- The default `tracemonkey` PDF file, which was included mostly to show that "normal" documents aren't negatively affected by these changes.
- The PDF file from issue 2504, i.e. https://dl-ctlg.panasonic.com/jp/manual/sd/sd_rbm1000_0.pdf, where most pages will switch *thousands* of times between a handful of `ColorSpace`s.
with the following manifest file:
```
[
{ "id": "tracemonkey",
"file": "pdfs/tracemonkey.pdf",
"md5": "9a192d8b1a7dc652a19835f6f08098bd",
"rounds": 100,
"type": "eq"
},
{ "id": "issue2504",
"file": "../web/pdfs/issue2504.pdf",
"md5": "",
"rounds": 20,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
- Overall
```
-- Grouped By browser, pdf, stat --
browser | pdf | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ----------- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | issue2504 | Overall | 640 | 977 | 497 | -479 | -49.08 | faster
firefox | issue2504 | Page Request | 640 | 3 | 4 | 1 | 59.18 |
firefox | issue2504 | Rendering | 640 | 974 | 493 | -481 | -49.37 | faster
firefox | tracemonkey | Overall | 1400 | 116 | 111 | -5 | -4.43 |
firefox | tracemonkey | Page Request | 1400 | 2 | 2 | 0 | -2.86 |
firefox | tracemonkey | Rendering | 1400 | 114 | 109 | -5 | -4.47 |
```
- Page-specific
```
-- Grouped By browser, pdf, page, stat --
browser | pdf | page | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ----------- | ---- | ------------ | ----- | ------------ | ----------- | ----- | ------- | -------------
firefox | issue2504 | 0 | Overall | 20 | 2295 | 1268 | -1027 | -44.76 | faster
firefox | issue2504 | 0 | Page Request | 20 | 6 | 7 | 1 | 15.32 |
firefox | issue2504 | 0 | Rendering | 20 | 2288 | 1260 | -1028 | -44.93 | faster
firefox | issue2504 | 1 | Overall | 20 | 3059 | 2806 | -252 | -8.25 | faster
firefox | issue2504 | 1 | Page Request | 20 | 11 | 14 | 3 | 23.25 | slower
firefox | issue2504 | 1 | Rendering | 20 | 3047 | 2792 | -255 | -8.37 | faster
firefox | issue2504 | 2 | Overall | 20 | 411 | 295 | -116 | -28.20 | faster
firefox | issue2504 | 2 | Page Request | 20 | 2 | 42 | 40 | 1897.62 |
firefox | issue2504 | 2 | Rendering | 20 | 409 | 253 | -156 | -38.09 | faster
firefox | issue2504 | 3 | Overall | 20 | 736 | 299 | -437 | -59.34 | faster
firefox | issue2504 | 3 | Page Request | 20 | 2 | 2 | 0 | 0.00 |
firefox | issue2504 | 3 | Rendering | 20 | 734 | 297 | -437 | -59.49 | faster
firefox | issue2504 | 4 | Overall | 20 | 356 | 458 | 102 | 28.63 |
firefox | issue2504 | 4 | Page Request | 20 | 1 | 2 | 1 | 57.14 | slower
firefox | issue2504 | 4 | Rendering | 20 | 354 | 455 | 101 | 28.53 |
firefox | issue2504 | 5 | Overall | 20 | 1381 | 765 | -616 | -44.59 | faster
firefox | issue2504 | 5 | Page Request | 20 | 3 | 5 | 2 | 50.00 | slower
firefox | issue2504 | 5 | Rendering | 20 | 1378 | 760 | -617 | -44.81 | faster
firefox | issue2504 | 6 | Overall | 20 | 757 | 299 | -459 | -60.57 | faster
firefox | issue2504 | 6 | Page Request | 20 | 2 | 5 | 3 | 150.00 | slower
firefox | issue2504 | 6 | Rendering | 20 | 755 | 294 | -462 | -61.11 | faster
firefox | issue2504 | 7 | Overall | 20 | 394 | 302 | -92 | -23.39 | faster
firefox | issue2504 | 7 | Page Request | 20 | 2 | 1 | -1 | -34.88 | faster
firefox | issue2504 | 7 | Rendering | 20 | 392 | 301 | -91 | -23.32 | faster
firefox | issue2504 | 8 | Overall | 20 | 2875 | 979 | -1896 | -65.95 | faster
firefox | issue2504 | 8 | Page Request | 20 | 1 | 2 | 0 | 11.11 |
firefox | issue2504 | 8 | Rendering | 20 | 2874 | 978 | -1896 | -65.99 | faster
firefox | issue2504 | 9 | Overall | 20 | 700 | 332 | -368 | -52.60 | faster
firefox | issue2504 | 9 | Page Request | 20 | 3 | 2 | 0 | -4.00 |
firefox | issue2504 | 9 | Rendering | 20 | 698 | 329 | -368 | -52.78 | faster
firefox | issue2504 | 10 | Overall | 20 | 3296 | 926 | -2370 | -71.91 | faster
firefox | issue2504 | 10 | Page Request | 20 | 2 | 2 | 0 | -18.75 |
firefox | issue2504 | 10 | Rendering | 20 | 3293 | 924 | -2370 | -71.96 | faster
firefox | issue2504 | 11 | Overall | 20 | 524 | 197 | -327 | -62.34 | faster
firefox | issue2504 | 11 | Page Request | 20 | 2 | 3 | 1 | 58.54 |
firefox | issue2504 | 11 | Rendering | 20 | 522 | 194 | -328 | -62.81 | faster
firefox | issue2504 | 12 | Overall | 20 | 752 | 369 | -384 | -50.98 | faster
firefox | issue2504 | 12 | Page Request | 20 | 3 | 2 | -1 | -36.51 | faster
firefox | issue2504 | 12 | Rendering | 20 | 749 | 367 | -382 | -51.05 | faster
firefox | issue2504 | 13 | Overall | 20 | 679 | 487 | -193 | -28.38 | faster
firefox | issue2504 | 13 | Page Request | 20 | 4 | 2 | -2 | -48.68 | faster
firefox | issue2504 | 13 | Rendering | 20 | 676 | 485 | -191 | -28.28 | faster
firefox | issue2504 | 14 | Overall | 20 | 474 | 283 | -191 | -40.26 | faster
firefox | issue2504 | 14 | Page Request | 20 | 2 | 4 | 2 | 78.57 |
firefox | issue2504 | 14 | Rendering | 20 | 471 | 279 | -192 | -40.79 | faster
firefox | issue2504 | 15 | Overall | 20 | 860 | 618 | -241 | -28.05 | faster
firefox | issue2504 | 15 | Page Request | 20 | 2 | 3 | 0 | 10.87 |
firefox | issue2504 | 15 | Rendering | 20 | 857 | 616 | -241 | -28.15 | faster
firefox | issue2504 | 16 | Overall | 20 | 389 | 243 | -147 | -37.71 | faster
firefox | issue2504 | 16 | Page Request | 20 | 2 | 2 | 0 | 2.33 |
firefox | issue2504 | 16 | Rendering | 20 | 387 | 240 | -147 | -37.94 | faster
firefox | issue2504 | 17 | Overall | 20 | 1484 | 672 | -812 | -54.70 | faster
firefox | issue2504 | 17 | Page Request | 20 | 2 | 3 | 1 | 37.21 |
firefox | issue2504 | 17 | Rendering | 20 | 1482 | 669 | -812 | -54.84 | faster
firefox | issue2504 | 18 | Overall | 20 | 575 | 252 | -323 | -56.12 | faster
firefox | issue2504 | 18 | Page Request | 20 | 2 | 2 | 0 | -16.22 |
firefox | issue2504 | 18 | Rendering | 20 | 573 | 251 | -322 | -56.24 | faster
firefox | issue2504 | 19 | Overall | 20 | 517 | 227 | -290 | -56.08 | faster
firefox | issue2504 | 19 | Page Request | 20 | 2 | 2 | 0 | 21.62 |
firefox | issue2504 | 19 | Rendering | 20 | 515 | 225 | -290 | -56.37 | faster
firefox | issue2504 | 20 | Overall | 20 | 668 | 670 | 2 | 0.31 |
firefox | issue2504 | 20 | Page Request | 20 | 4 | 2 | -1 | -34.29 |
firefox | issue2504 | 20 | Rendering | 20 | 664 | 667 | 3 | 0.49 |
firefox | issue2504 | 21 | Overall | 20 | 486 | 309 | -177 | -36.44 | faster
firefox | issue2504 | 21 | Page Request | 20 | 2 | 2 | 0 | 16.13 |
firefox | issue2504 | 21 | Rendering | 20 | 484 | 307 | -177 | -36.60 | faster
firefox | issue2504 | 22 | Overall | 20 | 543 | 267 | -276 | -50.85 | faster
firefox | issue2504 | 22 | Page Request | 20 | 2 | 2 | 0 | 10.26 |
firefox | issue2504 | 22 | Rendering | 20 | 541 | 265 | -276 | -51.07 | faster
firefox | issue2504 | 23 | Overall | 20 | 3246 | 871 | -2375 | -73.17 | faster
firefox | issue2504 | 23 | Page Request | 20 | 2 | 3 | 1 | 37.21 |
firefox | issue2504 | 23 | Rendering | 20 | 3243 | 868 | -2376 | -73.25 | faster
firefox | issue2504 | 24 | Overall | 20 | 379 | 156 | -223 | -58.83 | faster
firefox | issue2504 | 24 | Page Request | 20 | 2 | 2 | 0 | -2.86 |
firefox | issue2504 | 24 | Rendering | 20 | 378 | 154 | -223 | -59.10 | faster
firefox | issue2504 | 25 | Overall | 20 | 176 | 127 | -50 | -28.19 | faster
firefox | issue2504 | 25 | Page Request | 20 | 2 | 1 | 0 | -15.63 |
firefox | issue2504 | 25 | Rendering | 20 | 175 | 125 | -49 | -28.31 | faster
firefox | issue2504 | 26 | Overall | 20 | 181 | 108 | -74 | -40.67 | faster
firefox | issue2504 | 26 | Page Request | 20 | 3 | 2 | -1 | -39.13 | faster
firefox | issue2504 | 26 | Rendering | 20 | 178 | 105 | -72 | -40.69 | faster
firefox | issue2504 | 27 | Overall | 20 | 208 | 104 | -104 | -49.92 | faster
firefox | issue2504 | 27 | Page Request | 20 | 2 | 2 | 1 | 48.39 |
firefox | issue2504 | 27 | Rendering | 20 | 206 | 102 | -104 | -50.64 | faster
firefox | issue2504 | 28 | Overall | 20 | 241 | 111 | -131 | -54.16 | faster
firefox | issue2504 | 28 | Page Request | 20 | 2 | 2 | -1 | -33.33 |
firefox | issue2504 | 28 | Rendering | 20 | 239 | 109 | -130 | -54.39 | faster
firefox | issue2504 | 29 | Overall | 20 | 321 | 196 | -125 | -39.05 | faster
firefox | issue2504 | 29 | Page Request | 20 | 1 | 2 | 0 | 17.86 |
firefox | issue2504 | 29 | Rendering | 20 | 319 | 194 | -126 | -39.35 | faster
firefox | issue2504 | 30 | Overall | 20 | 651 | 271 | -380 | -58.41 | faster
firefox | issue2504 | 30 | Page Request | 20 | 1 | 2 | 1 | 50.00 |
firefox | issue2504 | 30 | Rendering | 20 | 649 | 269 | -381 | -58.60 | faster
firefox | issue2504 | 31 | Overall | 20 | 1635 | 647 | -988 | -60.42 | faster
firefox | issue2504 | 31 | Page Request | 20 | 1 | 2 | 0 | 30.43 |
firefox | issue2504 | 31 | Rendering | 20 | 1634 | 645 | -988 | -60.49 | faster
firefox | tracemonkey | 0 | Overall | 100 | 51 | 51 | 0 | 0.02 |
firefox | tracemonkey | 0 | Page Request | 100 | 1 | 1 | 0 | -4.76 |
firefox | tracemonkey | 0 | Rendering | 100 | 50 | 50 | 0 | 0.12 |
firefox | tracemonkey | 1 | Overall | 100 | 97 | 91 | -5 | -5.52 | faster
firefox | tracemonkey | 1 | Page Request | 100 | 3 | 3 | 0 | -1.32 |
firefox | tracemonkey | 1 | Rendering | 100 | 94 | 88 | -5 | -5.73 | faster
firefox | tracemonkey | 2 | Overall | 100 | 40 | 40 | 0 | 0.50 |
firefox | tracemonkey | 2 | Page Request | 100 | 1 | 1 | 0 | 3.16 |
firefox | tracemonkey | 2 | Rendering | 100 | 39 | 39 | 0 | 0.54 |
firefox | tracemonkey | 3 | Overall | 100 | 62 | 62 | -1 | -0.94 |
firefox | tracemonkey | 3 | Page Request | 100 | 1 | 1 | 0 | 17.05 |
firefox | tracemonkey | 3 | Rendering | 100 | 61 | 61 | -1 | -1.11 |
firefox | tracemonkey | 4 | Overall | 100 | 56 | 58 | 2 | 3.41 |
firefox | tracemonkey | 4 | Page Request | 100 | 1 | 1 | 0 | 15.31 |
firefox | tracemonkey | 4 | Rendering | 100 | 55 | 57 | 2 | 3.23 |
firefox | tracemonkey | 5 | Overall | 100 | 73 | 71 | -2 | -2.28 |
firefox | tracemonkey | 5 | Page Request | 100 | 2 | 2 | 0 | 12.20 |
firefox | tracemonkey | 5 | Rendering | 100 | 71 | 69 | -2 | -2.69 |
firefox | tracemonkey | 6 | Overall | 100 | 85 | 69 | -16 | -18.73 | faster
firefox | tracemonkey | 6 | Page Request | 100 | 2 | 2 | 0 | -9.90 |
firefox | tracemonkey | 6 | Rendering | 100 | 83 | 67 | -16 | -18.97 | faster
firefox | tracemonkey | 7 | Overall | 100 | 65 | 64 | 0 | -0.37 |
firefox | tracemonkey | 7 | Page Request | 100 | 1 | 1 | 0 | -11.94 |
firefox | tracemonkey | 7 | Rendering | 100 | 63 | 63 | 0 | -0.05 |
firefox | tracemonkey | 8 | Overall | 100 | 53 | 54 | 1 | 2.04 |
firefox | tracemonkey | 8 | Page Request | 100 | 1 | 1 | 0 | 17.02 |
firefox | tracemonkey | 8 | Rendering | 100 | 52 | 53 | 1 | 1.82 |
firefox | tracemonkey | 9 | Overall | 100 | 79 | 73 | -6 | -7.86 | faster
firefox | tracemonkey | 9 | Page Request | 100 | 2 | 2 | 0 | -15.14 |
firefox | tracemonkey | 9 | Rendering | 100 | 77 | 71 | -6 | -7.86 | faster
firefox | tracemonkey | 10 | Overall | 100 | 545 | 519 | -27 | -4.86 | faster
firefox | tracemonkey | 10 | Page Request | 100 | 14 | 13 | 0 | -3.56 |
firefox | tracemonkey | 10 | Rendering | 100 | 532 | 506 | -26 | -4.90 | faster
firefox | tracemonkey | 11 | Overall | 100 | 42 | 41 | -1 | -2.50 |
firefox | tracemonkey | 11 | Page Request | 100 | 1 | 1 | 0 | -27.42 | faster
firefox | tracemonkey | 11 | Rendering | 100 | 41 | 40 | -1 | -1.75 |
firefox | tracemonkey | 12 | Overall | 100 | 350 | 332 | -18 | -5.16 | faster
firefox | tracemonkey | 12 | Page Request | 100 | 3 | 3 | 0 | -5.17 |
firefox | tracemonkey | 12 | Rendering | 100 | 347 | 329 | -18 | -5.15 | faster
firefox | tracemonkey | 13 | Overall | 100 | 31 | 31 | 0 | 0.52 |
firefox | tracemonkey | 13 | Page Request | 100 | 1 | 1 | 0 | 4.95 |
firefox | tracemonkey | 13 | Rendering | 100 | 30 | 30 | 0 | 0.20 |
```
2020-06-13 21:12:40 +09:00
|
|
|
|
}
|
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
next(
|
|
|
|
|
self
|
|
|
|
|
.parseColorSpace({
|
|
|
|
|
cs: args[0],
|
|
|
|
|
resources,
|
|
|
|
|
localColorSpaceCache,
|
|
|
|
|
})
|
|
|
|
|
.then(function (colorSpace) {
|
|
|
|
|
if (colorSpace) {
|
|
|
|
|
stateManager.state.strokeColorSpace = colorSpace;
|
|
|
|
|
}
|
|
|
|
|
})
|
|
|
|
|
);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
case OPS.setFillColor:
|
|
|
|
|
cs = stateManager.state.fillColorSpace;
|
|
|
|
|
args = cs.getRgb(args, 0);
|
|
|
|
|
fn = OPS.setFillRGBColor;
|
|
|
|
|
break;
|
|
|
|
|
case OPS.setStrokeColor:
|
|
|
|
|
cs = stateManager.state.strokeColorSpace;
|
|
|
|
|
args = cs.getRgb(args, 0);
|
|
|
|
|
fn = OPS.setStrokeRGBColor;
|
|
|
|
|
break;
|
|
|
|
|
case OPS.setFillGray:
|
|
|
|
|
stateManager.state.fillColorSpace = ColorSpace.singletons.gray;
|
|
|
|
|
args = ColorSpace.singletons.gray.getRgb(args, 0);
|
|
|
|
|
fn = OPS.setFillRGBColor;
|
|
|
|
|
break;
|
|
|
|
|
case OPS.setStrokeGray:
|
|
|
|
|
stateManager.state.strokeColorSpace = ColorSpace.singletons.gray;
|
|
|
|
|
args = ColorSpace.singletons.gray.getRgb(args, 0);
|
|
|
|
|
fn = OPS.setStrokeRGBColor;
|
|
|
|
|
break;
|
|
|
|
|
case OPS.setFillCMYKColor:
|
|
|
|
|
stateManager.state.fillColorSpace = ColorSpace.singletons.cmyk;
|
|
|
|
|
args = ColorSpace.singletons.cmyk.getRgb(args, 0);
|
|
|
|
|
fn = OPS.setFillRGBColor;
|
|
|
|
|
break;
|
|
|
|
|
case OPS.setStrokeCMYKColor:
|
|
|
|
|
stateManager.state.strokeColorSpace = ColorSpace.singletons.cmyk;
|
|
|
|
|
args = ColorSpace.singletons.cmyk.getRgb(args, 0);
|
|
|
|
|
fn = OPS.setStrokeRGBColor;
|
|
|
|
|
break;
|
|
|
|
|
case OPS.setFillRGBColor:
|
|
|
|
|
stateManager.state.fillColorSpace = ColorSpace.singletons.rgb;
|
|
|
|
|
args = ColorSpace.singletons.rgb.getRgb(args, 0);
|
|
|
|
|
break;
|
|
|
|
|
case OPS.setStrokeRGBColor:
|
|
|
|
|
stateManager.state.strokeColorSpace = ColorSpace.singletons.rgb;
|
|
|
|
|
args = ColorSpace.singletons.rgb.getRgb(args, 0);
|
|
|
|
|
break;
|
|
|
|
|
case OPS.setFillColorN:
|
|
|
|
|
cs = stateManager.state.fillColorSpace;
|
|
|
|
|
if (cs.name === "Pattern") {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
next(
|
2020-07-05 19:20:10 +09:00
|
|
|
|
self.handleColorN(
|
|
|
|
|
operatorList,
|
|
|
|
|
OPS.setFillColorN,
|
|
|
|
|
args,
|
|
|
|
|
cs,
|
|
|
|
|
patterns,
|
|
|
|
|
resources,
|
|
|
|
|
task,
|
2020-10-09 00:33:23 +09:00
|
|
|
|
localColorSpaceCache,
|
2021-07-22 04:27:39 +09:00
|
|
|
|
localTilingPatternCache,
|
|
|
|
|
localShadingPatternCache
|
2020-07-05 19:20:10 +09:00
|
|
|
|
)
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
);
|
2019-10-31 23:53:51 +09:00
|
|
|
|
return;
|
Add local caching of `ColorSpace`s, by name, in `PartialEvaluator.getOperatorList` (issue 2504)
By caching parsed `ColorSpace`s, we thus don't need to re-parse the same data over and over which saves CPU cycles *and* reduces peak memory usage. (Obviously persistent memory usage *may* increase a tiny bit, but since the caching is done per `PartialEvaluator.getOperatorList` invocation and given that `ColorSpace` instances generally hold very little data this shouldn't be much of an issue.)
Furthermore, by caching `ColorSpace`s we can also lookup the already parsed ones *synchronously* during the `OperatorList` building, instead of having to defer to the event loop/microtask queue since the parsing is done asynchronously (such that error handling is easier).
Possible future improvements:
- Cache/lookup parsed `ColorSpaces` used in `Pattern`s and `Image`s.
- Attempt to cache *local* `ColorSpace`s by reference as well, in addition to only by name, assuming that there's documents where that would be beneficial and that it's not too difficult to implement.
- Assuming there's documents that would benefit from it, also cache repeated `ColorSpace`s *globally* as well.
Given that we've never, until now, been doing *any* caching of parsed `ColorSpace`s and that even using a simple name-only *local* cache helps tremendously in pathological cases, I purposely decided against complicating the implementation too much initially.
Also, compared to parsing of `Image`s, simply creating a `ColorSpace` instance isn't that expensive (hence I'd be somewhat surprised if adding a *global* cache would help much).
---
This patch was tested using:
- The default `tracemonkey` PDF file, which was included mostly to show that "normal" documents aren't negatively affected by these changes.
- The PDF file from issue 2504, i.e. https://dl-ctlg.panasonic.com/jp/manual/sd/sd_rbm1000_0.pdf, where most pages will switch *thousands* of times between a handful of `ColorSpace`s.
with the following manifest file:
```
[
{ "id": "tracemonkey",
"file": "pdfs/tracemonkey.pdf",
"md5": "9a192d8b1a7dc652a19835f6f08098bd",
"rounds": 100,
"type": "eq"
},
{ "id": "issue2504",
"file": "../web/pdfs/issue2504.pdf",
"md5": "",
"rounds": 20,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
- Overall
```
-- Grouped By browser, pdf, stat --
browser | pdf | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ----------- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | issue2504 | Overall | 640 | 977 | 497 | -479 | -49.08 | faster
firefox | issue2504 | Page Request | 640 | 3 | 4 | 1 | 59.18 |
firefox | issue2504 | Rendering | 640 | 974 | 493 | -481 | -49.37 | faster
firefox | tracemonkey | Overall | 1400 | 116 | 111 | -5 | -4.43 |
firefox | tracemonkey | Page Request | 1400 | 2 | 2 | 0 | -2.86 |
firefox | tracemonkey | Rendering | 1400 | 114 | 109 | -5 | -4.47 |
```
- Page-specific
```
-- Grouped By browser, pdf, page, stat --
browser | pdf | page | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ----------- | ---- | ------------ | ----- | ------------ | ----------- | ----- | ------- | -------------
firefox | issue2504 | 0 | Overall | 20 | 2295 | 1268 | -1027 | -44.76 | faster
firefox | issue2504 | 0 | Page Request | 20 | 6 | 7 | 1 | 15.32 |
firefox | issue2504 | 0 | Rendering | 20 | 2288 | 1260 | -1028 | -44.93 | faster
firefox | issue2504 | 1 | Overall | 20 | 3059 | 2806 | -252 | -8.25 | faster
firefox | issue2504 | 1 | Page Request | 20 | 11 | 14 | 3 | 23.25 | slower
firefox | issue2504 | 1 | Rendering | 20 | 3047 | 2792 | -255 | -8.37 | faster
firefox | issue2504 | 2 | Overall | 20 | 411 | 295 | -116 | -28.20 | faster
firefox | issue2504 | 2 | Page Request | 20 | 2 | 42 | 40 | 1897.62 |
firefox | issue2504 | 2 | Rendering | 20 | 409 | 253 | -156 | -38.09 | faster
firefox | issue2504 | 3 | Overall | 20 | 736 | 299 | -437 | -59.34 | faster
firefox | issue2504 | 3 | Page Request | 20 | 2 | 2 | 0 | 0.00 |
firefox | issue2504 | 3 | Rendering | 20 | 734 | 297 | -437 | -59.49 | faster
firefox | issue2504 | 4 | Overall | 20 | 356 | 458 | 102 | 28.63 |
firefox | issue2504 | 4 | Page Request | 20 | 1 | 2 | 1 | 57.14 | slower
firefox | issue2504 | 4 | Rendering | 20 | 354 | 455 | 101 | 28.53 |
firefox | issue2504 | 5 | Overall | 20 | 1381 | 765 | -616 | -44.59 | faster
firefox | issue2504 | 5 | Page Request | 20 | 3 | 5 | 2 | 50.00 | slower
firefox | issue2504 | 5 | Rendering | 20 | 1378 | 760 | -617 | -44.81 | faster
firefox | issue2504 | 6 | Overall | 20 | 757 | 299 | -459 | -60.57 | faster
firefox | issue2504 | 6 | Page Request | 20 | 2 | 5 | 3 | 150.00 | slower
firefox | issue2504 | 6 | Rendering | 20 | 755 | 294 | -462 | -61.11 | faster
firefox | issue2504 | 7 | Overall | 20 | 394 | 302 | -92 | -23.39 | faster
firefox | issue2504 | 7 | Page Request | 20 | 2 | 1 | -1 | -34.88 | faster
firefox | issue2504 | 7 | Rendering | 20 | 392 | 301 | -91 | -23.32 | faster
firefox | issue2504 | 8 | Overall | 20 | 2875 | 979 | -1896 | -65.95 | faster
firefox | issue2504 | 8 | Page Request | 20 | 1 | 2 | 0 | 11.11 |
firefox | issue2504 | 8 | Rendering | 20 | 2874 | 978 | -1896 | -65.99 | faster
firefox | issue2504 | 9 | Overall | 20 | 700 | 332 | -368 | -52.60 | faster
firefox | issue2504 | 9 | Page Request | 20 | 3 | 2 | 0 | -4.00 |
firefox | issue2504 | 9 | Rendering | 20 | 698 | 329 | -368 | -52.78 | faster
firefox | issue2504 | 10 | Overall | 20 | 3296 | 926 | -2370 | -71.91 | faster
firefox | issue2504 | 10 | Page Request | 20 | 2 | 2 | 0 | -18.75 |
firefox | issue2504 | 10 | Rendering | 20 | 3293 | 924 | -2370 | -71.96 | faster
firefox | issue2504 | 11 | Overall | 20 | 524 | 197 | -327 | -62.34 | faster
firefox | issue2504 | 11 | Page Request | 20 | 2 | 3 | 1 | 58.54 |
firefox | issue2504 | 11 | Rendering | 20 | 522 | 194 | -328 | -62.81 | faster
firefox | issue2504 | 12 | Overall | 20 | 752 | 369 | -384 | -50.98 | faster
firefox | issue2504 | 12 | Page Request | 20 | 3 | 2 | -1 | -36.51 | faster
firefox | issue2504 | 12 | Rendering | 20 | 749 | 367 | -382 | -51.05 | faster
firefox | issue2504 | 13 | Overall | 20 | 679 | 487 | -193 | -28.38 | faster
firefox | issue2504 | 13 | Page Request | 20 | 4 | 2 | -2 | -48.68 | faster
firefox | issue2504 | 13 | Rendering | 20 | 676 | 485 | -191 | -28.28 | faster
firefox | issue2504 | 14 | Overall | 20 | 474 | 283 | -191 | -40.26 | faster
firefox | issue2504 | 14 | Page Request | 20 | 2 | 4 | 2 | 78.57 |
firefox | issue2504 | 14 | Rendering | 20 | 471 | 279 | -192 | -40.79 | faster
firefox | issue2504 | 15 | Overall | 20 | 860 | 618 | -241 | -28.05 | faster
firefox | issue2504 | 15 | Page Request | 20 | 2 | 3 | 0 | 10.87 |
firefox | issue2504 | 15 | Rendering | 20 | 857 | 616 | -241 | -28.15 | faster
firefox | issue2504 | 16 | Overall | 20 | 389 | 243 | -147 | -37.71 | faster
firefox | issue2504 | 16 | Page Request | 20 | 2 | 2 | 0 | 2.33 |
firefox | issue2504 | 16 | Rendering | 20 | 387 | 240 | -147 | -37.94 | faster
firefox | issue2504 | 17 | Overall | 20 | 1484 | 672 | -812 | -54.70 | faster
firefox | issue2504 | 17 | Page Request | 20 | 2 | 3 | 1 | 37.21 |
firefox | issue2504 | 17 | Rendering | 20 | 1482 | 669 | -812 | -54.84 | faster
firefox | issue2504 | 18 | Overall | 20 | 575 | 252 | -323 | -56.12 | faster
firefox | issue2504 | 18 | Page Request | 20 | 2 | 2 | 0 | -16.22 |
firefox | issue2504 | 18 | Rendering | 20 | 573 | 251 | -322 | -56.24 | faster
firefox | issue2504 | 19 | Overall | 20 | 517 | 227 | -290 | -56.08 | faster
firefox | issue2504 | 19 | Page Request | 20 | 2 | 2 | 0 | 21.62 |
firefox | issue2504 | 19 | Rendering | 20 | 515 | 225 | -290 | -56.37 | faster
firefox | issue2504 | 20 | Overall | 20 | 668 | 670 | 2 | 0.31 |
firefox | issue2504 | 20 | Page Request | 20 | 4 | 2 | -1 | -34.29 |
firefox | issue2504 | 20 | Rendering | 20 | 664 | 667 | 3 | 0.49 |
firefox | issue2504 | 21 | Overall | 20 | 486 | 309 | -177 | -36.44 | faster
firefox | issue2504 | 21 | Page Request | 20 | 2 | 2 | 0 | 16.13 |
firefox | issue2504 | 21 | Rendering | 20 | 484 | 307 | -177 | -36.60 | faster
firefox | issue2504 | 22 | Overall | 20 | 543 | 267 | -276 | -50.85 | faster
firefox | issue2504 | 22 | Page Request | 20 | 2 | 2 | 0 | 10.26 |
firefox | issue2504 | 22 | Rendering | 20 | 541 | 265 | -276 | -51.07 | faster
firefox | issue2504 | 23 | Overall | 20 | 3246 | 871 | -2375 | -73.17 | faster
firefox | issue2504 | 23 | Page Request | 20 | 2 | 3 | 1 | 37.21 |
firefox | issue2504 | 23 | Rendering | 20 | 3243 | 868 | -2376 | -73.25 | faster
firefox | issue2504 | 24 | Overall | 20 | 379 | 156 | -223 | -58.83 | faster
firefox | issue2504 | 24 | Page Request | 20 | 2 | 2 | 0 | -2.86 |
firefox | issue2504 | 24 | Rendering | 20 | 378 | 154 | -223 | -59.10 | faster
firefox | issue2504 | 25 | Overall | 20 | 176 | 127 | -50 | -28.19 | faster
firefox | issue2504 | 25 | Page Request | 20 | 2 | 1 | 0 | -15.63 |
firefox | issue2504 | 25 | Rendering | 20 | 175 | 125 | -49 | -28.31 | faster
firefox | issue2504 | 26 | Overall | 20 | 181 | 108 | -74 | -40.67 | faster
firefox | issue2504 | 26 | Page Request | 20 | 3 | 2 | -1 | -39.13 | faster
firefox | issue2504 | 26 | Rendering | 20 | 178 | 105 | -72 | -40.69 | faster
firefox | issue2504 | 27 | Overall | 20 | 208 | 104 | -104 | -49.92 | faster
firefox | issue2504 | 27 | Page Request | 20 | 2 | 2 | 1 | 48.39 |
firefox | issue2504 | 27 | Rendering | 20 | 206 | 102 | -104 | -50.64 | faster
firefox | issue2504 | 28 | Overall | 20 | 241 | 111 | -131 | -54.16 | faster
firefox | issue2504 | 28 | Page Request | 20 | 2 | 2 | -1 | -33.33 |
firefox | issue2504 | 28 | Rendering | 20 | 239 | 109 | -130 | -54.39 | faster
firefox | issue2504 | 29 | Overall | 20 | 321 | 196 | -125 | -39.05 | faster
firefox | issue2504 | 29 | Page Request | 20 | 1 | 2 | 0 | 17.86 |
firefox | issue2504 | 29 | Rendering | 20 | 319 | 194 | -126 | -39.35 | faster
firefox | issue2504 | 30 | Overall | 20 | 651 | 271 | -380 | -58.41 | faster
firefox | issue2504 | 30 | Page Request | 20 | 1 | 2 | 1 | 50.00 |
firefox | issue2504 | 30 | Rendering | 20 | 649 | 269 | -381 | -58.60 | faster
firefox | issue2504 | 31 | Overall | 20 | 1635 | 647 | -988 | -60.42 | faster
firefox | issue2504 | 31 | Page Request | 20 | 1 | 2 | 0 | 30.43 |
firefox | issue2504 | 31 | Rendering | 20 | 1634 | 645 | -988 | -60.49 | faster
firefox | tracemonkey | 0 | Overall | 100 | 51 | 51 | 0 | 0.02 |
firefox | tracemonkey | 0 | Page Request | 100 | 1 | 1 | 0 | -4.76 |
firefox | tracemonkey | 0 | Rendering | 100 | 50 | 50 | 0 | 0.12 |
firefox | tracemonkey | 1 | Overall | 100 | 97 | 91 | -5 | -5.52 | faster
firefox | tracemonkey | 1 | Page Request | 100 | 3 | 3 | 0 | -1.32 |
firefox | tracemonkey | 1 | Rendering | 100 | 94 | 88 | -5 | -5.73 | faster
firefox | tracemonkey | 2 | Overall | 100 | 40 | 40 | 0 | 0.50 |
firefox | tracemonkey | 2 | Page Request | 100 | 1 | 1 | 0 | 3.16 |
firefox | tracemonkey | 2 | Rendering | 100 | 39 | 39 | 0 | 0.54 |
firefox | tracemonkey | 3 | Overall | 100 | 62 | 62 | -1 | -0.94 |
firefox | tracemonkey | 3 | Page Request | 100 | 1 | 1 | 0 | 17.05 |
firefox | tracemonkey | 3 | Rendering | 100 | 61 | 61 | -1 | -1.11 |
firefox | tracemonkey | 4 | Overall | 100 | 56 | 58 | 2 | 3.41 |
firefox | tracemonkey | 4 | Page Request | 100 | 1 | 1 | 0 | 15.31 |
firefox | tracemonkey | 4 | Rendering | 100 | 55 | 57 | 2 | 3.23 |
firefox | tracemonkey | 5 | Overall | 100 | 73 | 71 | -2 | -2.28 |
firefox | tracemonkey | 5 | Page Request | 100 | 2 | 2 | 0 | 12.20 |
firefox | tracemonkey | 5 | Rendering | 100 | 71 | 69 | -2 | -2.69 |
firefox | tracemonkey | 6 | Overall | 100 | 85 | 69 | -16 | -18.73 | faster
firefox | tracemonkey | 6 | Page Request | 100 | 2 | 2 | 0 | -9.90 |
firefox | tracemonkey | 6 | Rendering | 100 | 83 | 67 | -16 | -18.97 | faster
firefox | tracemonkey | 7 | Overall | 100 | 65 | 64 | 0 | -0.37 |
firefox | tracemonkey | 7 | Page Request | 100 | 1 | 1 | 0 | -11.94 |
firefox | tracemonkey | 7 | Rendering | 100 | 63 | 63 | 0 | -0.05 |
firefox | tracemonkey | 8 | Overall | 100 | 53 | 54 | 1 | 2.04 |
firefox | tracemonkey | 8 | Page Request | 100 | 1 | 1 | 0 | 17.02 |
firefox | tracemonkey | 8 | Rendering | 100 | 52 | 53 | 1 | 1.82 |
firefox | tracemonkey | 9 | Overall | 100 | 79 | 73 | -6 | -7.86 | faster
firefox | tracemonkey | 9 | Page Request | 100 | 2 | 2 | 0 | -15.14 |
firefox | tracemonkey | 9 | Rendering | 100 | 77 | 71 | -6 | -7.86 | faster
firefox | tracemonkey | 10 | Overall | 100 | 545 | 519 | -27 | -4.86 | faster
firefox | tracemonkey | 10 | Page Request | 100 | 14 | 13 | 0 | -3.56 |
firefox | tracemonkey | 10 | Rendering | 100 | 532 | 506 | -26 | -4.90 | faster
firefox | tracemonkey | 11 | Overall | 100 | 42 | 41 | -1 | -2.50 |
firefox | tracemonkey | 11 | Page Request | 100 | 1 | 1 | 0 | -27.42 | faster
firefox | tracemonkey | 11 | Rendering | 100 | 41 | 40 | -1 | -1.75 |
firefox | tracemonkey | 12 | Overall | 100 | 350 | 332 | -18 | -5.16 | faster
firefox | tracemonkey | 12 | Page Request | 100 | 3 | 3 | 0 | -5.17 |
firefox | tracemonkey | 12 | Rendering | 100 | 347 | 329 | -18 | -5.15 | faster
firefox | tracemonkey | 13 | Overall | 100 | 31 | 31 | 0 | 0.52 |
firefox | tracemonkey | 13 | Page Request | 100 | 1 | 1 | 0 | 4.95 |
firefox | tracemonkey | 13 | Rendering | 100 | 30 | 30 | 0 | 0.20 |
```
2020-06-13 21:12:40 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
args = cs.getRgb(args, 0);
|
|
|
|
|
fn = OPS.setFillRGBColor;
|
|
|
|
|
break;
|
|
|
|
|
case OPS.setStrokeColorN:
|
|
|
|
|
cs = stateManager.state.strokeColorSpace;
|
|
|
|
|
if (cs.name === "Pattern") {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
next(
|
2020-07-05 19:20:10 +09:00
|
|
|
|
self.handleColorN(
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
operatorList,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
OPS.setStrokeColorN,
|
|
|
|
|
args,
|
|
|
|
|
cs,
|
|
|
|
|
patterns,
|
|
|
|
|
resources,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
task,
|
2020-10-09 00:33:23 +09:00
|
|
|
|
localColorSpaceCache,
|
2021-07-22 04:27:39 +09:00
|
|
|
|
localTilingPatternCache,
|
|
|
|
|
localShadingPatternCache
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
)
|
|
|
|
|
);
|
2016-03-11 22:59:09 +09:00
|
|
|
|
return;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
args = cs.getRgb(args, 0);
|
|
|
|
|
fn = OPS.setStrokeRGBColor;
|
|
|
|
|
break;
|
|
|
|
|
|
2021-05-13 17:40:08 +09:00
|
|
|
|
case OPS.shadingFill:
|
|
|
|
|
var shadingRes = resources.get("Shading");
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (!shadingRes) {
|
|
|
|
|
throw new FormatError("No shading resource found");
|
|
|
|
|
}
|
|
|
|
|
|
2021-05-13 17:40:08 +09:00
|
|
|
|
var shading = shadingRes.get(args[0].name);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (!shading) {
|
|
|
|
|
throw new FormatError("No shading object found");
|
|
|
|
|
}
|
2021-07-22 04:27:39 +09:00
|
|
|
|
const patternId = self.parseShading({
|
2020-07-05 19:20:10 +09:00
|
|
|
|
shading,
|
|
|
|
|
resources,
|
2021-07-22 04:27:39 +09:00
|
|
|
|
localColorSpaceCache,
|
|
|
|
|
localShadingPatternCache,
|
|
|
|
|
});
|
|
|
|
|
args = [patternId];
|
2020-07-05 19:20:10 +09:00
|
|
|
|
fn = OPS.shadingFill;
|
|
|
|
|
break;
|
2021-05-13 17:40:08 +09:00
|
|
|
|
case OPS.setGState:
|
2021-07-15 04:38:19 +09:00
|
|
|
|
isValidName = args[0] instanceof Name;
|
2021-05-13 17:40:08 +09:00
|
|
|
|
name = args[0].name;
|
2021-07-15 04:38:19 +09:00
|
|
|
|
|
|
|
|
|
if (isValidName) {
|
2020-07-11 20:52:11 +09:00
|
|
|
|
const localGStateObj = localGStateCache.getByName(name);
|
|
|
|
|
if (localGStateObj) {
|
|
|
|
|
if (localGStateObj.length > 0) {
|
|
|
|
|
operatorList.addOp(OPS.setGState, [localGStateObj]);
|
|
|
|
|
}
|
|
|
|
|
args = null;
|
|
|
|
|
continue;
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
next(
|
2020-07-11 20:52:11 +09:00
|
|
|
|
new Promise(function (resolveGState, rejectGState) {
|
2021-07-15 04:38:19 +09:00
|
|
|
|
if (!isValidName) {
|
2020-07-11 20:52:11 +09:00
|
|
|
|
throw new FormatError("GState must be referred to by name.");
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
const extGState = resources.get("ExtGState");
|
|
|
|
|
if (!(extGState instanceof Dict)) {
|
|
|
|
|
throw new FormatError("ExtGState should be a dictionary.");
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
const gState = extGState.get(name);
|
|
|
|
|
// TODO: Attempt to lookup cached GStates by reference as well,
|
|
|
|
|
// if and only if there are PDF documents where doing so
|
|
|
|
|
// would significantly improve performance.
|
|
|
|
|
if (!(gState instanceof Dict)) {
|
|
|
|
|
throw new FormatError("GState should be a dictionary.");
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
self
|
|
|
|
|
.setGState({
|
|
|
|
|
resources,
|
|
|
|
|
gState,
|
|
|
|
|
operatorList,
|
|
|
|
|
cacheKey: name,
|
|
|
|
|
task,
|
|
|
|
|
stateManager,
|
|
|
|
|
localGStateCache,
|
|
|
|
|
localColorSpaceCache,
|
|
|
|
|
})
|
|
|
|
|
.then(resolveGState, rejectGState);
|
|
|
|
|
}).catch(function (reason) {
|
|
|
|
|
if (reason instanceof AbortException) {
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
if (self.options.ignoreErrors) {
|
|
|
|
|
// Error(s) in the ExtGState -- sending unsupported feature
|
|
|
|
|
// notification and allow parsing/rendering to continue.
|
|
|
|
|
self.handler.send("UnsupportedFeature", {
|
|
|
|
|
featureId: UNSUPPORTED_FEATURES.errorExtGState,
|
|
|
|
|
});
|
|
|
|
|
warn(`getOperatorList - ignoring ExtGState: "${reason}".`);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
throw reason;
|
|
|
|
|
})
|
2020-07-05 19:20:10 +09:00
|
|
|
|
);
|
|
|
|
|
return;
|
|
|
|
|
case OPS.moveTo:
|
|
|
|
|
case OPS.lineTo:
|
|
|
|
|
case OPS.curveTo:
|
|
|
|
|
case OPS.curveTo2:
|
|
|
|
|
case OPS.curveTo3:
|
|
|
|
|
case OPS.closePath:
|
|
|
|
|
case OPS.rectangle:
|
|
|
|
|
self.buildPath(operatorList, fn, args, parsingText);
|
|
|
|
|
continue;
|
|
|
|
|
case OPS.markPoint:
|
|
|
|
|
case OPS.markPointProps:
|
|
|
|
|
case OPS.beginCompat:
|
|
|
|
|
case OPS.endCompat:
|
|
|
|
|
// Ignore operators where the corresponding handlers are known to
|
|
|
|
|
// be no-op in CanvasGraphics (display/canvas.js). This prevents
|
|
|
|
|
// serialization errors and is also a bit more efficient.
|
|
|
|
|
// We could also try to serialize all objects in a general way,
|
|
|
|
|
// e.g. as done in https://github.com/mozilla/pdf.js/pull/6266,
|
|
|
|
|
// but doing so is meaningless without knowing the semantics.
|
|
|
|
|
continue;
|
2020-07-15 07:17:27 +09:00
|
|
|
|
case OPS.beginMarkedContentProps:
|
2022-02-21 20:45:00 +09:00
|
|
|
|
if (!(args[0] instanceof Name)) {
|
2020-07-15 07:17:27 +09:00
|
|
|
|
warn(`Expected name for beginMarkedContentProps arg0=${args[0]}`);
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
if (args[0].name === "OC") {
|
|
|
|
|
next(
|
|
|
|
|
self
|
|
|
|
|
.parseMarkedContentProps(args[1], resources)
|
|
|
|
|
.then(data => {
|
|
|
|
|
operatorList.addOp(OPS.beginMarkedContentProps, [
|
|
|
|
|
"OC",
|
|
|
|
|
data,
|
|
|
|
|
]);
|
|
|
|
|
})
|
|
|
|
|
.catch(reason => {
|
|
|
|
|
if (reason instanceof AbortException) {
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
if (self.options.ignoreErrors) {
|
|
|
|
|
self.handler.send("UnsupportedFeature", {
|
|
|
|
|
featureId: UNSUPPORTED_FEATURES.errorMarkedContent,
|
|
|
|
|
});
|
|
|
|
|
warn(
|
|
|
|
|
`getOperatorList - ignoring beginMarkedContentProps: "${reason}".`
|
|
|
|
|
);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
throw reason;
|
|
|
|
|
})
|
|
|
|
|
);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
// Other marked content types aren't supported yet.
|
2021-04-01 07:07:02 +09:00
|
|
|
|
args = [
|
|
|
|
|
args[0].name,
|
|
|
|
|
args[1] instanceof Dict ? args[1].get("MCID") : null,
|
|
|
|
|
];
|
2020-07-15 07:17:27 +09:00
|
|
|
|
|
|
|
|
|
break;
|
|
|
|
|
case OPS.beginMarkedContent:
|
|
|
|
|
case OPS.endMarkedContent:
|
2020-07-05 19:20:10 +09:00
|
|
|
|
default:
|
|
|
|
|
// Note: Ignore the operator if it has `Dict` arguments, since
|
|
|
|
|
// those are non-serializable, otherwise postMessage will throw
|
|
|
|
|
// "An object could not be cloned.".
|
|
|
|
|
if (args !== null) {
|
|
|
|
|
for (i = 0, ii = args.length; i < ii; i++) {
|
|
|
|
|
if (args[i] instanceof Dict) {
|
|
|
|
|
break;
|
2016-02-13 02:15:49 +09:00
|
|
|
|
}
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (i < ii) {
|
|
|
|
|
warn("getOperatorList - ignoring operator: " + fn);
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
}
|
[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815)
Other PDF readers, e.g. Adobe Reader and PDFium (in Chrome), will attempt to render as much of a page as possible even if there are errors present.
Currently we just bail as soon the first error is hit, which means that we'll usually not render anything in these cases and just display a blank page instead.
NOTE: This patch changes the default behaviour of the PDF.js API to always attempt to recover as much data as possible, even when encountering errors during e.g. `getOperatorList`/`getTextContent`, which thus improve our handling of corrupt PDF files and allow the default viewer to handle errors slightly more gracefully.
In the event that an API consumer wishes to use the old behaviour, where we stop parsing as soon as an error is encountered, the `stopAtErrors` parameter can be set at `getDocument`.
Fixes, inasmuch it's possible since the PDF files are corrupt, e.g. issue 6342, issue 3795, and [bug 1130815](https://bugzilla.mozilla.org/show_bug.cgi?id=1130815) (and probably others too).
2017-02-19 22:03:08 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
operatorList.addOp(fn, args);
|
|
|
|
|
}
|
|
|
|
|
if (stop) {
|
|
|
|
|
next(deferred);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
// Some PDFs don't close all restores inside object/form.
|
|
|
|
|
// Closing those for them.
|
|
|
|
|
closePendingRestoreOPS();
|
|
|
|
|
resolve();
|
|
|
|
|
}).catch(reason => {
|
|
|
|
|
if (reason instanceof AbortException) {
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
if (this.options.ignoreErrors) {
|
|
|
|
|
// Error(s) in the OperatorList -- sending unsupported feature
|
|
|
|
|
// notification and allow rendering to continue.
|
|
|
|
|
this.handler.send("UnsupportedFeature", {
|
|
|
|
|
featureId: UNSUPPORTED_FEATURES.errorOperatorList,
|
|
|
|
|
});
|
|
|
|
|
warn(
|
|
|
|
|
`getOperatorList - ignoring errors during "${task.name}" ` +
|
|
|
|
|
`task: "${reason}".`
|
|
|
|
|
);
|
2011-12-11 08:24:54 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
closePendingRestoreOPS();
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
throw reason;
|
|
|
|
|
});
|
|
|
|
|
}
|
2013-08-01 03:17:36 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
getTextContent({
|
|
|
|
|
stream,
|
|
|
|
|
task,
|
|
|
|
|
resources,
|
|
|
|
|
stateManager = null,
|
|
|
|
|
combineTextItems = false,
|
2021-04-01 07:07:02 +09:00
|
|
|
|
includeMarkedContent = false,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
sink,
|
2021-04-02 01:08:20 +09:00
|
|
|
|
seenStyles = new Set(),
|
2022-02-14 03:39:40 +09:00
|
|
|
|
viewBox,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}) {
|
|
|
|
|
// Ensure that `resources`/`stateManager` is correctly initialized,
|
|
|
|
|
// even if the provided parameter is e.g. `null`.
|
|
|
|
|
resources = resources || Dict.empty;
|
|
|
|
|
stateManager = stateManager || new StateManager(new TextState());
|
2011-12-11 08:24:54 +09:00
|
|
|
|
|
2021-05-24 02:03:53 +09:00
|
|
|
|
const NormalizedUnicodes = getNormalizedUnicodes();
|
2013-04-09 07:14:56 +09:00
|
|
|
|
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const textContent = {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
items: [],
|
|
|
|
|
styles: Object.create(null),
|
|
|
|
|
};
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const textContentItem = {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
initialized: false,
|
|
|
|
|
str: [],
|
2021-04-30 21:41:13 +09:00
|
|
|
|
totalWidth: 0,
|
|
|
|
|
totalHeight: 0,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
width: 0,
|
|
|
|
|
height: 0,
|
|
|
|
|
vertical: false,
|
2021-04-30 21:41:13 +09:00
|
|
|
|
prevTransform: null,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
textAdvanceScale: 0,
|
2021-04-30 21:41:13 +09:00
|
|
|
|
spaceInFlowMin: 0,
|
|
|
|
|
spaceInFlowMax: 0,
|
|
|
|
|
trackingSpaceMin: Infinity,
|
2021-05-24 02:03:53 +09:00
|
|
|
|
negativeSpaceMax: -Infinity,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
transform: null,
|
|
|
|
|
fontName: null,
|
2021-04-30 21:41:13 +09:00
|
|
|
|
hasEOL: false,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
};
|
2021-04-30 21:41:13 +09:00
|
|
|
|
|
|
|
|
|
// Used in addFakeSpaces.
|
|
|
|
|
|
2021-05-24 02:03:53 +09:00
|
|
|
|
// A white <= fontSize * TRACKING_SPACE_FACTOR is a tracking space
|
2021-04-30 21:41:13 +09:00
|
|
|
|
// so it doesn't count as a space.
|
2021-05-24 02:03:53 +09:00
|
|
|
|
const TRACKING_SPACE_FACTOR = 0.1;
|
2021-04-30 21:41:13 +09:00
|
|
|
|
|
2021-05-24 02:03:53 +09:00
|
|
|
|
// A negative white < fontSize * NEGATIVE_SPACE_FACTOR induces
|
|
|
|
|
// a break (a new chunk of text is created).
|
|
|
|
|
// It doesn't change anything when the text is copied but
|
|
|
|
|
// it improves potential mismatch between text layer and canvas.
|
|
|
|
|
const NEGATIVE_SPACE_FACTOR = -0.2;
|
|
|
|
|
|
|
|
|
|
// A white with a width in [fontSize * MIN_FACTOR; fontSize * MAX_FACTOR]
|
2021-04-30 21:41:13 +09:00
|
|
|
|
// is a space which will be inserted in the current flow of words.
|
|
|
|
|
// If the width is outside of this range then the flow is broken
|
|
|
|
|
// (which means a new span in the text layer).
|
|
|
|
|
// It's useful to adjust the best as possible the span in the layer
|
|
|
|
|
// to what is displayed in the canvas.
|
2021-05-24 02:03:53 +09:00
|
|
|
|
const SPACE_IN_FLOW_MIN_FACTOR = 0.1;
|
|
|
|
|
const SPACE_IN_FLOW_MAX_FACTOR = 0.6;
|
2014-04-10 08:44:07 +09:00
|
|
|
|
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const self = this;
|
|
|
|
|
const xref = this.xref;
|
2021-04-30 21:41:13 +09:00
|
|
|
|
const showSpacedTextBuffer = [];
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// The xobj is parsed iff it's needed, e.g. if there is a `DO` cmd.
|
2021-05-06 16:39:21 +09:00
|
|
|
|
let xobjs = null;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
const emptyXObjectCache = new LocalImageCache();
|
2020-07-11 21:05:53 +09:00
|
|
|
|
const emptyGStateCache = new LocalGStateCache();
|
2015-11-04 01:12:41 +09:00
|
|
|
|
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const preprocessor = new EvaluatorPreprocessor(stream, xref, stateManager);
|
2015-11-04 01:12:41 +09:00
|
|
|
|
|
2021-05-06 16:39:21 +09:00
|
|
|
|
let textState;
|
2015-11-06 23:40:44 +09:00
|
|
|
|
|
2021-04-30 21:41:13 +09:00
|
|
|
|
function getCurrentTextTransform() {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// 9.4.4 Text Space Details
|
2021-04-30 21:41:13 +09:00
|
|
|
|
const font = textState.font;
|
|
|
|
|
const tsm = [
|
2020-07-05 19:20:10 +09:00
|
|
|
|
textState.fontSize * textState.textHScale,
|
|
|
|
|
0,
|
|
|
|
|
0,
|
|
|
|
|
textState.fontSize,
|
|
|
|
|
0,
|
|
|
|
|
textState.textRise,
|
|
|
|
|
];
|
2014-04-10 08:44:07 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (
|
|
|
|
|
font.isType3Font &&
|
2021-05-30 01:06:49 +09:00
|
|
|
|
(textState.fontSize <= 1 || font.isCharBBox) &&
|
2020-07-05 19:20:10 +09:00
|
|
|
|
!isArrayEqual(textState.fontMatrix, FONT_IDENTITY_MATRIX)
|
|
|
|
|
) {
|
|
|
|
|
const glyphHeight = font.bbox[3] - font.bbox[1];
|
|
|
|
|
if (glyphHeight > 0) {
|
|
|
|
|
tsm[3] *= glyphHeight * textState.fontMatrix[3];
|
2015-11-24 00:57:43 +09:00
|
|
|
|
}
|
2014-04-10 08:44:07 +09:00
|
|
|
|
}
|
|
|
|
|
|
2021-04-30 21:41:13 +09:00
|
|
|
|
return Util.transform(
|
2020-07-05 19:20:10 +09:00
|
|
|
|
textState.ctm,
|
|
|
|
|
Util.transform(textState.textMatrix, tsm)
|
|
|
|
|
);
|
2021-04-30 21:41:13 +09:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
function ensureTextContentItem() {
|
|
|
|
|
if (textContentItem.initialized) {
|
|
|
|
|
return textContentItem;
|
|
|
|
|
}
|
|
|
|
|
const font = textState.font,
|
|
|
|
|
loadedName = font.loadedName;
|
|
|
|
|
if (!seenStyles.has(loadedName)) {
|
|
|
|
|
seenStyles.add(loadedName);
|
|
|
|
|
|
|
|
|
|
textContent.styles[loadedName] = {
|
|
|
|
|
fontFamily: font.fallbackName,
|
|
|
|
|
ascent: font.ascent,
|
|
|
|
|
descent: font.descent,
|
|
|
|
|
vertical: font.vertical,
|
|
|
|
|
};
|
|
|
|
|
}
|
|
|
|
|
textContentItem.fontName = loadedName;
|
|
|
|
|
|
|
|
|
|
const trm = (textContentItem.transform = getCurrentTextTransform());
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (!font.vertical) {
|
2021-04-30 21:41:13 +09:00
|
|
|
|
textContentItem.width = textContentItem.totalWidth = 0;
|
|
|
|
|
textContentItem.height = textContentItem.totalHeight = Math.hypot(
|
|
|
|
|
trm[2],
|
|
|
|
|
trm[3]
|
|
|
|
|
);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
textContentItem.vertical = false;
|
|
|
|
|
} else {
|
2021-04-30 21:41:13 +09:00
|
|
|
|
textContentItem.width = textContentItem.totalWidth = Math.hypot(
|
|
|
|
|
trm[0],
|
|
|
|
|
trm[1]
|
|
|
|
|
);
|
|
|
|
|
textContentItem.height = textContentItem.totalHeight = 0;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
textContentItem.vertical = true;
|
|
|
|
|
}
|
|
|
|
|
|
2021-02-10 20:28:49 +09:00
|
|
|
|
const scaleLineX = Math.hypot(
|
|
|
|
|
textState.textLineMatrix[0],
|
|
|
|
|
textState.textLineMatrix[1]
|
|
|
|
|
);
|
|
|
|
|
const scaleCtmX = Math.hypot(textState.ctm[0], textState.ctm[1]);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
textContentItem.textAdvanceScale = scaleCtmX * scaleLineX;
|
2021-05-24 02:03:53 +09:00
|
|
|
|
|
|
|
|
|
textContentItem.trackingSpaceMin =
|
|
|
|
|
textState.fontSize * TRACKING_SPACE_FACTOR;
|
|
|
|
|
textContentItem.negativeSpaceMax =
|
|
|
|
|
textState.fontSize * NEGATIVE_SPACE_FACTOR;
|
|
|
|
|
textContentItem.spaceInFlowMin =
|
|
|
|
|
textState.fontSize * SPACE_IN_FLOW_MIN_FACTOR;
|
|
|
|
|
textContentItem.spaceInFlowMax =
|
|
|
|
|
textState.fontSize * SPACE_IN_FLOW_MAX_FACTOR;
|
2014-04-10 08:44:07 +09:00
|
|
|
|
|
2021-04-30 21:41:13 +09:00
|
|
|
|
textContentItem.hasEOL = false;
|
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
textContentItem.initialized = true;
|
|
|
|
|
return textContentItem;
|
|
|
|
|
}
|
2014-04-10 08:44:07 +09:00
|
|
|
|
|
2021-04-30 21:41:13 +09:00
|
|
|
|
function updateAdvanceScale() {
|
|
|
|
|
if (!textContentItem.initialized) {
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
const scaleLineX = Math.hypot(
|
|
|
|
|
textState.textLineMatrix[0],
|
|
|
|
|
textState.textLineMatrix[1]
|
|
|
|
|
);
|
|
|
|
|
const scaleCtmX = Math.hypot(textState.ctm[0], textState.ctm[1]);
|
|
|
|
|
const scaleFactor = scaleCtmX * scaleLineX;
|
|
|
|
|
if (scaleFactor === textContentItem.textAdvanceScale) {
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (!textContentItem.vertical) {
|
|
|
|
|
textContentItem.totalWidth +=
|
|
|
|
|
textContentItem.width * textContentItem.textAdvanceScale;
|
|
|
|
|
textContentItem.width = 0;
|
|
|
|
|
} else {
|
|
|
|
|
textContentItem.totalHeight +=
|
|
|
|
|
textContentItem.height * textContentItem.textAdvanceScale;
|
|
|
|
|
textContentItem.height = 0;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
textContentItem.textAdvanceScale = scaleFactor;
|
|
|
|
|
}
|
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
function runBidiTransform(textChunk) {
|
2021-04-30 21:41:13 +09:00
|
|
|
|
const text = textChunk.str.join("");
|
|
|
|
|
const bidiResult = bidi(text, -1, textChunk.vertical);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return {
|
2022-01-16 06:32:10 +09:00
|
|
|
|
str: bidiResult.str,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
dir: bidiResult.dir,
|
2022-01-05 05:36:25 +09:00
|
|
|
|
width: Math.abs(textChunk.totalWidth),
|
|
|
|
|
height: Math.abs(textChunk.totalHeight),
|
2020-07-05 19:20:10 +09:00
|
|
|
|
transform: textChunk.transform,
|
|
|
|
|
fontName: textChunk.fontName,
|
2021-04-30 21:41:13 +09:00
|
|
|
|
hasEOL: textChunk.hasEOL,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
};
|
|
|
|
|
}
|
2014-04-10 08:44:07 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
function handleSetFont(fontName, fontRef) {
|
|
|
|
|
return self
|
|
|
|
|
.loadFont(fontName, fontRef, resources)
|
2021-05-30 03:01:52 +09:00
|
|
|
|
.then(function (translated) {
|
|
|
|
|
if (!translated.font.isType3Font) {
|
|
|
|
|
return translated;
|
|
|
|
|
}
|
|
|
|
|
return translated
|
|
|
|
|
.loadType3Data(self, resources, task)
|
|
|
|
|
.catch(function () {
|
|
|
|
|
// Ignore Type3-parsing errors, since we only use `loadType3Data`
|
|
|
|
|
// here to ensure that we'll always obtain a useful /FontBBox.
|
|
|
|
|
})
|
|
|
|
|
.then(function () {
|
|
|
|
|
return translated;
|
|
|
|
|
});
|
|
|
|
|
})
|
2020-07-05 19:20:10 +09:00
|
|
|
|
.then(function (translated) {
|
|
|
|
|
textState.font = translated.font;
|
|
|
|
|
textState.fontMatrix =
|
|
|
|
|
translated.font.fontMatrix || FONT_IDENTITY_MATRIX;
|
|
|
|
|
});
|
|
|
|
|
}
|
2014-04-10 08:44:07 +09:00
|
|
|
|
|
2022-01-26 23:35:46 +09:00
|
|
|
|
function applyInverseRotation(x, y, matrix) {
|
|
|
|
|
const scale = Math.hypot(matrix[0], matrix[1]);
|
|
|
|
|
return [
|
|
|
|
|
(matrix[0] * x + matrix[1] * y) / scale,
|
|
|
|
|
(matrix[2] * x + matrix[3] * y) / scale,
|
|
|
|
|
];
|
|
|
|
|
}
|
|
|
|
|
|
2021-05-24 02:03:53 +09:00
|
|
|
|
function compareWithLastPosition() {
|
2022-02-14 03:39:40 +09:00
|
|
|
|
const currentTransform = getCurrentTextTransform();
|
|
|
|
|
let posX = currentTransform[4];
|
|
|
|
|
let posY = currentTransform[5];
|
|
|
|
|
|
|
|
|
|
const shiftedX = posX - viewBox[0];
|
|
|
|
|
const shiftedY = posY - viewBox[1];
|
|
|
|
|
|
|
|
|
|
if (
|
|
|
|
|
shiftedX < 0 ||
|
|
|
|
|
shiftedX > viewBox[2] ||
|
|
|
|
|
shiftedY < 0 ||
|
|
|
|
|
shiftedY > viewBox[3]
|
|
|
|
|
) {
|
|
|
|
|
return false;
|
|
|
|
|
}
|
|
|
|
|
|
2021-04-30 21:41:13 +09:00
|
|
|
|
if (
|
|
|
|
|
!combineTextItems ||
|
|
|
|
|
!textState.font ||
|
|
|
|
|
!textContentItem.prevTransform
|
|
|
|
|
) {
|
2022-02-14 03:39:40 +09:00
|
|
|
|
return true;
|
2021-04-30 21:41:13 +09:00
|
|
|
|
}
|
|
|
|
|
|
2021-05-24 02:03:53 +09:00
|
|
|
|
let lastPosX = textContentItem.prevTransform[4];
|
|
|
|
|
let lastPosY = textContentItem.prevTransform[5];
|
2021-04-30 21:41:13 +09:00
|
|
|
|
|
|
|
|
|
if (lastPosX === posX && lastPosY === posY) {
|
2022-02-14 03:39:40 +09:00
|
|
|
|
return true;
|
2021-04-30 21:41:13 +09:00
|
|
|
|
}
|
|
|
|
|
|
2022-01-26 23:35:46 +09:00
|
|
|
|
let rotate = -1;
|
2021-05-24 02:03:53 +09:00
|
|
|
|
// Take into account the rotation is the current transform.
|
|
|
|
|
if (
|
|
|
|
|
currentTransform[0] &&
|
|
|
|
|
currentTransform[1] === 0 &&
|
|
|
|
|
currentTransform[2] === 0
|
|
|
|
|
) {
|
|
|
|
|
rotate = currentTransform[0] > 0 ? 0 : 180;
|
|
|
|
|
} else if (
|
|
|
|
|
currentTransform[1] &&
|
|
|
|
|
currentTransform[0] === 0 &&
|
|
|
|
|
currentTransform[3] === 0
|
|
|
|
|
) {
|
2022-01-26 23:35:46 +09:00
|
|
|
|
rotate = currentTransform[1] > 0 ? 90 : 270;
|
2021-05-24 02:03:53 +09:00
|
|
|
|
}
|
|
|
|
|
|
2022-01-26 23:35:46 +09:00
|
|
|
|
switch (rotate) {
|
|
|
|
|
case 0:
|
|
|
|
|
break;
|
|
|
|
|
case 90:
|
|
|
|
|
[posX, posY] = [posY, posX];
|
|
|
|
|
[lastPosX, lastPosY] = [lastPosY, lastPosX];
|
|
|
|
|
break;
|
|
|
|
|
case 180:
|
|
|
|
|
[posX, posY, lastPosX, lastPosY] = [
|
|
|
|
|
-posX,
|
|
|
|
|
-posY,
|
|
|
|
|
-lastPosX,
|
|
|
|
|
-lastPosY,
|
|
|
|
|
];
|
|
|
|
|
break;
|
|
|
|
|
case 270:
|
|
|
|
|
[posX, posY] = [-posY, -posX];
|
|
|
|
|
[lastPosX, lastPosY] = [-lastPosY, -lastPosX];
|
|
|
|
|
break;
|
|
|
|
|
default:
|
|
|
|
|
// This is not a 0, 90, 180, 270 rotation so:
|
|
|
|
|
// - remove the scale factor from the matrix to get a rotation matrix
|
|
|
|
|
// - apply the inverse (which is the transposed) to the positions
|
|
|
|
|
// and we can then compare positions of the glyphes to detect
|
|
|
|
|
// a whitespace.
|
|
|
|
|
[posX, posY] = applyInverseRotation(posX, posY, currentTransform);
|
|
|
|
|
[lastPosX, lastPosY] = applyInverseRotation(
|
|
|
|
|
lastPosX,
|
|
|
|
|
lastPosY,
|
|
|
|
|
textContentItem.prevTransform
|
|
|
|
|
);
|
2021-05-24 02:03:53 +09:00
|
|
|
|
}
|
2021-04-30 21:41:13 +09:00
|
|
|
|
|
|
|
|
|
if (textState.font.vertical) {
|
2021-05-24 02:03:53 +09:00
|
|
|
|
const advanceY = (lastPosY - posY) / textContentItem.textAdvanceScale;
|
|
|
|
|
const advanceX = posX - lastPosX;
|
2022-01-05 05:36:25 +09:00
|
|
|
|
|
|
|
|
|
// When the total height of the current chunk is negative
|
|
|
|
|
// then we're writing from bottom to top.
|
|
|
|
|
const textOrientation = Math.sign(textContentItem.height);
|
|
|
|
|
if (advanceY < textOrientation * textContentItem.negativeSpaceMax) {
|
2021-05-24 02:03:53 +09:00
|
|
|
|
if (
|
|
|
|
|
Math.abs(advanceX) >
|
|
|
|
|
0.5 * textContentItem.width /* not the same column */
|
|
|
|
|
) {
|
|
|
|
|
appendEOL();
|
2022-02-14 03:39:40 +09:00
|
|
|
|
return true;
|
2021-05-24 02:03:53 +09:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
flushTextContentItem();
|
2022-02-14 03:39:40 +09:00
|
|
|
|
return true;
|
2021-04-30 21:41:13 +09:00
|
|
|
|
}
|
|
|
|
|
|
2022-01-07 22:02:28 +09:00
|
|
|
|
if (Math.abs(advanceX) > textContentItem.width) {
|
2021-05-24 02:03:53 +09:00
|
|
|
|
appendEOL();
|
2022-02-14 03:39:40 +09:00
|
|
|
|
return true;
|
2021-04-30 21:41:13 +09:00
|
|
|
|
}
|
2022-01-05 05:36:25 +09:00
|
|
|
|
if (advanceY <= textOrientation * textContentItem.trackingSpaceMin) {
|
2021-04-30 21:41:13 +09:00
|
|
|
|
textContentItem.height += advanceY;
|
2022-01-05 05:36:25 +09:00
|
|
|
|
} else if (
|
|
|
|
|
!addFakeSpaces(
|
|
|
|
|
advanceY,
|
|
|
|
|
textContentItem.prevTransform,
|
|
|
|
|
textOrientation
|
|
|
|
|
)
|
|
|
|
|
) {
|
2021-04-30 21:41:13 +09:00
|
|
|
|
if (textContentItem.str.length === 0) {
|
|
|
|
|
textContent.items.push({
|
|
|
|
|
str: " ",
|
|
|
|
|
dir: "ltr",
|
|
|
|
|
width: 0,
|
2022-01-05 05:36:25 +09:00
|
|
|
|
height: Math.abs(advanceY),
|
2021-04-30 21:41:13 +09:00
|
|
|
|
transform: textContentItem.prevTransform,
|
|
|
|
|
fontName: textContentItem.fontName,
|
|
|
|
|
hasEOL: false,
|
|
|
|
|
});
|
|
|
|
|
} else {
|
|
|
|
|
textContentItem.height += advanceY;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2022-02-14 03:39:40 +09:00
|
|
|
|
return true;
|
2021-04-30 21:41:13 +09:00
|
|
|
|
}
|
|
|
|
|
|
2021-05-24 02:03:53 +09:00
|
|
|
|
const advanceX = (posX - lastPosX) / textContentItem.textAdvanceScale;
|
|
|
|
|
const advanceY = posY - lastPosY;
|
2022-01-05 05:36:25 +09:00
|
|
|
|
|
|
|
|
|
// When the total width of the current chunk is negative
|
|
|
|
|
// then we're writing from right to left.
|
|
|
|
|
const textOrientation = Math.sign(textContentItem.width);
|
|
|
|
|
if (advanceX < textOrientation * textContentItem.negativeSpaceMax) {
|
2021-05-24 02:03:53 +09:00
|
|
|
|
if (
|
|
|
|
|
Math.abs(advanceY) >
|
|
|
|
|
0.5 * textContentItem.height /* not the same line */
|
|
|
|
|
) {
|
|
|
|
|
appendEOL();
|
2022-02-14 03:39:40 +09:00
|
|
|
|
return true;
|
2021-05-24 02:03:53 +09:00
|
|
|
|
}
|
|
|
|
|
flushTextContentItem();
|
2022-02-14 03:39:40 +09:00
|
|
|
|
return true;
|
2021-04-30 21:41:13 +09:00
|
|
|
|
}
|
|
|
|
|
|
2021-05-24 02:03:53 +09:00
|
|
|
|
if (Math.abs(advanceY) > textContentItem.height) {
|
|
|
|
|
appendEOL();
|
2022-02-14 03:39:40 +09:00
|
|
|
|
return true;
|
2021-04-30 21:41:13 +09:00
|
|
|
|
}
|
|
|
|
|
|
2022-01-05 05:36:25 +09:00
|
|
|
|
if (advanceX <= textOrientation * textContentItem.trackingSpaceMin) {
|
2021-04-30 21:41:13 +09:00
|
|
|
|
textContentItem.width += advanceX;
|
2022-01-05 05:36:25 +09:00
|
|
|
|
} else if (
|
|
|
|
|
!addFakeSpaces(advanceX, textContentItem.prevTransform, textOrientation)
|
|
|
|
|
) {
|
2021-04-30 21:41:13 +09:00
|
|
|
|
if (textContentItem.str.length === 0) {
|
|
|
|
|
textContent.items.push({
|
|
|
|
|
str: " ",
|
|
|
|
|
dir: "ltr",
|
2022-01-05 05:36:25 +09:00
|
|
|
|
width: Math.abs(advanceX),
|
2021-04-30 21:41:13 +09:00
|
|
|
|
height: 0,
|
|
|
|
|
transform: textContentItem.prevTransform,
|
|
|
|
|
fontName: textContentItem.fontName,
|
|
|
|
|
hasEOL: false,
|
|
|
|
|
});
|
2014-04-10 08:44:07 +09:00
|
|
|
|
} else {
|
2021-04-30 21:41:13 +09:00
|
|
|
|
textContentItem.width += advanceX;
|
2014-04-10 08:44:07 +09:00
|
|
|
|
}
|
2021-04-30 21:41:13 +09:00
|
|
|
|
}
|
2022-02-14 03:39:40 +09:00
|
|
|
|
|
|
|
|
|
return true;
|
2021-04-30 21:41:13 +09:00
|
|
|
|
}
|
2015-11-06 23:40:44 +09:00
|
|
|
|
|
2021-05-24 02:03:53 +09:00
|
|
|
|
function buildTextContentItem({ chars, extraSpacing }) {
|
2021-04-30 21:41:13 +09:00
|
|
|
|
const font = textState.font;
|
|
|
|
|
if (!chars) {
|
|
|
|
|
// Just move according to the space we have.
|
|
|
|
|
const charSpacing = textState.charSpacing + extraSpacing;
|
|
|
|
|
if (charSpacing) {
|
|
|
|
|
if (!font.vertical) {
|
|
|
|
|
textState.translateTextMatrix(
|
|
|
|
|
charSpacing * textState.textHScale,
|
|
|
|
|
0
|
|
|
|
|
);
|
|
|
|
|
} else {
|
2021-05-24 02:03:53 +09:00
|
|
|
|
textState.translateTextMatrix(0, -charSpacing);
|
2021-04-30 21:41:13 +09:00
|
|
|
|
}
|
2015-11-02 23:54:15 +09:00
|
|
|
|
}
|
|
|
|
|
|
2021-04-30 21:41:13 +09:00
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
const glyphs = font.charsToGlyphs(chars);
|
|
|
|
|
const scale = textState.fontMatrix[0] * textState.fontSize;
|
2022-01-08 05:20:53 +09:00
|
|
|
|
|
2021-04-30 21:41:13 +09:00
|
|
|
|
for (let i = 0, ii = glyphs.length; i < ii; i++) {
|
|
|
|
|
const glyph = glyphs[i];
|
2022-01-24 07:04:18 +09:00
|
|
|
|
if (glyph.isInvisibleFormatMark) {
|
|
|
|
|
continue;
|
|
|
|
|
}
|
2021-04-30 21:41:13 +09:00
|
|
|
|
let charSpacing =
|
2021-05-24 02:03:53 +09:00
|
|
|
|
textState.charSpacing + (i + 1 === ii ? extraSpacing : 0);
|
|
|
|
|
|
|
|
|
|
let glyphWidth = glyph.width;
|
|
|
|
|
if (font.vertical) {
|
|
|
|
|
glyphWidth = glyph.vmetric ? glyph.vmetric[0] : -glyphWidth;
|
|
|
|
|
}
|
|
|
|
|
let scaledDim = glyphWidth * scale;
|
2021-04-30 21:41:13 +09:00
|
|
|
|
|
2021-05-24 02:03:53 +09:00
|
|
|
|
if (
|
2022-01-08 05:20:53 +09:00
|
|
|
|
glyph.isWhitespace &&
|
2021-05-24 02:03:53 +09:00
|
|
|
|
(i === 0 ||
|
|
|
|
|
i + 1 === ii ||
|
2022-01-08 05:20:53 +09:00
|
|
|
|
glyphs[i - 1].isWhitespace ||
|
|
|
|
|
glyphs[i + 1].isWhitespace ||
|
2021-11-13 02:04:17 +09:00
|
|
|
|
extraSpacing)
|
2021-05-24 02:03:53 +09:00
|
|
|
|
) {
|
|
|
|
|
// Don't push a " " in the textContentItem
|
|
|
|
|
// (except when it's between two non-spaces chars),
|
|
|
|
|
// it will be done (if required) in next call to
|
|
|
|
|
// compareWithLastPosition.
|
|
|
|
|
// This way we can merge real spaces and spaces due to cursor moves.
|
|
|
|
|
if (!font.vertical) {
|
|
|
|
|
charSpacing += scaledDim + textState.wordSpacing;
|
|
|
|
|
textState.translateTextMatrix(
|
|
|
|
|
charSpacing * textState.textHScale,
|
|
|
|
|
0
|
|
|
|
|
);
|
|
|
|
|
} else {
|
|
|
|
|
charSpacing += -scaledDim + textState.wordSpacing;
|
|
|
|
|
textState.translateTextMatrix(0, -charSpacing);
|
|
|
|
|
}
|
|
|
|
|
continue;
|
2015-11-04 01:12:41 +09:00
|
|
|
|
}
|
2021-04-30 21:41:13 +09:00
|
|
|
|
|
2022-02-14 03:39:40 +09:00
|
|
|
|
if (!compareWithLastPosition()) {
|
|
|
|
|
// The glyph is not in page so just skip it.
|
|
|
|
|
continue;
|
|
|
|
|
}
|
2021-05-24 02:03:53 +09:00
|
|
|
|
|
|
|
|
|
// Must be called after compareWithLastPosition because
|
|
|
|
|
// the textContentItem could have been flushed.
|
|
|
|
|
const textChunk = ensureTextContentItem();
|
2022-01-24 07:04:18 +09:00
|
|
|
|
if (glyph.isZeroWidthDiacritic) {
|
2021-05-24 02:03:53 +09:00
|
|
|
|
scaledDim = 0;
|
|
|
|
|
}
|
2016-12-07 07:07:16 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (!font.vertical) {
|
2021-04-30 21:41:13 +09:00
|
|
|
|
scaledDim *= textState.textHScale;
|
|
|
|
|
textState.translateTextMatrix(scaledDim, 0);
|
2021-05-24 02:03:53 +09:00
|
|
|
|
textChunk.width += scaledDim;
|
2019-01-29 22:24:48 +09:00
|
|
|
|
} else {
|
2021-04-30 21:41:13 +09:00
|
|
|
|
textState.translateTextMatrix(0, scaledDim);
|
|
|
|
|
scaledDim = Math.abs(scaledDim);
|
2021-05-24 02:03:53 +09:00
|
|
|
|
textChunk.height += scaledDim;
|
2019-01-29 22:24:48 +09:00
|
|
|
|
}
|
2015-11-04 01:12:41 +09:00
|
|
|
|
|
2021-05-24 02:03:53 +09:00
|
|
|
|
if (scaledDim) {
|
|
|
|
|
// Save the position of the last visible character.
|
|
|
|
|
textChunk.prevTransform = getCurrentTextTransform();
|
|
|
|
|
}
|
2021-04-30 21:41:13 +09:00
|
|
|
|
|
2022-02-01 01:48:35 +09:00
|
|
|
|
if (glyph.isWhitespace) {
|
2022-01-16 06:32:10 +09:00
|
|
|
|
// Replaces all whitespaces with standard spaces (0x20), to avoid
|
|
|
|
|
// alignment issues between the textLayer and the canvas if the text
|
|
|
|
|
// contains e.g. tabs (fixes issue6612.pdf).
|
|
|
|
|
textChunk.str.push(" ");
|
|
|
|
|
} else {
|
|
|
|
|
let glyphUnicode = glyph.unicode;
|
|
|
|
|
glyphUnicode = NormalizedUnicodes[glyphUnicode] || glyphUnicode;
|
|
|
|
|
glyphUnicode = reverseIfRtl(glyphUnicode);
|
|
|
|
|
textChunk.str.push(glyphUnicode);
|
|
|
|
|
}
|
2021-04-30 21:41:13 +09:00
|
|
|
|
|
2021-05-24 02:03:53 +09:00
|
|
|
|
if (charSpacing) {
|
|
|
|
|
if (!font.vertical) {
|
|
|
|
|
textState.translateTextMatrix(
|
|
|
|
|
charSpacing * textState.textHScale,
|
|
|
|
|
0
|
|
|
|
|
);
|
2021-04-30 21:41:13 +09:00
|
|
|
|
} else {
|
2021-05-24 02:03:53 +09:00
|
|
|
|
textState.translateTextMatrix(0, -charSpacing);
|
2021-04-30 21:41:13 +09:00
|
|
|
|
}
|
|
|
|
|
}
|
2017-04-17 21:46:53 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2014-05-10 10:41:03 +09:00
|
|
|
|
|
2021-04-30 21:41:13 +09:00
|
|
|
|
function appendEOL() {
|
|
|
|
|
if (textContentItem.initialized) {
|
|
|
|
|
textContentItem.hasEOL = true;
|
|
|
|
|
flushTextContentItem();
|
|
|
|
|
} else {
|
|
|
|
|
textContent.items.push({
|
|
|
|
|
str: "",
|
|
|
|
|
dir: "ltr",
|
|
|
|
|
width: 0,
|
|
|
|
|
height: 0,
|
|
|
|
|
transform: getCurrentTextTransform(),
|
|
|
|
|
fontName: textState.font.loadedName,
|
|
|
|
|
hasEOL: true,
|
|
|
|
|
});
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2021-04-30 21:41:13 +09:00
|
|
|
|
}
|
|
|
|
|
|
2022-01-05 05:36:25 +09:00
|
|
|
|
function addFakeSpaces(width, transf, textOrientation) {
|
2021-04-30 21:41:13 +09:00
|
|
|
|
if (
|
2022-01-05 05:36:25 +09:00
|
|
|
|
textOrientation * textContentItem.spaceInFlowMin <= width &&
|
|
|
|
|
width <= textOrientation * textContentItem.spaceInFlowMax
|
2021-04-30 21:41:13 +09:00
|
|
|
|
) {
|
|
|
|
|
if (textContentItem.initialized) {
|
|
|
|
|
textContentItem.str.push(" ");
|
|
|
|
|
}
|
|
|
|
|
return false;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2021-04-30 21:41:13 +09:00
|
|
|
|
|
|
|
|
|
const fontName = textContentItem.fontName;
|
|
|
|
|
|
|
|
|
|
let height = 0;
|
2021-05-24 02:03:53 +09:00
|
|
|
|
if (textContentItem.vertical) {
|
2021-04-30 21:41:13 +09:00
|
|
|
|
height = width;
|
|
|
|
|
width = 0;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
flushTextContentItem();
|
|
|
|
|
textContent.items.push({
|
|
|
|
|
str: " ",
|
|
|
|
|
// TODO: check if using the orientation from last chunk is
|
|
|
|
|
// better or not.
|
|
|
|
|
dir: "ltr",
|
2022-01-05 05:36:25 +09:00
|
|
|
|
width: Math.abs(width),
|
|
|
|
|
height: Math.abs(height),
|
2021-05-24 02:03:53 +09:00
|
|
|
|
transform: transf || getCurrentTextTransform(),
|
2021-04-30 21:41:13 +09:00
|
|
|
|
fontName,
|
|
|
|
|
hasEOL: false,
|
|
|
|
|
});
|
|
|
|
|
|
|
|
|
|
return true;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2016-06-01 06:01:35 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
function flushTextContentItem() {
|
2021-04-30 21:41:13 +09:00
|
|
|
|
if (!textContentItem.initialized || !textContentItem.str) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Do final text scaling.
|
|
|
|
|
if (!textContentItem.vertical) {
|
2021-04-30 21:41:13 +09:00
|
|
|
|
textContentItem.totalWidth +=
|
|
|
|
|
textContentItem.width * textContentItem.textAdvanceScale;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
} else {
|
2021-04-30 21:41:13 +09:00
|
|
|
|
textContentItem.totalHeight +=
|
|
|
|
|
textContentItem.height * textContentItem.textAdvanceScale;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
|
2021-04-30 21:41:13 +09:00
|
|
|
|
textContent.items.push(runBidiTransform(textContentItem));
|
2020-07-05 19:20:10 +09:00
|
|
|
|
textContentItem.initialized = false;
|
|
|
|
|
textContentItem.str.length = 0;
|
|
|
|
|
}
|
|
|
|
|
|
[api-minor] Reduce `postMessage` overhead, in `PartialEvaluator.getTextContent`, by sending text chunks in batches (issue 13962)
Following the STR in the issue, this patch reduces the number of `PartialEvaluator.getTextContent`-related `postMessage`-calls by approximately 78 percent.[1]
Note that by enforcing a relatively low value when batching text chunks, we should thus improve worst-case scenarios while not negatively affect all `textLayer` building.
While working on these changes I noticed, thanks to our unit-tests, that the implementation of the `appendEOL` function unfortunately means that the number and content of the textItems could actually be affected by the particular chunking used.
That seems *extremely* unfortunate, since in practice this means that the particular chunking used is thus observable through the API. Obviously that should be a completely internal implementation detail, which is why this patch also modifies `appendEOL` to mitigate that.[2]
Given that this patch adds a *minimum* batch size in `enqueueChunk`, there's obviously nothing preventing it from becoming a lot larger then the limit (depending e.g. on the PDF structure and the CPU load/speed).
While sending more text chunks at once isn't an issue in itself, it could become problematic at the main-thread during `textLayer` building. Note how both the `PartialEvaluator` and `CanvasGraphics` implementations utilize `Date.now()`-checks, to prevent long-running parsing/rendering from "hanging" the respective thread. In the `textLayer` building we don't utilize such a construction[3], and streaming of textContent is thus essentially acting as a *simple* stand-in for that functionality.
Hence why we want to avoid choosing a too large minimum batch size, since that could thus indirectly affect main-thread performance negatively.
---
[1] While it'd be possible to go even lower, that'd likely require more invasive re-factoring/changes to the `PartialEvaluator.getTextContent`-code to ensure that the batches don't become too large.
[2] This should also, as far as I can tell, explain some of the regressions observed in the "enhance" text-selection tests back in PR 13257.
Looking closer at the `appendEOL` function it should potentially be changed even more, however that should probably not be done here.
[3] I'd really like to avoid implementing something like that for the `textLayer` building as well, given that it'd require adding a fair bit of complexity.
2021-09-03 20:07:04 +09:00
|
|
|
|
function enqueueChunk(batch = false) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
const length = textContent.items.length;
|
[api-minor] Reduce `postMessage` overhead, in `PartialEvaluator.getTextContent`, by sending text chunks in batches (issue 13962)
Following the STR in the issue, this patch reduces the number of `PartialEvaluator.getTextContent`-related `postMessage`-calls by approximately 78 percent.[1]
Note that by enforcing a relatively low value when batching text chunks, we should thus improve worst-case scenarios while not negatively affect all `textLayer` building.
While working on these changes I noticed, thanks to our unit-tests, that the implementation of the `appendEOL` function unfortunately means that the number and content of the textItems could actually be affected by the particular chunking used.
That seems *extremely* unfortunate, since in practice this means that the particular chunking used is thus observable through the API. Obviously that should be a completely internal implementation detail, which is why this patch also modifies `appendEOL` to mitigate that.[2]
Given that this patch adds a *minimum* batch size in `enqueueChunk`, there's obviously nothing preventing it from becoming a lot larger then the limit (depending e.g. on the PDF structure and the CPU load/speed).
While sending more text chunks at once isn't an issue in itself, it could become problematic at the main-thread during `textLayer` building. Note how both the `PartialEvaluator` and `CanvasGraphics` implementations utilize `Date.now()`-checks, to prevent long-running parsing/rendering from "hanging" the respective thread. In the `textLayer` building we don't utilize such a construction[3], and streaming of textContent is thus essentially acting as a *simple* stand-in for that functionality.
Hence why we want to avoid choosing a too large minimum batch size, since that could thus indirectly affect main-thread performance negatively.
---
[1] While it'd be possible to go even lower, that'd likely require more invasive re-factoring/changes to the `PartialEvaluator.getTextContent`-code to ensure that the batches don't become too large.
[2] This should also, as far as I can tell, explain some of the regressions observed in the "enhance" text-selection tests back in PR 13257.
Looking closer at the `appendEOL` function it should potentially be changed even more, however that should probably not be done here.
[3] I'd really like to avoid implementing something like that for the `textLayer` building as well, given that it'd require adding a fair bit of complexity.
2021-09-03 20:07:04 +09:00
|
|
|
|
if (length === 0) {
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
if (batch && length < TEXT_CHUNK_BATCH_SIZE) {
|
|
|
|
|
return;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
[api-minor] Reduce `postMessage` overhead, in `PartialEvaluator.getTextContent`, by sending text chunks in batches (issue 13962)
Following the STR in the issue, this patch reduces the number of `PartialEvaluator.getTextContent`-related `postMessage`-calls by approximately 78 percent.[1]
Note that by enforcing a relatively low value when batching text chunks, we should thus improve worst-case scenarios while not negatively affect all `textLayer` building.
While working on these changes I noticed, thanks to our unit-tests, that the implementation of the `appendEOL` function unfortunately means that the number and content of the textItems could actually be affected by the particular chunking used.
That seems *extremely* unfortunate, since in practice this means that the particular chunking used is thus observable through the API. Obviously that should be a completely internal implementation detail, which is why this patch also modifies `appendEOL` to mitigate that.[2]
Given that this patch adds a *minimum* batch size in `enqueueChunk`, there's obviously nothing preventing it from becoming a lot larger then the limit (depending e.g. on the PDF structure and the CPU load/speed).
While sending more text chunks at once isn't an issue in itself, it could become problematic at the main-thread during `textLayer` building. Note how both the `PartialEvaluator` and `CanvasGraphics` implementations utilize `Date.now()`-checks, to prevent long-running parsing/rendering from "hanging" the respective thread. In the `textLayer` building we don't utilize such a construction[3], and streaming of textContent is thus essentially acting as a *simple* stand-in for that functionality.
Hence why we want to avoid choosing a too large minimum batch size, since that could thus indirectly affect main-thread performance negatively.
---
[1] While it'd be possible to go even lower, that'd likely require more invasive re-factoring/changes to the `PartialEvaluator.getTextContent`-code to ensure that the batches don't become too large.
[2] This should also, as far as I can tell, explain some of the regressions observed in the "enhance" text-selection tests back in PR 13257.
Looking closer at the `appendEOL` function it should potentially be changed even more, however that should probably not be done here.
[3] I'd really like to avoid implementing something like that for the `textLayer` building as well, given that it'd require adding a fair bit of complexity.
2021-09-03 20:07:04 +09:00
|
|
|
|
sink.enqueue(textContent, length);
|
|
|
|
|
textContent.items = [];
|
|
|
|
|
textContent.styles = Object.create(null);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2015-11-04 01:12:41 +09:00
|
|
|
|
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const timeSlotManager = new TimeSlotManager();
|
2020-07-05 19:20:10 +09:00
|
|
|
|
|
|
|
|
|
return new Promise(function promiseBody(resolve, reject) {
|
|
|
|
|
const next = function (promise) {
|
[api-minor] Reduce `postMessage` overhead, in `PartialEvaluator.getTextContent`, by sending text chunks in batches (issue 13962)
Following the STR in the issue, this patch reduces the number of `PartialEvaluator.getTextContent`-related `postMessage`-calls by approximately 78 percent.[1]
Note that by enforcing a relatively low value when batching text chunks, we should thus improve worst-case scenarios while not negatively affect all `textLayer` building.
While working on these changes I noticed, thanks to our unit-tests, that the implementation of the `appendEOL` function unfortunately means that the number and content of the textItems could actually be affected by the particular chunking used.
That seems *extremely* unfortunate, since in practice this means that the particular chunking used is thus observable through the API. Obviously that should be a completely internal implementation detail, which is why this patch also modifies `appendEOL` to mitigate that.[2]
Given that this patch adds a *minimum* batch size in `enqueueChunk`, there's obviously nothing preventing it from becoming a lot larger then the limit (depending e.g. on the PDF structure and the CPU load/speed).
While sending more text chunks at once isn't an issue in itself, it could become problematic at the main-thread during `textLayer` building. Note how both the `PartialEvaluator` and `CanvasGraphics` implementations utilize `Date.now()`-checks, to prevent long-running parsing/rendering from "hanging" the respective thread. In the `textLayer` building we don't utilize such a construction[3], and streaming of textContent is thus essentially acting as a *simple* stand-in for that functionality.
Hence why we want to avoid choosing a too large minimum batch size, since that could thus indirectly affect main-thread performance negatively.
---
[1] While it'd be possible to go even lower, that'd likely require more invasive re-factoring/changes to the `PartialEvaluator.getTextContent`-code to ensure that the batches don't become too large.
[2] This should also, as far as I can tell, explain some of the regressions observed in the "enhance" text-selection tests back in PR 13257.
Looking closer at the `appendEOL` function it should potentially be changed even more, however that should probably not be done here.
[3] I'd really like to avoid implementing something like that for the `textLayer` building as well, given that it'd require adding a fair bit of complexity.
2021-09-03 20:07:04 +09:00
|
|
|
|
enqueueChunk(/* batch = */ true);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
Promise.all([promise, sink.ready]).then(function () {
|
|
|
|
|
try {
|
|
|
|
|
promiseBody(resolve, reject);
|
|
|
|
|
} catch (ex) {
|
|
|
|
|
reject(ex);
|
|
|
|
|
}
|
|
|
|
|
}, reject);
|
|
|
|
|
};
|
|
|
|
|
task.ensureNotTerminated();
|
|
|
|
|
timeSlotManager.reset();
|
2021-05-06 17:08:09 +09:00
|
|
|
|
|
|
|
|
|
const operation = {};
|
2021-05-06 16:39:21 +09:00
|
|
|
|
let stop,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
args = [];
|
|
|
|
|
while (!(stop = timeSlotManager.check())) {
|
|
|
|
|
// The arguments parsed by read() are not used beyond this loop, so
|
|
|
|
|
// we can reuse the same array on every iteration, thus avoiding
|
|
|
|
|
// unnecessary allocations.
|
|
|
|
|
args.length = 0;
|
|
|
|
|
operation.args = args;
|
|
|
|
|
if (!preprocessor.read(operation)) {
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
textState = stateManager.state;
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const fn = operation.fn;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
args = operation.args;
|
|
|
|
|
|
|
|
|
|
switch (fn | 0) {
|
2021-05-13 17:40:08 +09:00
|
|
|
|
case OPS.setFont:
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// Optimization to ignore multiple identical Tf commands.
|
2021-05-13 17:40:08 +09:00
|
|
|
|
var fontNameArg = args[0].name,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
fontSizeArg = args[1];
|
|
|
|
|
if (
|
|
|
|
|
textState.font &&
|
|
|
|
|
fontNameArg === textState.fontName &&
|
|
|
|
|
fontSizeArg === textState.fontSize
|
|
|
|
|
) {
|
2014-05-10 10:21:15 +09:00
|
|
|
|
break;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
flushTextContentItem();
|
|
|
|
|
textState.fontName = fontNameArg;
|
|
|
|
|
textState.fontSize = fontSizeArg;
|
|
|
|
|
next(handleSetFont(fontNameArg, null));
|
|
|
|
|
return;
|
|
|
|
|
case OPS.setTextRise:
|
|
|
|
|
textState.textRise = args[0];
|
|
|
|
|
break;
|
|
|
|
|
case OPS.setHScale:
|
|
|
|
|
textState.textHScale = args[0] / 100;
|
|
|
|
|
break;
|
|
|
|
|
case OPS.setLeading:
|
|
|
|
|
textState.leading = args[0];
|
|
|
|
|
break;
|
|
|
|
|
case OPS.moveText:
|
|
|
|
|
textState.translateTextLineMatrix(args[0], args[1]);
|
|
|
|
|
textState.textMatrix = textState.textLineMatrix.slice();
|
|
|
|
|
break;
|
|
|
|
|
case OPS.setLeadingMoveText:
|
|
|
|
|
textState.leading = -args[1];
|
|
|
|
|
textState.translateTextLineMatrix(args[0], args[1]);
|
|
|
|
|
textState.textMatrix = textState.textLineMatrix.slice();
|
|
|
|
|
break;
|
|
|
|
|
case OPS.nextLine:
|
|
|
|
|
textState.carriageReturn();
|
|
|
|
|
break;
|
|
|
|
|
case OPS.setTextMatrix:
|
|
|
|
|
textState.setTextMatrix(
|
|
|
|
|
args[0],
|
|
|
|
|
args[1],
|
|
|
|
|
args[2],
|
|
|
|
|
args[3],
|
|
|
|
|
args[4],
|
|
|
|
|
args[5]
|
|
|
|
|
);
|
|
|
|
|
textState.setTextLineMatrix(
|
|
|
|
|
args[0],
|
|
|
|
|
args[1],
|
|
|
|
|
args[2],
|
|
|
|
|
args[3],
|
|
|
|
|
args[4],
|
|
|
|
|
args[5]
|
|
|
|
|
);
|
2021-04-30 21:41:13 +09:00
|
|
|
|
updateAdvanceScale();
|
2020-07-05 19:20:10 +09:00
|
|
|
|
break;
|
|
|
|
|
case OPS.setCharSpacing:
|
|
|
|
|
textState.charSpacing = args[0];
|
|
|
|
|
break;
|
|
|
|
|
case OPS.setWordSpacing:
|
|
|
|
|
textState.wordSpacing = args[0];
|
|
|
|
|
break;
|
|
|
|
|
case OPS.beginText:
|
|
|
|
|
textState.textMatrix = IDENTITY_MATRIX.slice();
|
|
|
|
|
textState.textLineMatrix = IDENTITY_MATRIX.slice();
|
|
|
|
|
break;
|
|
|
|
|
case OPS.showSpacedText:
|
|
|
|
|
if (!stateManager.state.font) {
|
|
|
|
|
self.ensureStateFont(stateManager.state);
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
|
2021-04-30 21:41:13 +09:00
|
|
|
|
const spaceFactor =
|
|
|
|
|
((textState.font.vertical ? 1 : -1) * textState.fontSize) / 1000;
|
|
|
|
|
const elements = args[0];
|
|
|
|
|
for (let i = 0, ii = elements.length; i < ii - 1; i++) {
|
|
|
|
|
const item = elements[i];
|
|
|
|
|
if (typeof item === "string") {
|
|
|
|
|
showSpacedTextBuffer.push(item);
|
|
|
|
|
} else if (typeof item === "number" && item !== 0) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// PDF Specification 5.3.2 states:
|
|
|
|
|
// The number is expressed in thousandths of a unit of text
|
|
|
|
|
// space.
|
|
|
|
|
// This amount is subtracted from the current horizontal or
|
|
|
|
|
// vertical coordinate, depending on the writing mode.
|
|
|
|
|
// In the default coordinate system, a positive adjustment
|
|
|
|
|
// has the effect of moving the next glyph painted either to
|
|
|
|
|
// the left or down by the given amount.
|
2021-04-30 21:41:13 +09:00
|
|
|
|
const str = showSpacedTextBuffer.join("");
|
|
|
|
|
showSpacedTextBuffer.length = 0;
|
|
|
|
|
buildTextContentItem({
|
|
|
|
|
chars: str,
|
|
|
|
|
extraSpacing: item * spaceFactor,
|
|
|
|
|
});
|
2013-08-01 03:17:36 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2021-04-30 21:41:13 +09:00
|
|
|
|
|
|
|
|
|
const item = elements[elements.length - 1];
|
|
|
|
|
if (typeof item === "string") {
|
|
|
|
|
showSpacedTextBuffer.push(item);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (showSpacedTextBuffer.length > 0) {
|
|
|
|
|
const str = showSpacedTextBuffer.join("");
|
|
|
|
|
showSpacedTextBuffer.length = 0;
|
|
|
|
|
buildTextContentItem({
|
|
|
|
|
chars: str,
|
|
|
|
|
extraSpacing: 0,
|
|
|
|
|
});
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
break;
|
|
|
|
|
case OPS.showText:
|
|
|
|
|
if (!stateManager.state.font) {
|
|
|
|
|
self.ensureStateFont(stateManager.state);
|
|
|
|
|
continue;
|
|
|
|
|
}
|
2021-04-30 21:41:13 +09:00
|
|
|
|
buildTextContentItem({
|
|
|
|
|
chars: args[0],
|
|
|
|
|
extraSpacing: 0,
|
|
|
|
|
});
|
2020-07-05 19:20:10 +09:00
|
|
|
|
break;
|
|
|
|
|
case OPS.nextLineShowText:
|
|
|
|
|
if (!stateManager.state.font) {
|
|
|
|
|
self.ensureStateFont(stateManager.state);
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
textState.carriageReturn();
|
2021-04-30 21:41:13 +09:00
|
|
|
|
buildTextContentItem({
|
|
|
|
|
chars: args[0],
|
|
|
|
|
extraSpacing: 0,
|
|
|
|
|
});
|
2020-07-05 19:20:10 +09:00
|
|
|
|
break;
|
|
|
|
|
case OPS.nextLineSetSpacingShowText:
|
|
|
|
|
if (!stateManager.state.font) {
|
|
|
|
|
self.ensureStateFont(stateManager.state);
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
textState.wordSpacing = args[0];
|
|
|
|
|
textState.charSpacing = args[1];
|
|
|
|
|
textState.carriageReturn();
|
2021-04-30 21:41:13 +09:00
|
|
|
|
buildTextContentItem({
|
|
|
|
|
chars: args[2],
|
|
|
|
|
extraSpacing: 0,
|
|
|
|
|
});
|
2020-07-05 19:20:10 +09:00
|
|
|
|
break;
|
2021-05-13 17:40:08 +09:00
|
|
|
|
case OPS.paintXObject:
|
2020-07-05 19:20:10 +09:00
|
|
|
|
flushTextContentItem();
|
|
|
|
|
if (!xobjs) {
|
|
|
|
|
xobjs = resources.get("XObject") || Dict.empty;
|
|
|
|
|
}
|
[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815)
Other PDF readers, e.g. Adobe Reader and PDFium (in Chrome), will attempt to render as much of a page as possible even if there are errors present.
Currently we just bail as soon the first error is hit, which means that we'll usually not render anything in these cases and just display a blank page instead.
NOTE: This patch changes the default behaviour of the PDF.js API to always attempt to recover as much data as possible, even when encountering errors during e.g. `getOperatorList`/`getTextContent`, which thus improve our handling of corrupt PDF files and allow the default viewer to handle errors slightly more gracefully.
In the event that an API consumer wishes to use the old behaviour, where we stop parsing as soon as an error is encountered, the `stopAtErrors` parameter can be set at `getDocument`.
Fixes, inasmuch it's possible since the PDF files are corrupt, e.g. issue 6342, issue 3795, and [bug 1130815](https://bugzilla.mozilla.org/show_bug.cgi?id=1130815) (and probably others too).
2017-02-19 22:03:08 +09:00
|
|
|
|
|
2021-07-15 04:38:19 +09:00
|
|
|
|
var isValidName = args[0] instanceof Name;
|
2021-05-13 17:40:08 +09:00
|
|
|
|
var name = args[0].name;
|
2021-07-15 04:38:19 +09:00
|
|
|
|
|
|
|
|
|
if (isValidName && emptyXObjectCache.getByName(name)) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
break;
|
|
|
|
|
}
|
2020-05-26 16:47:59 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
next(
|
|
|
|
|
new Promise(function (resolveXObject, rejectXObject) {
|
2021-07-15 04:38:19 +09:00
|
|
|
|
if (!isValidName) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
throw new FormatError("XObject must be referred to by name.");
|
|
|
|
|
}
|
2020-05-26 16:47:59 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
let xobj = xobjs.getRaw(name);
|
|
|
|
|
if (xobj instanceof Ref) {
|
|
|
|
|
if (emptyXObjectCache.getByRef(xobj)) {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
resolveXObject();
|
|
|
|
|
return;
|
|
|
|
|
}
|
2014-04-10 08:44:07 +09:00
|
|
|
|
|
2021-01-28 00:56:17 +09:00
|
|
|
|
const globalImage = self.globalImageCache.getData(
|
|
|
|
|
xobj,
|
|
|
|
|
self.pageIndex
|
|
|
|
|
);
|
|
|
|
|
if (globalImage) {
|
|
|
|
|
resolveXObject();
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
xobj = xref.fetch(xobj);
|
|
|
|
|
}
|
|
|
|
|
|
2022-02-17 21:45:42 +09:00
|
|
|
|
if (!(xobj instanceof BaseStream)) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
throw new FormatError("XObject should be a stream");
|
|
|
|
|
}
|
2017-09-17 20:35:18 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
const type = xobj.dict.get("Subtype");
|
2022-02-21 20:45:00 +09:00
|
|
|
|
if (!(type instanceof Name)) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
throw new FormatError("XObject should have a Name subtype");
|
|
|
|
|
}
|
2020-05-26 16:47:59 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (type.name !== "Form") {
|
|
|
|
|
emptyXObjectCache.set(name, xobj.dict.objId, true);
|
2017-09-17 20:35:18 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
resolveXObject();
|
|
|
|
|
return;
|
|
|
|
|
}
|
2017-09-17 20:35:18 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// Use a new `StateManager` to prevent incorrect positioning
|
|
|
|
|
// of textItems *after* the Form XObject, since errors in the
|
|
|
|
|
// data can otherwise prevent `restore` operators from
|
|
|
|
|
// executing.
|
|
|
|
|
// NOTE: Only an issue when `options.ignoreErrors === true`.
|
|
|
|
|
const currentState = stateManager.state.clone();
|
|
|
|
|
const xObjStateManager = new StateManager(currentState);
|
|
|
|
|
|
|
|
|
|
const matrix = xobj.dict.getArray("Matrix");
|
|
|
|
|
if (Array.isArray(matrix) && matrix.length === 6) {
|
|
|
|
|
xObjStateManager.transform(matrix);
|
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// Enqueue the `textContent` chunk before parsing the /Form
|
|
|
|
|
// XObject.
|
|
|
|
|
enqueueChunk();
|
|
|
|
|
const sinkWrapper = {
|
|
|
|
|
enqueueInvoked: false,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
enqueue(chunk, size) {
|
|
|
|
|
this.enqueueInvoked = true;
|
|
|
|
|
sink.enqueue(chunk, size);
|
|
|
|
|
},
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
get desiredSize() {
|
|
|
|
|
return sink.desiredSize;
|
|
|
|
|
},
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
get ready() {
|
|
|
|
|
return sink.ready;
|
|
|
|
|
},
|
|
|
|
|
};
|
2014-03-23 03:15:51 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
self
|
|
|
|
|
.getTextContent({
|
|
|
|
|
stream: xobj,
|
|
|
|
|
task,
|
|
|
|
|
resources: xobj.dict.get("Resources") || resources,
|
|
|
|
|
stateManager: xObjStateManager,
|
|
|
|
|
combineTextItems,
|
2021-04-01 07:07:02 +09:00
|
|
|
|
includeMarkedContent,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
sink: sinkWrapper,
|
|
|
|
|
seenStyles,
|
2022-02-14 03:39:40 +09:00
|
|
|
|
viewBox,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
})
|
|
|
|
|
.then(function () {
|
|
|
|
|
if (!sinkWrapper.enqueueInvoked) {
|
|
|
|
|
emptyXObjectCache.set(name, xobj.dict.objId, true);
|
|
|
|
|
}
|
|
|
|
|
resolveXObject();
|
|
|
|
|
}, rejectXObject);
|
|
|
|
|
}).catch(function (reason) {
|
|
|
|
|
if (reason instanceof AbortException) {
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
if (self.options.ignoreErrors) {
|
|
|
|
|
// Error(s) in the XObject -- allow text-extraction to
|
|
|
|
|
// continue.
|
|
|
|
|
warn(`getTextContent - ignoring XObject: "${reason}".`);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
throw reason;
|
|
|
|
|
})
|
|
|
|
|
);
|
|
|
|
|
return;
|
2021-05-13 17:40:08 +09:00
|
|
|
|
case OPS.setGState:
|
2021-07-15 04:38:19 +09:00
|
|
|
|
isValidName = args[0] instanceof Name;
|
2021-05-13 17:40:08 +09:00
|
|
|
|
name = args[0].name;
|
2021-07-15 04:38:19 +09:00
|
|
|
|
|
|
|
|
|
if (isValidName && emptyGStateCache.getByName(name)) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
break;
|
|
|
|
|
}
|
2020-07-11 21:05:53 +09:00
|
|
|
|
|
|
|
|
|
next(
|
|
|
|
|
new Promise(function (resolveGState, rejectGState) {
|
2021-07-15 04:38:19 +09:00
|
|
|
|
if (!isValidName) {
|
2020-07-11 21:05:53 +09:00
|
|
|
|
throw new FormatError("GState must be referred to by name.");
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
const extGState = resources.get("ExtGState");
|
|
|
|
|
if (!(extGState instanceof Dict)) {
|
|
|
|
|
throw new FormatError("ExtGState should be a dictionary.");
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
const gState = extGState.get(name);
|
|
|
|
|
// TODO: Attempt to lookup cached GStates by reference as well,
|
|
|
|
|
// if and only if there are PDF documents where doing so
|
|
|
|
|
// would significantly improve performance.
|
|
|
|
|
if (!(gState instanceof Dict)) {
|
|
|
|
|
throw new FormatError("GState should be a dictionary.");
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
const gStateFont = gState.get("Font");
|
|
|
|
|
if (!gStateFont) {
|
|
|
|
|
emptyGStateCache.set(name, gState.objId, true);
|
|
|
|
|
|
|
|
|
|
resolveGState();
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
flushTextContentItem();
|
|
|
|
|
|
|
|
|
|
textState.fontName = null;
|
|
|
|
|
textState.fontSize = gStateFont[1];
|
|
|
|
|
handleSetFont(null, gStateFont[0]).then(
|
|
|
|
|
resolveGState,
|
|
|
|
|
rejectGState
|
|
|
|
|
);
|
|
|
|
|
}).catch(function (reason) {
|
|
|
|
|
if (reason instanceof AbortException) {
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
if (self.options.ignoreErrors) {
|
|
|
|
|
// Error(s) in the ExtGState -- allow text-extraction to
|
|
|
|
|
// continue.
|
|
|
|
|
warn(`getTextContent - ignoring ExtGState: "${reason}".`);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
throw reason;
|
|
|
|
|
})
|
|
|
|
|
);
|
|
|
|
|
return;
|
2021-04-01 07:07:02 +09:00
|
|
|
|
case OPS.beginMarkedContent:
|
|
|
|
|
if (includeMarkedContent) {
|
|
|
|
|
textContent.items.push({
|
|
|
|
|
type: "beginMarkedContent",
|
2022-02-21 20:45:00 +09:00
|
|
|
|
tag: args[0] instanceof Name ? args[0].name : null,
|
2021-04-01 07:07:02 +09:00
|
|
|
|
});
|
|
|
|
|
}
|
|
|
|
|
break;
|
|
|
|
|
case OPS.beginMarkedContentProps:
|
|
|
|
|
if (includeMarkedContent) {
|
|
|
|
|
flushTextContentItem();
|
|
|
|
|
let mcid = null;
|
2022-02-21 20:44:56 +09:00
|
|
|
|
if (args[1] instanceof Dict) {
|
2021-04-01 07:07:02 +09:00
|
|
|
|
mcid = args[1].get("MCID");
|
|
|
|
|
}
|
|
|
|
|
textContent.items.push({
|
|
|
|
|
type: "beginMarkedContentProps",
|
|
|
|
|
id: Number.isInteger(mcid)
|
|
|
|
|
? `${self.idFactory.getPageObjId()}_mcid${mcid}`
|
|
|
|
|
: null,
|
2022-02-21 20:45:00 +09:00
|
|
|
|
tag: args[0] instanceof Name ? args[0].name : null,
|
2021-04-01 07:07:02 +09:00
|
|
|
|
});
|
|
|
|
|
}
|
|
|
|
|
break;
|
|
|
|
|
case OPS.endMarkedContent:
|
|
|
|
|
if (includeMarkedContent) {
|
|
|
|
|
flushTextContentItem();
|
|
|
|
|
textContent.items.push({
|
|
|
|
|
type: "endMarkedContent",
|
|
|
|
|
});
|
|
|
|
|
}
|
|
|
|
|
break;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
} // switch
|
|
|
|
|
if (textContent.items.length >= sink.desiredSize) {
|
|
|
|
|
// Wait for ready, if we reach highWaterMark.
|
|
|
|
|
stop = true;
|
|
|
|
|
break;
|
2014-05-10 10:41:03 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
} // while
|
|
|
|
|
if (stop) {
|
|
|
|
|
next(deferred);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
flushTextContentItem();
|
|
|
|
|
enqueueChunk();
|
|
|
|
|
resolve();
|
|
|
|
|
}).catch(reason => {
|
|
|
|
|
if (reason instanceof AbortException) {
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
if (this.options.ignoreErrors) {
|
|
|
|
|
// Error(s) in the TextContent -- allow text-extraction to continue.
|
|
|
|
|
warn(
|
|
|
|
|
`getTextContent - ignoring errors during "${task.name}" ` +
|
|
|
|
|
`task: "${reason}".`
|
|
|
|
|
);
|
|
|
|
|
|
2015-11-04 01:12:41 +09:00
|
|
|
|
flushTextContentItem();
|
2017-04-17 21:46:53 +09:00
|
|
|
|
enqueueChunk();
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
throw reason;
|
|
|
|
|
});
|
|
|
|
|
}
|
2011-12-11 08:24:54 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
extractDataStructures(dict, baseDict, properties) {
|
|
|
|
|
const xref = this.xref;
|
|
|
|
|
let cidToGidBytes;
|
|
|
|
|
// 9.10.2
|
Export the "raw" `toUnicode`-data from `PartialEvaluator.preEvaluateFont`
Compared to other data-structures, such as e.g. `Dict`s, we're purposely *not* caching Streams on the `XRef`-instance.[1]
The, somewhat unfortunate, effect of Streams not being cached is that repeatedly getting the *same* Stream-data requires re-parsing/re-initializing of a bunch of data; see `XRef.fetch` and related methods.
For the font-parsing in particular we're currently fetching the `toUnicode`-data, which is very often a Stream, in `PartialEvaluator.preEvaluateFont` and then *again* in `PartialEvaluator.extractDataStructures` soon afterwards.
By instead letting `PartialEvaluator.preEvaluateFont` export the "raw" `toUnicode`-data, we can avoid *some* unnecessary re-parsing/re-initializing when handling fonts.
*Please note:* In this particular case, given that `PartialEvaluator.preEvaluateFont` only accesses the "raw" `toUnicode` data, exporting a Stream should be safe.
---
[1] The reasons for this include:
- Streams, especially `DecodeStream`-instances, can become *very* large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so.
- Attempting to read from the *same* Stream-instance more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position.
- Given that parsing, even in the worker-thread, is now fairly asynchronous it's generally impossible to assert that any one Stream-instance isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Stream-instance isn't going to work in the general case.
2021-05-08 05:25:08 +09:00
|
|
|
|
const toUnicodePromise = this.readToUnicode(
|
|
|
|
|
properties.toUnicode || dict.get("ToUnicode") || baseDict.get("ToUnicode")
|
|
|
|
|
);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
|
|
|
|
|
if (properties.composite) {
|
|
|
|
|
// CIDSystemInfo helps to match CID to glyphs
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const cidSystemInfo = dict.get("CIDSystemInfo");
|
2022-02-21 20:44:56 +09:00
|
|
|
|
if (cidSystemInfo instanceof Dict) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
properties.cidSystemInfo = {
|
|
|
|
|
registry: stringToPDFString(cidSystemInfo.get("Registry")),
|
|
|
|
|
ordering: stringToPDFString(cidSystemInfo.get("Ordering")),
|
|
|
|
|
supplement: cidSystemInfo.get("Supplement"),
|
|
|
|
|
};
|
|
|
|
|
}
|
2011-10-25 08:55:23 +09:00
|
|
|
|
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const cidToGidMap = dict.get("CIDToGIDMap");
|
2021-10-08 19:21:26 +09:00
|
|
|
|
if (cidToGidMap instanceof BaseStream) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
cidToGidBytes = cidToGidMap.getBytes();
|
2011-10-25 08:55:23 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2011-10-25 08:55:23 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// Based on 9.6.6 of the spec the encoding can come from multiple places
|
|
|
|
|
// and depends on the font type. The base encoding and differences are
|
|
|
|
|
// read here, but the encoding that is actually used is chosen during
|
|
|
|
|
// glyph mapping in the font.
|
|
|
|
|
// TODO: Loading the built in encoding in the font would allow the
|
|
|
|
|
// differences to be merged in here not require us to hold on to it.
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const differences = [];
|
|
|
|
|
let baseEncodingName = null;
|
|
|
|
|
let encoding;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (dict.has("Encoding")) {
|
|
|
|
|
encoding = dict.get("Encoding");
|
2022-02-21 20:44:56 +09:00
|
|
|
|
if (encoding instanceof Dict) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
baseEncodingName = encoding.get("BaseEncoding");
|
2022-02-21 20:45:00 +09:00
|
|
|
|
baseEncodingName =
|
|
|
|
|
baseEncodingName instanceof Name ? baseEncodingName.name : null;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// Load the differences between the base and original
|
|
|
|
|
if (encoding.has("Differences")) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const diffEncoding = encoding.get("Differences");
|
|
|
|
|
let index = 0;
|
|
|
|
|
for (let j = 0, jj = diffEncoding.length; j < jj; j++) {
|
|
|
|
|
const data = xref.fetchIfRef(diffEncoding[j]);
|
2022-02-22 19:55:34 +09:00
|
|
|
|
if (typeof data === "number") {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
index = data;
|
2022-02-21 20:45:00 +09:00
|
|
|
|
} else if (data instanceof Name) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
differences[index++] = data.name;
|
|
|
|
|
} else {
|
|
|
|
|
throw new FormatError(
|
|
|
|
|
`Invalid entry in 'Differences' array: ${data}`
|
|
|
|
|
);
|
2011-10-25 08:55:23 +09:00
|
|
|
|
}
|
|
|
|
|
}
|
2014-02-12 03:27:09 +09:00
|
|
|
|
}
|
2022-02-21 20:45:00 +09:00
|
|
|
|
} else if (encoding instanceof Name) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
baseEncodingName = encoding.name;
|
|
|
|
|
} else {
|
|
|
|
|
throw new FormatError("Encoding is not a Name nor a Dict");
|
|
|
|
|
}
|
|
|
|
|
// According to table 114 if the encoding is a named encoding it must be
|
|
|
|
|
// one of these predefined encodings.
|
|
|
|
|
if (
|
|
|
|
|
baseEncodingName !== "MacRomanEncoding" &&
|
|
|
|
|
baseEncodingName !== "MacExpertEncoding" &&
|
|
|
|
|
baseEncodingName !== "WinAnsiEncoding"
|
|
|
|
|
) {
|
|
|
|
|
baseEncodingName = null;
|
2014-02-12 03:27:09 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2014-02-12 03:27:09 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (baseEncodingName) {
|
2021-04-21 00:12:19 +09:00
|
|
|
|
properties.defaultEncoding = getEncoding(baseEncodingName);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
} else {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const isSymbolicFont = !!(properties.flags & FontFlags.Symbolic);
|
|
|
|
|
const isNonsymbolicFont = !!(properties.flags & FontFlags.Nonsymbolic);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// According to "Table 114" in section "9.6.6.1 General" (under
|
|
|
|
|
// "9.6.6 Character Encoding") of the PDF specification, a Nonsymbolic
|
|
|
|
|
// font should use the `StandardEncoding` if no encoding is specified.
|
|
|
|
|
encoding = StandardEncoding;
|
|
|
|
|
if (properties.type === "TrueType" && !isNonsymbolicFont) {
|
|
|
|
|
encoding = WinAnsiEncoding;
|
|
|
|
|
}
|
|
|
|
|
// The Symbolic attribute can be misused for regular fonts
|
|
|
|
|
// Heuristic: we have to check if the font is a standard one also
|
|
|
|
|
if (isSymbolicFont) {
|
|
|
|
|
encoding = MacRomanEncoding;
|
2021-06-20 18:06:35 +09:00
|
|
|
|
if (!properties.file || properties.isInternalFont) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (/Symbol/i.test(properties.name)) {
|
|
|
|
|
encoding = SymbolSetEncoding;
|
|
|
|
|
} else if (/Dingbats|Wingdings/i.test(properties.name)) {
|
|
|
|
|
encoding = ZapfDingbatsEncoding;
|
2014-09-01 10:22:24 +09:00
|
|
|
|
}
|
2014-02-12 03:27:09 +09:00
|
|
|
|
}
|
2011-10-25 08:55:23 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
properties.defaultEncoding = encoding;
|
|
|
|
|
}
|
2011-11-25 00:38:09 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
properties.differences = differences;
|
|
|
|
|
properties.baseEncodingName = baseEncodingName;
|
|
|
|
|
properties.hasEncoding = !!baseEncodingName || differences.length > 0;
|
|
|
|
|
properties.dict = dict;
|
|
|
|
|
return toUnicodePromise
|
|
|
|
|
.then(readToUnicode => {
|
|
|
|
|
properties.toUnicode = readToUnicode;
|
|
|
|
|
return this.buildToUnicode(properties);
|
|
|
|
|
})
|
|
|
|
|
.then(builtToUnicode => {
|
|
|
|
|
properties.toUnicode = builtToUnicode;
|
|
|
|
|
if (cidToGidBytes) {
|
|
|
|
|
properties.cidToGidMap = this.readCidToGidMap(
|
|
|
|
|
cidToGidBytes,
|
|
|
|
|
builtToUnicode
|
|
|
|
|
);
|
2017-11-26 20:53:06 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return properties;
|
|
|
|
|
});
|
|
|
|
|
}
|
2019-09-30 06:50:58 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
/**
|
2021-05-18 20:45:19 +09:00
|
|
|
|
* @returns {Array}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
* @private
|
|
|
|
|
*/
|
2021-05-18 20:45:19 +09:00
|
|
|
|
_simpleFontToUnicode(properties, forceGlyphs = false) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
assert(!properties.composite, "Must be a simple font.");
|
|
|
|
|
|
|
|
|
|
const toUnicode = [];
|
|
|
|
|
const encoding = properties.defaultEncoding.slice();
|
|
|
|
|
const baseEncodingName = properties.baseEncodingName;
|
|
|
|
|
// Merge in the differences array.
|
|
|
|
|
const differences = properties.differences;
|
|
|
|
|
for (const charcode in differences) {
|
|
|
|
|
const glyphName = differences[charcode];
|
|
|
|
|
if (glyphName === ".notdef") {
|
|
|
|
|
// Skip .notdef to prevent rendering errors, e.g. boxes appearing
|
|
|
|
|
// where there should be spaces (fixes issue5256.pdf).
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
encoding[charcode] = glyphName;
|
|
|
|
|
}
|
|
|
|
|
const glyphsUnicodeMap = getGlyphsUnicode();
|
|
|
|
|
for (const charcode in encoding) {
|
|
|
|
|
// a) Map the character code to a character name.
|
|
|
|
|
let glyphName = encoding[charcode];
|
|
|
|
|
// b) Look up the character name in the Adobe Glyph List (see the
|
|
|
|
|
// Bibliography) to obtain the corresponding Unicode value.
|
|
|
|
|
if (glyphName === "") {
|
|
|
|
|
continue;
|
|
|
|
|
} else if (glyphsUnicodeMap[glyphName] === undefined) {
|
|
|
|
|
// (undocumented) c) Few heuristics to recognize unknown glyphs
|
|
|
|
|
// NOTE: Adobe Reader does not do this step, but OSX Preview does
|
|
|
|
|
let code = 0;
|
|
|
|
|
switch (glyphName[0]) {
|
|
|
|
|
case "G": // Gxx glyph
|
|
|
|
|
if (glyphName.length === 3) {
|
|
|
|
|
code = parseInt(glyphName.substring(1), 16);
|
|
|
|
|
}
|
|
|
|
|
break;
|
|
|
|
|
case "g": // g00xx glyph
|
|
|
|
|
if (glyphName.length === 5) {
|
|
|
|
|
code = parseInt(glyphName.substring(1), 16);
|
|
|
|
|
}
|
|
|
|
|
break;
|
|
|
|
|
case "C": // Cdd{d} glyph
|
|
|
|
|
case "c": // cdd{d} glyph
|
|
|
|
|
if (glyphName.length >= 3 && glyphName.length <= 4) {
|
|
|
|
|
const codeStr = glyphName.substring(1);
|
|
|
|
|
|
|
|
|
|
if (forceGlyphs) {
|
|
|
|
|
code = parseInt(codeStr, 16);
|
|
|
|
|
break;
|
2017-11-26 20:53:06 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// Normally the Cdd{d}/cdd{d} glyphName format will contain
|
|
|
|
|
// regular, i.e. base 10, charCodes (see issue4550.pdf)...
|
|
|
|
|
code = +codeStr;
|
|
|
|
|
|
|
|
|
|
// ... however some PDF generators violate that assumption by
|
|
|
|
|
// containing glyph, i.e. base 16, codes instead.
|
|
|
|
|
// In that case we need to re-parse the *entire* encoding to
|
|
|
|
|
// prevent broken text-selection (fixes issue9655_reduced.pdf).
|
|
|
|
|
if (
|
|
|
|
|
Number.isNaN(code) &&
|
|
|
|
|
Number.isInteger(parseInt(codeStr, 16))
|
|
|
|
|
) {
|
2021-05-18 20:45:19 +09:00
|
|
|
|
return this._simpleFontToUnicode(
|
2020-07-05 19:20:10 +09:00
|
|
|
|
properties,
|
|
|
|
|
/* forceGlyphs */ true
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
);
|
2017-11-26 20:53:06 +09:00
|
|
|
|
}
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
break;
|
2022-03-20 18:36:11 +09:00
|
|
|
|
default: // 'uniXXXX'/'uXXXX{XX}' glyphs
|
2020-07-05 19:20:10 +09:00
|
|
|
|
const unicode = getUnicodeForGlyph(glyphName, glyphsUnicodeMap);
|
|
|
|
|
if (unicode !== -1) {
|
|
|
|
|
code = unicode;
|
|
|
|
|
}
|
2017-11-26 20:53:06 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (code > 0 && code <= 0x10ffff && Number.isInteger(code)) {
|
|
|
|
|
// If `baseEncodingName` is one the predefined encodings, and `code`
|
|
|
|
|
// equals `charcode`, using the glyph defined in the baseEncoding
|
|
|
|
|
// seems to yield a better `toUnicode` mapping (fixes issue 5070).
|
|
|
|
|
if (baseEncodingName && code === +charcode) {
|
|
|
|
|
const baseEncoding = getEncoding(baseEncodingName);
|
|
|
|
|
if (baseEncoding && (glyphName = baseEncoding[charcode])) {
|
|
|
|
|
toUnicode[charcode] = String.fromCharCode(
|
|
|
|
|
glyphsUnicodeMap[glyphName]
|
|
|
|
|
);
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
toUnicode[charcode] = String.fromCodePoint(code);
|
Build a fallback `ToUnicode` map for simple fonts (issue 8229)
In some fonts, the included `ToUnicode` data is incomplete causing text-selection to not work properly. For simple fonts that contain encoding data, we can manually build a `ToUnicode` map to attempt to improve things.
Please note that since we're currently using the `ToUnicode` data during glyph mapping, in an attempt to avoid rendering regressions, I purposely didn't want to amend to original `ToUnicode` data for this text-selection edge-case.
Instead, I opted for the current solution, which will (hopefully) give slightly better text-extraction results in PDF file with incomplete `ToUnicode` data.
According to the PDF specification, see [section 9.10.2](http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1873172):
> A conforming reader can use these methods, in the priority given, to map a character code to a Unicode value.
> ...
Reading that paragraph literally, it doesn't seem too unreasonable to use *different* methods for different charcodes.
Fixes 8229.
2017-11-26 21:29:43 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
continue;
|
2016-02-29 01:20:29 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
toUnicode[charcode] = String.fromCharCode(glyphsUnicodeMap[glyphName]);
|
|
|
|
|
}
|
2021-05-18 20:45:19 +09:00
|
|
|
|
return toUnicode;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2017-11-26 20:53:06 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
/**
|
|
|
|
|
* Builds a char code to unicode map based on section 9.10 of the spec.
|
|
|
|
|
* @param {Object} properties Font properties object.
|
|
|
|
|
* @returns {Promise} A Promise that is resolved with a
|
|
|
|
|
* {ToUnicodeMap|IdentityToUnicodeMap} object.
|
|
|
|
|
*/
|
2021-05-17 22:40:23 +09:00
|
|
|
|
async buildToUnicode(properties) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
properties.hasIncludedToUnicodeMap =
|
|
|
|
|
!!properties.toUnicode && properties.toUnicode.length > 0;
|
|
|
|
|
|
|
|
|
|
// Section 9.10.2 Mapping Character Codes to Unicode Values
|
|
|
|
|
if (properties.hasIncludedToUnicodeMap) {
|
|
|
|
|
// Some fonts contain incomplete ToUnicode data, causing issues with
|
|
|
|
|
// text-extraction. For simple fonts, containing encoding information,
|
|
|
|
|
// use a fallback ToUnicode map to improve this (fixes issue8229.pdf).
|
|
|
|
|
if (!properties.composite && properties.hasEncoding) {
|
2021-05-18 20:45:19 +09:00
|
|
|
|
properties.fallbackToUnicode = this._simpleFontToUnicode(properties);
|
2016-02-29 01:20:29 +09:00
|
|
|
|
}
|
2021-05-17 22:40:23 +09:00
|
|
|
|
return properties.toUnicode;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// According to the spec if the font is a simple font we should only map
|
|
|
|
|
// to unicode if the base encoding is MacRoman, MacExpert, or WinAnsi or
|
|
|
|
|
// the differences array only contains adobe standard or symbol set names,
|
|
|
|
|
// in pratice it seems better to always try to create a toUnicode map
|
|
|
|
|
// based of the default encoding.
|
|
|
|
|
if (!properties.composite /* is simple font */) {
|
2021-05-18 20:45:19 +09:00
|
|
|
|
return new ToUnicodeMap(this._simpleFontToUnicode(properties));
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// If the font is a composite font that uses one of the predefined CMaps
|
|
|
|
|
// listed in Table 118 (except Identity–H and Identity–V) or whose
|
|
|
|
|
// descendant CIDFont uses the Adobe-GB1, Adobe-CNS1, Adobe-Japan1, or
|
|
|
|
|
// Adobe-Korea1 character collection:
|
|
|
|
|
if (
|
|
|
|
|
properties.composite &&
|
|
|
|
|
((properties.cMap.builtInCMap &&
|
|
|
|
|
!(properties.cMap instanceof IdentityCMap)) ||
|
|
|
|
|
(properties.cidSystemInfo.registry === "Adobe" &&
|
|
|
|
|
(properties.cidSystemInfo.ordering === "GB1" ||
|
|
|
|
|
properties.cidSystemInfo.ordering === "CNS1" ||
|
|
|
|
|
properties.cidSystemInfo.ordering === "Japan1" ||
|
|
|
|
|
properties.cidSystemInfo.ordering === "Korea1")))
|
|
|
|
|
) {
|
|
|
|
|
// Then:
|
|
|
|
|
// a) Map the character code to a character identifier (CID) according
|
|
|
|
|
// to the font’s CMap.
|
|
|
|
|
// b) Obtain the registry and ordering of the character collection used
|
|
|
|
|
// by the font’s CMap (for example, Adobe and Japan1) from its
|
|
|
|
|
// CIDSystemInfo dictionary.
|
2021-05-17 22:40:23 +09:00
|
|
|
|
const { registry, ordering } = properties.cidSystemInfo;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// c) Construct a second CMap name by concatenating the registry and
|
|
|
|
|
// ordering obtained in step (b) in the format registry–ordering–UCS2
|
|
|
|
|
// (for example, Adobe–Japan1–UCS2).
|
2021-05-17 22:40:23 +09:00
|
|
|
|
const ucs2CMapName = Name.get(`${registry}-${ordering}-UCS2`);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// d) Obtain the CMap with the name constructed in step (c) (available
|
|
|
|
|
// from the ASN Web site; see the Bibliography).
|
2021-05-17 22:40:23 +09:00
|
|
|
|
const ucs2CMap = await CMapFactory.create({
|
2020-07-05 19:20:10 +09:00
|
|
|
|
encoding: ucs2CMapName,
|
|
|
|
|
fetchBuiltInCMap: this._fetchBuiltInCMapBound,
|
|
|
|
|
useCMap: null,
|
|
|
|
|
});
|
2021-05-17 22:40:23 +09:00
|
|
|
|
const toUnicode = [];
|
|
|
|
|
properties.cMap.forEach(function (charcode, cid) {
|
|
|
|
|
if (cid > 0xffff) {
|
|
|
|
|
throw new FormatError("Max size of CID is 65,535");
|
|
|
|
|
}
|
|
|
|
|
// e) Map the CID obtained in step (a) according to the CMap
|
|
|
|
|
// obtained in step (d), producing a Unicode value.
|
|
|
|
|
const ucs2 = ucs2CMap.lookup(cid);
|
|
|
|
|
if (ucs2) {
|
|
|
|
|
toUnicode[charcode] = String.fromCharCode(
|
|
|
|
|
(ucs2.charCodeAt(0) << 8) + ucs2.charCodeAt(1)
|
|
|
|
|
);
|
|
|
|
|
}
|
|
|
|
|
});
|
|
|
|
|
return new ToUnicodeMap(toUnicode);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2016-02-29 01:20:29 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// The viewer's choice, just use an identity map.
|
2021-05-17 22:40:23 +09:00
|
|
|
|
return new IdentityToUnicodeMap(properties.firstChar, properties.lastChar);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
|
Export the "raw" `toUnicode`-data from `PartialEvaluator.preEvaluateFont`
Compared to other data-structures, such as e.g. `Dict`s, we're purposely *not* caching Streams on the `XRef`-instance.[1]
The, somewhat unfortunate, effect of Streams not being cached is that repeatedly getting the *same* Stream-data requires re-parsing/re-initializing of a bunch of data; see `XRef.fetch` and related methods.
For the font-parsing in particular we're currently fetching the `toUnicode`-data, which is very often a Stream, in `PartialEvaluator.preEvaluateFont` and then *again* in `PartialEvaluator.extractDataStructures` soon afterwards.
By instead letting `PartialEvaluator.preEvaluateFont` export the "raw" `toUnicode`-data, we can avoid *some* unnecessary re-parsing/re-initializing when handling fonts.
*Please note:* In this particular case, given that `PartialEvaluator.preEvaluateFont` only accesses the "raw" `toUnicode` data, exporting a Stream should be safe.
---
[1] The reasons for this include:
- Streams, especially `DecodeStream`-instances, can become *very* large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so.
- Attempting to read from the *same* Stream-instance more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position.
- Given that parsing, even in the worker-thread, is now fairly asynchronous it's generally impossible to assert that any one Stream-instance isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Stream-instance isn't going to work in the general case.
2021-05-08 05:25:08 +09:00
|
|
|
|
readToUnicode(cmapObj) {
|
|
|
|
|
if (!cmapObj) {
|
|
|
|
|
return Promise.resolve(null);
|
|
|
|
|
}
|
2022-02-21 20:45:00 +09:00
|
|
|
|
if (cmapObj instanceof Name) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return CMapFactory.create({
|
|
|
|
|
encoding: cmapObj,
|
|
|
|
|
fetchBuiltInCMap: this._fetchBuiltInCMapBound,
|
|
|
|
|
useCMap: null,
|
|
|
|
|
}).then(function (cmap) {
|
|
|
|
|
if (cmap instanceof IdentityCMap) {
|
|
|
|
|
return new IdentityToUnicodeMap(0, 0xffff);
|
|
|
|
|
}
|
|
|
|
|
return new ToUnicodeMap(cmap.getMap());
|
|
|
|
|
});
|
2022-02-17 21:45:42 +09:00
|
|
|
|
} else if (cmapObj instanceof BaseStream) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return CMapFactory.create({
|
|
|
|
|
encoding: cmapObj,
|
|
|
|
|
fetchBuiltInCMap: this._fetchBuiltInCMapBound,
|
|
|
|
|
useCMap: null,
|
|
|
|
|
}).then(
|
|
|
|
|
function (cmap) {
|
2016-02-29 01:20:29 +09:00
|
|
|
|
if (cmap instanceof IdentityCMap) {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
return new IdentityToUnicodeMap(0, 0xffff);
|
2016-02-29 01:20:29 +09:00
|
|
|
|
}
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const map = new Array(cmap.length);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// Convert UTF-16BE
|
|
|
|
|
// NOTE: cmap can be a sparse array, so use forEach instead of
|
|
|
|
|
// `for(;;)` to iterate over all keys.
|
|
|
|
|
cmap.forEach(function (charCode, token) {
|
2021-09-18 07:01:24 +09:00
|
|
|
|
// Some cmaps contain *only* CID characters (fixes issue9367.pdf).
|
|
|
|
|
if (typeof token === "number") {
|
|
|
|
|
map[charCode] = String.fromCodePoint(token);
|
|
|
|
|
return;
|
|
|
|
|
}
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const str = [];
|
|
|
|
|
for (let k = 0; k < token.length; k += 2) {
|
|
|
|
|
const w1 = (token.charCodeAt(k) << 8) | token.charCodeAt(k + 1);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if ((w1 & 0xf800) !== 0xd800) {
|
|
|
|
|
// w1 < 0xD800 || w1 > 0xDFFF
|
|
|
|
|
str.push(w1);
|
|
|
|
|
continue;
|
2016-02-29 01:20:29 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
k += 2;
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const w2 = (token.charCodeAt(k) << 8) | token.charCodeAt(k + 1);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
str.push(((w1 & 0x3ff) << 10) + (w2 & 0x3ff) + 0x10000);
|
2020-01-30 21:13:51 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
map[charCode] = String.fromCodePoint.apply(String, str);
|
|
|
|
|
});
|
|
|
|
|
return new ToUnicodeMap(map);
|
|
|
|
|
},
|
|
|
|
|
reason => {
|
|
|
|
|
if (reason instanceof AbortException) {
|
|
|
|
|
return null;
|
|
|
|
|
}
|
|
|
|
|
if (this.options.ignoreErrors) {
|
|
|
|
|
// Error in the ToUnicode data -- sending unsupported feature
|
|
|
|
|
// notification and allow font parsing to continue.
|
|
|
|
|
this.handler.send("UnsupportedFeature", {
|
|
|
|
|
featureId: UNSUPPORTED_FEATURES.errorFontToUnicode,
|
|
|
|
|
});
|
|
|
|
|
warn(`readToUnicode - ignoring ToUnicode data: "${reason}".`);
|
|
|
|
|
return null;
|
2020-01-30 21:13:51 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
throw reason;
|
|
|
|
|
}
|
|
|
|
|
);
|
|
|
|
|
}
|
|
|
|
|
return Promise.resolve(null);
|
|
|
|
|
}
|
2014-03-23 03:15:51 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
readCidToGidMap(glyphsData, toUnicode) {
|
|
|
|
|
// Extract the encoding from the CIDToGIDMap
|
2011-10-29 10:38:31 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// Set encoding 0 to later verify the font has an encoding
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const result = [];
|
|
|
|
|
for (let j = 0, jj = glyphsData.length; j < jj; j++) {
|
|
|
|
|
const glyphID = (glyphsData[j++] << 8) | glyphsData[j];
|
2020-07-05 19:20:10 +09:00
|
|
|
|
const code = j >> 1;
|
|
|
|
|
if (glyphID === 0 && !toUnicode.has(code)) {
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
result[code] = glyphID;
|
|
|
|
|
}
|
|
|
|
|
return result;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
extractWidths(dict, descriptor, properties) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const xref = this.xref;
|
|
|
|
|
let glyphsWidths = [];
|
|
|
|
|
let defaultWidth = 0;
|
|
|
|
|
const glyphsVMetrics = [];
|
|
|
|
|
let defaultVMetrics;
|
|
|
|
|
let i, ii, j, jj, start, code, widths;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (properties.composite) {
|
|
|
|
|
defaultWidth = dict.has("DW") ? dict.get("DW") : 1000;
|
|
|
|
|
|
|
|
|
|
widths = dict.get("W");
|
|
|
|
|
if (widths) {
|
|
|
|
|
for (i = 0, ii = widths.length; i < ii; i++) {
|
|
|
|
|
start = xref.fetchIfRef(widths[i++]);
|
|
|
|
|
code = xref.fetchIfRef(widths[i]);
|
|
|
|
|
if (Array.isArray(code)) {
|
|
|
|
|
for (j = 0, jj = code.length; j < jj; j++) {
|
|
|
|
|
glyphsWidths[start++] = xref.fetchIfRef(code[j]);
|
|
|
|
|
}
|
|
|
|
|
} else {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const width = xref.fetchIfRef(widths[++i]);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
for (j = start; j <= code; j++) {
|
|
|
|
|
glyphsWidths[j] = width;
|
|
|
|
|
}
|
|
|
|
|
}
|
2014-03-23 03:15:51 +09:00
|
|
|
|
}
|
2011-10-25 08:55:23 +09:00
|
|
|
|
}
|
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (properties.vertical) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
let vmetrics = dict.getArray("DW2") || [880, -1000];
|
2020-07-05 19:20:10 +09:00
|
|
|
|
defaultVMetrics = [vmetrics[1], defaultWidth * 0.5, vmetrics[0]];
|
|
|
|
|
vmetrics = dict.get("W2");
|
|
|
|
|
if (vmetrics) {
|
|
|
|
|
for (i = 0, ii = vmetrics.length; i < ii; i++) {
|
|
|
|
|
start = xref.fetchIfRef(vmetrics[i++]);
|
|
|
|
|
code = xref.fetchIfRef(vmetrics[i]);
|
2017-09-02 03:27:13 +09:00
|
|
|
|
if (Array.isArray(code)) {
|
2014-04-08 06:42:54 +09:00
|
|
|
|
for (j = 0, jj = code.length; j < jj; j++) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
glyphsVMetrics[start++] = [
|
|
|
|
|
xref.fetchIfRef(code[j++]),
|
|
|
|
|
xref.fetchIfRef(code[j++]),
|
|
|
|
|
xref.fetchIfRef(code[j]),
|
|
|
|
|
];
|
2014-03-23 03:15:51 +09:00
|
|
|
|
}
|
2013-01-30 07:19:08 +09:00
|
|
|
|
} else {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const vmetric = [
|
2020-07-05 19:20:10 +09:00
|
|
|
|
xref.fetchIfRef(vmetrics[++i]),
|
|
|
|
|
xref.fetchIfRef(vmetrics[++i]),
|
|
|
|
|
xref.fetchIfRef(vmetrics[++i]),
|
|
|
|
|
];
|
2014-04-08 06:42:54 +09:00
|
|
|
|
for (j = start; j <= code; j++) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
glyphsVMetrics[j] = vmetric;
|
2014-03-23 03:15:51 +09:00
|
|
|
|
}
|
2011-10-29 10:38:31 +09:00
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
} else {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const firstChar = properties.firstChar;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
widths = dict.get("Widths");
|
|
|
|
|
if (widths) {
|
|
|
|
|
j = firstChar;
|
|
|
|
|
for (i = 0, ii = widths.length; i < ii; i++) {
|
|
|
|
|
glyphsWidths[j++] = xref.fetchIfRef(widths[i]);
|
2013-02-08 21:29:22 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
defaultWidth = parseFloat(descriptor.get("MissingWidth")) || 0;
|
2011-10-29 10:38:31 +09:00
|
|
|
|
} else {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// Trying get the BaseFont metrics (see comment above).
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const baseFontName = dict.get("BaseFont");
|
2022-02-21 20:45:00 +09:00
|
|
|
|
if (baseFontName instanceof Name) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const metrics = this.getBaseFontMetrics(baseFontName.name);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
|
|
|
|
|
glyphsWidths = this.buildCharCodeToWidth(metrics.widths, properties);
|
|
|
|
|
defaultWidth = metrics.defaultWidth;
|
2011-10-29 10:38:31 +09:00
|
|
|
|
}
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2011-10-29 10:38:31 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// Heuristic: detection of monospace font by checking all non-zero widths
|
2021-05-06 16:39:21 +09:00
|
|
|
|
let isMonospace = true;
|
|
|
|
|
let firstWidth = defaultWidth;
|
|
|
|
|
for (const glyph in glyphsWidths) {
|
|
|
|
|
const glyphWidth = glyphsWidths[glyph];
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (!glyphWidth) {
|
|
|
|
|
continue;
|
2012-09-17 04:38:30 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (!firstWidth) {
|
|
|
|
|
firstWidth = glyphWidth;
|
|
|
|
|
continue;
|
2014-03-23 03:15:51 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (firstWidth !== glyphWidth) {
|
|
|
|
|
isMonospace = false;
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if (isMonospace) {
|
|
|
|
|
properties.flags |= FontFlags.FixedPitch;
|
|
|
|
|
}
|
2012-09-17 04:38:30 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
properties.defaultWidth = defaultWidth;
|
|
|
|
|
properties.widths = glyphsWidths;
|
|
|
|
|
properties.defaultVMetrics = defaultVMetrics;
|
|
|
|
|
properties.vmetrics = glyphsVMetrics;
|
|
|
|
|
}
|
2011-10-29 10:38:31 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
isSerifFont(baseFontName) {
|
|
|
|
|
// Simulating descriptor flags attribute
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const fontNameWoStyle = baseFontName.split("-")[0];
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return (
|
|
|
|
|
fontNameWoStyle in getSerifFonts() ||
|
|
|
|
|
fontNameWoStyle.search(/serif/gi) !== -1
|
|
|
|
|
);
|
|
|
|
|
}
|
2013-01-12 04:04:56 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
getBaseFontMetrics(name) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
let defaultWidth = 0;
|
|
|
|
|
let widths = Object.create(null);
|
|
|
|
|
let monospace = false;
|
|
|
|
|
const stdFontMap = getStdFontMap();
|
|
|
|
|
let lookupName = stdFontMap[name] || name;
|
|
|
|
|
const Metrics = getMetrics();
|
2020-07-05 19:20:10 +09:00
|
|
|
|
|
|
|
|
|
if (!(lookupName in Metrics)) {
|
|
|
|
|
// Use default fonts for looking up font metrics if the passed
|
|
|
|
|
// font is not a base font
|
|
|
|
|
if (this.isSerifFont(name)) {
|
|
|
|
|
lookupName = "Times-Roman";
|
2011-10-29 10:38:31 +09:00
|
|
|
|
} else {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
lookupName = "Helvetica";
|
2011-10-25 08:55:23 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const glyphWidths = Metrics[lookupName];
|
2011-10-25 08:55:23 +09:00
|
|
|
|
|
2022-02-22 19:55:34 +09:00
|
|
|
|
if (typeof glyphWidths === "number") {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
defaultWidth = glyphWidths;
|
|
|
|
|
monospace = true;
|
|
|
|
|
} else {
|
|
|
|
|
widths = glyphWidths(); // expand lazy widths array
|
|
|
|
|
}
|
2011-10-25 08:55:23 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return {
|
|
|
|
|
defaultWidth,
|
|
|
|
|
monospace,
|
|
|
|
|
widths,
|
|
|
|
|
};
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
buildCharCodeToWidth(widthsByGlyphName, properties) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const widths = Object.create(null);
|
|
|
|
|
const differences = properties.differences;
|
|
|
|
|
const encoding = properties.defaultEncoding;
|
|
|
|
|
for (let charCode = 0; charCode < 256; charCode++) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (charCode in differences && widthsByGlyphName[differences[charCode]]) {
|
|
|
|
|
widths[charCode] = widthsByGlyphName[differences[charCode]];
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
if (charCode in encoding && widthsByGlyphName[encoding[charCode]]) {
|
|
|
|
|
widths[charCode] = widthsByGlyphName[encoding[charCode]];
|
|
|
|
|
continue;
|
2014-02-12 03:27:09 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
return widths;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
preEvaluateFont(dict) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const baseDict = dict;
|
|
|
|
|
let type = dict.get("Subtype");
|
2022-02-21 20:45:00 +09:00
|
|
|
|
if (!(type instanceof Name)) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
throw new FormatError("invalid font Subtype");
|
|
|
|
|
}
|
2014-02-12 03:27:09 +09:00
|
|
|
|
|
2021-05-06 16:39:21 +09:00
|
|
|
|
let composite = false;
|
Export the "raw" `toUnicode`-data from `PartialEvaluator.preEvaluateFont`
Compared to other data-structures, such as e.g. `Dict`s, we're purposely *not* caching Streams on the `XRef`-instance.[1]
The, somewhat unfortunate, effect of Streams not being cached is that repeatedly getting the *same* Stream-data requires re-parsing/re-initializing of a bunch of data; see `XRef.fetch` and related methods.
For the font-parsing in particular we're currently fetching the `toUnicode`-data, which is very often a Stream, in `PartialEvaluator.preEvaluateFont` and then *again* in `PartialEvaluator.extractDataStructures` soon afterwards.
By instead letting `PartialEvaluator.preEvaluateFont` export the "raw" `toUnicode`-data, we can avoid *some* unnecessary re-parsing/re-initializing when handling fonts.
*Please note:* In this particular case, given that `PartialEvaluator.preEvaluateFont` only accesses the "raw" `toUnicode` data, exporting a Stream should be safe.
---
[1] The reasons for this include:
- Streams, especially `DecodeStream`-instances, can become *very* large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so.
- Attempting to read from the *same* Stream-instance more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position.
- Given that parsing, even in the worker-thread, is now fairly asynchronous it's generally impossible to assert that any one Stream-instance isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Stream-instance isn't going to work in the general case.
2021-05-08 05:25:08 +09:00
|
|
|
|
let hash, toUnicode;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (type.name === "Type0") {
|
|
|
|
|
// If font is a composite
|
|
|
|
|
// - get the descendant font
|
|
|
|
|
// - set the type according to the descendant font
|
|
|
|
|
// - get the FontDescriptor from the descendant font
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const df = dict.get("DescendantFonts");
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (!df) {
|
|
|
|
|
throw new FormatError("Descendant fonts are not specified");
|
|
|
|
|
}
|
|
|
|
|
dict = Array.isArray(df) ? this.xref.fetchIfRef(df[0]) : df;
|
|
|
|
|
|
2021-01-07 19:25:09 +09:00
|
|
|
|
if (!(dict instanceof Dict)) {
|
|
|
|
|
throw new FormatError("Descendant font is not a dictionary.");
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
type = dict.get("Subtype");
|
2022-02-21 20:45:00 +09:00
|
|
|
|
if (!(type instanceof Name)) {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
throw new FormatError("invalid font Subtype");
|
2017-07-20 21:04:54 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
composite = true;
|
|
|
|
|
}
|
2011-10-25 08:55:23 +09:00
|
|
|
|
|
2021-05-08 05:07:23 +09:00
|
|
|
|
const firstChar = dict.get("FirstChar") || 0,
|
|
|
|
|
lastChar = dict.get("LastChar") || (composite ? 0xffff : 0xff);
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const descriptor = dict.get("FontDescriptor");
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (descriptor) {
|
2021-05-06 17:08:09 +09:00
|
|
|
|
hash = new MurmurHash3_64();
|
2021-05-08 05:07:23 +09:00
|
|
|
|
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const encoding = baseDict.getRaw("Encoding");
|
2022-02-21 20:45:00 +09:00
|
|
|
|
if (encoding instanceof Name) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
hash.update(encoding.name);
|
2022-02-18 20:11:45 +09:00
|
|
|
|
} else if (encoding instanceof Ref) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
hash.update(encoding.toString());
|
2022-02-21 20:44:56 +09:00
|
|
|
|
} else if (encoding instanceof Dict) {
|
2020-07-17 19:57:34 +09:00
|
|
|
|
for (const entry of encoding.getRawValues()) {
|
2022-02-21 20:45:00 +09:00
|
|
|
|
if (entry instanceof Name) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
hash.update(entry.name);
|
2022-02-18 20:11:45 +09:00
|
|
|
|
} else if (entry instanceof Ref) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
hash.update(entry.toString());
|
|
|
|
|
} else if (Array.isArray(entry)) {
|
|
|
|
|
// 'Differences' array (fixes bug1157493.pdf).
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const diffLength = entry.length,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
diffBuf = new Array(diffLength);
|
|
|
|
|
|
2021-05-06 16:39:21 +09:00
|
|
|
|
for (let j = 0; j < diffLength; j++) {
|
|
|
|
|
const diffEntry = entry[j];
|
2022-02-21 20:45:00 +09:00
|
|
|
|
if (diffEntry instanceof Name) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
diffBuf[j] = diffEntry.name;
|
2022-02-22 19:55:34 +09:00
|
|
|
|
} else if (
|
|
|
|
|
typeof diffEntry === "number" ||
|
|
|
|
|
diffEntry instanceof Ref
|
|
|
|
|
) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
diffBuf[j] = diffEntry.toString();
|
2016-12-28 08:06:54 +09:00
|
|
|
|
}
|
2015-04-25 20:27:10 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
hash.update(diffBuf.join());
|
2015-04-25 20:27:10 +09:00
|
|
|
|
}
|
2014-03-04 02:44:45 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2014-03-04 02:44:45 +09:00
|
|
|
|
|
2021-05-08 05:07:23 +09:00
|
|
|
|
hash.update(`${firstChar}-${lastChar}`); // Fixes issue10665_reduced.pdf
|
2014-03-04 02:44:45 +09:00
|
|
|
|
|
Export the "raw" `toUnicode`-data from `PartialEvaluator.preEvaluateFont`
Compared to other data-structures, such as e.g. `Dict`s, we're purposely *not* caching Streams on the `XRef`-instance.[1]
The, somewhat unfortunate, effect of Streams not being cached is that repeatedly getting the *same* Stream-data requires re-parsing/re-initializing of a bunch of data; see `XRef.fetch` and related methods.
For the font-parsing in particular we're currently fetching the `toUnicode`-data, which is very often a Stream, in `PartialEvaluator.preEvaluateFont` and then *again* in `PartialEvaluator.extractDataStructures` soon afterwards.
By instead letting `PartialEvaluator.preEvaluateFont` export the "raw" `toUnicode`-data, we can avoid *some* unnecessary re-parsing/re-initializing when handling fonts.
*Please note:* In this particular case, given that `PartialEvaluator.preEvaluateFont` only accesses the "raw" `toUnicode` data, exporting a Stream should be safe.
---
[1] The reasons for this include:
- Streams, especially `DecodeStream`-instances, can become *very* large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so.
- Attempting to read from the *same* Stream-instance more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position.
- Given that parsing, even in the worker-thread, is now fairly asynchronous it's generally impossible to assert that any one Stream-instance isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Stream-instance isn't going to work in the general case.
2021-05-08 05:25:08 +09:00
|
|
|
|
toUnicode = dict.get("ToUnicode") || baseDict.get("ToUnicode");
|
2022-02-17 21:45:42 +09:00
|
|
|
|
if (toUnicode instanceof BaseStream) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const stream = toUnicode.str || toUnicode;
|
Export the "raw" `toUnicode`-data from `PartialEvaluator.preEvaluateFont`
Compared to other data-structures, such as e.g. `Dict`s, we're purposely *not* caching Streams on the `XRef`-instance.[1]
The, somewhat unfortunate, effect of Streams not being cached is that repeatedly getting the *same* Stream-data requires re-parsing/re-initializing of a bunch of data; see `XRef.fetch` and related methods.
For the font-parsing in particular we're currently fetching the `toUnicode`-data, which is very often a Stream, in `PartialEvaluator.preEvaluateFont` and then *again* in `PartialEvaluator.extractDataStructures` soon afterwards.
By instead letting `PartialEvaluator.preEvaluateFont` export the "raw" `toUnicode`-data, we can avoid *some* unnecessary re-parsing/re-initializing when handling fonts.
*Please note:* In this particular case, given that `PartialEvaluator.preEvaluateFont` only accesses the "raw" `toUnicode` data, exporting a Stream should be safe.
---
[1] The reasons for this include:
- Streams, especially `DecodeStream`-instances, can become *very* large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so.
- Attempting to read from the *same* Stream-instance more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position.
- Given that parsing, even in the worker-thread, is now fairly asynchronous it's generally impossible to assert that any one Stream-instance isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Stream-instance isn't going to work in the general case.
2021-05-08 05:25:08 +09:00
|
|
|
|
const uint8array = stream.buffer
|
2020-07-05 19:20:10 +09:00
|
|
|
|
? new Uint8Array(stream.buffer.buffer, 0, stream.bufferLength)
|
|
|
|
|
: new Uint8Array(
|
|
|
|
|
stream.bytes.buffer,
|
|
|
|
|
stream.start,
|
|
|
|
|
stream.end - stream.start
|
|
|
|
|
);
|
|
|
|
|
hash.update(uint8array);
|
2022-02-21 20:45:00 +09:00
|
|
|
|
} else if (toUnicode instanceof Name) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
hash.update(toUnicode.name);
|
2014-03-04 02:44:45 +09:00
|
|
|
|
}
|
|
|
|
|
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const widths = dict.get("Widths") || baseDict.get("Widths");
|
2021-05-07 19:44:20 +09:00
|
|
|
|
if (Array.isArray(widths)) {
|
|
|
|
|
const widthsBuf = [];
|
|
|
|
|
for (const entry of widths) {
|
2022-02-22 19:55:34 +09:00
|
|
|
|
if (typeof entry === "number" || entry instanceof Ref) {
|
2021-05-07 19:44:20 +09:00
|
|
|
|
widthsBuf.push(entry.toString());
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
hash.update(widthsBuf.join());
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2021-05-07 18:49:58 +09:00
|
|
|
|
|
|
|
|
|
if (composite) {
|
2021-05-07 19:07:30 +09:00
|
|
|
|
hash.update("compositeFont");
|
|
|
|
|
|
2021-05-07 18:49:58 +09:00
|
|
|
|
const compositeWidths = dict.get("W") || baseDict.get("W");
|
|
|
|
|
if (Array.isArray(compositeWidths)) {
|
|
|
|
|
const widthsBuf = [];
|
|
|
|
|
for (const entry of compositeWidths) {
|
2022-02-22 19:55:34 +09:00
|
|
|
|
if (typeof entry === "number" || entry instanceof Ref) {
|
2021-05-07 18:49:58 +09:00
|
|
|
|
widthsBuf.push(entry.toString());
|
|
|
|
|
} else if (Array.isArray(entry)) {
|
|
|
|
|
const subWidthsBuf = [];
|
|
|
|
|
for (const element of entry) {
|
2022-02-22 19:55:34 +09:00
|
|
|
|
if (typeof element === "number" || element instanceof Ref) {
|
2021-05-07 18:49:58 +09:00
|
|
|
|
subWidthsBuf.push(element.toString());
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
widthsBuf.push(`[${subWidthsBuf.join()}]`);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
hash.update(widthsBuf.join());
|
|
|
|
|
}
|
2021-10-08 19:21:26 +09:00
|
|
|
|
|
|
|
|
|
const cidToGidMap =
|
|
|
|
|
dict.getRaw("CIDToGIDMap") || baseDict.getRaw("CIDToGIDMap");
|
|
|
|
|
if (cidToGidMap instanceof Name) {
|
|
|
|
|
hash.update(cidToGidMap.name);
|
|
|
|
|
} else if (cidToGidMap instanceof Ref) {
|
|
|
|
|
hash.update(cidToGidMap.toString());
|
|
|
|
|
} else if (cidToGidMap instanceof BaseStream) {
|
|
|
|
|
hash.update(cidToGidMap.peekBytes());
|
|
|
|
|
}
|
2021-05-07 18:49:58 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2014-03-04 02:44:45 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return {
|
|
|
|
|
descriptor,
|
|
|
|
|
dict,
|
|
|
|
|
baseDict,
|
|
|
|
|
composite,
|
|
|
|
|
type: type.name,
|
2021-05-08 05:07:23 +09:00
|
|
|
|
firstChar,
|
|
|
|
|
lastChar,
|
Export the "raw" `toUnicode`-data from `PartialEvaluator.preEvaluateFont`
Compared to other data-structures, such as e.g. `Dict`s, we're purposely *not* caching Streams on the `XRef`-instance.[1]
The, somewhat unfortunate, effect of Streams not being cached is that repeatedly getting the *same* Stream-data requires re-parsing/re-initializing of a bunch of data; see `XRef.fetch` and related methods.
For the font-parsing in particular we're currently fetching the `toUnicode`-data, which is very often a Stream, in `PartialEvaluator.preEvaluateFont` and then *again* in `PartialEvaluator.extractDataStructures` soon afterwards.
By instead letting `PartialEvaluator.preEvaluateFont` export the "raw" `toUnicode`-data, we can avoid *some* unnecessary re-parsing/re-initializing when handling fonts.
*Please note:* In this particular case, given that `PartialEvaluator.preEvaluateFont` only accesses the "raw" `toUnicode` data, exporting a Stream should be safe.
---
[1] The reasons for this include:
- Streams, especially `DecodeStream`-instances, can become *very* large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so.
- Attempting to read from the *same* Stream-instance more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position.
- Given that parsing, even in the worker-thread, is now fairly asynchronous it's generally impossible to assert that any one Stream-instance isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Stream-instance isn't going to work in the general case.
2021-05-08 05:25:08 +09:00
|
|
|
|
toUnicode,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
hash: hash ? hash.hexdigest() : "",
|
|
|
|
|
};
|
|
|
|
|
}
|
2011-10-25 08:55:23 +09:00
|
|
|
|
|
Export the "raw" `toUnicode`-data from `PartialEvaluator.preEvaluateFont`
Compared to other data-structures, such as e.g. `Dict`s, we're purposely *not* caching Streams on the `XRef`-instance.[1]
The, somewhat unfortunate, effect of Streams not being cached is that repeatedly getting the *same* Stream-data requires re-parsing/re-initializing of a bunch of data; see `XRef.fetch` and related methods.
For the font-parsing in particular we're currently fetching the `toUnicode`-data, which is very often a Stream, in `PartialEvaluator.preEvaluateFont` and then *again* in `PartialEvaluator.extractDataStructures` soon afterwards.
By instead letting `PartialEvaluator.preEvaluateFont` export the "raw" `toUnicode`-data, we can avoid *some* unnecessary re-parsing/re-initializing when handling fonts.
*Please note:* In this particular case, given that `PartialEvaluator.preEvaluateFont` only accesses the "raw" `toUnicode` data, exporting a Stream should be safe.
---
[1] The reasons for this include:
- Streams, especially `DecodeStream`-instances, can become *very* large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so.
- Attempting to read from the *same* Stream-instance more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position.
- Given that parsing, even in the worker-thread, is now fairly asynchronous it's generally impossible to assert that any one Stream-instance isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Stream-instance isn't going to work in the general case.
2021-05-08 05:25:08 +09:00
|
|
|
|
async translateFont({
|
|
|
|
|
descriptor,
|
|
|
|
|
dict,
|
|
|
|
|
baseDict,
|
|
|
|
|
composite,
|
|
|
|
|
type,
|
|
|
|
|
firstChar,
|
|
|
|
|
lastChar,
|
|
|
|
|
toUnicode,
|
|
|
|
|
cssFontInfo,
|
|
|
|
|
}) {
|
2021-05-16 01:41:28 +09:00
|
|
|
|
const isType3Font = type === "Type3";
|
2021-05-06 16:39:21 +09:00
|
|
|
|
let properties;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
|
|
|
|
|
if (!descriptor) {
|
2021-05-16 01:41:28 +09:00
|
|
|
|
if (isType3Font) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// FontDescriptor is only required for Type3 fonts when the document
|
|
|
|
|
// is a tagged pdf. Create a barbebones one to get by.
|
|
|
|
|
descriptor = new Dict(null);
|
|
|
|
|
descriptor.set("FontName", Name.get(type));
|
|
|
|
|
descriptor.set("FontBBox", dict.getArray("FontBBox") || [0, 0, 0, 0]);
|
|
|
|
|
} else {
|
|
|
|
|
// Before PDF 1.5 if the font was one of the base 14 fonts, having a
|
|
|
|
|
// FontDescriptor was not required.
|
|
|
|
|
// This case is here for compatibility.
|
2021-05-06 16:39:21 +09:00
|
|
|
|
let baseFontName = dict.get("BaseFont");
|
2022-02-21 20:45:00 +09:00
|
|
|
|
if (!(baseFontName instanceof Name)) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
throw new FormatError("Base font is not specified");
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Using base font name as a font name.
|
|
|
|
|
baseFontName = baseFontName.name.replace(/[,_]/g, "-");
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const metrics = this.getBaseFontMetrics(baseFontName);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
|
|
|
|
|
// Simulating descriptor flags attribute
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const fontNameWoStyle = baseFontName.split("-")[0];
|
|
|
|
|
const flags =
|
2020-07-05 19:20:10 +09:00
|
|
|
|
(this.isSerifFont(fontNameWoStyle) ? FontFlags.Serif : 0) |
|
|
|
|
|
(metrics.monospace ? FontFlags.FixedPitch : 0) |
|
|
|
|
|
(getSymbolsFonts()[fontNameWoStyle]
|
|
|
|
|
? FontFlags.Symbolic
|
|
|
|
|
: FontFlags.Nonsymbolic);
|
|
|
|
|
|
|
|
|
|
properties = {
|
|
|
|
|
type,
|
|
|
|
|
name: baseFontName,
|
2020-12-11 10:32:18 +09:00
|
|
|
|
loadedName: baseDict.loadedName,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
widths: metrics.widths,
|
|
|
|
|
defaultWidth: metrics.defaultWidth,
|
2021-09-23 00:02:06 +09:00
|
|
|
|
isSimulatedFlags: true,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
flags,
|
|
|
|
|
firstChar,
|
|
|
|
|
lastChar,
|
Export the "raw" `toUnicode`-data from `PartialEvaluator.preEvaluateFont`
Compared to other data-structures, such as e.g. `Dict`s, we're purposely *not* caching Streams on the `XRef`-instance.[1]
The, somewhat unfortunate, effect of Streams not being cached is that repeatedly getting the *same* Stream-data requires re-parsing/re-initializing of a bunch of data; see `XRef.fetch` and related methods.
For the font-parsing in particular we're currently fetching the `toUnicode`-data, which is very often a Stream, in `PartialEvaluator.preEvaluateFont` and then *again* in `PartialEvaluator.extractDataStructures` soon afterwards.
By instead letting `PartialEvaluator.preEvaluateFont` export the "raw" `toUnicode`-data, we can avoid *some* unnecessary re-parsing/re-initializing when handling fonts.
*Please note:* In this particular case, given that `PartialEvaluator.preEvaluateFont` only accesses the "raw" `toUnicode` data, exporting a Stream should be safe.
---
[1] The reasons for this include:
- Streams, especially `DecodeStream`-instances, can become *very* large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so.
- Attempting to read from the *same* Stream-instance more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position.
- Given that parsing, even in the worker-thread, is now fairly asynchronous it's generally impossible to assert that any one Stream-instance isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Stream-instance isn't going to work in the general case.
2021-05-08 05:25:08 +09:00
|
|
|
|
toUnicode,
|
2021-06-27 22:19:02 +09:00
|
|
|
|
xHeight: 0,
|
|
|
|
|
capHeight: 0,
|
|
|
|
|
italicAngle: 0,
|
2021-05-16 01:41:28 +09:00
|
|
|
|
isType3Font,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
};
|
|
|
|
|
const widths = dict.get("Widths");
|
2021-06-20 18:06:35 +09:00
|
|
|
|
|
2020-12-11 10:32:18 +09:00
|
|
|
|
const standardFontName = getStandardFontName(baseFontName);
|
|
|
|
|
let file = null;
|
|
|
|
|
if (standardFontName) {
|
|
|
|
|
properties.isStandardFont = true;
|
|
|
|
|
file = await this.fetchStandardFontData(standardFontName);
|
|
|
|
|
properties.isInternalFont = !!file;
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return this.extractDataStructures(dict, dict, properties).then(
|
|
|
|
|
newProperties => {
|
|
|
|
|
if (widths) {
|
|
|
|
|
const glyphWidths = [];
|
|
|
|
|
let j = firstChar;
|
|
|
|
|
for (let i = 0, ii = widths.length; i < ii; i++) {
|
|
|
|
|
glyphWidths[j++] = this.xref.fetchIfRef(widths[i]);
|
2020-01-22 03:36:41 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
newProperties.widths = glyphWidths;
|
|
|
|
|
} else {
|
|
|
|
|
newProperties.widths = this.buildCharCodeToWidth(
|
|
|
|
|
metrics.widths,
|
|
|
|
|
newProperties
|
|
|
|
|
);
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
}
|
2020-12-11 10:32:18 +09:00
|
|
|
|
return new Font(baseFontName, file, newProperties);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
);
|
2011-10-25 08:55:23 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2011-10-25 08:55:23 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// According to the spec if 'FontDescriptor' is declared, 'FirstChar',
|
|
|
|
|
// 'LastChar' and 'Widths' should exist too, but some PDF encoders seem
|
|
|
|
|
// to ignore this rule when a variant of a standard font is used.
|
|
|
|
|
// TODO Fill the width array depending on which of the base font this is
|
|
|
|
|
// a variant.
|
|
|
|
|
|
2021-05-06 16:39:21 +09:00
|
|
|
|
let fontName = descriptor.get("FontName");
|
|
|
|
|
let baseFont = dict.get("BaseFont");
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// Some bad PDFs have a string as the font name.
|
2022-02-24 01:02:19 +09:00
|
|
|
|
if (typeof fontName === "string") {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
fontName = Name.get(fontName);
|
|
|
|
|
}
|
2022-02-24 01:02:19 +09:00
|
|
|
|
if (typeof baseFont === "string") {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
baseFont = Name.get(baseFont);
|
|
|
|
|
}
|
2013-01-12 10:10:09 +09:00
|
|
|
|
|
2021-05-16 01:41:28 +09:00
|
|
|
|
if (!isType3Font) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const fontNameStr = fontName && fontName.name;
|
|
|
|
|
const baseFontStr = baseFont && baseFont.name;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (fontNameStr !== baseFontStr) {
|
|
|
|
|
info(
|
2020-10-29 23:40:40 +09:00
|
|
|
|
`The FontDescriptor's FontName is "${fontNameStr}" but ` +
|
|
|
|
|
`should be the same as the Font's BaseFont "${baseFontStr}".`
|
2020-07-05 19:20:10 +09:00
|
|
|
|
);
|
|
|
|
|
// Workaround for cases where e.g. fontNameStr = 'Arial' and
|
|
|
|
|
// baseFontStr = 'Arial,Bold' (needed when no font file is embedded).
|
|
|
|
|
if (fontNameStr && baseFontStr && baseFontStr.startsWith(fontNameStr)) {
|
|
|
|
|
fontName = baseFont;
|
2013-03-03 21:30:08 +09:00
|
|
|
|
}
|
2013-01-12 10:10:09 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
fontName = fontName || baseFont;
|
2013-01-12 10:10:09 +09:00
|
|
|
|
|
2022-02-21 20:45:00 +09:00
|
|
|
|
if (!(fontName instanceof Name)) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
throw new FormatError("invalid font name");
|
|
|
|
|
}
|
2011-10-25 08:55:23 +09:00
|
|
|
|
|
2021-05-16 22:15:12 +09:00
|
|
|
|
let fontFile, subtype, length1, length2, length3;
|
2021-02-07 01:48:26 +09:00
|
|
|
|
try {
|
|
|
|
|
fontFile = descriptor.get("FontFile", "FontFile2", "FontFile3");
|
|
|
|
|
} catch (ex) {
|
|
|
|
|
if (!this.options.ignoreErrors) {
|
|
|
|
|
throw ex;
|
|
|
|
|
}
|
|
|
|
|
warn(`translateFont - fetching "${fontName.name}" font file: "${ex}".`);
|
|
|
|
|
fontFile = new NullStream();
|
|
|
|
|
}
|
2020-12-11 10:32:18 +09:00
|
|
|
|
let isStandardFont = false;
|
|
|
|
|
let isInternalFont = false;
|
2021-06-09 03:50:31 +09:00
|
|
|
|
let glyphScaleFactors = null;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (fontFile) {
|
|
|
|
|
if (fontFile.dict) {
|
2021-05-16 22:15:12 +09:00
|
|
|
|
const subtypeEntry = fontFile.dict.get("Subtype");
|
|
|
|
|
if (subtypeEntry instanceof Name) {
|
|
|
|
|
subtype = subtypeEntry.name;
|
2011-10-25 08:55:23 +09:00
|
|
|
|
}
|
2021-05-16 22:15:12 +09:00
|
|
|
|
length1 = fontFile.dict.get("Length1");
|
|
|
|
|
length2 = fontFile.dict.get("Length2");
|
|
|
|
|
length3 = fontFile.dict.get("Length3");
|
2011-10-25 08:55:23 +09:00
|
|
|
|
}
|
2021-06-09 03:50:31 +09:00
|
|
|
|
} else if (cssFontInfo) {
|
|
|
|
|
// We've a missing XFA font.
|
|
|
|
|
const standardFontName = getXfaFontName(fontName.name);
|
|
|
|
|
if (standardFontName) {
|
|
|
|
|
cssFontInfo.fontFamily = `${cssFontInfo.fontFamily}-PdfJS-XFA`;
|
2021-07-28 01:43:05 +09:00
|
|
|
|
cssFontInfo.metrics = standardFontName.metrics || null;
|
2021-06-09 03:50:31 +09:00
|
|
|
|
glyphScaleFactors = standardFontName.factors || null;
|
|
|
|
|
fontFile = await this.fetchStandardFontData(standardFontName.name);
|
|
|
|
|
isInternalFont = !!fontFile;
|
2021-07-29 01:30:22 +09:00
|
|
|
|
|
|
|
|
|
// We're using a substitution font but for example widths (if any)
|
|
|
|
|
// are related to the glyph positions in the font.
|
|
|
|
|
// So we overwrite everything here to be sure that widths are
|
|
|
|
|
// correct.
|
|
|
|
|
baseDict = dict = getXfaFontDict(fontName.name);
|
|
|
|
|
composite = true;
|
2021-06-09 03:50:31 +09:00
|
|
|
|
}
|
2021-06-20 18:06:35 +09:00
|
|
|
|
} else if (!isType3Font) {
|
2020-12-11 10:32:18 +09:00
|
|
|
|
const standardFontName = getStandardFontName(fontName.name);
|
|
|
|
|
if (standardFontName) {
|
|
|
|
|
isStandardFont = true;
|
|
|
|
|
fontFile = await this.fetchStandardFontData(standardFontName);
|
|
|
|
|
isInternalFont = !!fontFile;
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2011-10-25 08:55:23 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
properties = {
|
|
|
|
|
type,
|
|
|
|
|
name: fontName.name,
|
|
|
|
|
subtype,
|
|
|
|
|
file: fontFile,
|
|
|
|
|
length1,
|
|
|
|
|
length2,
|
|
|
|
|
length3,
|
2020-12-11 10:32:18 +09:00
|
|
|
|
isStandardFont,
|
|
|
|
|
isInternalFont,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
loadedName: baseDict.loadedName,
|
|
|
|
|
composite,
|
|
|
|
|
fixedPitch: false,
|
|
|
|
|
fontMatrix: dict.getArray("FontMatrix") || FONT_IDENTITY_MATRIX,
|
2021-05-08 05:07:23 +09:00
|
|
|
|
firstChar,
|
|
|
|
|
lastChar,
|
Export the "raw" `toUnicode`-data from `PartialEvaluator.preEvaluateFont`
Compared to other data-structures, such as e.g. `Dict`s, we're purposely *not* caching Streams on the `XRef`-instance.[1]
The, somewhat unfortunate, effect of Streams not being cached is that repeatedly getting the *same* Stream-data requires re-parsing/re-initializing of a bunch of data; see `XRef.fetch` and related methods.
For the font-parsing in particular we're currently fetching the `toUnicode`-data, which is very often a Stream, in `PartialEvaluator.preEvaluateFont` and then *again* in `PartialEvaluator.extractDataStructures` soon afterwards.
By instead letting `PartialEvaluator.preEvaluateFont` export the "raw" `toUnicode`-data, we can avoid *some* unnecessary re-parsing/re-initializing when handling fonts.
*Please note:* In this particular case, given that `PartialEvaluator.preEvaluateFont` only accesses the "raw" `toUnicode` data, exporting a Stream should be safe.
---
[1] The reasons for this include:
- Streams, especially `DecodeStream`-instances, can become *very* large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so.
- Attempting to read from the *same* Stream-instance more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position.
- Given that parsing, even in the worker-thread, is now fairly asynchronous it's generally impossible to assert that any one Stream-instance isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Stream-instance isn't going to work in the general case.
2021-05-08 05:25:08 +09:00
|
|
|
|
toUnicode,
|
2021-05-30 01:06:49 +09:00
|
|
|
|
bbox: descriptor.getArray("FontBBox") || dict.getArray("FontBBox"),
|
2020-07-05 19:20:10 +09:00
|
|
|
|
ascent: descriptor.get("Ascent"),
|
|
|
|
|
descent: descriptor.get("Descent"),
|
2021-06-27 22:19:02 +09:00
|
|
|
|
xHeight: descriptor.get("XHeight") || 0,
|
|
|
|
|
capHeight: descriptor.get("CapHeight") || 0,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
flags: descriptor.get("Flags"),
|
2021-06-27 22:19:02 +09:00
|
|
|
|
italicAngle: descriptor.get("ItalicAngle") || 0,
|
2021-05-16 01:41:28 +09:00
|
|
|
|
isType3Font,
|
Export the "raw" `toUnicode`-data from `PartialEvaluator.preEvaluateFont`
Compared to other data-structures, such as e.g. `Dict`s, we're purposely *not* caching Streams on the `XRef`-instance.[1]
The, somewhat unfortunate, effect of Streams not being cached is that repeatedly getting the *same* Stream-data requires re-parsing/re-initializing of a bunch of data; see `XRef.fetch` and related methods.
For the font-parsing in particular we're currently fetching the `toUnicode`-data, which is very often a Stream, in `PartialEvaluator.preEvaluateFont` and then *again* in `PartialEvaluator.extractDataStructures` soon afterwards.
By instead letting `PartialEvaluator.preEvaluateFont` export the "raw" `toUnicode`-data, we can avoid *some* unnecessary re-parsing/re-initializing when handling fonts.
*Please note:* In this particular case, given that `PartialEvaluator.preEvaluateFont` only accesses the "raw" `toUnicode` data, exporting a Stream should be safe.
---
[1] The reasons for this include:
- Streams, especially `DecodeStream`-instances, can become *very* large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so.
- Attempting to read from the *same* Stream-instance more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position.
- Given that parsing, even in the worker-thread, is now fairly asynchronous it's generally impossible to assert that any one Stream-instance isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Stream-instance isn't going to work in the general case.
2021-05-08 05:25:08 +09:00
|
|
|
|
cssFontInfo,
|
2021-06-09 03:50:31 +09:00
|
|
|
|
scaleFactors: glyphScaleFactors,
|
2020-07-05 19:20:10 +09:00
|
|
|
|
};
|
2013-02-08 21:29:22 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (composite) {
|
2020-10-15 16:30:54 +09:00
|
|
|
|
const cidEncoding = baseDict.get("Encoding");
|
2022-02-21 20:45:00 +09:00
|
|
|
|
if (cidEncoding instanceof Name) {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
properties.cidEncoding = cidEncoding.name;
|
|
|
|
|
}
|
2020-10-15 16:30:54 +09:00
|
|
|
|
const cMap = await CMapFactory.create({
|
2020-07-05 19:20:10 +09:00
|
|
|
|
encoding: cidEncoding,
|
|
|
|
|
fetchBuiltInCMap: this._fetchBuiltInCMapBound,
|
|
|
|
|
useCMap: null,
|
|
|
|
|
});
|
2020-10-15 16:30:54 +09:00
|
|
|
|
properties.cMap = cMap;
|
|
|
|
|
properties.vertical = properties.cMap.vertical;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2011-10-25 08:55:23 +09:00
|
|
|
|
|
2020-10-15 16:30:54 +09:00
|
|
|
|
return this.extractDataStructures(dict, baseDict, properties).then(
|
|
|
|
|
newProperties => {
|
2020-07-05 19:20:10 +09:00
|
|
|
|
this.extractWidths(dict, descriptor, newProperties);
|
2016-02-29 01:20:29 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return new Font(fontName.name, fontFile, newProperties);
|
2020-10-15 16:30:54 +09:00
|
|
|
|
}
|
|
|
|
|
);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2011-10-25 08:55:23 +09:00
|
|
|
|
|
2021-05-16 01:21:18 +09:00
|
|
|
|
static buildFontPaths(font, glyphs, handler, evaluatorOptions) {
|
Fallback to the built-in font renderer when font loading fails
After PR 9340 all glyphs are now re-mapped to a Private Use Area (PUA) which means that if a font fails to load, for whatever reason[1], all glyphs in the font will now render as Unicode glyph outlines.
This obviously doesn't look good, to say the least, and might be seen as a "regression" since previously many glyphs were left in their original positions which provided a slightly better fallback[2].
Hence this patch, which implements a *general* fallback to the PDF.js built-in font renderer for fonts that fail to load (i.e. are rejected by the sanitizer). One caveat here is that this only works for the Font Loading API, since it's easy to handle errors in that case[3].
The solution implemented in this patch does *not* in any way delay the loading of valid fonts, which was the problem with my previous attempt at a solution, and will only require a bit of extra work/waiting for those fonts that actually fail to load.
*Please note:* This patch doesn't fix any of the underlying PDF.js font conversion bugs that's responsible for creating corrupt font files, however it does *improve* rendering in a number of cases; refer to this possibly incomplete list:
[Bug 1524888](https://bugzilla.mozilla.org/show_bug.cgi?id=1524888)
Issue 10175
Issue 10232
---
[1] Usually because the PDF.js font conversion code wasn't able to parse the font file correctly.
[2] Glyphs fell back to some default font, which while not accurate was more useful than the current state.
[3] Furthermore I'm not sure how to implement this generally, assuming that's even possible, and don't really have time/interest to look into it either.
2019-02-11 08:47:56 +09:00
|
|
|
|
function buildPath(fontChar) {
|
2021-05-16 01:21:18 +09:00
|
|
|
|
const glyphName = `${font.loadedName}_path_${fontChar}`;
|
|
|
|
|
try {
|
|
|
|
|
if (font.renderer.hasBuiltPath(fontChar)) {
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
handler.send("commonobj", [
|
|
|
|
|
glyphName,
|
|
|
|
|
"FontPath",
|
|
|
|
|
font.renderer.getPathJs(fontChar),
|
|
|
|
|
]);
|
|
|
|
|
} catch (reason) {
|
|
|
|
|
if (evaluatorOptions.ignoreErrors) {
|
|
|
|
|
// Error in the font data -- sending unsupported feature notification
|
|
|
|
|
// and allow glyph path building to continue.
|
|
|
|
|
handler.send("UnsupportedFeature", {
|
|
|
|
|
featureId: UNSUPPORTED_FEATURES.errorFontBuildPath,
|
|
|
|
|
});
|
|
|
|
|
warn(`buildFontPaths - ignoring ${glyphName} glyph: "${reason}".`);
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
throw reason;
|
Fallback to the built-in font renderer when font loading fails
After PR 9340 all glyphs are now re-mapped to a Private Use Area (PUA) which means that if a font fails to load, for whatever reason[1], all glyphs in the font will now render as Unicode glyph outlines.
This obviously doesn't look good, to say the least, and might be seen as a "regression" since previously many glyphs were left in their original positions which provided a slightly better fallback[2].
Hence this patch, which implements a *general* fallback to the PDF.js built-in font renderer for fonts that fail to load (i.e. are rejected by the sanitizer). One caveat here is that this only works for the Font Loading API, since it's easy to handle errors in that case[3].
The solution implemented in this patch does *not* in any way delay the loading of valid fonts, which was the problem with my previous attempt at a solution, and will only require a bit of extra work/waiting for those fonts that actually fail to load.
*Please note:* This patch doesn't fix any of the underlying PDF.js font conversion bugs that's responsible for creating corrupt font files, however it does *improve* rendering in a number of cases; refer to this possibly incomplete list:
[Bug 1524888](https://bugzilla.mozilla.org/show_bug.cgi?id=1524888)
Issue 10175
Issue 10232
---
[1] Usually because the PDF.js font conversion code wasn't able to parse the font file correctly.
[2] Glyphs fell back to some default font, which while not accurate was more useful than the current state.
[3] Furthermore I'm not sure how to implement this generally, assuming that's even possible, and don't really have time/interest to look into it either.
2019-02-11 08:47:56 +09:00
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
for (const glyph of glyphs) {
|
|
|
|
|
buildPath(glyph.fontChar);
|
|
|
|
|
|
|
|
|
|
// If the glyph has an accent we need to build a path for its
|
|
|
|
|
// fontChar too, otherwise CanvasGraphics_paintChar will fail.
|
|
|
|
|
const accent = glyph.accent;
|
|
|
|
|
if (accent && accent.fontChar) {
|
|
|
|
|
buildPath(accent.fontChar);
|
|
|
|
|
}
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
Fallback to the built-in font renderer when font loading fails
After PR 9340 all glyphs are now re-mapped to a Private Use Area (PUA) which means that if a font fails to load, for whatever reason[1], all glyphs in the font will now render as Unicode glyph outlines.
This obviously doesn't look good, to say the least, and might be seen as a "regression" since previously many glyphs were left in their original positions which provided a slightly better fallback[2].
Hence this patch, which implements a *general* fallback to the PDF.js built-in font renderer for fonts that fail to load (i.e. are rejected by the sanitizer). One caveat here is that this only works for the Font Loading API, since it's easy to handle errors in that case[3].
The solution implemented in this patch does *not* in any way delay the loading of valid fonts, which was the problem with my previous attempt at a solution, and will only require a bit of extra work/waiting for those fonts that actually fail to load.
*Please note:* This patch doesn't fix any of the underlying PDF.js font conversion bugs that's responsible for creating corrupt font files, however it does *improve* rendering in a number of cases; refer to this possibly incomplete list:
[Bug 1524888](https://bugzilla.mozilla.org/show_bug.cgi?id=1524888)
Issue 10175
Issue 10232
---
[1] Usually because the PDF.js font conversion code wasn't able to parse the font file correctly.
[2] Glyphs fell back to some default font, which while not accurate was more useful than the current state.
[3] Furthermore I'm not sure how to implement this generally, assuming that's even possible, and don't really have time/interest to look into it either.
2019-02-11 08:47:56 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
static get fallbackFontDict() {
|
2020-01-16 23:08:25 +09:00
|
|
|
|
const dict = new Dict();
|
|
|
|
|
dict.set("BaseFont", Name.get("PDFJS-FallbackFont"));
|
|
|
|
|
dict.set("Type", Name.get("FallbackType"));
|
|
|
|
|
dict.set("Subtype", Name.get("FallbackType"));
|
|
|
|
|
dict.set("Encoding", Name.get("WinAnsiEncoding"));
|
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
return shadow(this, "fallbackFontDict", dict);
|
|
|
|
|
}
|
|
|
|
|
}
|
2011-10-25 08:55:23 +09:00
|
|
|
|
|
2020-04-03 17:19:02 +09:00
|
|
|
|
class TranslatedFont {
|
2021-05-16 01:21:18 +09:00
|
|
|
|
constructor({ loadedName, font, dict, evaluatorOptions }) {
|
2014-05-20 06:27:54 +09:00
|
|
|
|
this.loadedName = loadedName;
|
|
|
|
|
this.font = font;
|
|
|
|
|
this.dict = dict;
|
2021-05-16 01:21:18 +09:00
|
|
|
|
this._evaluatorOptions = evaluatorOptions || DefaultPartialEvaluatorOptions;
|
2014-05-20 06:27:54 +09:00
|
|
|
|
this.type3Loaded = null;
|
Improve how Type3-fonts with dependencies are handled
While the `CharProcs` streams of Type3-fonts *usually* don't rely on dependencies, such as e.g. images, it does happen in some cases.
Currently any dependencies are simply appended to the parent operatorList, which in practice means *only* the operatorList of the *first* page where the Type3-font is being used.
However, there's one thing that's slightly unfortunate with that approach: Since fonts are global to the PDF document, we really ought to ensure that any Type3 dependencies are appended to the operatorList of *all* pages where the Type3-font is being used. Otherwise there's a theoretical risk that, if one page has its rendering paused, another page may try to use a Type3-font whose dependencies are not yet fully resolved. In that case there would be errors, since Type3 operatorLists are executed synchronously.
Hence this patch, which ensures that all relevant pages will have Type3 dependencies appended to the main operatorList. (Note here that the `OperatorList.addDependencies` method, via `OperatorList.addDependency`, ensures that a dependency is only added *once* to any operatorList.)
Finally, these changes also remove the need for the "waiting for the main-thread"-hack that was added to `PartialEvaluator.buildPaintImageXObject` as part of fixing issue 10717.
2020-07-26 19:23:28 +09:00
|
|
|
|
this.type3Dependencies = font.isType3Font ? new Set() : null;
|
2014-05-20 06:27:54 +09:00
|
|
|
|
this.sent = false;
|
|
|
|
|
}
|
Fallback to the built-in font renderer when font loading fails
After PR 9340 all glyphs are now re-mapped to a Private Use Area (PUA) which means that if a font fails to load, for whatever reason[1], all glyphs in the font will now render as Unicode glyph outlines.
This obviously doesn't look good, to say the least, and might be seen as a "regression" since previously many glyphs were left in their original positions which provided a slightly better fallback[2].
Hence this patch, which implements a *general* fallback to the PDF.js built-in font renderer for fonts that fail to load (i.e. are rejected by the sanitizer). One caveat here is that this only works for the Font Loading API, since it's easy to handle errors in that case[3].
The solution implemented in this patch does *not* in any way delay the loading of valid fonts, which was the problem with my previous attempt at a solution, and will only require a bit of extra work/waiting for those fonts that actually fail to load.
*Please note:* This patch doesn't fix any of the underlying PDF.js font conversion bugs that's responsible for creating corrupt font files, however it does *improve* rendering in a number of cases; refer to this possibly incomplete list:
[Bug 1524888](https://bugzilla.mozilla.org/show_bug.cgi?id=1524888)
Issue 10175
Issue 10232
---
[1] Usually because the PDF.js font conversion code wasn't able to parse the font file correctly.
[2] Glyphs fell back to some default font, which while not accurate was more useful than the current state.
[3] Furthermore I'm not sure how to implement this generally, assuming that's even possible, and don't really have time/interest to look into it either.
2019-02-11 08:47:56 +09:00
|
|
|
|
|
2020-04-03 17:19:02 +09:00
|
|
|
|
send(handler) {
|
|
|
|
|
if (this.sent) {
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
this.sent = true;
|
Fallback to the built-in font renderer when font loading fails
After PR 9340 all glyphs are now re-mapped to a Private Use Area (PUA) which means that if a font fails to load, for whatever reason[1], all glyphs in the font will now render as Unicode glyph outlines.
This obviously doesn't look good, to say the least, and might be seen as a "regression" since previously many glyphs were left in their original positions which provided a slightly better fallback[2].
Hence this patch, which implements a *general* fallback to the PDF.js built-in font renderer for fonts that fail to load (i.e. are rejected by the sanitizer). One caveat here is that this only works for the Font Loading API, since it's easy to handle errors in that case[3].
The solution implemented in this patch does *not* in any way delay the loading of valid fonts, which was the problem with my previous attempt at a solution, and will only require a bit of extra work/waiting for those fonts that actually fail to load.
*Please note:* This patch doesn't fix any of the underlying PDF.js font conversion bugs that's responsible for creating corrupt font files, however it does *improve* rendering in a number of cases; refer to this possibly incomplete list:
[Bug 1524888](https://bugzilla.mozilla.org/show_bug.cgi?id=1524888)
Issue 10175
Issue 10232
---
[1] Usually because the PDF.js font conversion code wasn't able to parse the font file correctly.
[2] Glyphs fell back to some default font, which while not accurate was more useful than the current state.
[3] Furthermore I'm not sure how to implement this generally, assuming that's even possible, and don't really have time/interest to look into it either.
2019-02-11 08:47:56 +09:00
|
|
|
|
|
2020-04-03 17:19:02 +09:00
|
|
|
|
handler.send("commonobj", [
|
|
|
|
|
this.loadedName,
|
|
|
|
|
"Font",
|
2021-05-16 01:21:18 +09:00
|
|
|
|
this.font.exportData(this._evaluatorOptions.fontExtraProperties),
|
2020-04-03 17:19:02 +09:00
|
|
|
|
]);
|
|
|
|
|
}
|
Fallback to the built-in font renderer when font loading fails
After PR 9340 all glyphs are now re-mapped to a Private Use Area (PUA) which means that if a font fails to load, for whatever reason[1], all glyphs in the font will now render as Unicode glyph outlines.
This obviously doesn't look good, to say the least, and might be seen as a "regression" since previously many glyphs were left in their original positions which provided a slightly better fallback[2].
Hence this patch, which implements a *general* fallback to the PDF.js built-in font renderer for fonts that fail to load (i.e. are rejected by the sanitizer). One caveat here is that this only works for the Font Loading API, since it's easy to handle errors in that case[3].
The solution implemented in this patch does *not* in any way delay the loading of valid fonts, which was the problem with my previous attempt at a solution, and will only require a bit of extra work/waiting for those fonts that actually fail to load.
*Please note:* This patch doesn't fix any of the underlying PDF.js font conversion bugs that's responsible for creating corrupt font files, however it does *improve* rendering in a number of cases; refer to this possibly incomplete list:
[Bug 1524888](https://bugzilla.mozilla.org/show_bug.cgi?id=1524888)
Issue 10175
Issue 10232
---
[1] Usually because the PDF.js font conversion code wasn't able to parse the font file correctly.
[2] Glyphs fell back to some default font, which while not accurate was more useful than the current state.
[3] Furthermore I'm not sure how to implement this generally, assuming that's even possible, and don't really have time/interest to look into it either.
2019-02-11 08:47:56 +09:00
|
|
|
|
|
2020-04-03 17:19:02 +09:00
|
|
|
|
fallback(handler) {
|
|
|
|
|
if (!this.font.data) {
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
// When font loading failed, fall back to the built-in font renderer.
|
|
|
|
|
this.font.disableFontFace = true;
|
|
|
|
|
// An arbitrary number of text rendering operators could have been
|
|
|
|
|
// encountered between the point in time when the 'Font' message was sent
|
|
|
|
|
// to the main-thread, and the point in time when the 'FontFallback'
|
|
|
|
|
// message was received on the worker-thread.
|
|
|
|
|
// To ensure that all 'FontPath's are available on the main-thread, when
|
|
|
|
|
// font loading failed, attempt to resend *all* previously parsed glyphs.
|
2021-05-16 01:21:18 +09:00
|
|
|
|
PartialEvaluator.buildFontPaths(
|
|
|
|
|
this.font,
|
|
|
|
|
/* glyphs = */ this.font.glyphCacheValues,
|
|
|
|
|
handler,
|
|
|
|
|
this._evaluatorOptions
|
|
|
|
|
);
|
2020-04-03 17:19:02 +09:00
|
|
|
|
}
|
2014-05-20 06:27:54 +09:00
|
|
|
|
|
Improve how Type3-fonts with dependencies are handled
While the `CharProcs` streams of Type3-fonts *usually* don't rely on dependencies, such as e.g. images, it does happen in some cases.
Currently any dependencies are simply appended to the parent operatorList, which in practice means *only* the operatorList of the *first* page where the Type3-font is being used.
However, there's one thing that's slightly unfortunate with that approach: Since fonts are global to the PDF document, we really ought to ensure that any Type3 dependencies are appended to the operatorList of *all* pages where the Type3-font is being used. Otherwise there's a theoretical risk that, if one page has its rendering paused, another page may try to use a Type3-font whose dependencies are not yet fully resolved. In that case there would be errors, since Type3 operatorLists are executed synchronously.
Hence this patch, which ensures that all relevant pages will have Type3 dependencies appended to the main operatorList. (Note here that the `OperatorList.addDependencies` method, via `OperatorList.addDependency`, ensures that a dependency is only added *once* to any operatorList.)
Finally, these changes also remove the need for the "waiting for the main-thread"-hack that was added to `PartialEvaluator.buildPaintImageXObject` as part of fixing issue 10717.
2020-07-26 19:23:28 +09:00
|
|
|
|
loadType3Data(evaluator, resources, task) {
|
2020-04-03 17:19:02 +09:00
|
|
|
|
if (this.type3Loaded) {
|
2014-05-20 06:27:54 +09:00
|
|
|
|
return this.type3Loaded;
|
2020-04-03 17:19:02 +09:00
|
|
|
|
}
|
2020-07-27 20:00:24 +09:00
|
|
|
|
if (!this.font.isType3Font) {
|
|
|
|
|
throw new Error("Must be a Type3 font.");
|
|
|
|
|
}
|
2020-04-03 17:19:02 +09:00
|
|
|
|
// When parsing Type3 glyphs, always ignore them if there are errors.
|
|
|
|
|
// Compared to the parsing of e.g. an entire page, it doesn't really
|
|
|
|
|
// make sense to only be able to render a Type3 glyph partially.
|
2021-05-31 19:13:20 +09:00
|
|
|
|
const type3Evaluator = evaluator.clone({ ignoreErrors: false });
|
2020-04-03 17:19:02 +09:00
|
|
|
|
type3Evaluator.parsingType3Font = true;
|
2022-01-14 01:36:36 +09:00
|
|
|
|
// Prevent circular references in Type3 fonts.
|
|
|
|
|
const type3FontRefs = new RefSet(evaluator.type3FontRefs);
|
|
|
|
|
if (this.dict.objId && !type3FontRefs.has(this.dict.objId)) {
|
|
|
|
|
type3FontRefs.put(this.dict.objId);
|
|
|
|
|
}
|
|
|
|
|
type3Evaluator.type3FontRefs = type3FontRefs;
|
2020-04-03 17:19:02 +09:00
|
|
|
|
|
Improve how Type3-fonts with dependencies are handled
While the `CharProcs` streams of Type3-fonts *usually* don't rely on dependencies, such as e.g. images, it does happen in some cases.
Currently any dependencies are simply appended to the parent operatorList, which in practice means *only* the operatorList of the *first* page where the Type3-font is being used.
However, there's one thing that's slightly unfortunate with that approach: Since fonts are global to the PDF document, we really ought to ensure that any Type3 dependencies are appended to the operatorList of *all* pages where the Type3-font is being used. Otherwise there's a theoretical risk that, if one page has its rendering paused, another page may try to use a Type3-font whose dependencies are not yet fully resolved. In that case there would be errors, since Type3 operatorLists are executed synchronously.
Hence this patch, which ensures that all relevant pages will have Type3 dependencies appended to the main operatorList. (Note here that the `OperatorList.addDependencies` method, via `OperatorList.addDependency`, ensures that a dependency is only added *once* to any operatorList.)
Finally, these changes also remove the need for the "waiting for the main-thread"-hack that was added to `PartialEvaluator.buildPaintImageXObject` as part of fixing issue 10717.
2020-07-26 19:23:28 +09:00
|
|
|
|
const translatedFont = this.font,
|
|
|
|
|
type3Dependencies = this.type3Dependencies;
|
2021-05-06 16:39:21 +09:00
|
|
|
|
let loadCharProcsPromise = Promise.resolve();
|
|
|
|
|
const charProcs = this.dict.get("CharProcs");
|
|
|
|
|
const fontResources = this.dict.get("Resources") || resources;
|
|
|
|
|
const charProcOperatorList = Object.create(null);
|
2020-04-03 17:19:02 +09:00
|
|
|
|
|
2021-05-30 01:06:49 +09:00
|
|
|
|
const isEmptyBBox =
|
|
|
|
|
!translatedFont.bbox || isArrayEqual(translatedFont.bbox, [0, 0, 0, 0]);
|
|
|
|
|
|
Improve how Type3-fonts with dependencies are handled
While the `CharProcs` streams of Type3-fonts *usually* don't rely on dependencies, such as e.g. images, it does happen in some cases.
Currently any dependencies are simply appended to the parent operatorList, which in practice means *only* the operatorList of the *first* page where the Type3-font is being used.
However, there's one thing that's slightly unfortunate with that approach: Since fonts are global to the PDF document, we really ought to ensure that any Type3 dependencies are appended to the operatorList of *all* pages where the Type3-font is being used. Otherwise there's a theoretical risk that, if one page has its rendering paused, another page may try to use a Type3-font whose dependencies are not yet fully resolved. In that case there would be errors, since Type3 operatorLists are executed synchronously.
Hence this patch, which ensures that all relevant pages will have Type3 dependencies appended to the main operatorList. (Note here that the `OperatorList.addDependencies` method, via `OperatorList.addDependency`, ensures that a dependency is only added *once* to any operatorList.)
Finally, these changes also remove the need for the "waiting for the main-thread"-hack that was added to `PartialEvaluator.buildPaintImageXObject` as part of fixing issue 10717.
2020-07-26 19:23:28 +09:00
|
|
|
|
for (const key of charProcs.getKeys()) {
|
2020-12-10 22:22:05 +09:00
|
|
|
|
loadCharProcsPromise = loadCharProcsPromise.then(() => {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const glyphStream = charProcs.get(key);
|
|
|
|
|
const operatorList = new OperatorList();
|
2020-04-03 17:19:02 +09:00
|
|
|
|
return type3Evaluator
|
|
|
|
|
.getOperatorList({
|
|
|
|
|
stream: glyphStream,
|
|
|
|
|
task,
|
|
|
|
|
resources: fontResources,
|
|
|
|
|
operatorList,
|
|
|
|
|
})
|
2020-12-10 22:22:05 +09:00
|
|
|
|
.then(() => {
|
|
|
|
|
// According to the PDF specification, section "9.6.5 Type 3 Fonts"
|
|
|
|
|
// and "Table 113":
|
|
|
|
|
// "A glyph description that begins with the d1 operator should
|
|
|
|
|
// not execute any operators that set the colour (or other
|
|
|
|
|
// colour-related parameters) in the graphics state;
|
|
|
|
|
// any use of such operators shall be ignored."
|
|
|
|
|
if (operatorList.fnArray[0] === OPS.setCharWidthAndBounds) {
|
2021-05-30 01:06:49 +09:00
|
|
|
|
this._removeType3ColorOperators(operatorList, isEmptyBBox);
|
2020-12-10 22:22:05 +09:00
|
|
|
|
}
|
2020-04-03 17:19:02 +09:00
|
|
|
|
charProcOperatorList[key] = operatorList.getIR();
|
|
|
|
|
|
Improve how Type3-fonts with dependencies are handled
While the `CharProcs` streams of Type3-fonts *usually* don't rely on dependencies, such as e.g. images, it does happen in some cases.
Currently any dependencies are simply appended to the parent operatorList, which in practice means *only* the operatorList of the *first* page where the Type3-font is being used.
However, there's one thing that's slightly unfortunate with that approach: Since fonts are global to the PDF document, we really ought to ensure that any Type3 dependencies are appended to the operatorList of *all* pages where the Type3-font is being used. Otherwise there's a theoretical risk that, if one page has its rendering paused, another page may try to use a Type3-font whose dependencies are not yet fully resolved. In that case there would be errors, since Type3 operatorLists are executed synchronously.
Hence this patch, which ensures that all relevant pages will have Type3 dependencies appended to the main operatorList. (Note here that the `OperatorList.addDependencies` method, via `OperatorList.addDependency`, ensures that a dependency is only added *once* to any operatorList.)
Finally, these changes also remove the need for the "waiting for the main-thread"-hack that was added to `PartialEvaluator.buildPaintImageXObject` as part of fixing issue 10717.
2020-07-26 19:23:28 +09:00
|
|
|
|
for (const dependency of operatorList.dependencies) {
|
|
|
|
|
type3Dependencies.add(dependency);
|
|
|
|
|
}
|
2020-04-03 17:19:02 +09:00
|
|
|
|
})
|
2020-04-14 19:28:14 +09:00
|
|
|
|
.catch(function (reason) {
|
2020-04-03 17:19:02 +09:00
|
|
|
|
warn(`Type3 font resource "${key}" is not available.`);
|
|
|
|
|
const dummyOperatorList = new OperatorList();
|
|
|
|
|
charProcOperatorList[key] = dummyOperatorList.getIR();
|
|
|
|
|
});
|
|
|
|
|
});
|
|
|
|
|
}
|
2021-05-30 01:06:49 +09:00
|
|
|
|
this.type3Loaded = loadCharProcsPromise.then(() => {
|
2020-04-03 17:19:02 +09:00
|
|
|
|
translatedFont.charProcOperatorList = charProcOperatorList;
|
2021-05-30 01:06:49 +09:00
|
|
|
|
if (this._bbox) {
|
|
|
|
|
translatedFont.isCharBBox = true;
|
|
|
|
|
translatedFont.bbox = this._bbox;
|
|
|
|
|
}
|
2020-04-03 17:19:02 +09:00
|
|
|
|
});
|
|
|
|
|
return this.type3Loaded;
|
|
|
|
|
}
|
2020-12-10 22:22:05 +09:00
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* @private
|
|
|
|
|
*/
|
2021-05-30 01:06:49 +09:00
|
|
|
|
_removeType3ColorOperators(operatorList, isEmptyBBox = false) {
|
2020-12-10 22:22:05 +09:00
|
|
|
|
if (
|
|
|
|
|
typeof PDFJSDev === "undefined" ||
|
|
|
|
|
PDFJSDev.test("!PRODUCTION || TESTING")
|
|
|
|
|
) {
|
|
|
|
|
assert(
|
|
|
|
|
operatorList.fnArray[0] === OPS.setCharWidthAndBounds,
|
|
|
|
|
"Type3 glyph shall start with the d1 operator."
|
|
|
|
|
);
|
|
|
|
|
}
|
2021-05-30 01:06:49 +09:00
|
|
|
|
if (isEmptyBBox) {
|
|
|
|
|
if (!this._bbox) {
|
|
|
|
|
this._bbox = [Infinity, Infinity, -Infinity, -Infinity];
|
|
|
|
|
}
|
|
|
|
|
const charBBox = Util.normalizeRect(operatorList.argsArray[0].slice(2));
|
|
|
|
|
|
|
|
|
|
this._bbox[0] = Math.min(this._bbox[0], charBBox[0]);
|
|
|
|
|
this._bbox[1] = Math.min(this._bbox[1], charBBox[1]);
|
|
|
|
|
this._bbox[2] = Math.max(this._bbox[2], charBBox[2]);
|
|
|
|
|
this._bbox[3] = Math.max(this._bbox[3], charBBox[3]);
|
|
|
|
|
}
|
2020-12-10 22:22:05 +09:00
|
|
|
|
let i = 1,
|
|
|
|
|
ii = operatorList.length;
|
|
|
|
|
while (i < ii) {
|
|
|
|
|
switch (operatorList.fnArray[i]) {
|
|
|
|
|
case OPS.setStrokeColorSpace:
|
|
|
|
|
case OPS.setFillColorSpace:
|
|
|
|
|
case OPS.setStrokeColor:
|
|
|
|
|
case OPS.setStrokeColorN:
|
|
|
|
|
case OPS.setFillColor:
|
|
|
|
|
case OPS.setFillColorN:
|
|
|
|
|
case OPS.setStrokeGray:
|
|
|
|
|
case OPS.setFillGray:
|
|
|
|
|
case OPS.setStrokeRGBColor:
|
|
|
|
|
case OPS.setFillRGBColor:
|
|
|
|
|
case OPS.setStrokeCMYKColor:
|
|
|
|
|
case OPS.setFillCMYKColor:
|
|
|
|
|
case OPS.shadingFill:
|
|
|
|
|
case OPS.setRenderingIntent:
|
|
|
|
|
operatorList.fnArray.splice(i, 1);
|
|
|
|
|
operatorList.argsArray.splice(i, 1);
|
|
|
|
|
ii--;
|
|
|
|
|
continue;
|
|
|
|
|
|
|
|
|
|
case OPS.setGState:
|
2021-01-22 20:27:38 +09:00
|
|
|
|
const [gStateObj] = operatorList.argsArray[i];
|
2020-12-10 22:22:05 +09:00
|
|
|
|
let j = 0,
|
|
|
|
|
jj = gStateObj.length;
|
|
|
|
|
while (j < jj) {
|
|
|
|
|
const [gStateKey] = gStateObj[j];
|
|
|
|
|
switch (gStateKey) {
|
|
|
|
|
case "TR":
|
|
|
|
|
case "TR2":
|
|
|
|
|
case "HT":
|
|
|
|
|
case "BG":
|
|
|
|
|
case "BG2":
|
|
|
|
|
case "UCR":
|
|
|
|
|
case "UCR2":
|
|
|
|
|
gStateObj.splice(j, 1);
|
|
|
|
|
jj--;
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
j++;
|
|
|
|
|
}
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
i++;
|
|
|
|
|
}
|
|
|
|
|
}
|
2020-04-03 17:19:02 +09:00
|
|
|
|
}
|
2014-05-20 06:27:54 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
class StateManager {
|
2021-01-22 04:15:31 +09:00
|
|
|
|
constructor(initialState = new EvalState()) {
|
2014-04-10 08:44:07 +09:00
|
|
|
|
this.state = initialState;
|
|
|
|
|
this.stateStack = [];
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
|
|
|
|
|
save() {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const old = this.state;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
this.stateStack.push(this.state);
|
|
|
|
|
this.state = old.clone();
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
restore() {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const prev = this.stateStack.pop();
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (prev) {
|
|
|
|
|
this.state = prev;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
transform(args) {
|
|
|
|
|
this.state.ctm = Util.transform(this.state.ctm, args);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
class TextState {
|
|
|
|
|
constructor() {
|
2014-04-10 08:44:07 +09:00
|
|
|
|
this.ctm = new Float32Array(IDENTITY_MATRIX);
|
2016-06-01 06:01:35 +09:00
|
|
|
|
this.fontName = null;
|
2013-09-15 02:58:58 +09:00
|
|
|
|
this.fontSize = 0;
|
2014-04-10 08:44:07 +09:00
|
|
|
|
this.font = null;
|
|
|
|
|
this.fontMatrix = FONT_IDENTITY_MATRIX;
|
|
|
|
|
this.textMatrix = IDENTITY_MATRIX.slice();
|
|
|
|
|
this.textLineMatrix = IDENTITY_MATRIX.slice();
|
|
|
|
|
this.charSpacing = 0;
|
|
|
|
|
this.wordSpacing = 0;
|
2013-09-15 02:58:58 +09:00
|
|
|
|
this.leading = 0;
|
|
|
|
|
this.textHScale = 1;
|
|
|
|
|
this.textRise = 0;
|
|
|
|
|
}
|
2014-03-23 03:15:51 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
setTextMatrix(a, b, c, d, e, f) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const m = this.textMatrix;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
m[0] = a;
|
|
|
|
|
m[1] = b;
|
|
|
|
|
m[2] = c;
|
|
|
|
|
m[3] = d;
|
|
|
|
|
m[4] = e;
|
|
|
|
|
m[5] = f;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
setTextLineMatrix(a, b, c, d, e, f) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const m = this.textLineMatrix;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
m[0] = a;
|
|
|
|
|
m[1] = b;
|
|
|
|
|
m[2] = c;
|
|
|
|
|
m[3] = d;
|
|
|
|
|
m[4] = e;
|
|
|
|
|
m[5] = f;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
translateTextMatrix(x, y) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const m = this.textMatrix;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
m[4] = m[0] * x + m[2] * y + m[4];
|
|
|
|
|
m[5] = m[1] * x + m[3] * y + m[5];
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
translateTextLineMatrix(x, y) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const m = this.textLineMatrix;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
m[4] = m[0] * x + m[2] * y + m[4];
|
|
|
|
|
m[5] = m[1] * x + m[3] * y + m[5];
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
carriageReturn() {
|
|
|
|
|
this.translateTextLineMatrix(0, -this.leading);
|
|
|
|
|
this.textMatrix = this.textLineMatrix.slice();
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
clone() {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const clone = Object.create(this);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
clone.textMatrix = this.textMatrix.slice();
|
|
|
|
|
clone.textLineMatrix = this.textLineMatrix.slice();
|
|
|
|
|
clone.fontMatrix = this.fontMatrix.slice();
|
|
|
|
|
return clone;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
class EvalState {
|
|
|
|
|
constructor() {
|
2014-04-10 08:44:07 +09:00
|
|
|
|
this.ctm = new Float32Array(IDENTITY_MATRIX);
|
2013-08-01 06:01:55 +09:00
|
|
|
|
this.font = null;
|
2013-08-20 08:33:20 +09:00
|
|
|
|
this.textRenderingMode = TextRenderingMode.FILL;
|
2014-05-22 02:47:42 +09:00
|
|
|
|
this.fillColorSpace = ColorSpace.singletons.gray;
|
|
|
|
|
this.strokeColorSpace = ColorSpace.singletons.gray;
|
2011-10-25 08:55:23 +09:00
|
|
|
|
}
|
2011-10-28 03:51:10 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
clone() {
|
|
|
|
|
return Object.create(this);
|
|
|
|
|
}
|
|
|
|
|
}
|
2014-01-17 22:16:52 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
class EvaluatorPreprocessor {
|
|
|
|
|
static get opMap() {
|
|
|
|
|
// Specifies properties for each command
|
|
|
|
|
//
|
|
|
|
|
// If variableArgs === true: [0, `numArgs`] expected
|
|
|
|
|
// If variableArgs === false: exactly `numArgs` expected
|
|
|
|
|
const getOPMap = getLookupTableFactory(function (t) {
|
|
|
|
|
// Graphic state
|
|
|
|
|
t.w = { id: OPS.setLineWidth, numArgs: 1, variableArgs: false };
|
|
|
|
|
t.J = { id: OPS.setLineCap, numArgs: 1, variableArgs: false };
|
|
|
|
|
t.j = { id: OPS.setLineJoin, numArgs: 1, variableArgs: false };
|
|
|
|
|
t.M = { id: OPS.setMiterLimit, numArgs: 1, variableArgs: false };
|
|
|
|
|
t.d = { id: OPS.setDash, numArgs: 2, variableArgs: false };
|
|
|
|
|
t.ri = { id: OPS.setRenderingIntent, numArgs: 1, variableArgs: false };
|
|
|
|
|
t.i = { id: OPS.setFlatness, numArgs: 1, variableArgs: false };
|
|
|
|
|
t.gs = { id: OPS.setGState, numArgs: 1, variableArgs: false };
|
|
|
|
|
t.q = { id: OPS.save, numArgs: 0, variableArgs: false };
|
|
|
|
|
t.Q = { id: OPS.restore, numArgs: 0, variableArgs: false };
|
|
|
|
|
t.cm = { id: OPS.transform, numArgs: 6, variableArgs: false };
|
|
|
|
|
|
|
|
|
|
// Path
|
|
|
|
|
t.m = { id: OPS.moveTo, numArgs: 2, variableArgs: false };
|
|
|
|
|
t.l = { id: OPS.lineTo, numArgs: 2, variableArgs: false };
|
|
|
|
|
t.c = { id: OPS.curveTo, numArgs: 6, variableArgs: false };
|
|
|
|
|
t.v = { id: OPS.curveTo2, numArgs: 4, variableArgs: false };
|
|
|
|
|
t.y = { id: OPS.curveTo3, numArgs: 4, variableArgs: false };
|
|
|
|
|
t.h = { id: OPS.closePath, numArgs: 0, variableArgs: false };
|
|
|
|
|
t.re = { id: OPS.rectangle, numArgs: 4, variableArgs: false };
|
|
|
|
|
t.S = { id: OPS.stroke, numArgs: 0, variableArgs: false };
|
|
|
|
|
t.s = { id: OPS.closeStroke, numArgs: 0, variableArgs: false };
|
|
|
|
|
t.f = { id: OPS.fill, numArgs: 0, variableArgs: false };
|
|
|
|
|
t.F = { id: OPS.fill, numArgs: 0, variableArgs: false };
|
|
|
|
|
t["f*"] = { id: OPS.eoFill, numArgs: 0, variableArgs: false };
|
|
|
|
|
t.B = { id: OPS.fillStroke, numArgs: 0, variableArgs: false };
|
|
|
|
|
t["B*"] = { id: OPS.eoFillStroke, numArgs: 0, variableArgs: false };
|
|
|
|
|
t.b = { id: OPS.closeFillStroke, numArgs: 0, variableArgs: false };
|
|
|
|
|
t["b*"] = { id: OPS.closeEOFillStroke, numArgs: 0, variableArgs: false };
|
|
|
|
|
t.n = { id: OPS.endPath, numArgs: 0, variableArgs: false };
|
|
|
|
|
|
|
|
|
|
// Clipping
|
|
|
|
|
t.W = { id: OPS.clip, numArgs: 0, variableArgs: false };
|
|
|
|
|
t["W*"] = { id: OPS.eoClip, numArgs: 0, variableArgs: false };
|
|
|
|
|
|
|
|
|
|
// Text
|
|
|
|
|
t.BT = { id: OPS.beginText, numArgs: 0, variableArgs: false };
|
|
|
|
|
t.ET = { id: OPS.endText, numArgs: 0, variableArgs: false };
|
|
|
|
|
t.Tc = { id: OPS.setCharSpacing, numArgs: 1, variableArgs: false };
|
|
|
|
|
t.Tw = { id: OPS.setWordSpacing, numArgs: 1, variableArgs: false };
|
|
|
|
|
t.Tz = { id: OPS.setHScale, numArgs: 1, variableArgs: false };
|
|
|
|
|
t.TL = { id: OPS.setLeading, numArgs: 1, variableArgs: false };
|
|
|
|
|
t.Tf = { id: OPS.setFont, numArgs: 2, variableArgs: false };
|
|
|
|
|
t.Tr = { id: OPS.setTextRenderingMode, numArgs: 1, variableArgs: false };
|
|
|
|
|
t.Ts = { id: OPS.setTextRise, numArgs: 1, variableArgs: false };
|
|
|
|
|
t.Td = { id: OPS.moveText, numArgs: 2, variableArgs: false };
|
|
|
|
|
t.TD = { id: OPS.setLeadingMoveText, numArgs: 2, variableArgs: false };
|
|
|
|
|
t.Tm = { id: OPS.setTextMatrix, numArgs: 6, variableArgs: false };
|
|
|
|
|
t["T*"] = { id: OPS.nextLine, numArgs: 0, variableArgs: false };
|
|
|
|
|
t.Tj = { id: OPS.showText, numArgs: 1, variableArgs: false };
|
|
|
|
|
t.TJ = { id: OPS.showSpacedText, numArgs: 1, variableArgs: false };
|
|
|
|
|
t["'"] = { id: OPS.nextLineShowText, numArgs: 1, variableArgs: false };
|
|
|
|
|
t['"'] = {
|
|
|
|
|
id: OPS.nextLineSetSpacingShowText,
|
|
|
|
|
numArgs: 3,
|
|
|
|
|
variableArgs: false,
|
|
|
|
|
};
|
2014-01-17 22:16:52 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// Type3 fonts
|
|
|
|
|
t.d0 = { id: OPS.setCharWidth, numArgs: 2, variableArgs: false };
|
|
|
|
|
t.d1 = {
|
|
|
|
|
id: OPS.setCharWidthAndBounds,
|
|
|
|
|
numArgs: 6,
|
|
|
|
|
variableArgs: false,
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
// Color
|
|
|
|
|
t.CS = { id: OPS.setStrokeColorSpace, numArgs: 1, variableArgs: false };
|
|
|
|
|
t.cs = { id: OPS.setFillColorSpace, numArgs: 1, variableArgs: false };
|
|
|
|
|
t.SC = { id: OPS.setStrokeColor, numArgs: 4, variableArgs: true };
|
|
|
|
|
t.SCN = { id: OPS.setStrokeColorN, numArgs: 33, variableArgs: true };
|
|
|
|
|
t.sc = { id: OPS.setFillColor, numArgs: 4, variableArgs: true };
|
|
|
|
|
t.scn = { id: OPS.setFillColorN, numArgs: 33, variableArgs: true };
|
|
|
|
|
t.G = { id: OPS.setStrokeGray, numArgs: 1, variableArgs: false };
|
|
|
|
|
t.g = { id: OPS.setFillGray, numArgs: 1, variableArgs: false };
|
|
|
|
|
t.RG = { id: OPS.setStrokeRGBColor, numArgs: 3, variableArgs: false };
|
|
|
|
|
t.rg = { id: OPS.setFillRGBColor, numArgs: 3, variableArgs: false };
|
|
|
|
|
t.K = { id: OPS.setStrokeCMYKColor, numArgs: 4, variableArgs: false };
|
|
|
|
|
t.k = { id: OPS.setFillCMYKColor, numArgs: 4, variableArgs: false };
|
|
|
|
|
|
|
|
|
|
// Shading
|
|
|
|
|
t.sh = { id: OPS.shadingFill, numArgs: 1, variableArgs: false };
|
|
|
|
|
|
|
|
|
|
// Images
|
|
|
|
|
t.BI = { id: OPS.beginInlineImage, numArgs: 0, variableArgs: false };
|
|
|
|
|
t.ID = { id: OPS.beginImageData, numArgs: 0, variableArgs: false };
|
|
|
|
|
t.EI = { id: OPS.endInlineImage, numArgs: 1, variableArgs: false };
|
|
|
|
|
|
|
|
|
|
// XObjects
|
|
|
|
|
t.Do = { id: OPS.paintXObject, numArgs: 1, variableArgs: false };
|
|
|
|
|
t.MP = { id: OPS.markPoint, numArgs: 1, variableArgs: false };
|
|
|
|
|
t.DP = { id: OPS.markPointProps, numArgs: 2, variableArgs: false };
|
|
|
|
|
t.BMC = { id: OPS.beginMarkedContent, numArgs: 1, variableArgs: false };
|
|
|
|
|
t.BDC = {
|
|
|
|
|
id: OPS.beginMarkedContentProps,
|
|
|
|
|
numArgs: 2,
|
|
|
|
|
variableArgs: false,
|
|
|
|
|
};
|
|
|
|
|
t.EMC = { id: OPS.endMarkedContent, numArgs: 0, variableArgs: false };
|
|
|
|
|
|
|
|
|
|
// Compatibility
|
|
|
|
|
t.BX = { id: OPS.beginCompat, numArgs: 0, variableArgs: false };
|
|
|
|
|
t.EX = { id: OPS.endCompat, numArgs: 0, variableArgs: false };
|
|
|
|
|
|
|
|
|
|
// (reserved partial commands for the lexer)
|
|
|
|
|
t.BM = null;
|
|
|
|
|
t.BD = null;
|
|
|
|
|
t.true = null;
|
|
|
|
|
t.fa = null;
|
|
|
|
|
t.fal = null;
|
|
|
|
|
t.fals = null;
|
|
|
|
|
t.false = null;
|
|
|
|
|
t.nu = null;
|
|
|
|
|
t.nul = null;
|
|
|
|
|
t.null = null;
|
|
|
|
|
});
|
|
|
|
|
|
|
|
|
|
return shadow(this, "opMap", getOPMap());
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static get MAX_INVALID_PATH_OPS() {
|
2022-02-08 00:14:45 +09:00
|
|
|
|
return shadow(this, "MAX_INVALID_PATH_OPS", 10);
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
|
2021-01-22 04:15:31 +09:00
|
|
|
|
constructor(stream, xref, stateManager = new StateManager()) {
|
2016-01-22 07:43:27 +09:00
|
|
|
|
// TODO(mduan): pass array of knownCommands rather than this.opMap
|
2014-01-17 22:16:52 +09:00
|
|
|
|
// dictionary
|
2019-06-23 23:01:45 +09:00
|
|
|
|
this.parser = new Parser({
|
2020-07-05 19:20:10 +09:00
|
|
|
|
lexer: new Lexer(stream, EvaluatorPreprocessor.opMap),
|
2019-06-23 23:01:45 +09:00
|
|
|
|
xref,
|
|
|
|
|
});
|
2014-04-10 08:44:07 +09:00
|
|
|
|
this.stateManager = stateManager;
|
2014-05-15 15:07:43 +09:00
|
|
|
|
this.nonProcessedArgs = [];
|
Error, rather than warn, once a number of invalid path operators are encountered in `EvaluatorPreprocessor.read` (bug 1443140)
Incomplete path operators, in particular, can result in fairly chaotic rendering artifacts, as can be observed on page four of the referenced PDF file.
The initial (naive) solution that was attempted, was to simply throw a `FormatError` as soon as any invalid (i.e. too short) operator was found and rely on the existing `ignoreErrors` code-paths. However, doing so would have caused regressions in some files; see the existing `issue2391-1` test-case, which was promoted to an `eq` test to help prevent future bugs.
Hence this patch, which adds special handling for invalid path operators since those may cause quite bad rendering artifacts.
You could, in all fairness, argue that the patch is a handwavy solution and I wouldn't object. However, given that this only concerns *corrupt* PDF files, the way that PDF viewers (PDF.js included) try to gracefully deal with those could probably be described as a best-effort solution anyway.
This patch also adjusts the existing `warn`/`info` messages to print the command name according to the PDF specification, rather than an internal PDF.js enumeration value. The former should be much more useful for debugging purposes.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1443140.
2018-06-24 16:53:32 +09:00
|
|
|
|
this._numInvalidPathOPS = 0;
|
2014-01-17 22:16:52 +09:00
|
|
|
|
}
|
2014-03-23 03:15:51 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
get savedStatesDepth() {
|
|
|
|
|
return this.stateManager.stateStack.length;
|
|
|
|
|
}
|
2014-03-23 03:15:51 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// |operation| is an object with two fields:
|
|
|
|
|
//
|
|
|
|
|
// - |fn| is an out param.
|
|
|
|
|
//
|
|
|
|
|
// - |args| is an inout param. On entry, it should have one of two values.
|
|
|
|
|
//
|
|
|
|
|
// - An empty array. This indicates that the caller is providing the
|
|
|
|
|
// array in which the args will be stored in. The caller should use
|
|
|
|
|
// this value if it can reuse a single array for each call to read().
|
|
|
|
|
//
|
|
|
|
|
// - |null|. This indicates that the caller needs this function to create
|
|
|
|
|
// the array in which any args are stored in. If there are zero args,
|
|
|
|
|
// this function will leave |operation.args| as |null| (thus avoiding
|
|
|
|
|
// allocations that would occur if we used an empty array to represent
|
|
|
|
|
// zero arguments). Otherwise, it will replace |null| with a new array
|
|
|
|
|
// containing the arguments. The caller should use this value if it
|
|
|
|
|
// cannot reuse an array for each call to read().
|
|
|
|
|
//
|
|
|
|
|
// These two modes are present because this function is very hot and so
|
|
|
|
|
// avoiding allocations where possible is worthwhile.
|
|
|
|
|
//
|
|
|
|
|
read(operation) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
let args = operation.args;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
while (true) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const obj = this.parser.getObj();
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (obj instanceof Cmd) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const cmd = obj.cmd;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// Check that the command is valid
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const opSpec = EvaluatorPreprocessor.opMap[cmd];
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (!opSpec) {
|
|
|
|
|
warn(`Unknown command "${cmd}".`);
|
|
|
|
|
continue;
|
|
|
|
|
}
|
2014-06-19 19:47:00 +09:00
|
|
|
|
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const fn = opSpec.id;
|
|
|
|
|
const numArgs = opSpec.numArgs;
|
|
|
|
|
let argsLength = args !== null ? args.length : 0;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
|
|
|
|
|
if (!opSpec.variableArgs) {
|
|
|
|
|
// Postscript commands can be nested, e.g. /F2 /GS2 gs 5.711 Tf
|
|
|
|
|
if (argsLength !== numArgs) {
|
2021-05-06 16:39:21 +09:00
|
|
|
|
const nonProcessedArgs = this.nonProcessedArgs;
|
2020-07-05 19:20:10 +09:00
|
|
|
|
while (argsLength > numArgs) {
|
|
|
|
|
nonProcessedArgs.push(args.shift());
|
|
|
|
|
argsLength--;
|
|
|
|
|
}
|
|
|
|
|
while (argsLength < numArgs && nonProcessedArgs.length !== 0) {
|
|
|
|
|
if (args === null) {
|
|
|
|
|
args = [];
|
2014-06-19 19:47:00 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
args.unshift(nonProcessedArgs.pop());
|
|
|
|
|
argsLength++;
|
2014-06-19 19:47:00 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
2014-06-19 19:47:00 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
if (argsLength < numArgs) {
|
|
|
|
|
const partialMsg =
|
|
|
|
|
`command ${cmd}: expected ${numArgs} args, ` +
|
|
|
|
|
`but received ${argsLength} args.`;
|
|
|
|
|
|
|
|
|
|
// Incomplete path operators, in particular, can result in fairly
|
|
|
|
|
// chaotic rendering artifacts. Hence the following heuristics is
|
|
|
|
|
// used to error, rather than just warn, once a number of invalid
|
|
|
|
|
// path operators have been encountered (fixes bug1443140.pdf).
|
|
|
|
|
if (
|
|
|
|
|
fn >= OPS.moveTo &&
|
|
|
|
|
fn <= OPS.endPath && // Path operator
|
|
|
|
|
++this._numInvalidPathOPS >
|
|
|
|
|
EvaluatorPreprocessor.MAX_INVALID_PATH_OPS
|
|
|
|
|
) {
|
|
|
|
|
throw new FormatError(`Invalid ${partialMsg}`);
|
2014-06-19 19:47:00 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// If we receive too few arguments, it's not possible to execute
|
|
|
|
|
// the command, hence we skip the command.
|
|
|
|
|
warn(`Skipping ${partialMsg}`);
|
|
|
|
|
if (args !== null) {
|
|
|
|
|
args.length = 0;
|
|
|
|
|
}
|
|
|
|
|
continue;
|
2014-06-19 19:47:00 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
} else if (argsLength > numArgs) {
|
|
|
|
|
info(
|
|
|
|
|
`Command ${cmd}: expected [0, ${numArgs}] args, ` +
|
|
|
|
|
`but received ${argsLength} args.`
|
|
|
|
|
);
|
|
|
|
|
}
|
2014-06-19 19:47:00 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
// TODO figure out how to type-check vararg functions
|
|
|
|
|
this.preprocessCommand(fn, args);
|
2014-06-19 19:47:00 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
operation.fn = fn;
|
|
|
|
|
operation.args = args;
|
|
|
|
|
return true;
|
|
|
|
|
}
|
|
|
|
|
if (obj === EOF) {
|
|
|
|
|
return false; // no more commands
|
|
|
|
|
}
|
|
|
|
|
// argument
|
|
|
|
|
if (obj !== null) {
|
|
|
|
|
if (args === null) {
|
|
|
|
|
args = [];
|
2016-12-16 21:05:33 +09:00
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
args.push(obj);
|
|
|
|
|
if (args.length > 33) {
|
|
|
|
|
throw new FormatError("Too many arguments");
|
2014-01-17 22:16:52 +09:00
|
|
|
|
}
|
|
|
|
|
}
|
2020-07-05 19:20:10 +09:00
|
|
|
|
}
|
|
|
|
|
}
|
2014-03-23 03:15:51 +09:00
|
|
|
|
|
2020-07-05 19:20:10 +09:00
|
|
|
|
preprocessCommand(fn, args) {
|
|
|
|
|
switch (fn | 0) {
|
|
|
|
|
case OPS.save:
|
|
|
|
|
this.stateManager.save();
|
|
|
|
|
break;
|
|
|
|
|
case OPS.restore:
|
|
|
|
|
this.stateManager.restore();
|
|
|
|
|
break;
|
|
|
|
|
case OPS.transform:
|
|
|
|
|
this.stateManager.transform(args);
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
2014-02-24 11:42:54 +09:00
|
|
|
|
|
2021-01-22 04:15:31 +09:00
|
|
|
|
export { EvaluatorPreprocessor, PartialEvaluator };
|