2012-09-01 07:48:21 +09:00
|
|
|
/* Copyright 2012 Mozilla Foundation
|
|
|
|
*
|
|
|
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
|
* you may not use this file except in compliance with the License.
|
|
|
|
* You may obtain a copy of the License at
|
|
|
|
*
|
|
|
|
* http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
*
|
|
|
|
* Unless required by applicable law or agreed to in writing, software
|
|
|
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
* See the License for the specific language governing permissions and
|
|
|
|
* limitations under the License.
|
|
|
|
*/
|
2011-10-25 10:13:12 +09:00
|
|
|
|
2017-07-20 21:04:54 +09:00
|
|
|
import {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
assert,
|
|
|
|
FormatError,
|
|
|
|
info,
|
|
|
|
InvalidPDFException,
|
|
|
|
isArrayBuffer,
|
|
|
|
isArrayEqual,
|
|
|
|
isBool,
|
|
|
|
isNum,
|
|
|
|
isString,
|
|
|
|
OPS,
|
2020-12-08 03:22:14 +09:00
|
|
|
PageActionEventType,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
shadow,
|
|
|
|
stringToBytes,
|
|
|
|
stringToPDFString,
|
2021-03-19 18:11:40 +09:00
|
|
|
stringToUTF8String,
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
unreachable,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
Util,
|
|
|
|
warn,
|
2020-01-02 20:00:16 +09:00
|
|
|
} from "../shared/util.js";
|
2020-01-16 22:55:08 +09:00
|
|
|
import {
|
|
|
|
clearPrimitiveCaches,
|
|
|
|
Dict,
|
|
|
|
isDict,
|
|
|
|
isName,
|
2020-10-17 00:15:58 +09:00
|
|
|
isRef,
|
2020-01-16 22:55:08 +09:00
|
|
|
isStream,
|
2021-03-26 17:28:18 +09:00
|
|
|
Name,
|
2020-01-16 22:55:08 +09:00
|
|
|
Ref,
|
|
|
|
} from "./primitives.js";
|
2019-02-24 00:14:31 +09:00
|
|
|
import {
|
2020-12-08 03:22:14 +09:00
|
|
|
collectActions,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
getInheritableProperty,
|
2020-02-10 17:38:57 +09:00
|
|
|
isWhiteSpace,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
MissingDataException,
|
2021-03-26 17:28:18 +09:00
|
|
|
validateCSSFont,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
XRefEntryException,
|
|
|
|
XRefParseException,
|
2020-01-02 20:00:16 +09:00
|
|
|
} from "./core_utils.js";
|
2021-04-27 23:18:52 +09:00
|
|
|
import { NullStream, Stream } from "./stream.js";
|
2020-01-02 20:00:16 +09:00
|
|
|
import { AnnotationFactory } from "./annotation.js";
|
2021-05-14 16:59:24 +09:00
|
|
|
import { BaseStream } from "./base_stream.js";
|
2020-01-02 20:00:16 +09:00
|
|
|
import { calculateMD5 } from "./crypto.js";
|
2021-04-14 01:26:23 +09:00
|
|
|
import { Catalog } from "./catalog.js";
|
2020-01-02 20:00:16 +09:00
|
|
|
import { Linearization } from "./parser.js";
|
2021-04-14 01:25:34 +09:00
|
|
|
import { ObjectLoader } from "./object_loader.js";
|
2020-01-02 20:00:16 +09:00
|
|
|
import { OperatorList } from "./operator_list.js";
|
|
|
|
import { PartialEvaluator } from "./evaluator.js";
|
2021-04-27 23:18:52 +09:00
|
|
|
import { StreamsSequenceStream } from "./decode_stream.js";
|
2021-04-01 07:07:02 +09:00
|
|
|
import { StructTreePage } from "./struct_tree.js";
|
2021-03-19 18:11:40 +09:00
|
|
|
import { XFAFactory } from "./xfa/factory.js";
|
2021-04-14 01:26:07 +09:00
|
|
|
import { XRef } from "./xref.js";
|
2015-11-22 01:32:47 +09:00
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
const DEFAULT_USER_UNIT = 1.0;
|
|
|
|
const LETTER_SIZE_MEDIABOX = [0, 0, 612, 792];
|
2013-03-14 04:24:55 +09:00
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
class Page {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
constructor({
|
|
|
|
pdfManager,
|
|
|
|
xref,
|
|
|
|
pageIndex,
|
|
|
|
pageDict,
|
|
|
|
ref,
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
globalIdFactory,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
fontCache,
|
|
|
|
builtInCMapCache,
|
2021-06-08 20:58:52 +09:00
|
|
|
standardFontDataCache,
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
globalImageCache,
|
Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in `PartialEvaluator.hasBlendModes`
The `PartialEvaluator.hasBlendModes` method is necessary to determine if there's any blend modes on a page, which unfortunately requires *synchronous* parsing of the /Resources of each page before its rendering can start (see the "StartRenderPage"-message).
In practice it's not uncommon for certain /Resources-entries to be found on more than one page (referenced via the XRef-table), which thus leads to unnecessary re-fetching/re-parsing of data in `PartialEvaluator.hasBlendModes`.
To improve performance, especially in pathological cases, we can cache /Resources-entries when it's absolutely clear that they do not contain *any* blend modes at all[1]. This way, subsequent `PartialEvaluator.hasBlendModes` calls can be made significantly more efficient.
This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf:
```
[
{ "id": "issue6961",
"file": "../web/pdfs/issue6961.pdf",
"md5": "a80e4357a8fda758d96c2c76f2980b03",
"rounds": 100,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | 0 | Overall | 100 | 1034 | 555 | -480 | -46.39 | faster
firefox | 0 | Page Request | 100 | 489 | 7 | -482 | -98.67 | faster
firefox | 0 | Rendering | 100 | 545 | 548 | 2 | 0.45 |
firefox | 1 | Overall | 100 | 912 | 428 | -484 | -53.06 | faster
firefox | 1 | Page Request | 100 | 487 | 1 | -486 | -99.77 | faster
firefox | 1 | Rendering | 100 | 425 | 427 | 2 | 0.51 |
```
---
[1] In the case where blend modes *are* found, it becomes a lot more difficult to know if it's generally safe to skip /Resources-entries. Hence we don't cache anything in that case, however note that most document/pages do not utilize blend modes anyway.
2020-11-05 21:35:33 +09:00
|
|
|
nonBlendModesSet,
|
2021-03-19 18:11:40 +09:00
|
|
|
xfaFactory,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
}) {
|
2013-04-09 07:14:56 +09:00
|
|
|
this.pdfManager = pdfManager;
|
2012-10-29 05:10:34 +09:00
|
|
|
this.pageIndex = pageIndex;
|
2011-10-25 10:13:12 +09:00
|
|
|
this.pageDict = pageDict;
|
|
|
|
this.xref = xref;
|
|
|
|
this.ref = ref;
|
2013-11-15 06:43:38 +09:00
|
|
|
this.fontCache = fontCache;
|
2017-02-14 22:28:31 +09:00
|
|
|
this.builtInCMapCache = builtInCMapCache;
|
2021-06-08 20:58:52 +09:00
|
|
|
this.standardFontDataCache = standardFontDataCache;
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
this.globalImageCache = globalImageCache;
|
Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in `PartialEvaluator.hasBlendModes`
The `PartialEvaluator.hasBlendModes` method is necessary to determine if there's any blend modes on a page, which unfortunately requires *synchronous* parsing of the /Resources of each page before its rendering can start (see the "StartRenderPage"-message).
In practice it's not uncommon for certain /Resources-entries to be found on more than one page (referenced via the XRef-table), which thus leads to unnecessary re-fetching/re-parsing of data in `PartialEvaluator.hasBlendModes`.
To improve performance, especially in pathological cases, we can cache /Resources-entries when it's absolutely clear that they do not contain *any* blend modes at all[1]. This way, subsequent `PartialEvaluator.hasBlendModes` calls can be made significantly more efficient.
This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf:
```
[
{ "id": "issue6961",
"file": "../web/pdfs/issue6961.pdf",
"md5": "a80e4357a8fda758d96c2c76f2980b03",
"rounds": 100,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | 0 | Overall | 100 | 1034 | 555 | -480 | -46.39 | faster
firefox | 0 | Page Request | 100 | 489 | 7 | -482 | -98.67 | faster
firefox | 0 | Rendering | 100 | 545 | 548 | 2 | 0.45 |
firefox | 1 | Overall | 100 | 912 | 428 | -484 | -53.06 | faster
firefox | 1 | Page Request | 100 | 487 | 1 | -486 | -99.77 | faster
firefox | 1 | Rendering | 100 | 425 | 427 | 2 | 0.51 |
```
---
[1] In the case where blend modes *are* found, it becomes a lot more difficult to know if it's generally safe to skip /Resources-entries. Hence we don't cache anything in that case, however note that most document/pages do not utilize blend modes anyway.
2020-11-05 21:35:33 +09:00
|
|
|
this.nonBlendModesSet = nonBlendModesSet;
|
2016-03-03 09:48:21 +09:00
|
|
|
this.evaluatorOptions = pdfManager.evaluatorOptions;
|
2013-06-05 09:57:52 +09:00
|
|
|
this.resourcesPromise = null;
|
2021-03-19 18:11:40 +09:00
|
|
|
this.xfaFactory = xfaFactory;
|
2017-01-09 00:51:30 +09:00
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
const idCounters = {
|
2017-01-09 00:51:30 +09:00
|
|
|
obj: 0,
|
|
|
|
};
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
this._localIdFactory = class extends globalIdFactory {
|
|
|
|
static createObjId() {
|
2019-04-20 19:36:49 +09:00
|
|
|
return `p${pageIndex}_${++idCounters.obj}`;
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
}
|
2021-04-01 07:07:02 +09:00
|
|
|
|
|
|
|
static getPageObjId() {
|
|
|
|
return `page${ref.toString()}`;
|
|
|
|
}
|
2017-01-09 00:51:30 +09:00
|
|
|
};
|
2011-10-25 10:13:12 +09:00
|
|
|
}
|
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
/**
|
|
|
|
* @private
|
|
|
|
*/
|
|
|
|
_getInheritableProperty(key, getArray = false) {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
const value = getInheritableProperty({
|
|
|
|
dict: this.pageDict,
|
|
|
|
key,
|
|
|
|
getArray,
|
|
|
|
stopWhenFound: false,
|
|
|
|
});
|
2018-12-29 23:46:22 +09:00
|
|
|
if (!Array.isArray(value)) {
|
|
|
|
return value;
|
|
|
|
}
|
|
|
|
if (value.length === 1 || !isDict(value[0])) {
|
|
|
|
return value[0];
|
|
|
|
}
|
2020-08-28 08:05:33 +09:00
|
|
|
return Dict.merge({ xref: this.xref, dictArray: value });
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
get content() {
|
2021-05-14 16:59:24 +09:00
|
|
|
return this.pageDict.getArray("Contents");
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
get resources() {
|
|
|
|
// For robustness: The spec states that a \Resources entry has to be
|
|
|
|
// present, but can be empty. Some documents still omit it; in this case
|
|
|
|
// we return an empty dictionary.
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(
|
|
|
|
this,
|
|
|
|
"resources",
|
|
|
|
this._getInheritableProperty("Resources") || Dict.empty
|
|
|
|
);
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
|
|
|
|
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".
The patch makes the following notable changes:
- Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
- Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
- Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
- Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
- Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.
---
[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-08 22:54:46 +09:00
|
|
|
_getBoundingBox(name) {
|
2021-03-19 18:11:40 +09:00
|
|
|
if (this.xfaData) {
|
2021-05-25 22:50:12 +09:00
|
|
|
return this.xfaData.bbox;
|
2021-03-19 18:11:40 +09:00
|
|
|
}
|
|
|
|
|
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".
The patch makes the following notable changes:
- Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
- Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
- Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
- Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
- Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.
---
[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-08 22:54:46 +09:00
|
|
|
const box = this._getInheritableProperty(name, /* getArray = */ true);
|
|
|
|
|
|
|
|
if (Array.isArray(box) && box.length === 4) {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
if (box[2] - box[0] !== 0 && box[3] - box[1] !== 0) {
|
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".
The patch makes the following notable changes:
- Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
- Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
- Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
- Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
- Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.
---
[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-08 22:54:46 +09:00
|
|
|
return box;
|
|
|
|
}
|
|
|
|
warn(`Empty /${name} entry.`);
|
|
|
|
}
|
|
|
|
return null;
|
|
|
|
}
|
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
get mediaBox() {
|
|
|
|
// Reset invalid media box to letter size.
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(
|
|
|
|
this,
|
|
|
|
"mediaBox",
|
|
|
|
this._getBoundingBox("MediaBox") || LETTER_SIZE_MEDIABOX
|
|
|
|
);
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
2016-12-06 09:00:12 +09:00
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
get cropBox() {
|
|
|
|
// Reset invalid crop box to media box.
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(
|
|
|
|
this,
|
|
|
|
"cropBox",
|
|
|
|
this._getBoundingBox("CropBox") || this.mediaBox
|
|
|
|
);
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
2014-03-14 22:39:35 +09:00
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
get userUnit() {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
let obj = this.pageDict.get("UserUnit");
|
2018-12-29 23:46:22 +09:00
|
|
|
if (!isNum(obj) || obj <= 0) {
|
|
|
|
obj = DEFAULT_USER_UNIT;
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(this, "userUnit", obj);
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
get view() {
|
|
|
|
// From the spec, 6th ed., p.963:
|
|
|
|
// "The crop, bleed, trim, and art boxes should not ordinarily
|
|
|
|
// extend beyond the boundaries of the media box. If they do, they are
|
|
|
|
// effectively reduced to their intersection with the media box."
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
const { cropBox, mediaBox } = this;
|
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".
The patch makes the following notable changes:
- Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
- Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
- Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
- Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
- Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.
---
[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-08 22:54:46 +09:00
|
|
|
let view;
|
2019-08-08 00:04:26 +09:00
|
|
|
if (cropBox === mediaBox || isArrayEqual(cropBox, mediaBox)) {
|
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".
The patch makes the following notable changes:
- Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
- Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
- Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
- Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
- Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.
---
[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-08 22:54:46 +09:00
|
|
|
view = mediaBox;
|
|
|
|
} else {
|
2019-08-11 20:40:58 +09:00
|
|
|
const box = Util.intersect(cropBox, mediaBox);
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
if (box && box[2] - box[0] !== 0 && box[3] - box[1] !== 0) {
|
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".
The patch makes the following notable changes:
- Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
- Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
- Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
- Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
- Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.
---
[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-08 22:54:46 +09:00
|
|
|
view = box;
|
|
|
|
} else {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
warn("Empty /CropBox and /MediaBox intersection.");
|
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".
The patch makes the following notable changes:
- Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
- Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
- Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
- Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
- Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.
---
[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-08 22:54:46 +09:00
|
|
|
}
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(this, "view", view || mediaBox);
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
get rotate() {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
let rotate = this._getInheritableProperty("Rotate") || 0;
|
2018-12-29 23:46:22 +09:00
|
|
|
|
|
|
|
// Normalize rotation so it's a multiple of 90 and between 0 and 270.
|
|
|
|
if (rotate % 90 !== 0) {
|
|
|
|
rotate = 0;
|
|
|
|
} else if (rotate >= 360) {
|
|
|
|
rotate = rotate % 360;
|
|
|
|
} else if (rotate < 0) {
|
|
|
|
// The spec doesn't cover negatives. Assume it's counterclockwise
|
|
|
|
// rotation. The following is the other implementation of modulo.
|
|
|
|
rotate = ((rotate % 360) + 360) % 360;
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(this, "rotate", rotate);
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
|
|
|
|
2021-05-14 16:59:24 +09:00
|
|
|
/**
|
|
|
|
* @returns {Promise<BaseStream>}
|
|
|
|
*/
|
2018-12-29 23:46:22 +09:00
|
|
|
getContentStream() {
|
2021-05-14 16:59:24 +09:00
|
|
|
return this.pdfManager.ensure(this, "content").then(content => {
|
|
|
|
if (content instanceof BaseStream) {
|
|
|
|
return content;
|
|
|
|
}
|
|
|
|
if (Array.isArray(content)) {
|
|
|
|
return new StreamsSequenceStream(content);
|
2011-10-25 10:13:12 +09:00
|
|
|
}
|
2018-12-29 23:46:22 +09:00
|
|
|
// Replace non-existent page content with empty content.
|
2021-05-14 16:59:24 +09:00
|
|
|
return new NullStream();
|
|
|
|
});
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
|
|
|
|
2021-03-19 18:11:40 +09:00
|
|
|
get xfaData() {
|
|
|
|
if (this.xfaFactory) {
|
2021-05-25 22:50:12 +09:00
|
|
|
return shadow(this, "xfaData", {
|
|
|
|
bbox: this.xfaFactory.getBoundingBox(this.pageIndex),
|
|
|
|
});
|
2021-03-19 18:11:40 +09:00
|
|
|
}
|
|
|
|
return shadow(this, "xfaData", null);
|
|
|
|
}
|
|
|
|
|
2020-08-04 02:44:04 +09:00
|
|
|
save(handler, task, annotationStorage) {
|
|
|
|
const partialEvaluator = new PartialEvaluator({
|
|
|
|
xref: this.xref,
|
|
|
|
handler,
|
|
|
|
pageIndex: this.pageIndex,
|
|
|
|
idFactory: this._localIdFactory,
|
|
|
|
fontCache: this.fontCache,
|
|
|
|
builtInCMapCache: this.builtInCMapCache,
|
2021-06-08 20:58:52 +09:00
|
|
|
standardFontDataCache: this.standardFontDataCache,
|
2020-08-04 02:44:04 +09:00
|
|
|
globalImageCache: this.globalImageCache,
|
|
|
|
options: this.evaluatorOptions,
|
|
|
|
});
|
|
|
|
|
|
|
|
// Fetch the page's annotations and save the content
|
|
|
|
// in case of interactive form fields.
|
|
|
|
return this._parsedAnnotations.then(function (annotations) {
|
|
|
|
const newRefsPromises = [];
|
|
|
|
for (const annotation of annotations) {
|
2021-05-04 01:03:16 +09:00
|
|
|
if (!annotation.mustBePrinted(annotationStorage)) {
|
2020-08-04 02:44:04 +09:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
newRefsPromises.push(
|
|
|
|
annotation
|
|
|
|
.save(partialEvaluator, task, annotationStorage)
|
|
|
|
.catch(function (reason) {
|
|
|
|
warn(
|
|
|
|
"save - ignoring annotation data during " +
|
|
|
|
`"${task.name}" task: "${reason}".`
|
|
|
|
);
|
|
|
|
return null;
|
|
|
|
})
|
|
|
|
);
|
|
|
|
}
|
|
|
|
|
|
|
|
return Promise.all(newRefsPromises);
|
|
|
|
});
|
|
|
|
}
|
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
loadResources(keys) {
|
|
|
|
if (!this.resourcesPromise) {
|
|
|
|
// TODO: add async `_getInheritableProperty` and remove this.
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
this.resourcesPromise = this.pdfManager.ensure(this, "resources");
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
|
|
|
return this.resourcesPromise.then(() => {
|
|
|
|
const objectLoader = new ObjectLoader(this.resources, keys, this.xref);
|
|
|
|
return objectLoader.load();
|
|
|
|
});
|
|
|
|
}
|
|
|
|
|
2020-07-22 20:55:52 +09:00
|
|
|
getOperatorList({
|
|
|
|
handler,
|
|
|
|
sink,
|
|
|
|
task,
|
|
|
|
intent,
|
|
|
|
renderInteractiveForms,
|
|
|
|
annotationStorage,
|
|
|
|
}) {
|
2021-05-14 16:59:24 +09:00
|
|
|
const contentStreamPromise = this.getContentStream();
|
2018-12-29 23:46:22 +09:00
|
|
|
const resourcesPromise = this.loadResources([
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
"ColorSpace",
|
2021-04-21 02:51:01 +09:00
|
|
|
"ExtGState",
|
|
|
|
"Font",
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
"Pattern",
|
2021-04-21 02:51:01 +09:00
|
|
|
"Properties",
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
"Shading",
|
|
|
|
"XObject",
|
2018-12-29 23:46:22 +09:00
|
|
|
]);
|
|
|
|
|
|
|
|
const partialEvaluator = new PartialEvaluator({
|
|
|
|
xref: this.xref,
|
|
|
|
handler,
|
|
|
|
pageIndex: this.pageIndex,
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
idFactory: this._localIdFactory,
|
2018-12-29 23:46:22 +09:00
|
|
|
fontCache: this.fontCache,
|
|
|
|
builtInCMapCache: this.builtInCMapCache,
|
2021-06-08 20:58:52 +09:00
|
|
|
standardFontDataCache: this.standardFontDataCache,
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
globalImageCache: this.globalImageCache,
|
2018-12-29 23:46:22 +09:00
|
|
|
options: this.evaluatorOptions,
|
|
|
|
});
|
|
|
|
|
|
|
|
const dataPromises = Promise.all([contentStreamPromise, resourcesPromise]);
|
|
|
|
const pageListPromise = dataPromises.then(([contentStream]) => {
|
2020-06-11 23:05:38 +09:00
|
|
|
const opList = new OperatorList(intent, sink);
|
2018-12-29 23:46:22 +09:00
|
|
|
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
handler.send("StartRenderPage", {
|
Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in `PartialEvaluator.hasBlendModes`
The `PartialEvaluator.hasBlendModes` method is necessary to determine if there's any blend modes on a page, which unfortunately requires *synchronous* parsing of the /Resources of each page before its rendering can start (see the "StartRenderPage"-message).
In practice it's not uncommon for certain /Resources-entries to be found on more than one page (referenced via the XRef-table), which thus leads to unnecessary re-fetching/re-parsing of data in `PartialEvaluator.hasBlendModes`.
To improve performance, especially in pathological cases, we can cache /Resources-entries when it's absolutely clear that they do not contain *any* blend modes at all[1]. This way, subsequent `PartialEvaluator.hasBlendModes` calls can be made significantly more efficient.
This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf:
```
[
{ "id": "issue6961",
"file": "../web/pdfs/issue6961.pdf",
"md5": "a80e4357a8fda758d96c2c76f2980b03",
"rounds": 100,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | 0 | Overall | 100 | 1034 | 555 | -480 | -46.39 | faster
firefox | 0 | Page Request | 100 | 489 | 7 | -482 | -98.67 | faster
firefox | 0 | Rendering | 100 | 545 | 548 | 2 | 0.45 |
firefox | 1 | Overall | 100 | 912 | 428 | -484 | -53.06 | faster
firefox | 1 | Page Request | 100 | 487 | 1 | -486 | -99.77 | faster
firefox | 1 | Rendering | 100 | 425 | 427 | 2 | 0.51 |
```
---
[1] In the case where blend modes *are* found, it becomes a lot more difficult to know if it's generally safe to skip /Resources-entries. Hence we don't cache anything in that case, however note that most document/pages do not utilize blend modes anyway.
2020-11-05 21:35:33 +09:00
|
|
|
transparency: partialEvaluator.hasBlendModes(
|
|
|
|
this.resources,
|
|
|
|
this.nonBlendModesSet
|
|
|
|
),
|
2018-12-29 23:46:22 +09:00
|
|
|
pageIndex: this.pageIndex,
|
|
|
|
intent,
|
|
|
|
});
|
|
|
|
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return partialEvaluator
|
|
|
|
.getOperatorList({
|
|
|
|
stream: contentStream,
|
|
|
|
task,
|
|
|
|
resources: this.resources,
|
|
|
|
operatorList: opList,
|
|
|
|
})
|
2020-04-14 19:28:14 +09:00
|
|
|
.then(function () {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return opList;
|
|
|
|
});
|
2018-12-29 23:46:22 +09:00
|
|
|
});
|
|
|
|
|
|
|
|
// Fetch the page's annotations and add their operator lists to the
|
|
|
|
// page's operator list to render them.
|
|
|
|
return Promise.all([pageListPromise, this._parsedAnnotations]).then(
|
2020-04-14 19:28:14 +09:00
|
|
|
function ([pageOpList, annotations]) {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
if (annotations.length === 0) {
|
|
|
|
pageOpList.flush(true);
|
|
|
|
return { length: pageOpList.totalLength };
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
2017-06-13 17:22:11 +09:00
|
|
|
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
// Collect the operator list promises for the annotations. Each promise
|
|
|
|
// is resolved with the complete operator list for a single annotation.
|
|
|
|
const opListPromises = [];
|
|
|
|
for (const annotation of annotations) {
|
2020-11-04 00:04:08 +09:00
|
|
|
if (
|
2021-05-04 01:03:16 +09:00
|
|
|
(intent === "display" &&
|
|
|
|
annotation.mustBeViewed(annotationStorage)) ||
|
|
|
|
(intent === "print" && annotation.mustBePrinted(annotationStorage))
|
2020-11-04 00:04:08 +09:00
|
|
|
) {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
opListPromises.push(
|
2020-05-04 23:51:08 +09:00
|
|
|
annotation
|
2020-07-22 20:55:52 +09:00
|
|
|
.getOperatorList(
|
|
|
|
partialEvaluator,
|
|
|
|
task,
|
|
|
|
renderInteractiveForms,
|
|
|
|
annotationStorage
|
|
|
|
)
|
2020-05-04 23:51:08 +09:00
|
|
|
.catch(function (reason) {
|
|
|
|
warn(
|
|
|
|
"getOperatorList - ignoring annotation data during " +
|
|
|
|
`"${task.name}" task: "${reason}".`
|
|
|
|
);
|
|
|
|
return null;
|
|
|
|
})
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
);
|
|
|
|
}
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
2014-03-14 22:39:35 +09:00
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
return Promise.all(opListPromises).then(function (opLists) {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
pageOpList.addOp(OPS.beginAnnotations, []);
|
|
|
|
for (const opList of opLists) {
|
|
|
|
pageOpList.addOpList(opList);
|
|
|
|
}
|
|
|
|
pageOpList.addOp(OPS.endAnnotations, []);
|
|
|
|
pageOpList.flush(true);
|
|
|
|
return { length: pageOpList.totalLength };
|
|
|
|
});
|
|
|
|
}
|
|
|
|
);
|
|
|
|
}
|
|
|
|
|
|
|
|
extractTextContent({
|
|
|
|
handler,
|
|
|
|
task,
|
|
|
|
normalizeWhitespace,
|
2021-04-01 07:07:02 +09:00
|
|
|
includeMarkedContent,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
sink,
|
|
|
|
combineTextItems,
|
|
|
|
}) {
|
2021-05-14 16:59:24 +09:00
|
|
|
const contentStreamPromise = this.getContentStream();
|
2018-12-29 23:46:22 +09:00
|
|
|
const resourcesPromise = this.loadResources([
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
"ExtGState",
|
|
|
|
"Font",
|
2021-04-21 02:51:01 +09:00
|
|
|
"Properties",
|
|
|
|
"XObject",
|
2018-12-29 23:46:22 +09:00
|
|
|
]);
|
|
|
|
|
|
|
|
const dataPromises = Promise.all([contentStreamPromise, resourcesPromise]);
|
|
|
|
return dataPromises.then(([contentStream]) => {
|
|
|
|
const partialEvaluator = new PartialEvaluator({
|
Change the signatures of the `PartialEvaluator` "constructor" and its `getOperatorList`/`getTextContent` methods to take parameter objects
Currently these methods accept a large number of parameters, which creates quite unwieldy call-sites. When invoking them, you have to remember not only what arguments to supply, but also the correct order, to avoid runtime errors.
Furthermore, since some of the parameters are optional, you also have to remember to pass e.g. `null` or `undefined` for those ones.
Also, adding new parameters to these methods (which happens occasionally), often becomes unnecessarily tedious (based on personal experience).
Please note that I do *not* think that we need/should convert *every* single method in `evaluator.js` (or elsewhere in `/core` files) to take parameter objects. However, in my opinion, once a method starts relying on approximately five parameter (or even more), passing them in individually becomes quite cumbersome.
With these changes, I obviously needed to update the `evaluator_spec.js` unit-tests. The main change there, except the new method signatures[1], is that it's now re-using *one* `PartialEvalutor` instance, since I couldn't see any compelling reason for creating a new one in every single test.
*Note:* If this patch is accepted, my intention is to (time permitting) see if it makes sense to convert additional methods in `evaluator.js` (and other `/core` files) in a similar fashion, but I figured that it'd be a good idea to limit the initial scope somewhat.
---
[1] A fun fact here, note how the `PartialEvaluator` signature used in `evaluator_spec.js` wasn't even correct in the current `master`.
2017-04-30 06:13:51 +09:00
|
|
|
xref: this.xref,
|
|
|
|
handler,
|
|
|
|
pageIndex: this.pageIndex,
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
idFactory: this._localIdFactory,
|
Change the signatures of the `PartialEvaluator` "constructor" and its `getOperatorList`/`getTextContent` methods to take parameter objects
Currently these methods accept a large number of parameters, which creates quite unwieldy call-sites. When invoking them, you have to remember not only what arguments to supply, but also the correct order, to avoid runtime errors.
Furthermore, since some of the parameters are optional, you also have to remember to pass e.g. `null` or `undefined` for those ones.
Also, adding new parameters to these methods (which happens occasionally), often becomes unnecessarily tedious (based on personal experience).
Please note that I do *not* think that we need/should convert *every* single method in `evaluator.js` (or elsewhere in `/core` files) to take parameter objects. However, in my opinion, once a method starts relying on approximately five parameter (or even more), passing them in individually becomes quite cumbersome.
With these changes, I obviously needed to update the `evaluator_spec.js` unit-tests. The main change there, except the new method signatures[1], is that it's now re-using *one* `PartialEvalutor` instance, since I couldn't see any compelling reason for creating a new one in every single test.
*Note:* If this patch is accepted, my intention is to (time permitting) see if it makes sense to convert additional methods in `evaluator.js` (and other `/core` files) in a similar fashion, but I figured that it'd be a good idea to limit the initial scope somewhat.
---
[1] A fun fact here, note how the `PartialEvaluator` signature used in `evaluator_spec.js` wasn't even correct in the current `master`.
2017-04-30 06:13:51 +09:00
|
|
|
fontCache: this.fontCache,
|
|
|
|
builtInCMapCache: this.builtInCMapCache,
|
2021-06-08 20:58:52 +09:00
|
|
|
standardFontDataCache: this.standardFontDataCache,
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
globalImageCache: this.globalImageCache,
|
Change the signatures of the `PartialEvaluator` "constructor" and its `getOperatorList`/`getTextContent` methods to take parameter objects
Currently these methods accept a large number of parameters, which creates quite unwieldy call-sites. When invoking them, you have to remember not only what arguments to supply, but also the correct order, to avoid runtime errors.
Furthermore, since some of the parameters are optional, you also have to remember to pass e.g. `null` or `undefined` for those ones.
Also, adding new parameters to these methods (which happens occasionally), often becomes unnecessarily tedious (based on personal experience).
Please note that I do *not* think that we need/should convert *every* single method in `evaluator.js` (or elsewhere in `/core` files) to take parameter objects. However, in my opinion, once a method starts relying on approximately five parameter (or even more), passing them in individually becomes quite cumbersome.
With these changes, I obviously needed to update the `evaluator_spec.js` unit-tests. The main change there, except the new method signatures[1], is that it's now re-using *one* `PartialEvalutor` instance, since I couldn't see any compelling reason for creating a new one in every single test.
*Note:* If this patch is accepted, my intention is to (time permitting) see if it makes sense to convert additional methods in `evaluator.js` (and other `/core` files) in a similar fashion, but I figured that it'd be a good idea to limit the initial scope somewhat.
---
[1] A fun fact here, note how the `PartialEvaluator` signature used in `evaluator_spec.js` wasn't even correct in the current `master`.
2017-04-30 06:13:51 +09:00
|
|
|
options: this.evaluatorOptions,
|
|
|
|
});
|
2013-04-23 06:20:49 +09:00
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
return partialEvaluator.getTextContent({
|
|
|
|
stream: contentStream,
|
|
|
|
task,
|
|
|
|
resources: this.resources,
|
|
|
|
normalizeWhitespace,
|
2021-04-01 07:07:02 +09:00
|
|
|
includeMarkedContent,
|
2018-12-29 23:46:22 +09:00
|
|
|
combineTextItems,
|
|
|
|
sink,
|
2013-04-09 07:14:56 +09:00
|
|
|
});
|
2018-12-29 23:46:22 +09:00
|
|
|
});
|
|
|
|
}
|
2013-04-09 07:14:56 +09:00
|
|
|
|
2021-04-01 07:07:02 +09:00
|
|
|
async getStructTree() {
|
|
|
|
const structTreeRoot = await this.pdfManager.ensureCatalog(
|
|
|
|
"structTreeRoot"
|
|
|
|
);
|
2021-04-12 15:52:35 +09:00
|
|
|
if (!structTreeRoot) {
|
|
|
|
return null;
|
|
|
|
}
|
|
|
|
const structTree = await this.pdfManager.ensure(this, "_parseStructTree", [
|
|
|
|
structTreeRoot,
|
|
|
|
]);
|
|
|
|
return structTree.serializable;
|
2021-04-11 19:00:14 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* @private
|
|
|
|
*/
|
|
|
|
_parseStructTree(structTreeRoot) {
|
2021-04-01 07:07:02 +09:00
|
|
|
const tree = new StructTreePage(structTreeRoot, this.pageDict);
|
|
|
|
tree.parse();
|
|
|
|
return tree;
|
|
|
|
}
|
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
getAnnotationsData(intent) {
|
2020-04-14 19:28:14 +09:00
|
|
|
return this._parsedAnnotations.then(function (annotations) {
|
2018-12-29 23:46:22 +09:00
|
|
|
const annotationsData = [];
|
|
|
|
for (let i = 0, ii = annotations.length; i < ii; i++) {
|
2021-05-04 01:03:16 +09:00
|
|
|
// Get the annotation even if it's hidden because
|
|
|
|
// JS can change its display.
|
|
|
|
if (
|
|
|
|
!intent ||
|
|
|
|
(intent === "display" && annotations[i].viewable) ||
|
|
|
|
(intent === "print" && annotations[i].printable)
|
|
|
|
) {
|
2018-12-29 23:46:22 +09:00
|
|
|
annotationsData.push(annotations[i].data);
|
2017-02-16 07:52:15 +09:00
|
|
|
}
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
|
|
|
return annotationsData;
|
|
|
|
});
|
|
|
|
}
|
2017-02-16 07:52:15 +09:00
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
get annotations() {
|
2020-12-10 19:31:46 +09:00
|
|
|
const annots = this._getInheritableProperty("Annots");
|
|
|
|
return shadow(this, "annotations", Array.isArray(annots) ? annots : []);
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
Change the signatures of the `PartialEvaluator` "constructor" and its `getOperatorList`/`getTextContent` methods to take parameter objects
Currently these methods accept a large number of parameters, which creates quite unwieldy call-sites. When invoking them, you have to remember not only what arguments to supply, but also the correct order, to avoid runtime errors.
Furthermore, since some of the parameters are optional, you also have to remember to pass e.g. `null` or `undefined` for those ones.
Also, adding new parameters to these methods (which happens occasionally), often becomes unnecessarily tedious (based on personal experience).
Please note that I do *not* think that we need/should convert *every* single method in `evaluator.js` (or elsewhere in `/core` files) to take parameter objects. However, in my opinion, once a method starts relying on approximately five parameter (or even more), passing them in individually becomes quite cumbersome.
With these changes, I obviously needed to update the `evaluator_spec.js` unit-tests. The main change there, except the new method signatures[1], is that it's now re-using *one* `PartialEvalutor` instance, since I couldn't see any compelling reason for creating a new one in every single test.
*Note:* If this patch is accepted, my intention is to (time permitting) see if it makes sense to convert additional methods in `evaluator.js` (and other `/core` files) in a similar fashion, but I figured that it'd be a good idea to limit the initial scope somewhat.
---
[1] A fun fact here, note how the `PartialEvaluator` signature used in `evaluator_spec.js` wasn't even correct in the current `master`.
2017-04-30 06:13:51 +09:00
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
get _parsedAnnotations() {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
const parsedAnnotations = this.pdfManager
|
|
|
|
.ensure(this, "annotations")
|
|
|
|
.then(() => {
|
2018-12-29 23:46:22 +09:00
|
|
|
const annotationPromises = [];
|
2020-05-09 18:58:09 +09:00
|
|
|
for (const annotationRef of this.annotations) {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
annotationPromises.push(
|
|
|
|
AnnotationFactory.create(
|
|
|
|
this.xref,
|
2020-05-09 18:58:09 +09:00
|
|
|
annotationRef,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
this.pdfManager,
|
2021-03-31 00:50:35 +09:00
|
|
|
this._localIdFactory,
|
|
|
|
/* collectFields */ false
|
2020-05-09 18:58:09 +09:00
|
|
|
).catch(function (reason) {
|
|
|
|
warn(`_parsedAnnotations: "${reason}".`);
|
|
|
|
return null;
|
|
|
|
})
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
);
|
2015-11-22 21:56:52 +09:00
|
|
|
}
|
2018-12-29 23:46:22 +09:00
|
|
|
|
2020-05-09 18:58:09 +09:00
|
|
|
return Promise.all(annotationPromises).then(function (annotations) {
|
|
|
|
return annotations.filter(annotation => !!annotation);
|
|
|
|
});
|
2018-12-29 23:46:22 +09:00
|
|
|
});
|
2018-03-21 09:43:40 +09:00
|
|
|
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(this, "_parsedAnnotations", parsedAnnotations);
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
2020-12-08 03:22:14 +09:00
|
|
|
|
|
|
|
get jsActions() {
|
|
|
|
const actions = collectActions(
|
|
|
|
this.xref,
|
|
|
|
this.pageDict,
|
|
|
|
PageActionEventType
|
|
|
|
);
|
|
|
|
return shadow(this, "jsActions", actions);
|
|
|
|
}
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
2011-10-25 10:13:12 +09:00
|
|
|
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
const PDF_HEADER_SIGNATURE = new Uint8Array([0x25, 0x50, 0x44, 0x46, 0x2d]);
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
const STARTXREF_SIGNATURE = new Uint8Array([
|
2021-05-19 18:24:38 +09:00
|
|
|
0x73, 0x74, 0x61, 0x72, 0x74, 0x78, 0x72, 0x65, 0x66,
|
|
|
|
]);
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
const ENDOBJ_SIGNATURE = new Uint8Array([0x65, 0x6e, 0x64, 0x6f, 0x62, 0x6a]);
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
|
2018-12-30 00:18:36 +09:00
|
|
|
const FINGERPRINT_FIRST_BYTES = 1024;
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
const EMPTY_FINGERPRINT =
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00";
|
2018-12-30 00:18:36 +09:00
|
|
|
|
2020-02-05 21:59:47 +09:00
|
|
|
const PDF_HEADER_VERSION_REGEXP = /^[1-9]\.[0-9]$/;
|
|
|
|
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
function find(stream, signature, limit = 1024, backwards = false) {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
if (
|
|
|
|
typeof PDFJSDev === "undefined" ||
|
|
|
|
PDFJSDev.test("!PRODUCTION || TESTING")
|
|
|
|
) {
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
assert(limit > 0, 'The "limit" must be a positive integer.');
|
|
|
|
}
|
|
|
|
const signatureLength = signature.length;
|
2018-12-30 00:18:36 +09:00
|
|
|
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
const scanBytes = stream.peekBytes(limit);
|
|
|
|
const scanLength = scanBytes.length - signatureLength;
|
2018-12-30 00:18:36 +09:00
|
|
|
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
if (scanLength <= 0) {
|
2018-12-30 00:18:36 +09:00
|
|
|
return false;
|
|
|
|
}
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
if (backwards) {
|
|
|
|
const signatureEnd = signatureLength - 1;
|
|
|
|
|
|
|
|
let pos = scanBytes.length - 1;
|
|
|
|
while (pos >= signatureEnd) {
|
|
|
|
let j = 0;
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
while (
|
|
|
|
j < signatureLength &&
|
|
|
|
scanBytes[pos - j] === signature[signatureEnd - j]
|
|
|
|
) {
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
j++;
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
if (j >= signatureLength) {
|
|
|
|
// `signature` found.
|
|
|
|
stream.pos += pos - signatureEnd;
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
return true;
|
|
|
|
}
|
|
|
|
pos--;
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
} else {
|
|
|
|
// forwards
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
let pos = 0;
|
|
|
|
while (pos <= scanLength) {
|
|
|
|
let j = 0;
|
|
|
|
while (j < signatureLength && scanBytes[pos + j] === signature[j]) {
|
|
|
|
j++;
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
if (j >= signatureLength) {
|
|
|
|
// `signature` found.
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
stream.pos += pos;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
pos++;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return false;
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
|
|
|
|
2011-10-25 10:13:12 +09:00
|
|
|
/**
|
2019-11-25 19:25:57 +09:00
|
|
|
* The `PDFDocument` class holds all the (worker-thread) data of the PDF file.
|
2011-10-25 10:13:12 +09:00
|
|
|
*/
|
2018-12-30 00:18:36 +09:00
|
|
|
class PDFDocument {
|
|
|
|
constructor(pdfManager, arg) {
|
|
|
|
let stream;
|
2014-03-23 04:36:35 +09:00
|
|
|
if (isStream(arg)) {
|
2017-01-03 20:39:38 +09:00
|
|
|
stream = arg;
|
2014-03-23 04:36:35 +09:00
|
|
|
} else if (isArrayBuffer(arg)) {
|
2017-01-03 20:39:38 +09:00
|
|
|
stream = new Stream(arg);
|
2014-03-23 04:36:35 +09:00
|
|
|
} else {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
throw new Error("PDFDocument: Unknown argument type");
|
2014-03-23 04:36:35 +09:00
|
|
|
}
|
2017-07-20 21:04:54 +09:00
|
|
|
if (stream.length <= 0) {
|
2019-12-09 23:00:45 +09:00
|
|
|
throw new InvalidPDFException(
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
"The PDF file is empty, i.e. its size is zero bytes."
|
|
|
|
);
|
2017-07-20 21:04:54 +09:00
|
|
|
}
|
2017-01-03 20:39:38 +09:00
|
|
|
|
2013-04-09 07:14:56 +09:00
|
|
|
this.pdfManager = pdfManager;
|
2011-10-25 10:13:12 +09:00
|
|
|
this.stream = stream;
|
2017-01-03 20:39:38 +09:00
|
|
|
this.xref = new XRef(stream, pdfManager);
|
2018-07-25 23:53:47 +09:00
|
|
|
this._pagePromises = [];
|
2020-08-23 05:21:38 +09:00
|
|
|
this._version = null;
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
|
|
|
|
const idCounters = {
|
|
|
|
font: 0,
|
|
|
|
};
|
|
|
|
this._globalIdFactory = class {
|
|
|
|
static getDocId() {
|
|
|
|
return `g_${pdfManager.docId}`;
|
|
|
|
}
|
|
|
|
|
|
|
|
static createFontId() {
|
|
|
|
return `f${++idCounters.font}`;
|
|
|
|
}
|
|
|
|
|
|
|
|
static createObjId() {
|
|
|
|
unreachable("Abstract method `createObjId` called.");
|
2021-04-01 07:07:02 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
static getPageObjId() {
|
|
|
|
unreachable("Abstract method `getPageObjId` called.");
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
}
|
|
|
|
};
|
2011-10-25 10:13:12 +09:00
|
|
|
}
|
|
|
|
|
2018-12-30 00:18:36 +09:00
|
|
|
parse(recoveryMode) {
|
2020-08-26 06:22:21 +09:00
|
|
|
this.xref.parse(recoveryMode);
|
|
|
|
this.catalog = new Catalog(this.pdfManager, this.xref);
|
2018-12-30 00:18:36 +09:00
|
|
|
|
2020-08-23 05:21:38 +09:00
|
|
|
// The `checkHeader` method is called before this method and parses the
|
|
|
|
// version from the header. The specification states in section 7.5.2
|
|
|
|
// that the version from the catalog, if present, should overwrite the
|
|
|
|
// version from the header.
|
|
|
|
if (this.catalog.version) {
|
|
|
|
this._version = this.catalog.version;
|
2014-03-23 04:36:35 +09:00
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
get linearization() {
|
|
|
|
let linearization = null;
|
|
|
|
try {
|
|
|
|
linearization = Linearization.create(this.stream);
|
|
|
|
} catch (err) {
|
|
|
|
if (err instanceof MissingDataException) {
|
|
|
|
throw err;
|
2011-10-25 10:13:12 +09:00
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
info(err);
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(this, "linearization", linearization);
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
get startXRef() {
|
|
|
|
const stream = this.stream;
|
|
|
|
let startXRef = 0;
|
|
|
|
|
|
|
|
if (this.linearization) {
|
|
|
|
// Find the end of the first object.
|
2011-10-25 10:13:12 +09:00
|
|
|
stream.reset();
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
if (find(stream, ENDOBJ_SIGNATURE)) {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
startXRef = stream.pos + 6 - stream.start;
|
2011-10-25 10:13:12 +09:00
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
} else {
|
|
|
|
// Find `startxref` by checking backwards from the end of the file.
|
|
|
|
const step = 1024;
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
const startXRefLength = STARTXREF_SIGNATURE.length;
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
let found = false,
|
|
|
|
pos = stream.end;
|
2018-12-30 00:18:36 +09:00
|
|
|
|
|
|
|
while (!found && pos > 0) {
|
|
|
|
pos -= step - startXRefLength;
|
|
|
|
if (pos < 0) {
|
|
|
|
pos = 0;
|
2017-03-22 22:02:34 +09:00
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
stream.pos = pos;
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
found = find(stream, STARTXREF_SIGNATURE, step, true);
|
2013-08-09 02:02:11 +09:00
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
|
|
|
|
if (found) {
|
|
|
|
stream.skip(9);
|
|
|
|
let ch;
|
|
|
|
do {
|
|
|
|
ch = stream.getByte();
|
2020-02-10 17:38:57 +09:00
|
|
|
} while (isWhiteSpace(ch));
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
let str = "";
|
2019-12-26 04:03:46 +09:00
|
|
|
while (ch >= /* Space = */ 0x20 && ch <= /* '9' = */ 0x39) {
|
2018-12-30 00:18:36 +09:00
|
|
|
str += String.fromCharCode(ch);
|
|
|
|
ch = stream.getByte();
|
2012-08-04 08:11:43 +09:00
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
startXRef = parseInt(str, 10);
|
|
|
|
if (isNaN(startXRef)) {
|
|
|
|
startXRef = 0;
|
2014-08-02 22:19:55 +09:00
|
|
|
}
|
2013-10-03 17:09:06 +09:00
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(this, "startXRef", startXRef);
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
2013-10-03 17:09:06 +09:00
|
|
|
|
2018-12-30 00:18:36 +09:00
|
|
|
// Find the header, get the PDF format version and setup the
|
|
|
|
// stream to start from the header.
|
|
|
|
checkHeader() {
|
|
|
|
const stream = this.stream;
|
|
|
|
stream.reset();
|
|
|
|
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
if (!find(stream, PDF_HEADER_SIGNATURE)) {
|
2018-12-30 00:18:36 +09:00
|
|
|
// May not be a PDF file, but don't throw an error and let
|
|
|
|
// parsing continue.
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
stream.moveStart();
|
|
|
|
|
|
|
|
// Read the PDF format version.
|
|
|
|
const MAX_PDF_VERSION_LENGTH = 12;
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
let version = "",
|
|
|
|
ch;
|
2019-12-26 04:03:46 +09:00
|
|
|
while ((ch = stream.getByte()) > /* Space = */ 0x20) {
|
2018-12-30 00:18:36 +09:00
|
|
|
if (version.length >= MAX_PDF_VERSION_LENGTH) {
|
|
|
|
break;
|
2011-12-23 07:44:42 +09:00
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
version += String.fromCharCode(ch);
|
|
|
|
}
|
2020-08-23 05:21:38 +09:00
|
|
|
if (!this._version) {
|
2018-12-30 00:18:36 +09:00
|
|
|
// Remove the "%PDF-" prefix.
|
2020-08-23 05:21:38 +09:00
|
|
|
this._version = version.substring(5);
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
|
|
|
}
|
2012-03-27 07:14:59 +09:00
|
|
|
|
2018-12-30 00:18:36 +09:00
|
|
|
parseStartXRef() {
|
|
|
|
this.xref.setStartXRef(this.startXRef);
|
|
|
|
}
|
|
|
|
|
|
|
|
get numPages() {
|
2021-03-19 18:11:40 +09:00
|
|
|
if (this.xfaFactory) {
|
|
|
|
return shadow(this, "numPages", this.xfaFactory.numberPages);
|
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
const linearization = this.linearization;
|
|
|
|
const num = linearization ? linearization.numPages : this.catalog.numPages;
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(this, "numPages", num);
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
|
|
|
|
2020-08-23 21:04:49 +09:00
|
|
|
/**
|
|
|
|
* @private
|
|
|
|
*/
|
|
|
|
_hasOnlyDocumentSignatures(fields, recursionDepth = 0) {
|
|
|
|
const RECURSION_LIMIT = 10;
|
2020-10-16 19:57:01 +09:00
|
|
|
|
|
|
|
if (!Array.isArray(fields)) {
|
|
|
|
return false;
|
|
|
|
}
|
2020-08-23 21:04:49 +09:00
|
|
|
return fields.every(field => {
|
|
|
|
field = this.xref.fetchIfRef(field);
|
2020-10-16 19:57:01 +09:00
|
|
|
if (!(field instanceof Dict)) {
|
|
|
|
return false;
|
|
|
|
}
|
2020-08-23 21:04:49 +09:00
|
|
|
if (field.has("Kids")) {
|
|
|
|
if (++recursionDepth > RECURSION_LIMIT) {
|
|
|
|
warn("_hasOnlyDocumentSignatures: maximum recursion depth reached");
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
return this._hasOnlyDocumentSignatures(
|
|
|
|
field.get("Kids"),
|
|
|
|
recursionDepth
|
|
|
|
);
|
|
|
|
}
|
|
|
|
const isSignature = isName(field.get("FT"), "Sig");
|
|
|
|
const rectangle = field.get("Rect");
|
|
|
|
const isInvisible =
|
|
|
|
Array.isArray(rectangle) && rectangle.every(value => value === 0);
|
|
|
|
return isSignature && isInvisible;
|
|
|
|
});
|
|
|
|
}
|
|
|
|
|
2021-03-19 18:11:40 +09:00
|
|
|
get xfaData() {
|
|
|
|
const acroForm = this.catalog.acroForm;
|
|
|
|
if (!acroForm) {
|
|
|
|
return null;
|
|
|
|
}
|
|
|
|
|
|
|
|
const xfa = acroForm.get("XFA");
|
|
|
|
const entries = {
|
|
|
|
"xdp:xdp": "",
|
|
|
|
template: "",
|
|
|
|
datasets: "",
|
|
|
|
config: "",
|
|
|
|
connectionSet: "",
|
|
|
|
localeSet: "",
|
|
|
|
stylesheet: "",
|
|
|
|
"/xdp:xdp": "",
|
|
|
|
};
|
|
|
|
if (isStream(xfa) && !xfa.isEmpty) {
|
|
|
|
try {
|
2021-05-01 19:11:09 +09:00
|
|
|
entries["xdp:xdp"] = stringToUTF8String(xfa.getString());
|
2021-03-19 18:11:40 +09:00
|
|
|
return entries;
|
|
|
|
} catch (_) {
|
|
|
|
warn("XFA - Invalid utf-8 string.");
|
|
|
|
return null;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!Array.isArray(xfa) || xfa.length === 0) {
|
|
|
|
return null;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (let i = 0, ii = xfa.length; i < ii; i += 2) {
|
|
|
|
let name;
|
|
|
|
if (i === 0) {
|
|
|
|
name = "xdp:xdp";
|
|
|
|
} else if (i === ii - 2) {
|
|
|
|
name = "/xdp:xdp";
|
|
|
|
} else {
|
|
|
|
name = xfa[i];
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!entries.hasOwnProperty(name)) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
const data = this.xref.fetchIfRef(xfa[i + 1]);
|
|
|
|
if (!isStream(data) || data.isEmpty) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
try {
|
2021-05-01 19:11:09 +09:00
|
|
|
entries[name] = stringToUTF8String(data.getString());
|
2021-03-19 18:11:40 +09:00
|
|
|
} catch (_) {
|
|
|
|
warn("XFA - Invalid utf-8 string.");
|
|
|
|
return null;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return entries;
|
|
|
|
}
|
|
|
|
|
|
|
|
get xfaFactory() {
|
|
|
|
if (
|
|
|
|
this.pdfManager.enableXfa &&
|
|
|
|
this.formInfo.hasXfa &&
|
|
|
|
!this.formInfo.hasAcroForm
|
|
|
|
) {
|
|
|
|
const data = this.xfaData;
|
|
|
|
return shadow(this, "xfaFactory", data ? new XFAFactory(data) : null);
|
|
|
|
}
|
|
|
|
return shadow(this, "xfaFaxtory", null);
|
|
|
|
}
|
|
|
|
|
2021-05-25 22:50:12 +09:00
|
|
|
get htmlForXfa() {
|
|
|
|
if (this.xfaFactory) {
|
|
|
|
return this.xfaFactory.getPages();
|
|
|
|
}
|
|
|
|
return null;
|
2021-03-19 18:11:40 +09:00
|
|
|
}
|
|
|
|
|
2021-03-26 17:28:18 +09:00
|
|
|
async loadXfaFonts(handler, task) {
|
|
|
|
const acroForm = await this.pdfManager.ensureCatalog("acroForm");
|
A couple of small scripting/XFA-related tweaks in the worker-code
- Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible.
- Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.)
- Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be.
Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.)
- Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.
2021-04-17 17:13:42 +09:00
|
|
|
if (!acroForm) {
|
|
|
|
return;
|
|
|
|
}
|
2021-03-26 17:28:18 +09:00
|
|
|
const resources = await acroForm.getAsync("DR");
|
|
|
|
if (!(resources instanceof Dict)) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
const objectLoader = new ObjectLoader(resources, ["Font"], this.xref);
|
|
|
|
await objectLoader.load();
|
|
|
|
|
|
|
|
const fontRes = resources.get("Font");
|
|
|
|
if (!(fontRes instanceof Dict)) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2021-06-09 03:50:31 +09:00
|
|
|
const options = Object.assign(
|
|
|
|
Object.create(null),
|
|
|
|
this.pdfManager.evaluatorOptions
|
|
|
|
);
|
|
|
|
options.useSystemFonts = false;
|
|
|
|
|
2021-03-26 17:28:18 +09:00
|
|
|
const partialEvaluator = new PartialEvaluator({
|
|
|
|
xref: this.xref,
|
|
|
|
handler,
|
|
|
|
pageIndex: -1,
|
|
|
|
idFactory: this._globalIdFactory,
|
|
|
|
fontCache: this.catalog.fontCache,
|
|
|
|
builtInCMapCache: this.catalog.builtInCMapCache,
|
2021-06-08 20:58:52 +09:00
|
|
|
standardFontDataCache: this.catalog.standardFontDataCache,
|
2021-06-09 03:50:31 +09:00
|
|
|
options,
|
2021-03-26 17:28:18 +09:00
|
|
|
});
|
|
|
|
const operatorList = new OperatorList();
|
|
|
|
const initialState = {
|
|
|
|
font: null,
|
|
|
|
clone() {
|
|
|
|
return this;
|
|
|
|
},
|
|
|
|
};
|
|
|
|
|
|
|
|
const fonts = new Map();
|
|
|
|
fontRes.forEach((fontName, font) => {
|
|
|
|
fonts.set(fontName, font);
|
|
|
|
});
|
|
|
|
const promises = [];
|
|
|
|
|
|
|
|
for (const [fontName, font] of fonts) {
|
|
|
|
const descriptor = font.get("FontDescriptor");
|
A couple of small scripting/XFA-related tweaks in the worker-code
- Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible.
- Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.)
- Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be.
Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.)
- Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.
2021-04-17 17:13:42 +09:00
|
|
|
if (!(descriptor instanceof Dict)) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
const fontFamily = descriptor.get("FontFamily");
|
|
|
|
const fontWeight = descriptor.get("FontWeight");
|
2021-05-28 18:06:11 +09:00
|
|
|
|
|
|
|
// Angle is expressed in degrees counterclockwise in PDF
|
|
|
|
// when it's clockwise in CSS
|
|
|
|
// (see https://drafts.csswg.org/css-fonts-4/#valdef-font-style-oblique-angle)
|
|
|
|
const italicAngle = -descriptor.get("ItalicAngle");
|
A couple of small scripting/XFA-related tweaks in the worker-code
- Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible.
- Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.)
- Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be.
Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.)
- Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.
2021-04-17 17:13:42 +09:00
|
|
|
const cssFontInfo = { fontFamily, fontWeight, italicAngle };
|
2021-03-26 17:28:18 +09:00
|
|
|
|
A couple of small scripting/XFA-related tweaks in the worker-code
- Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible.
- Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.)
- Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be.
Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.)
- Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.
2021-04-17 17:13:42 +09:00
|
|
|
if (!validateCSSFont(cssFontInfo)) {
|
|
|
|
continue;
|
2021-03-26 17:28:18 +09:00
|
|
|
}
|
A couple of small scripting/XFA-related tweaks in the worker-code
- Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible.
- Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.)
- Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be.
Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.)
- Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.
2021-04-17 17:13:42 +09:00
|
|
|
promises.push(
|
|
|
|
partialEvaluator
|
|
|
|
.handleSetFont(
|
|
|
|
resources,
|
|
|
|
[Name.get(fontName), 1],
|
|
|
|
/* fontRef = */ null,
|
|
|
|
operatorList,
|
|
|
|
task,
|
|
|
|
initialState,
|
|
|
|
/* fallbackFontDict = */ null,
|
|
|
|
/* cssFontInfo = */ cssFontInfo
|
|
|
|
)
|
|
|
|
.catch(function (reason) {
|
|
|
|
warn(`loadXfaFonts: "${reason}".`);
|
|
|
|
return null;
|
|
|
|
})
|
|
|
|
);
|
2021-03-26 17:28:18 +09:00
|
|
|
}
|
|
|
|
await Promise.all(promises);
|
|
|
|
}
|
|
|
|
|
2020-08-23 21:04:49 +09:00
|
|
|
get formInfo() {
|
2021-04-10 23:53:17 +09:00
|
|
|
const formInfo = {
|
|
|
|
hasFields: false,
|
|
|
|
hasAcroForm: false,
|
|
|
|
hasXfa: false,
|
|
|
|
hasSignatures: false,
|
|
|
|
};
|
2020-08-23 21:04:49 +09:00
|
|
|
const acroForm = this.catalog.acroForm;
|
|
|
|
if (!acroForm) {
|
|
|
|
return shadow(this, "formInfo", formInfo);
|
|
|
|
}
|
|
|
|
|
|
|
|
try {
|
Don't store complex data in `PDFDocument.formInfo`, and replace the `fields` object with a `hasFields` boolean instead
*This patch is based on a couple of smaller things that I noticed when working on PR 12479.*
- Don't store the /Fields on the `formInfo` getter, since that feels like overloading it with unintended (and too complex) data, and utilize a `hasFields` boolean instead.
This functionality was originally added in PR 12271, to help determine what kind of form data a PDF document contains, and I think that we should ensure that the return value of `formInfo` only consists of "simple" data.
With these changes the `fieldObjects` getter instead has to look-up the /Fields manually, however that shouldn't be a problem since the access is guarded by a `formInfo.hasFields` check which ensures that the data both exists and is valid. Furthermore, most documents doesn't even have any /AcroForm data anyway.
- Determine the `hasFields` property *first*, to ensure that it's always correct even if there's errors when checking e.g. the /XFA or /SigFlags entires, since the `fieldObjects` getter depends on it.
- Simplify a loop in `fieldObjects`, since the object being accessed is a `Map` and those have built-in iteration support.
- Use a higher logging level for errors in the `formInfo` getter, and include the actual error message, since that'd have helped with fixing PR 12479 a lot quicker.
- Update the JSDoc comment in `src/display/api.js` to list the return values correctly, and also slightly extend/improve the description.
2020-10-16 19:20:44 +09:00
|
|
|
const fields = acroForm.get("Fields");
|
|
|
|
const hasFields = Array.isArray(fields) && fields.length > 0;
|
|
|
|
formInfo.hasFields = hasFields; // Used by the `fieldObjects` getter.
|
|
|
|
|
2020-08-23 21:04:49 +09:00
|
|
|
// The document contains XFA data if the `XFA` entry is a non-empty
|
|
|
|
// array or stream.
|
|
|
|
const xfa = acroForm.get("XFA");
|
Don't store complex data in `PDFDocument.formInfo`, and replace the `fields` object with a `hasFields` boolean instead
*This patch is based on a couple of smaller things that I noticed when working on PR 12479.*
- Don't store the /Fields on the `formInfo` getter, since that feels like overloading it with unintended (and too complex) data, and utilize a `hasFields` boolean instead.
This functionality was originally added in PR 12271, to help determine what kind of form data a PDF document contains, and I think that we should ensure that the return value of `formInfo` only consists of "simple" data.
With these changes the `fieldObjects` getter instead has to look-up the /Fields manually, however that shouldn't be a problem since the access is guarded by a `formInfo.hasFields` check which ensures that the data both exists and is valid. Furthermore, most documents doesn't even have any /AcroForm data anyway.
- Determine the `hasFields` property *first*, to ensure that it's always correct even if there's errors when checking e.g. the /XFA or /SigFlags entires, since the `fieldObjects` getter depends on it.
- Simplify a loop in `fieldObjects`, since the object being accessed is a `Map` and those have built-in iteration support.
- Use a higher logging level for errors in the `formInfo` getter, and include the actual error message, since that'd have helped with fixing PR 12479 a lot quicker.
- Update the JSDoc comment in `src/display/api.js` to list the return values correctly, and also slightly extend/improve the description.
2020-10-16 19:20:44 +09:00
|
|
|
formInfo.hasXfa =
|
2020-08-23 21:04:49 +09:00
|
|
|
(Array.isArray(xfa) && xfa.length > 0) ||
|
|
|
|
(isStream(xfa) && !xfa.isEmpty);
|
|
|
|
|
|
|
|
// The document contains AcroForm data if the `Fields` entry is a
|
|
|
|
// non-empty array and it doesn't consist of only document signatures.
|
|
|
|
// This second check is required for files that don't actually contain
|
|
|
|
// AcroForm data (only XFA data), but that use the `Fields` entry to
|
|
|
|
// store (invisible) document signatures. This can be detected using
|
|
|
|
// the first bit of the `SigFlags` integer (see Table 219 in the
|
|
|
|
// specification).
|
|
|
|
const sigFlags = acroForm.get("SigFlags");
|
2021-04-10 23:53:17 +09:00
|
|
|
const hasSignatures = !!(sigFlags & 0x1);
|
2020-08-23 21:04:49 +09:00
|
|
|
const hasOnlyDocumentSignatures =
|
2021-04-10 23:53:17 +09:00
|
|
|
hasSignatures && this._hasOnlyDocumentSignatures(fields);
|
2020-08-23 21:04:49 +09:00
|
|
|
formInfo.hasAcroForm = hasFields && !hasOnlyDocumentSignatures;
|
2021-04-10 23:53:17 +09:00
|
|
|
formInfo.hasSignatures = hasSignatures;
|
2020-08-23 21:04:49 +09:00
|
|
|
} catch (ex) {
|
|
|
|
if (ex instanceof MissingDataException) {
|
|
|
|
throw ex;
|
|
|
|
}
|
Don't store complex data in `PDFDocument.formInfo`, and replace the `fields` object with a `hasFields` boolean instead
*This patch is based on a couple of smaller things that I noticed when working on PR 12479.*
- Don't store the /Fields on the `formInfo` getter, since that feels like overloading it with unintended (and too complex) data, and utilize a `hasFields` boolean instead.
This functionality was originally added in PR 12271, to help determine what kind of form data a PDF document contains, and I think that we should ensure that the return value of `formInfo` only consists of "simple" data.
With these changes the `fieldObjects` getter instead has to look-up the /Fields manually, however that shouldn't be a problem since the access is guarded by a `formInfo.hasFields` check which ensures that the data both exists and is valid. Furthermore, most documents doesn't even have any /AcroForm data anyway.
- Determine the `hasFields` property *first*, to ensure that it's always correct even if there's errors when checking e.g. the /XFA or /SigFlags entires, since the `fieldObjects` getter depends on it.
- Simplify a loop in `fieldObjects`, since the object being accessed is a `Map` and those have built-in iteration support.
- Use a higher logging level for errors in the `formInfo` getter, and include the actual error message, since that'd have helped with fixing PR 12479 a lot quicker.
- Update the JSDoc comment in `src/display/api.js` to list the return values correctly, and also slightly extend/improve the description.
2020-10-16 19:20:44 +09:00
|
|
|
warn(`Cannot fetch form information: "${ex}".`);
|
2020-08-23 21:04:49 +09:00
|
|
|
}
|
|
|
|
return shadow(this, "formInfo", formInfo);
|
|
|
|
}
|
|
|
|
|
2018-12-30 00:18:36 +09:00
|
|
|
get documentInfo() {
|
|
|
|
const DocumentInfoValidators = {
|
|
|
|
Title: isString,
|
|
|
|
Author: isString,
|
|
|
|
Subject: isString,
|
|
|
|
Keywords: isString,
|
|
|
|
Creator: isString,
|
|
|
|
Producer: isString,
|
|
|
|
CreationDate: isString,
|
|
|
|
ModDate: isString,
|
|
|
|
Trapped: isName,
|
|
|
|
};
|
|
|
|
|
2020-08-23 05:21:38 +09:00
|
|
|
let version = this._version;
|
2020-02-05 21:59:47 +09:00
|
|
|
if (
|
|
|
|
typeof version !== "string" ||
|
|
|
|
!PDF_HEADER_VERSION_REGEXP.test(version)
|
|
|
|
) {
|
|
|
|
warn(`Invalid PDF header version number: ${version}`);
|
|
|
|
version = null;
|
|
|
|
}
|
|
|
|
|
2018-12-30 00:18:36 +09:00
|
|
|
const docInfo = {
|
2020-02-05 21:59:47 +09:00
|
|
|
PDFFormatVersion: version,
|
2018-12-30 00:18:36 +09:00
|
|
|
IsLinearized: !!this.linearization,
|
2020-08-23 21:04:49 +09:00
|
|
|
IsAcroFormPresent: this.formInfo.hasAcroForm,
|
|
|
|
IsXFAPresent: this.formInfo.hasXfa,
|
2020-08-23 05:47:15 +09:00
|
|
|
IsCollectionPresent: !!this.catalog.collection,
|
2021-04-10 23:53:17 +09:00
|
|
|
IsSignaturesPresent: this.formInfo.hasSignatures,
|
2018-12-30 00:18:36 +09:00
|
|
|
};
|
|
|
|
|
|
|
|
let infoDict;
|
|
|
|
try {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
infoDict = this.xref.trailer.get("Info");
|
2018-12-30 00:18:36 +09:00
|
|
|
} catch (err) {
|
|
|
|
if (err instanceof MissingDataException) {
|
|
|
|
throw err;
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
info("The document information dictionary is invalid.");
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
2013-02-07 08:19:29 +09:00
|
|
|
|
2018-12-30 00:18:36 +09:00
|
|
|
if (isDict(infoDict)) {
|
|
|
|
// Fill the document info with valid entries from the specification,
|
|
|
|
// as well as any existing well-formed custom entries.
|
|
|
|
for (const key of infoDict.getKeys()) {
|
|
|
|
const value = infoDict.get(key);
|
|
|
|
|
|
|
|
if (DocumentInfoValidators[key]) {
|
|
|
|
// Make sure the (standard) value conforms to the specification.
|
|
|
|
if (DocumentInfoValidators[key](value)) {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
docInfo[key] =
|
|
|
|
typeof value !== "string" ? value : stringToPDFString(value);
|
2018-12-30 00:18:36 +09:00
|
|
|
} else {
|
|
|
|
info(`Bad value in document info for "${key}".`);
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
} else if (typeof key === "string") {
|
2018-12-30 00:18:36 +09:00
|
|
|
// For custom values, only accept white-listed types to prevent
|
|
|
|
// errors that would occur when trying to send non-serializable
|
|
|
|
// objects to the main-thread (for example `Dict` or `Stream`).
|
|
|
|
let customValue;
|
|
|
|
if (isString(value)) {
|
|
|
|
customValue = stringToPDFString(value);
|
|
|
|
} else if (isName(value) || isNum(value) || isBool(value)) {
|
|
|
|
customValue = value;
|
|
|
|
} else {
|
|
|
|
info(`Unsupported value in document info for (custom) "${key}".`);
|
|
|
|
continue;
|
|
|
|
}
|
2018-07-26 03:50:25 +09:00
|
|
|
|
2020-04-17 19:06:27 +09:00
|
|
|
if (!docInfo.Custom) {
|
|
|
|
docInfo.Custom = Object.create(null);
|
2018-07-26 03:50:25 +09:00
|
|
|
}
|
2020-04-17 19:06:27 +09:00
|
|
|
docInfo.Custom[key] = customValue;
|
2018-07-26 03:50:25 +09:00
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(this, "documentInfo", docInfo);
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
2018-07-26 03:50:25 +09:00
|
|
|
|
2018-12-30 00:18:36 +09:00
|
|
|
get fingerprint() {
|
|
|
|
let hash;
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
const idArray = this.xref.trailer.get("ID");
|
|
|
|
if (
|
|
|
|
Array.isArray(idArray) &&
|
|
|
|
idArray[0] &&
|
|
|
|
isString(idArray[0]) &&
|
|
|
|
idArray[0] !== EMPTY_FINGERPRINT
|
|
|
|
) {
|
2018-12-30 00:18:36 +09:00
|
|
|
hash = stringToBytes(idArray[0]);
|
|
|
|
} else {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
hash = calculateMD5(
|
|
|
|
this.stream.getByteRange(0, FINGERPRINT_FIRST_BYTES),
|
|
|
|
0,
|
|
|
|
FINGERPRINT_FIRST_BYTES
|
|
|
|
);
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
|
|
|
|
2019-07-15 18:26:07 +09:00
|
|
|
const fingerprintBuf = [];
|
2019-01-03 19:14:11 +09:00
|
|
|
for (let i = 0, ii = hash.length; i < ii; i++) {
|
|
|
|
const hex = hash[i].toString(16);
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
fingerprintBuf.push(hex.padStart(2, "0"));
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(this, "fingerprint", fingerprintBuf.join(""));
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
_getLinearizationPage(pageIndex) {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
const { catalog, linearization } = this;
|
2020-05-05 19:40:01 +09:00
|
|
|
if (
|
|
|
|
typeof PDFJSDev === "undefined" ||
|
|
|
|
PDFJSDev.test("!PRODUCTION || TESTING")
|
|
|
|
) {
|
|
|
|
assert(
|
|
|
|
linearization && linearization.pageFirst === pageIndex,
|
|
|
|
"_getLinearizationPage - invalid pageIndex argument."
|
|
|
|
);
|
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
|
2019-05-26 00:40:14 +09:00
|
|
|
const ref = Ref.get(linearization.objectNumberFirst, 0);
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return this.xref
|
|
|
|
.fetchAsync(ref)
|
|
|
|
.then(obj => {
|
|
|
|
// Ensure that the object that was found is actually a Page dictionary.
|
|
|
|
if (
|
|
|
|
isDict(obj, "Page") ||
|
|
|
|
(isDict(obj) && !obj.has("Type") && obj.has("Contents"))
|
|
|
|
) {
|
|
|
|
if (ref && !catalog.pageKidsCountCache.has(ref)) {
|
|
|
|
catalog.pageKidsCountCache.put(ref, 1); // Cache the Page reference.
|
|
|
|
}
|
|
|
|
return [obj, ref];
|
Check that the first page can be successfully loaded, to try and ascertain the validity of the XRef table (issue 7496, issue 10326)
For PDF documents with sufficiently broken XRef tables, it's usually quite obvious when you need to fallback to indexing the entire file. However, for certain kinds of corrupted PDF documents the XRef table will, for all intents and purposes, appear to be valid. It's not until you actually try to fetch various objects that things will start to break, which is the case in the referenced issues[1].
Since there's generally a real effort being in made PDF.js to load even corrupt PDF documents, this patch contains a suggested approach to attempt to do a bit more validation of the XRef table during the initial document loading phase.
Here the choice is made to attempt to load the *first* page, as a basic sanity check of the validity of the XRef table. Please note that attempting to load a more-or-less arbitrarily chosen object without any context of what it's supposed to be isn't a very useful, which is why this particular choice was made.
Obviously, just because the first page can be loaded successfully that doesn't guarantee that the *entire* XRef table is valid, however if even the first page fails to load you can be reasonably sure that the document is *not* valid[2].
Even though this patch won't cause any significant increase in the amount of parsing required during initial loading of the document[3], it will require loading of more data upfront which thus delays the initial `getDocument` call.
Whether or not this is a problem depends very much on what you actually measure, please consider the following examples:
```javascript
console.time('first');
getDocument(...).promise.then((pdfDocument) => {
console.timeEnd('first');
});
console.time('second');
getDocument(...).promise.then((pdfDocument) => {
pdfDocument.getPage(1).then((pdfPage) => { // Note: the API uses `pageNumber >= 1`, the Worker uses `pageIndex >= 0`.
console.timeEnd('second');
});
});
```
The first case is pretty much guaranteed to show a small regression, however the second case won't be affected at all since the Worker caches the result of `getPage` calls. Again, please remember that the second case is what matters for the standard PDF.js use-case which is why I'm hoping that this patch is deemed acceptable.
---
[1] In issue 7496, the problem is that the document is edited without the XRef table being correctly updated.
In issue 10326, the generator was sorting the XRef table according to the offsets rather than the objects.
[2] The idea of checking the first page in particular came from the "standard" use-case for the PDF.js library, i.e. the default viewer, where a failure to load the first page basically means that nothing will work; note how `{BaseViewer, PDFThumbnailViewer}.setDocument` depends completely on being able to fetch the *first* page.
[3] The only extra parsing is caused by, potentially, having to traverse *part* of the `Pages` tree to find the first page.
2018-12-05 05:51:27 +09:00
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
throw new FormatError(
|
|
|
|
"The Linearization dictionary doesn't point " +
|
|
|
|
"to a valid Page dictionary."
|
|
|
|
);
|
|
|
|
})
|
|
|
|
.catch(reason => {
|
|
|
|
info(reason);
|
|
|
|
return catalog.getPageDict(pageIndex);
|
|
|
|
});
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
getPage(pageIndex) {
|
|
|
|
if (this._pagePromises[pageIndex] !== undefined) {
|
|
|
|
return this._pagePromises[pageIndex];
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
const { catalog, linearization } = this;
|
2018-12-30 00:18:36 +09:00
|
|
|
|
2021-03-19 18:11:40 +09:00
|
|
|
if (this.xfaFactory) {
|
|
|
|
return Promise.resolve(
|
|
|
|
new Page({
|
|
|
|
pdfManager: this.pdfManager,
|
|
|
|
xref: this.xref,
|
|
|
|
pageIndex,
|
|
|
|
pageDict: Dict.empty,
|
|
|
|
ref: null,
|
|
|
|
globalIdFactory: this._globalIdFactory,
|
|
|
|
fontCache: catalog.fontCache,
|
|
|
|
builtInCMapCache: catalog.builtInCMapCache,
|
2021-06-08 20:58:52 +09:00
|
|
|
standardFontDataCache: catalog.standardFontDataCache,
|
2021-03-19 18:11:40 +09:00
|
|
|
globalImageCache: catalog.globalImageCache,
|
|
|
|
nonBlendModesSet: catalog.nonBlendModesSet,
|
|
|
|
xfaFactory: this.xfaFactory,
|
|
|
|
})
|
|
|
|
);
|
|
|
|
}
|
|
|
|
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
const promise =
|
|
|
|
linearization && linearization.pageFirst === pageIndex
|
|
|
|
? this._getLinearizationPage(pageIndex)
|
|
|
|
: catalog.getPageDict(pageIndex);
|
2018-12-30 00:18:36 +09:00
|
|
|
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return (this._pagePromises[pageIndex] = promise.then(([pageDict, ref]) => {
|
2018-12-30 00:18:36 +09:00
|
|
|
return new Page({
|
|
|
|
pdfManager: this.pdfManager,
|
|
|
|
xref: this.xref,
|
|
|
|
pageIndex,
|
|
|
|
pageDict,
|
|
|
|
ref,
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
globalIdFactory: this._globalIdFactory,
|
2018-12-30 00:18:36 +09:00
|
|
|
fontCache: catalog.fontCache,
|
|
|
|
builtInCMapCache: catalog.builtInCMapCache,
|
2021-06-08 20:58:52 +09:00
|
|
|
standardFontDataCache: catalog.standardFontDataCache,
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
globalImageCache: catalog.globalImageCache,
|
Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in `PartialEvaluator.hasBlendModes`
The `PartialEvaluator.hasBlendModes` method is necessary to determine if there's any blend modes on a page, which unfortunately requires *synchronous* parsing of the /Resources of each page before its rendering can start (see the "StartRenderPage"-message).
In practice it's not uncommon for certain /Resources-entries to be found on more than one page (referenced via the XRef-table), which thus leads to unnecessary re-fetching/re-parsing of data in `PartialEvaluator.hasBlendModes`.
To improve performance, especially in pathological cases, we can cache /Resources-entries when it's absolutely clear that they do not contain *any* blend modes at all[1]. This way, subsequent `PartialEvaluator.hasBlendModes` calls can be made significantly more efficient.
This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf:
```
[
{ "id": "issue6961",
"file": "../web/pdfs/issue6961.pdf",
"md5": "a80e4357a8fda758d96c2c76f2980b03",
"rounds": 100,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | 0 | Overall | 100 | 1034 | 555 | -480 | -46.39 | faster
firefox | 0 | Page Request | 100 | 489 | 7 | -482 | -98.67 | faster
firefox | 0 | Rendering | 100 | 545 | 548 | 2 | 0.45 |
firefox | 1 | Overall | 100 | 912 | 428 | -484 | -53.06 | faster
firefox | 1 | Page Request | 100 | 487 | 1 | -486 | -99.77 | faster
firefox | 1 | Rendering | 100 | 425 | 427 | 2 | 0.51 |
```
---
[1] In the case where blend modes *are* found, it becomes a lot more difficult to know if it's generally safe to skip /Resources-entries. Hence we don't cache anything in that case, however note that most document/pages do not utilize blend modes anyway.
2020-11-05 21:35:33 +09:00
|
|
|
nonBlendModesSet: catalog.nonBlendModesSet,
|
2021-03-19 18:11:40 +09:00
|
|
|
xfaFactory: null,
|
Check that the first page can be successfully loaded, to try and ascertain the validity of the XRef table (issue 7496, issue 10326)
For PDF documents with sufficiently broken XRef tables, it's usually quite obvious when you need to fallback to indexing the entire file. However, for certain kinds of corrupted PDF documents the XRef table will, for all intents and purposes, appear to be valid. It's not until you actually try to fetch various objects that things will start to break, which is the case in the referenced issues[1].
Since there's generally a real effort being in made PDF.js to load even corrupt PDF documents, this patch contains a suggested approach to attempt to do a bit more validation of the XRef table during the initial document loading phase.
Here the choice is made to attempt to load the *first* page, as a basic sanity check of the validity of the XRef table. Please note that attempting to load a more-or-less arbitrarily chosen object without any context of what it's supposed to be isn't a very useful, which is why this particular choice was made.
Obviously, just because the first page can be loaded successfully that doesn't guarantee that the *entire* XRef table is valid, however if even the first page fails to load you can be reasonably sure that the document is *not* valid[2].
Even though this patch won't cause any significant increase in the amount of parsing required during initial loading of the document[3], it will require loading of more data upfront which thus delays the initial `getDocument` call.
Whether or not this is a problem depends very much on what you actually measure, please consider the following examples:
```javascript
console.time('first');
getDocument(...).promise.then((pdfDocument) => {
console.timeEnd('first');
});
console.time('second');
getDocument(...).promise.then((pdfDocument) => {
pdfDocument.getPage(1).then((pdfPage) => { // Note: the API uses `pageNumber >= 1`, the Worker uses `pageIndex >= 0`.
console.timeEnd('second');
});
});
```
The first case is pretty much guaranteed to show a small regression, however the second case won't be affected at all since the Worker caches the result of `getPage` calls. Again, please remember that the second case is what matters for the standard PDF.js use-case which is why I'm hoping that this patch is deemed acceptable.
---
[1] In issue 7496, the problem is that the document is edited without the XRef table being correctly updated.
In issue 10326, the generator was sorting the XRef table according to the offsets rather than the objects.
[2] The idea of checking the first page in particular came from the "standard" use-case for the PDF.js library, i.e. the default viewer, where a failure to load the first page basically means that nothing will work; note how `{BaseViewer, PDFThumbnailViewer}.setDocument` depends completely on being able to fetch the *first* page.
[3] The only extra parsing is caused by, potentially, having to traverse *part* of the `Pages` tree to find the first page.
2018-12-05 05:51:27 +09:00
|
|
|
});
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
}));
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
Check that the first page can be successfully loaded, to try and ascertain the validity of the XRef table (issue 7496, issue 10326)
For PDF documents with sufficiently broken XRef tables, it's usually quite obvious when you need to fallback to indexing the entire file. However, for certain kinds of corrupted PDF documents the XRef table will, for all intents and purposes, appear to be valid. It's not until you actually try to fetch various objects that things will start to break, which is the case in the referenced issues[1].
Since there's generally a real effort being in made PDF.js to load even corrupt PDF documents, this patch contains a suggested approach to attempt to do a bit more validation of the XRef table during the initial document loading phase.
Here the choice is made to attempt to load the *first* page, as a basic sanity check of the validity of the XRef table. Please note that attempting to load a more-or-less arbitrarily chosen object without any context of what it's supposed to be isn't a very useful, which is why this particular choice was made.
Obviously, just because the first page can be loaded successfully that doesn't guarantee that the *entire* XRef table is valid, however if even the first page fails to load you can be reasonably sure that the document is *not* valid[2].
Even though this patch won't cause any significant increase in the amount of parsing required during initial loading of the document[3], it will require loading of more data upfront which thus delays the initial `getDocument` call.
Whether or not this is a problem depends very much on what you actually measure, please consider the following examples:
```javascript
console.time('first');
getDocument(...).promise.then((pdfDocument) => {
console.timeEnd('first');
});
console.time('second');
getDocument(...).promise.then((pdfDocument) => {
pdfDocument.getPage(1).then((pdfPage) => { // Note: the API uses `pageNumber >= 1`, the Worker uses `pageIndex >= 0`.
console.timeEnd('second');
});
});
```
The first case is pretty much guaranteed to show a small regression, however the second case won't be affected at all since the Worker caches the result of `getPage` calls. Again, please remember that the second case is what matters for the standard PDF.js use-case which is why I'm hoping that this patch is deemed acceptable.
---
[1] In issue 7496, the problem is that the document is edited without the XRef table being correctly updated.
In issue 10326, the generator was sorting the XRef table according to the offsets rather than the objects.
[2] The idea of checking the first page in particular came from the "standard" use-case for the PDF.js library, i.e. the default viewer, where a failure to load the first page basically means that nothing will work; note how `{BaseViewer, PDFThumbnailViewer}.setDocument` depends completely on being able to fetch the *first* page.
[3] The only extra parsing is caused by, potentially, having to traverse *part* of the `Pages` tree to find the first page.
2018-12-05 05:51:27 +09:00
|
|
|
|
2018-12-30 00:18:36 +09:00
|
|
|
checkFirstPage() {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return this.getPage(0).catch(async reason => {
|
2018-12-30 00:18:36 +09:00
|
|
|
if (reason instanceof XRefEntryException) {
|
|
|
|
// Clear out the various caches to ensure that we haven't stored any
|
|
|
|
// inconsistent and/or incorrect state, since that could easily break
|
|
|
|
// subsequent `this.getPage` calls.
|
|
|
|
this._pagePromises.length = 0;
|
2019-12-07 20:20:26 +09:00
|
|
|
await this.cleanup();
|
2011-10-25 10:13:12 +09:00
|
|
|
|
2018-12-30 00:18:36 +09:00
|
|
|
throw new XRefParseException();
|
|
|
|
}
|
|
|
|
});
|
|
|
|
}
|
|
|
|
|
Fallback to the built-in font renderer when font loading fails
After PR 9340 all glyphs are now re-mapped to a Private Use Area (PUA) which means that if a font fails to load, for whatever reason[1], all glyphs in the font will now render as Unicode glyph outlines.
This obviously doesn't look good, to say the least, and might be seen as a "regression" since previously many glyphs were left in their original positions which provided a slightly better fallback[2].
Hence this patch, which implements a *general* fallback to the PDF.js built-in font renderer for fonts that fail to load (i.e. are rejected by the sanitizer). One caveat here is that this only works for the Font Loading API, since it's easy to handle errors in that case[3].
The solution implemented in this patch does *not* in any way delay the loading of valid fonts, which was the problem with my previous attempt at a solution, and will only require a bit of extra work/waiting for those fonts that actually fail to load.
*Please note:* This patch doesn't fix any of the underlying PDF.js font conversion bugs that's responsible for creating corrupt font files, however it does *improve* rendering in a number of cases; refer to this possibly incomplete list:
[Bug 1524888](https://bugzilla.mozilla.org/show_bug.cgi?id=1524888)
Issue 10175
Issue 10232
---
[1] Usually because the PDF.js font conversion code wasn't able to parse the font file correctly.
[2] Glyphs fell back to some default font, which while not accurate was more useful than the current state.
[3] Furthermore I'm not sure how to implement this generally, assuming that's even possible, and don't really have time/interest to look into it either.
2019-02-11 08:47:56 +09:00
|
|
|
fontFallback(id, handler) {
|
|
|
|
return this.catalog.fontFallback(id, handler);
|
|
|
|
}
|
|
|
|
|
2020-05-23 18:21:32 +09:00
|
|
|
async cleanup(manuallyTriggered = false) {
|
|
|
|
return this.catalog
|
|
|
|
? this.catalog.cleanup(manuallyTriggered)
|
|
|
|
: clearPrimitiveCaches();
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
2020-10-01 03:58:45 +09:00
|
|
|
|
2020-10-16 19:57:01 +09:00
|
|
|
/**
|
|
|
|
* @private
|
|
|
|
*/
|
2020-10-01 03:58:45 +09:00
|
|
|
_collectFieldObjects(name, fieldRef, promises) {
|
|
|
|
const field = this.xref.fetchIfRef(fieldRef);
|
|
|
|
if (field.has("T")) {
|
|
|
|
const partName = stringToPDFString(field.get("T"));
|
|
|
|
if (name === "") {
|
|
|
|
name = partName;
|
|
|
|
} else {
|
|
|
|
name = `${name}.${partName}`;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-12-17 00:00:12 +09:00
|
|
|
if (!promises.has(name)) {
|
2020-10-01 03:58:45 +09:00
|
|
|
promises.set(name, []);
|
|
|
|
}
|
|
|
|
promises.get(name).push(
|
|
|
|
AnnotationFactory.create(
|
|
|
|
this.xref,
|
|
|
|
fieldRef,
|
|
|
|
this.pdfManager,
|
2021-03-31 00:50:35 +09:00
|
|
|
this._localIdFactory,
|
|
|
|
/* collectFields */ true
|
2020-10-01 03:58:45 +09:00
|
|
|
)
|
2020-10-22 20:24:43 +09:00
|
|
|
.then(annotation => annotation && annotation.getFieldObject())
|
2020-10-01 03:58:45 +09:00
|
|
|
.catch(function (reason) {
|
|
|
|
warn(`_collectFieldObjects: "${reason}".`);
|
|
|
|
return null;
|
|
|
|
})
|
|
|
|
);
|
|
|
|
|
|
|
|
if (field.has("Kids")) {
|
|
|
|
const kids = field.get("Kids");
|
|
|
|
for (const kid of kids) {
|
|
|
|
this._collectFieldObjects(name, kid, promises);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
get fieldObjects() {
|
Don't store complex data in `PDFDocument.formInfo`, and replace the `fields` object with a `hasFields` boolean instead
*This patch is based on a couple of smaller things that I noticed when working on PR 12479.*
- Don't store the /Fields on the `formInfo` getter, since that feels like overloading it with unintended (and too complex) data, and utilize a `hasFields` boolean instead.
This functionality was originally added in PR 12271, to help determine what kind of form data a PDF document contains, and I think that we should ensure that the return value of `formInfo` only consists of "simple" data.
With these changes the `fieldObjects` getter instead has to look-up the /Fields manually, however that shouldn't be a problem since the access is guarded by a `formInfo.hasFields` check which ensures that the data both exists and is valid. Furthermore, most documents doesn't even have any /AcroForm data anyway.
- Determine the `hasFields` property *first*, to ensure that it's always correct even if there's errors when checking e.g. the /XFA or /SigFlags entires, since the `fieldObjects` getter depends on it.
- Simplify a loop in `fieldObjects`, since the object being accessed is a `Map` and those have built-in iteration support.
- Use a higher logging level for errors in the `formInfo` getter, and include the actual error message, since that'd have helped with fixing PR 12479 a lot quicker.
- Update the JSDoc comment in `src/display/api.js` to list the return values correctly, and also slightly extend/improve the description.
2020-10-16 19:20:44 +09:00
|
|
|
if (!this.formInfo.hasFields) {
|
2020-10-01 03:58:45 +09:00
|
|
|
return shadow(this, "fieldObjects", Promise.resolve(null));
|
|
|
|
}
|
|
|
|
|
|
|
|
const allFields = Object.create(null);
|
|
|
|
const fieldPromises = new Map();
|
Don't store complex data in `PDFDocument.formInfo`, and replace the `fields` object with a `hasFields` boolean instead
*This patch is based on a couple of smaller things that I noticed when working on PR 12479.*
- Don't store the /Fields on the `formInfo` getter, since that feels like overloading it with unintended (and too complex) data, and utilize a `hasFields` boolean instead.
This functionality was originally added in PR 12271, to help determine what kind of form data a PDF document contains, and I think that we should ensure that the return value of `formInfo` only consists of "simple" data.
With these changes the `fieldObjects` getter instead has to look-up the /Fields manually, however that shouldn't be a problem since the access is guarded by a `formInfo.hasFields` check which ensures that the data both exists and is valid. Furthermore, most documents doesn't even have any /AcroForm data anyway.
- Determine the `hasFields` property *first*, to ensure that it's always correct even if there's errors when checking e.g. the /XFA or /SigFlags entires, since the `fieldObjects` getter depends on it.
- Simplify a loop in `fieldObjects`, since the object being accessed is a `Map` and those have built-in iteration support.
- Use a higher logging level for errors in the `formInfo` getter, and include the actual error message, since that'd have helped with fixing PR 12479 a lot quicker.
- Update the JSDoc comment in `src/display/api.js` to list the return values correctly, and also slightly extend/improve the description.
2020-10-16 19:20:44 +09:00
|
|
|
for (const fieldRef of this.catalog.acroForm.get("Fields")) {
|
2020-10-01 03:58:45 +09:00
|
|
|
this._collectFieldObjects("", fieldRef, fieldPromises);
|
|
|
|
}
|
|
|
|
|
|
|
|
const allPromises = [];
|
Don't store complex data in `PDFDocument.formInfo`, and replace the `fields` object with a `hasFields` boolean instead
*This patch is based on a couple of smaller things that I noticed when working on PR 12479.*
- Don't store the /Fields on the `formInfo` getter, since that feels like overloading it with unintended (and too complex) data, and utilize a `hasFields` boolean instead.
This functionality was originally added in PR 12271, to help determine what kind of form data a PDF document contains, and I think that we should ensure that the return value of `formInfo` only consists of "simple" data.
With these changes the `fieldObjects` getter instead has to look-up the /Fields manually, however that shouldn't be a problem since the access is guarded by a `formInfo.hasFields` check which ensures that the data both exists and is valid. Furthermore, most documents doesn't even have any /AcroForm data anyway.
- Determine the `hasFields` property *first*, to ensure that it's always correct even if there's errors when checking e.g. the /XFA or /SigFlags entires, since the `fieldObjects` getter depends on it.
- Simplify a loop in `fieldObjects`, since the object being accessed is a `Map` and those have built-in iteration support.
- Use a higher logging level for errors in the `formInfo` getter, and include the actual error message, since that'd have helped with fixing PR 12479 a lot quicker.
- Update the JSDoc comment in `src/display/api.js` to list the return values correctly, and also slightly extend/improve the description.
2020-10-16 19:20:44 +09:00
|
|
|
for (const [name, promises] of fieldPromises) {
|
2020-10-01 03:58:45 +09:00
|
|
|
allPromises.push(
|
|
|
|
Promise.all(promises).then(fields => {
|
2020-10-22 20:24:43 +09:00
|
|
|
fields = fields.filter(field => !!field);
|
2020-10-01 03:58:45 +09:00
|
|
|
if (fields.length > 0) {
|
|
|
|
allFields[name] = fields;
|
|
|
|
}
|
|
|
|
})
|
|
|
|
);
|
|
|
|
}
|
|
|
|
|
|
|
|
return shadow(
|
|
|
|
this,
|
|
|
|
"fieldObjects",
|
|
|
|
Promise.all(allPromises).then(() => allFields)
|
|
|
|
);
|
|
|
|
}
|
2020-10-17 00:15:58 +09:00
|
|
|
|
2020-10-29 03:16:56 +09:00
|
|
|
get hasJSActions() {
|
A couple of small scripting/XFA-related tweaks in the worker-code
- Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible.
- Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.)
- Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be.
Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.)
- Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.
2021-04-17 17:13:42 +09:00
|
|
|
const promise = this.pdfManager.ensureDoc("_parseHasJSActions");
|
2021-04-13 19:30:20 +09:00
|
|
|
return shadow(this, "hasJSActions", promise);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* @private
|
|
|
|
*/
|
|
|
|
async _parseHasJSActions() {
|
|
|
|
const [catalogJsActions, fieldObjects] = await Promise.all([
|
|
|
|
this.pdfManager.ensureCatalog("jsActions"),
|
A couple of small scripting/XFA-related tweaks in the worker-code
- Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible.
- Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.)
- Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be.
Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.)
- Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.
2021-04-17 17:13:42 +09:00
|
|
|
this.pdfManager.ensureDoc("fieldObjects"),
|
2021-04-13 19:30:20 +09:00
|
|
|
]);
|
|
|
|
|
|
|
|
if (catalogJsActions) {
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
if (fieldObjects) {
|
|
|
|
return Object.values(fieldObjects).some(fieldObject =>
|
|
|
|
fieldObject.some(object => object.actions !== null)
|
|
|
|
);
|
|
|
|
}
|
|
|
|
return false;
|
2020-10-29 03:16:56 +09:00
|
|
|
}
|
|
|
|
|
2020-10-17 00:15:58 +09:00
|
|
|
get calculationOrderIds() {
|
|
|
|
const acroForm = this.catalog.acroForm;
|
|
|
|
if (!acroForm || !acroForm.has("CO")) {
|
|
|
|
return shadow(this, "calculationOrderIds", null);
|
|
|
|
}
|
|
|
|
|
|
|
|
const calculationOrder = acroForm.get("CO");
|
|
|
|
if (!Array.isArray(calculationOrder) || calculationOrder.length === 0) {
|
|
|
|
return shadow(this, "calculationOrderIds", null);
|
|
|
|
}
|
|
|
|
|
|
|
|
const ids = calculationOrder.filter(isRef).map(ref => ref.toString());
|
2020-12-11 00:02:11 +09:00
|
|
|
if (ids.length === 0) {
|
|
|
|
return shadow(this, "calculationOrderIds", null);
|
|
|
|
}
|
2020-10-17 00:15:58 +09:00
|
|
|
return shadow(this, "calculationOrderIds", ids);
|
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
2015-11-22 01:32:47 +09:00
|
|
|
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
export { Page, PDFDocument };
|