2012-09-01 07:48:21 +09:00
|
|
|
/* Copyright 2012 Mozilla Foundation
|
|
|
|
*
|
|
|
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
|
* you may not use this file except in compliance with the License.
|
|
|
|
* You may obtain a copy of the License at
|
|
|
|
*
|
|
|
|
* http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
*
|
|
|
|
* Unless required by applicable law or agreed to in writing, software
|
|
|
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
* See the License for the specific language governing permissions and
|
|
|
|
* limitations under the License.
|
|
|
|
*/
|
2011-10-25 10:13:12 +09:00
|
|
|
|
2017-07-20 21:04:54 +09:00
|
|
|
import {
|
2023-06-23 02:48:40 +09:00
|
|
|
AnnotationEditorPrefix,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
assert,
|
|
|
|
FormatError,
|
|
|
|
info,
|
|
|
|
InvalidPDFException,
|
|
|
|
isArrayEqual,
|
2020-12-08 03:22:14 +09:00
|
|
|
PageActionEventType,
|
[api-minor] Re-factor the *internal* renderingIntent, and change the default `intent` value in the `PDFPageProxy.getAnnotations` method
With the changes made in PR 13746 the *internal* renderingIntent handling became somewhat "messy", since we're now having to do string-matching in various spots in order to handle the "oplist"-intent correctly.
Hence this patch, which implements the idea from PR 13746 to convert the `intent`-strings, used in various API-methods, into an *internal* renderingIntent that's implemented using a bit-field instead. *Please note:* This part of the patch, in itself, does *not* change the public API (but see below).
This patch is tagged `api-minor` for the following reasons:
1. It changes the *default* value for the `intent` parameter, in the `PDFPageProxy.getAnnotations` method, to "display" in order to be consistent across the API.
2. In order to get *all* annotations, with the `PDFPageProxy.getAnnotations` method, you now need to explicitly set "any" as the `intent` parameter.
3. The `PDFPageProxy.getOperatorList` method will now also support the new "any" intent, to allow accessing the operatorList of all annotations (limited to those types that have one).
4. Finally, for consistency across the API, the `PDFPageProxy.render` method also support the new "any" intent (although I'm not sure how useful that'll be).
Points 1 and 2 above are the significant, and thus breaking, changes in *default* behaviour here. However, unfortunately I cannot see a good way to improve the overall API while also keeping `PDFPageProxy.getAnnotations` unchanged.
2021-08-02 21:30:08 +09:00
|
|
|
RenderingIntentFlag,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
shadow,
|
|
|
|
stringToBytes,
|
|
|
|
stringToPDFString,
|
2021-03-19 18:11:40 +09:00
|
|
|
stringToUTF8String,
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
unreachable,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
Util,
|
|
|
|
warn,
|
2020-01-02 20:00:16 +09:00
|
|
|
} from "../shared/util.js";
|
2023-06-23 02:48:40 +09:00
|
|
|
import { AnnotationFactory, PopupAnnotation } from "./annotation.js";
|
2019-02-24 00:14:31 +09:00
|
|
|
import {
|
2020-12-08 03:22:14 +09:00
|
|
|
collectActions,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
getInheritableProperty,
|
2022-06-15 23:57:33 +09:00
|
|
|
getNewAnnotationsMap,
|
2020-02-10 17:38:57 +09:00
|
|
|
isWhiteSpace,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
MissingDataException,
|
2022-10-15 18:55:37 +09:00
|
|
|
PDF_VERSION_REGEXP,
|
2021-03-26 17:28:18 +09:00
|
|
|
validateCSSFont,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
XRefEntryException,
|
|
|
|
XRefParseException,
|
2020-01-02 20:00:16 +09:00
|
|
|
} from "./core_utils.js";
|
2023-06-05 22:07:28 +09:00
|
|
|
import { Dict, isName, isRefsEqual, Name, Ref, RefSet } from "./primitives.js";
|
2021-07-29 01:30:22 +09:00
|
|
|
import { getXfaFontDict, getXfaFontName } from "./xfa_fonts.js";
|
2021-05-14 16:59:24 +09:00
|
|
|
import { BaseStream } from "./base_stream.js";
|
2020-01-02 20:00:16 +09:00
|
|
|
import { calculateMD5 } from "./crypto.js";
|
2021-04-14 01:26:23 +09:00
|
|
|
import { Catalog } from "./catalog.js";
|
2022-01-25 00:16:54 +09:00
|
|
|
import { clearGlobalCaches } from "./cleanup_helper.js";
|
2022-04-01 02:18:30 +09:00
|
|
|
import { DatasetReader } from "./dataset_reader.js";
|
2020-01-02 20:00:16 +09:00
|
|
|
import { Linearization } from "./parser.js";
|
2022-03-09 01:13:47 +09:00
|
|
|
import { NullStream } from "./stream.js";
|
2021-04-14 01:25:34 +09:00
|
|
|
import { ObjectLoader } from "./object_loader.js";
|
2020-01-02 20:00:16 +09:00
|
|
|
import { OperatorList } from "./operator_list.js";
|
|
|
|
import { PartialEvaluator } from "./evaluator.js";
|
2021-04-27 23:18:52 +09:00
|
|
|
import { StreamsSequenceStream } from "./decode_stream.js";
|
2021-04-01 07:07:02 +09:00
|
|
|
import { StructTreePage } from "./struct_tree.js";
|
2022-06-01 22:42:46 +09:00
|
|
|
import { writeObject } from "./writer.js";
|
2021-03-19 18:11:40 +09:00
|
|
|
import { XFAFactory } from "./xfa/factory.js";
|
2021-04-14 01:26:07 +09:00
|
|
|
import { XRef } from "./xref.js";
|
2015-11-22 01:32:47 +09:00
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
const DEFAULT_USER_UNIT = 1.0;
|
|
|
|
const LETTER_SIZE_MEDIABOX = [0, 0, 612, 792];
|
2013-03-14 04:24:55 +09:00
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
class Page {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
constructor({
|
|
|
|
pdfManager,
|
|
|
|
xref,
|
|
|
|
pageIndex,
|
|
|
|
pageDict,
|
|
|
|
ref,
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
globalIdFactory,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
fontCache,
|
|
|
|
builtInCMapCache,
|
2021-06-08 20:58:52 +09:00
|
|
|
standardFontDataCache,
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
globalImageCache,
|
2023-05-10 22:31:07 +09:00
|
|
|
systemFontCache,
|
Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in `PartialEvaluator.hasBlendModes`
The `PartialEvaluator.hasBlendModes` method is necessary to determine if there's any blend modes on a page, which unfortunately requires *synchronous* parsing of the /Resources of each page before its rendering can start (see the "StartRenderPage"-message).
In practice it's not uncommon for certain /Resources-entries to be found on more than one page (referenced via the XRef-table), which thus leads to unnecessary re-fetching/re-parsing of data in `PartialEvaluator.hasBlendModes`.
To improve performance, especially in pathological cases, we can cache /Resources-entries when it's absolutely clear that they do not contain *any* blend modes at all[1]. This way, subsequent `PartialEvaluator.hasBlendModes` calls can be made significantly more efficient.
This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf:
```
[
{ "id": "issue6961",
"file": "../web/pdfs/issue6961.pdf",
"md5": "a80e4357a8fda758d96c2c76f2980b03",
"rounds": 100,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | 0 | Overall | 100 | 1034 | 555 | -480 | -46.39 | faster
firefox | 0 | Page Request | 100 | 489 | 7 | -482 | -98.67 | faster
firefox | 0 | Rendering | 100 | 545 | 548 | 2 | 0.45 |
firefox | 1 | Overall | 100 | 912 | 428 | -484 | -53.06 | faster
firefox | 1 | Page Request | 100 | 487 | 1 | -486 | -99.77 | faster
firefox | 1 | Rendering | 100 | 425 | 427 | 2 | 0.51 |
```
---
[1] In the case where blend modes *are* found, it becomes a lot more difficult to know if it's generally safe to skip /Resources-entries. Hence we don't cache anything in that case, however note that most document/pages do not utilize blend modes anyway.
2020-11-05 21:35:33 +09:00
|
|
|
nonBlendModesSet,
|
2021-03-19 18:11:40 +09:00
|
|
|
xfaFactory,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
}) {
|
2013-04-09 07:14:56 +09:00
|
|
|
this.pdfManager = pdfManager;
|
2012-10-29 05:10:34 +09:00
|
|
|
this.pageIndex = pageIndex;
|
2011-10-25 10:13:12 +09:00
|
|
|
this.pageDict = pageDict;
|
|
|
|
this.xref = xref;
|
|
|
|
this.ref = ref;
|
2013-11-15 06:43:38 +09:00
|
|
|
this.fontCache = fontCache;
|
2017-02-14 22:28:31 +09:00
|
|
|
this.builtInCMapCache = builtInCMapCache;
|
2021-06-08 20:58:52 +09:00
|
|
|
this.standardFontDataCache = standardFontDataCache;
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
this.globalImageCache = globalImageCache;
|
2023-05-10 22:31:07 +09:00
|
|
|
this.systemFontCache = systemFontCache;
|
Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in `PartialEvaluator.hasBlendModes`
The `PartialEvaluator.hasBlendModes` method is necessary to determine if there's any blend modes on a page, which unfortunately requires *synchronous* parsing of the /Resources of each page before its rendering can start (see the "StartRenderPage"-message).
In practice it's not uncommon for certain /Resources-entries to be found on more than one page (referenced via the XRef-table), which thus leads to unnecessary re-fetching/re-parsing of data in `PartialEvaluator.hasBlendModes`.
To improve performance, especially in pathological cases, we can cache /Resources-entries when it's absolutely clear that they do not contain *any* blend modes at all[1]. This way, subsequent `PartialEvaluator.hasBlendModes` calls can be made significantly more efficient.
This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf:
```
[
{ "id": "issue6961",
"file": "../web/pdfs/issue6961.pdf",
"md5": "a80e4357a8fda758d96c2c76f2980b03",
"rounds": 100,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | 0 | Overall | 100 | 1034 | 555 | -480 | -46.39 | faster
firefox | 0 | Page Request | 100 | 489 | 7 | -482 | -98.67 | faster
firefox | 0 | Rendering | 100 | 545 | 548 | 2 | 0.45 |
firefox | 1 | Overall | 100 | 912 | 428 | -484 | -53.06 | faster
firefox | 1 | Page Request | 100 | 487 | 1 | -486 | -99.77 | faster
firefox | 1 | Rendering | 100 | 425 | 427 | 2 | 0.51 |
```
---
[1] In the case where blend modes *are* found, it becomes a lot more difficult to know if it's generally safe to skip /Resources-entries. Hence we don't cache anything in that case, however note that most document/pages do not utilize blend modes anyway.
2020-11-05 21:35:33 +09:00
|
|
|
this.nonBlendModesSet = nonBlendModesSet;
|
2016-03-03 09:48:21 +09:00
|
|
|
this.evaluatorOptions = pdfManager.evaluatorOptions;
|
2013-06-05 09:57:52 +09:00
|
|
|
this.resourcesPromise = null;
|
2021-03-19 18:11:40 +09:00
|
|
|
this.xfaFactory = xfaFactory;
|
2017-01-09 00:51:30 +09:00
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
const idCounters = {
|
2017-01-09 00:51:30 +09:00
|
|
|
obj: 0,
|
|
|
|
};
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
this._localIdFactory = class extends globalIdFactory {
|
|
|
|
static createObjId() {
|
2019-04-20 19:36:49 +09:00
|
|
|
return `p${pageIndex}_${++idCounters.obj}`;
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
}
|
2021-04-01 07:07:02 +09:00
|
|
|
|
|
|
|
static getPageObjId() {
|
2023-05-19 05:23:42 +09:00
|
|
|
return `p${ref.toString()}`;
|
2021-04-01 07:07:02 +09:00
|
|
|
}
|
2017-01-09 00:51:30 +09:00
|
|
|
};
|
2011-10-25 10:13:12 +09:00
|
|
|
}
|
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
/**
|
|
|
|
* @private
|
|
|
|
*/
|
|
|
|
_getInheritableProperty(key, getArray = false) {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
const value = getInheritableProperty({
|
|
|
|
dict: this.pageDict,
|
|
|
|
key,
|
|
|
|
getArray,
|
|
|
|
stopWhenFound: false,
|
|
|
|
});
|
2018-12-29 23:46:22 +09:00
|
|
|
if (!Array.isArray(value)) {
|
|
|
|
return value;
|
|
|
|
}
|
2022-02-21 20:44:56 +09:00
|
|
|
if (value.length === 1 || !(value[0] instanceof Dict)) {
|
2018-12-29 23:46:22 +09:00
|
|
|
return value[0];
|
|
|
|
}
|
2020-08-28 08:05:33 +09:00
|
|
|
return Dict.merge({ xref: this.xref, dictArray: value });
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
get content() {
|
2021-05-14 16:59:24 +09:00
|
|
|
return this.pageDict.getArray("Contents");
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
get resources() {
|
|
|
|
// For robustness: The spec states that a \Resources entry has to be
|
|
|
|
// present, but can be empty. Some documents still omit it; in this case
|
|
|
|
// we return an empty dictionary.
|
2022-07-08 19:06:25 +09:00
|
|
|
const resources = this._getInheritableProperty("Resources");
|
|
|
|
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(
|
|
|
|
this,
|
|
|
|
"resources",
|
2022-07-08 19:06:25 +09:00
|
|
|
resources instanceof Dict ? resources : Dict.empty
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
);
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
|
|
|
|
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".
The patch makes the following notable changes:
- Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
- Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
- Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
- Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
- Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.
---
[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-08 22:54:46 +09:00
|
|
|
_getBoundingBox(name) {
|
2021-03-19 18:11:40 +09:00
|
|
|
if (this.xfaData) {
|
2021-05-25 22:50:12 +09:00
|
|
|
return this.xfaData.bbox;
|
2021-03-19 18:11:40 +09:00
|
|
|
}
|
2022-12-02 18:09:35 +09:00
|
|
|
let box = this._getInheritableProperty(name, /* getArray = */ true);
|
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".
The patch makes the following notable changes:
- Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
- Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
- Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
- Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
- Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.
---
[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-08 22:54:46 +09:00
|
|
|
|
|
|
|
if (Array.isArray(box) && box.length === 4) {
|
2022-12-02 18:09:35 +09:00
|
|
|
box = Util.normalizeRect(box);
|
|
|
|
if (box[2] - box[0] > 0 && box[3] - box[1] > 0) {
|
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".
The patch makes the following notable changes:
- Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
- Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
- Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
- Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
- Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.
---
[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-08 22:54:46 +09:00
|
|
|
return box;
|
|
|
|
}
|
2022-12-02 18:09:35 +09:00
|
|
|
warn(`Empty, or invalid, /${name} entry.`);
|
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".
The patch makes the following notable changes:
- Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
- Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
- Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
- Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
- Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.
---
[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-08 22:54:46 +09:00
|
|
|
}
|
|
|
|
return null;
|
|
|
|
}
|
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
get mediaBox() {
|
|
|
|
// Reset invalid media box to letter size.
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(
|
|
|
|
this,
|
|
|
|
"mediaBox",
|
|
|
|
this._getBoundingBox("MediaBox") || LETTER_SIZE_MEDIABOX
|
|
|
|
);
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
2016-12-06 09:00:12 +09:00
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
get cropBox() {
|
|
|
|
// Reset invalid crop box to media box.
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(
|
|
|
|
this,
|
|
|
|
"cropBox",
|
|
|
|
this._getBoundingBox("CropBox") || this.mediaBox
|
|
|
|
);
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
2014-03-14 22:39:35 +09:00
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
get userUnit() {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
let obj = this.pageDict.get("UserUnit");
|
2022-02-22 19:55:34 +09:00
|
|
|
if (typeof obj !== "number" || obj <= 0) {
|
2018-12-29 23:46:22 +09:00
|
|
|
obj = DEFAULT_USER_UNIT;
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(this, "userUnit", obj);
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
get view() {
|
|
|
|
// From the spec, 6th ed., p.963:
|
|
|
|
// "The crop, bleed, trim, and art boxes should not ordinarily
|
|
|
|
// extend beyond the boundaries of the media box. If they do, they are
|
|
|
|
// effectively reduced to their intersection with the media box."
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
const { cropBox, mediaBox } = this;
|
2022-12-02 18:09:35 +09:00
|
|
|
|
|
|
|
if (cropBox !== mediaBox && !isArrayEqual(cropBox, mediaBox)) {
|
2019-08-11 20:40:58 +09:00
|
|
|
const box = Util.intersect(cropBox, mediaBox);
|
2022-12-02 18:09:35 +09:00
|
|
|
if (box && box[2] - box[0] > 0 && box[3] - box[1] > 0) {
|
|
|
|
return shadow(this, "view", box);
|
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".
The patch makes the following notable changes:
- Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
- Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
- Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
- Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
- Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.
---
[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-08 22:54:46 +09:00
|
|
|
}
|
2022-12-02 18:09:35 +09:00
|
|
|
warn("Empty /CropBox and /MediaBox intersection.");
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
2022-12-02 18:09:35 +09:00
|
|
|
return shadow(this, "view", mediaBox);
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
get rotate() {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
let rotate = this._getInheritableProperty("Rotate") || 0;
|
2018-12-29 23:46:22 +09:00
|
|
|
|
|
|
|
// Normalize rotation so it's a multiple of 90 and between 0 and 270.
|
|
|
|
if (rotate % 90 !== 0) {
|
|
|
|
rotate = 0;
|
|
|
|
} else if (rotate >= 360) {
|
2021-07-04 18:51:11 +09:00
|
|
|
rotate %= 360;
|
2018-12-29 23:46:22 +09:00
|
|
|
} else if (rotate < 0) {
|
|
|
|
// The spec doesn't cover negatives. Assume it's counterclockwise
|
|
|
|
// rotation. The following is the other implementation of modulo.
|
|
|
|
rotate = ((rotate % 360) + 360) % 360;
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(this, "rotate", rotate);
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
|
|
|
|
2021-07-26 22:34:14 +09:00
|
|
|
/**
|
|
|
|
* @private
|
|
|
|
*/
|
2023-03-07 17:42:04 +09:00
|
|
|
_onSubStreamError(reason, objId) {
|
2021-07-26 22:34:14 +09:00
|
|
|
if (this.evaluatorOptions.ignoreErrors) {
|
|
|
|
warn(`getContentStream - ignoring sub-stream (${objId}): "${reason}".`);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
throw reason;
|
|
|
|
}
|
|
|
|
|
2021-05-14 16:59:24 +09:00
|
|
|
/**
|
|
|
|
* @returns {Promise<BaseStream>}
|
|
|
|
*/
|
2023-03-07 17:42:04 +09:00
|
|
|
getContentStream() {
|
2021-05-14 16:59:24 +09:00
|
|
|
return this.pdfManager.ensure(this, "content").then(content => {
|
|
|
|
if (content instanceof BaseStream) {
|
|
|
|
return content;
|
|
|
|
}
|
|
|
|
if (Array.isArray(content)) {
|
2021-07-26 22:34:14 +09:00
|
|
|
return new StreamsSequenceStream(
|
|
|
|
content,
|
2023-03-07 17:42:04 +09:00
|
|
|
this._onSubStreamError.bind(this)
|
2021-07-26 22:34:14 +09:00
|
|
|
);
|
2011-10-25 10:13:12 +09:00
|
|
|
}
|
2018-12-29 23:46:22 +09:00
|
|
|
// Replace non-existent page content with empty content.
|
2021-05-14 16:59:24 +09:00
|
|
|
return new NullStream();
|
|
|
|
});
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
|
|
|
|
2021-03-19 18:11:40 +09:00
|
|
|
get xfaData() {
|
2021-10-16 19:16:40 +09:00
|
|
|
return shadow(
|
|
|
|
this,
|
|
|
|
"xfaData",
|
|
|
|
this.xfaFactory
|
|
|
|
? { bbox: this.xfaFactory.getBoundingBox(this.pageIndex) }
|
|
|
|
: null
|
|
|
|
);
|
2021-03-19 18:11:40 +09:00
|
|
|
}
|
|
|
|
|
2023-06-16 03:43:57 +09:00
|
|
|
#replaceIdByRef(annotations, deletedAnnotations, existingAnnotations) {
|
2023-06-05 22:07:28 +09:00
|
|
|
for (const annotation of annotations) {
|
|
|
|
if (annotation.id) {
|
|
|
|
const ref = Ref.fromString(annotation.id);
|
|
|
|
if (!ref) {
|
|
|
|
warn(`A non-linked annotation cannot be modified: ${annotation.id}`);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
if (annotation.deleted) {
|
|
|
|
deletedAnnotations.put(ref);
|
|
|
|
continue;
|
|
|
|
}
|
2023-06-16 03:43:57 +09:00
|
|
|
existingAnnotations?.put(ref);
|
2023-06-05 22:07:28 +09:00
|
|
|
annotation.ref = ref;
|
|
|
|
delete annotation.id;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2023-06-23 02:48:40 +09:00
|
|
|
async saveNewAnnotations(handler, task, annotations, imagePromises) {
|
2022-06-01 22:42:46 +09:00
|
|
|
if (this.xfaFactory) {
|
|
|
|
throw new Error("XFA: Cannot save new annotations.");
|
|
|
|
}
|
|
|
|
|
|
|
|
const partialEvaluator = new PartialEvaluator({
|
|
|
|
xref: this.xref,
|
|
|
|
handler,
|
|
|
|
pageIndex: this.pageIndex,
|
|
|
|
idFactory: this._localIdFactory,
|
|
|
|
fontCache: this.fontCache,
|
|
|
|
builtInCMapCache: this.builtInCMapCache,
|
|
|
|
standardFontDataCache: this.standardFontDataCache,
|
|
|
|
globalImageCache: this.globalImageCache,
|
2023-05-10 22:31:07 +09:00
|
|
|
systemFontCache: this.systemFontCache,
|
2022-06-01 22:42:46 +09:00
|
|
|
options: this.evaluatorOptions,
|
|
|
|
});
|
|
|
|
|
2023-06-05 22:07:28 +09:00
|
|
|
const deletedAnnotations = new RefSet();
|
2023-06-16 03:43:57 +09:00
|
|
|
const existingAnnotations = new RefSet();
|
|
|
|
this.#replaceIdByRef(annotations, deletedAnnotations, existingAnnotations);
|
2023-06-05 22:07:28 +09:00
|
|
|
|
2022-06-01 22:42:46 +09:00
|
|
|
const pageDict = this.pageDict;
|
2023-06-05 22:07:28 +09:00
|
|
|
const annotationsArray = this.annotations.filter(
|
|
|
|
a => !(a instanceof Ref && deletedAnnotations.has(a))
|
|
|
|
);
|
2022-06-01 22:42:46 +09:00
|
|
|
const newData = await AnnotationFactory.saveNewAnnotations(
|
|
|
|
partialEvaluator,
|
|
|
|
task,
|
2023-06-23 02:48:40 +09:00
|
|
|
annotations,
|
|
|
|
imagePromises
|
2022-06-01 22:42:46 +09:00
|
|
|
);
|
|
|
|
|
|
|
|
for (const { ref } of newData.annotations) {
|
2023-06-16 03:43:57 +09:00
|
|
|
// Don't add an existing annotation ref to the annotations array.
|
|
|
|
if (ref instanceof Ref && !existingAnnotations.has(ref)) {
|
|
|
|
annotationsArray.push(ref);
|
|
|
|
}
|
2022-06-01 22:42:46 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
const savedDict = pageDict.get("Annots");
|
|
|
|
pageDict.set("Annots", annotationsArray);
|
|
|
|
const buffer = [];
|
|
|
|
|
|
|
|
let transform = null;
|
|
|
|
if (this.xref.encrypt) {
|
|
|
|
transform = this.xref.encrypt.createCipherTransform(
|
|
|
|
this.ref.num,
|
|
|
|
this.ref.gen
|
|
|
|
);
|
|
|
|
}
|
|
|
|
|
2023-04-28 04:50:27 +09:00
|
|
|
await writeObject(this.ref, pageDict, buffer, transform);
|
2022-06-01 22:42:46 +09:00
|
|
|
if (savedDict) {
|
|
|
|
pageDict.set("Annots", savedDict);
|
|
|
|
}
|
|
|
|
|
|
|
|
const objects = newData.dependencies;
|
|
|
|
objects.push(
|
|
|
|
{ ref: this.ref, data: buffer.join("") },
|
|
|
|
...newData.annotations
|
|
|
|
);
|
2022-06-15 23:57:33 +09:00
|
|
|
|
2022-06-01 22:42:46 +09:00
|
|
|
return objects;
|
|
|
|
}
|
|
|
|
|
2020-08-04 02:44:04 +09:00
|
|
|
save(handler, task, annotationStorage) {
|
|
|
|
const partialEvaluator = new PartialEvaluator({
|
|
|
|
xref: this.xref,
|
|
|
|
handler,
|
|
|
|
pageIndex: this.pageIndex,
|
|
|
|
idFactory: this._localIdFactory,
|
|
|
|
fontCache: this.fontCache,
|
|
|
|
builtInCMapCache: this.builtInCMapCache,
|
2021-06-08 20:58:52 +09:00
|
|
|
standardFontDataCache: this.standardFontDataCache,
|
2020-08-04 02:44:04 +09:00
|
|
|
globalImageCache: this.globalImageCache,
|
2023-05-10 22:31:07 +09:00
|
|
|
systemFontCache: this.systemFontCache,
|
2020-08-04 02:44:04 +09:00
|
|
|
options: this.evaluatorOptions,
|
|
|
|
});
|
|
|
|
|
|
|
|
// Fetch the page's annotations and save the content
|
|
|
|
// in case of interactive form fields.
|
|
|
|
return this._parsedAnnotations.then(function (annotations) {
|
|
|
|
const newRefsPromises = [];
|
|
|
|
for (const annotation of annotations) {
|
2021-05-04 01:03:16 +09:00
|
|
|
if (!annotation.mustBePrinted(annotationStorage)) {
|
2020-08-04 02:44:04 +09:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
newRefsPromises.push(
|
|
|
|
annotation
|
|
|
|
.save(partialEvaluator, task, annotationStorage)
|
|
|
|
.catch(function (reason) {
|
|
|
|
warn(
|
|
|
|
"save - ignoring annotation data during " +
|
|
|
|
`"${task.name}" task: "${reason}".`
|
|
|
|
);
|
|
|
|
return null;
|
|
|
|
})
|
|
|
|
);
|
|
|
|
}
|
|
|
|
|
2022-06-20 00:46:59 +09:00
|
|
|
return Promise.all(newRefsPromises).then(function (newRefs) {
|
|
|
|
return newRefs.filter(newRef => !!newRef);
|
|
|
|
});
|
2020-08-04 02:44:04 +09:00
|
|
|
});
|
|
|
|
}
|
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
loadResources(keys) {
|
|
|
|
if (!this.resourcesPromise) {
|
|
|
|
// TODO: add async `_getInheritableProperty` and remove this.
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
this.resourcesPromise = this.pdfManager.ensure(this, "resources");
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
|
|
|
return this.resourcesPromise.then(() => {
|
|
|
|
const objectLoader = new ObjectLoader(this.resources, keys, this.xref);
|
|
|
|
return objectLoader.load();
|
|
|
|
});
|
|
|
|
}
|
|
|
|
|
[Regression] Re-factor the *internal* `includeAnnotationStorage` handling, since it's currently subtly wrong
*This patch is very similar to the recently fixed `renderInteractiveForms`-options, see PR 13867.*
As far as I can tell, this *subtle* bug has existed ever since `AnnotationStorage`-support was first added in PR 12106 (a little over a year ago).
The value of the `includeAnnotationStorage`-option, as passed to the `PDFPageProxy.render` method, will (potentially) affect the size/content of the operatorList that's returned from the worker (for documents with forms).
Given that operatorLists will generally, unless they contain huge images, be cached in the API, repeated `PDFPageProxy.render` calls where the form-data has been changed by the user in between, can thus *wrongly* return a cached operatorList.
In the viewer we're only using the `includeAnnotationStorage`-option when printing, which is probably why this has gone unnoticed for so long. Note that we, for performance reasons, don't cache printing-operatorLists in the API.
However, there's nothing stopping an API-user from using the `includeAnnotationStorage`-option during "normal" rendering, which could thus result in *subtle* (and difficult to understand) rendering bugs.
In order to handle this, we need to know if the `AnnotationStorage`-instance has been updated since the last `PDFPageProxy.render` call. The most "correct" solution would obviously be to create a hash of the `AnnotationStorage` contents, however that would require adding a bunch of code, complexity, and runtime overhead.
Given that operatorList caching in the API doesn't have to be perfect[1], but only have to avoid *false* cache-hits, we can simplify things significantly be only keeping track of the last time that the `AnnotationStorage`-data was modified.
*Please note:* While working on this patch, I also noticed that the `renderInteractiveForms`- and `includeAnnotationStorage`-options in the `PDFPageProxy.render` method are mutually exclusive.[2]
Given that the various Annotation-related options in `PDFPageProxy.render` have been added at different times, this has unfortunately led to the current "messy" situation.[3]
---
[1] Note how we're already not caching operatorLists for pages with *huge* images, in order to save memory, hence there's no guarantee that operatorLists will always be cached.
[2] Setting both to `true` will result in undefined behaviour, since trying to insert `AnnotationStorage`-values into fields that are being excluded from the operatorList-building will obviously not work, which isn't at all clear from the documentation.
[3] My intention is to try and fix this in a follow-up PR, and I've got a WIP patch locally, however it will result in a number of API-observable changes.
2021-08-16 02:57:42 +09:00
|
|
|
getOperatorList({
|
|
|
|
handler,
|
|
|
|
sink,
|
|
|
|
task,
|
|
|
|
intent,
|
|
|
|
cacheKey,
|
|
|
|
annotationStorage = null,
|
|
|
|
}) {
|
2023-03-07 17:42:04 +09:00
|
|
|
const contentStreamPromise = this.getContentStream();
|
2018-12-29 23:46:22 +09:00
|
|
|
const resourcesPromise = this.loadResources([
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
"ColorSpace",
|
2021-04-21 02:51:01 +09:00
|
|
|
"ExtGState",
|
|
|
|
"Font",
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
"Pattern",
|
2021-04-21 02:51:01 +09:00
|
|
|
"Properties",
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
"Shading",
|
|
|
|
"XObject",
|
2018-12-29 23:46:22 +09:00
|
|
|
]);
|
|
|
|
|
|
|
|
const partialEvaluator = new PartialEvaluator({
|
|
|
|
xref: this.xref,
|
|
|
|
handler,
|
|
|
|
pageIndex: this.pageIndex,
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
idFactory: this._localIdFactory,
|
2018-12-29 23:46:22 +09:00
|
|
|
fontCache: this.fontCache,
|
|
|
|
builtInCMapCache: this.builtInCMapCache,
|
2021-06-08 20:58:52 +09:00
|
|
|
standardFontDataCache: this.standardFontDataCache,
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
globalImageCache: this.globalImageCache,
|
2023-05-10 22:31:07 +09:00
|
|
|
systemFontCache: this.systemFontCache,
|
2018-12-29 23:46:22 +09:00
|
|
|
options: this.evaluatorOptions,
|
|
|
|
});
|
|
|
|
|
2022-06-15 23:57:33 +09:00
|
|
|
const newAnnotationsByPage = !this.xfaFactory
|
|
|
|
? getNewAnnotationsMap(annotationStorage)
|
|
|
|
: null;
|
2023-06-05 22:07:28 +09:00
|
|
|
let deletedAnnotations = null;
|
2022-06-15 23:57:33 +09:00
|
|
|
|
|
|
|
let newAnnotationsPromise = Promise.resolve(null);
|
|
|
|
if (newAnnotationsByPage) {
|
2023-06-23 02:48:40 +09:00
|
|
|
let imagePromises;
|
2022-06-15 23:57:33 +09:00
|
|
|
const newAnnotations = newAnnotationsByPage.get(this.pageIndex);
|
|
|
|
if (newAnnotations) {
|
2023-06-23 02:48:40 +09:00
|
|
|
// An annotation can contain a reference to a bitmap, but this bitmap
|
|
|
|
// is defined in another annotation. So we need to find this annotation
|
|
|
|
// and generate the bitmap.
|
|
|
|
const missingBitmaps = new Set();
|
|
|
|
for (const { bitmapId, bitmap } of newAnnotations) {
|
|
|
|
if (bitmapId && !bitmap && !missingBitmaps.has(bitmapId)) {
|
|
|
|
missingBitmaps.add(bitmapId);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
const { isOffscreenCanvasSupported } = this.evaluatorOptions;
|
|
|
|
if (missingBitmaps.size > 0) {
|
2023-07-18 22:39:18 +09:00
|
|
|
const annotationWithBitmaps = newAnnotations.slice();
|
2023-06-23 02:48:40 +09:00
|
|
|
for (const [key, annotation] of annotationStorage) {
|
|
|
|
if (!key.startsWith(AnnotationEditorPrefix)) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
if (annotation.bitmap && missingBitmaps.has(annotation.bitmapId)) {
|
|
|
|
annotationWithBitmaps.push(annotation);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
// The array annotationWithBitmaps cannot be empty: the check above
|
|
|
|
// makes sure to have at least one annotation containing the bitmap.
|
|
|
|
imagePromises = AnnotationFactory.generateImages(
|
|
|
|
annotationWithBitmaps,
|
|
|
|
this.xref,
|
|
|
|
isOffscreenCanvasSupported
|
|
|
|
);
|
|
|
|
} else {
|
|
|
|
imagePromises = AnnotationFactory.generateImages(
|
|
|
|
newAnnotations,
|
|
|
|
this.xref,
|
|
|
|
isOffscreenCanvasSupported
|
|
|
|
);
|
|
|
|
}
|
|
|
|
|
2023-06-05 22:07:28 +09:00
|
|
|
deletedAnnotations = new RefSet();
|
2023-06-16 03:43:57 +09:00
|
|
|
this.#replaceIdByRef(newAnnotations, deletedAnnotations, null);
|
2022-06-15 23:57:33 +09:00
|
|
|
newAnnotationsPromise = AnnotationFactory.printNewAnnotations(
|
|
|
|
partialEvaluator,
|
|
|
|
task,
|
2023-06-23 02:48:40 +09:00
|
|
|
newAnnotations,
|
|
|
|
imagePromises
|
2022-06-15 23:57:33 +09:00
|
|
|
);
|
|
|
|
}
|
|
|
|
}
|
2018-12-29 23:46:22 +09:00
|
|
|
const dataPromises = Promise.all([contentStreamPromise, resourcesPromise]);
|
|
|
|
const pageListPromise = dataPromises.then(([contentStream]) => {
|
2020-06-11 23:05:38 +09:00
|
|
|
const opList = new OperatorList(intent, sink);
|
2018-12-29 23:46:22 +09:00
|
|
|
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
handler.send("StartRenderPage", {
|
Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in `PartialEvaluator.hasBlendModes`
The `PartialEvaluator.hasBlendModes` method is necessary to determine if there's any blend modes on a page, which unfortunately requires *synchronous* parsing of the /Resources of each page before its rendering can start (see the "StartRenderPage"-message).
In practice it's not uncommon for certain /Resources-entries to be found on more than one page (referenced via the XRef-table), which thus leads to unnecessary re-fetching/re-parsing of data in `PartialEvaluator.hasBlendModes`.
To improve performance, especially in pathological cases, we can cache /Resources-entries when it's absolutely clear that they do not contain *any* blend modes at all[1]. This way, subsequent `PartialEvaluator.hasBlendModes` calls can be made significantly more efficient.
This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf:
```
[
{ "id": "issue6961",
"file": "../web/pdfs/issue6961.pdf",
"md5": "a80e4357a8fda758d96c2c76f2980b03",
"rounds": 100,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | 0 | Overall | 100 | 1034 | 555 | -480 | -46.39 | faster
firefox | 0 | Page Request | 100 | 489 | 7 | -482 | -98.67 | faster
firefox | 0 | Rendering | 100 | 545 | 548 | 2 | 0.45 |
firefox | 1 | Overall | 100 | 912 | 428 | -484 | -53.06 | faster
firefox | 1 | Page Request | 100 | 487 | 1 | -486 | -99.77 | faster
firefox | 1 | Rendering | 100 | 425 | 427 | 2 | 0.51 |
```
---
[1] In the case where blend modes *are* found, it becomes a lot more difficult to know if it's generally safe to skip /Resources-entries. Hence we don't cache anything in that case, however note that most document/pages do not utilize blend modes anyway.
2020-11-05 21:35:33 +09:00
|
|
|
transparency: partialEvaluator.hasBlendModes(
|
|
|
|
this.resources,
|
|
|
|
this.nonBlendModesSet
|
|
|
|
),
|
2018-12-29 23:46:22 +09:00
|
|
|
pageIndex: this.pageIndex,
|
[Regression] Re-factor the *internal* `includeAnnotationStorage` handling, since it's currently subtly wrong
*This patch is very similar to the recently fixed `renderInteractiveForms`-options, see PR 13867.*
As far as I can tell, this *subtle* bug has existed ever since `AnnotationStorage`-support was first added in PR 12106 (a little over a year ago).
The value of the `includeAnnotationStorage`-option, as passed to the `PDFPageProxy.render` method, will (potentially) affect the size/content of the operatorList that's returned from the worker (for documents with forms).
Given that operatorLists will generally, unless they contain huge images, be cached in the API, repeated `PDFPageProxy.render` calls where the form-data has been changed by the user in between, can thus *wrongly* return a cached operatorList.
In the viewer we're only using the `includeAnnotationStorage`-option when printing, which is probably why this has gone unnoticed for so long. Note that we, for performance reasons, don't cache printing-operatorLists in the API.
However, there's nothing stopping an API-user from using the `includeAnnotationStorage`-option during "normal" rendering, which could thus result in *subtle* (and difficult to understand) rendering bugs.
In order to handle this, we need to know if the `AnnotationStorage`-instance has been updated since the last `PDFPageProxy.render` call. The most "correct" solution would obviously be to create a hash of the `AnnotationStorage` contents, however that would require adding a bunch of code, complexity, and runtime overhead.
Given that operatorList caching in the API doesn't have to be perfect[1], but only have to avoid *false* cache-hits, we can simplify things significantly be only keeping track of the last time that the `AnnotationStorage`-data was modified.
*Please note:* While working on this patch, I also noticed that the `renderInteractiveForms`- and `includeAnnotationStorage`-options in the `PDFPageProxy.render` method are mutually exclusive.[2]
Given that the various Annotation-related options in `PDFPageProxy.render` have been added at different times, this has unfortunately led to the current "messy" situation.[3]
---
[1] Note how we're already not caching operatorLists for pages with *huge* images, in order to save memory, hence there's no guarantee that operatorLists will always be cached.
[2] Setting both to `true` will result in undefined behaviour, since trying to insert `AnnotationStorage`-values into fields that are being excluded from the operatorList-building will obviously not work, which isn't at all clear from the documentation.
[3] My intention is to try and fix this in a follow-up PR, and I've got a WIP patch locally, however it will result in a number of API-observable changes.
2021-08-16 02:57:42 +09:00
|
|
|
cacheKey,
|
2018-12-29 23:46:22 +09:00
|
|
|
});
|
|
|
|
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return partialEvaluator
|
|
|
|
.getOperatorList({
|
|
|
|
stream: contentStream,
|
|
|
|
task,
|
|
|
|
resources: this.resources,
|
|
|
|
operatorList: opList,
|
|
|
|
})
|
2020-04-14 19:28:14 +09:00
|
|
|
.then(function () {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return opList;
|
|
|
|
});
|
2018-12-29 23:46:22 +09:00
|
|
|
});
|
|
|
|
|
|
|
|
// Fetch the page's annotations and add their operator lists to the
|
|
|
|
// page's operator list to render them.
|
2022-06-15 23:57:33 +09:00
|
|
|
return Promise.all([
|
|
|
|
pageListPromise,
|
|
|
|
this._parsedAnnotations,
|
|
|
|
newAnnotationsPromise,
|
|
|
|
]).then(function ([pageOpList, annotations, newAnnotations]) {
|
|
|
|
if (newAnnotations) {
|
2023-06-05 22:07:28 +09:00
|
|
|
// Some annotations can already exist (if it has the refToReplace
|
|
|
|
// property). In this case, we replace the old annotation by the new
|
|
|
|
// one.
|
|
|
|
annotations = annotations.filter(
|
|
|
|
a => !(a.ref && deletedAnnotations.has(a.ref))
|
|
|
|
);
|
|
|
|
for (let i = 0, ii = newAnnotations.length; i < ii; i++) {
|
|
|
|
const newAnnotation = newAnnotations[i];
|
|
|
|
if (newAnnotation.refToReplace) {
|
|
|
|
const j = annotations.findIndex(
|
|
|
|
a => a.ref && isRefsEqual(a.ref, newAnnotation.refToReplace)
|
|
|
|
);
|
|
|
|
if (j >= 0) {
|
|
|
|
annotations.splice(j, 1, newAnnotation);
|
|
|
|
newAnnotations.splice(i--, 1);
|
|
|
|
ii--;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2022-06-15 23:57:33 +09:00
|
|
|
annotations = annotations.concat(newAnnotations);
|
|
|
|
}
|
|
|
|
if (
|
|
|
|
annotations.length === 0 ||
|
|
|
|
intent & RenderingIntentFlag.ANNOTATIONS_DISABLE
|
|
|
|
) {
|
2022-06-27 18:41:37 +09:00
|
|
|
pageOpList.flush(/* lastChunk = */ true);
|
2022-06-15 23:57:33 +09:00
|
|
|
return { length: pageOpList.totalLength };
|
|
|
|
}
|
|
|
|
const renderForms = !!(intent & RenderingIntentFlag.ANNOTATIONS_FORMS),
|
|
|
|
intentAny = !!(intent & RenderingIntentFlag.ANY),
|
|
|
|
intentDisplay = !!(intent & RenderingIntentFlag.DISPLAY),
|
|
|
|
intentPrint = !!(intent & RenderingIntentFlag.PRINT);
|
|
|
|
|
|
|
|
// Collect the operator list promises for the annotations. Each promise
|
|
|
|
// is resolved with the complete operator list for a single annotation.
|
|
|
|
const opListPromises = [];
|
|
|
|
for (const annotation of annotations) {
|
[api-minor] Introduce a new `annotationMode`-option, in `PDFPageProxy.{render, getOperatorList}`
*This is a follow-up to PRs 13867 and 13899.*
This patch is tagged `api-minor` for the following reasons:
- It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.
- For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.
- It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.
---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
2021-08-08 21:36:28 +09:00
|
|
|
if (
|
2022-06-15 23:57:33 +09:00
|
|
|
intentAny ||
|
2023-07-30 22:52:27 +09:00
|
|
|
(intentDisplay &&
|
|
|
|
annotation.mustBeViewed(annotationStorage, renderForms)) ||
|
2022-06-15 23:57:33 +09:00
|
|
|
(intentPrint && annotation.mustBePrinted(annotationStorage))
|
[api-minor] Introduce a new `annotationMode`-option, in `PDFPageProxy.{render, getOperatorList}`
*This is a follow-up to PRs 13867 and 13899.*
This patch is tagged `api-minor` for the following reasons:
- It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.
- For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.
- It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.
---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
2021-08-08 21:36:28 +09:00
|
|
|
) {
|
2022-06-15 23:57:33 +09:00
|
|
|
opListPromises.push(
|
|
|
|
annotation
|
|
|
|
.getOperatorList(
|
|
|
|
partialEvaluator,
|
|
|
|
task,
|
|
|
|
intent,
|
|
|
|
renderForms,
|
|
|
|
annotationStorage
|
|
|
|
)
|
|
|
|
.catch(function (reason) {
|
|
|
|
warn(
|
|
|
|
"getOperatorList - ignoring annotation data during " +
|
|
|
|
`"${task.name}" task: "${reason}".`
|
|
|
|
);
|
2022-10-10 16:40:30 +09:00
|
|
|
return {
|
|
|
|
opList: null,
|
|
|
|
separateForm: false,
|
|
|
|
separateCanvas: false,
|
|
|
|
};
|
2022-06-15 23:57:33 +09:00
|
|
|
})
|
|
|
|
);
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
}
|
2022-06-15 23:57:33 +09:00
|
|
|
|
|
|
|
return Promise.all(opListPromises).then(function (opLists) {
|
2022-06-27 18:41:37 +09:00
|
|
|
let form = false,
|
|
|
|
canvas = false;
|
|
|
|
|
|
|
|
for (const { opList, separateForm, separateCanvas } of opLists) {
|
2022-06-15 23:57:33 +09:00
|
|
|
pageOpList.addOpList(opList);
|
2022-06-27 18:41:37 +09:00
|
|
|
|
2022-07-17 18:24:05 +09:00
|
|
|
form ||= separateForm;
|
|
|
|
canvas ||= separateCanvas;
|
2022-06-15 23:57:33 +09:00
|
|
|
}
|
2022-06-27 18:41:37 +09:00
|
|
|
pageOpList.flush(
|
|
|
|
/* lastChunk = */ true,
|
|
|
|
/* separateAnnots = */ { form, canvas }
|
|
|
|
);
|
2022-06-15 23:57:33 +09:00
|
|
|
return { length: pageOpList.totalLength };
|
|
|
|
});
|
|
|
|
});
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
}
|
|
|
|
|
2023-03-23 18:15:14 +09:00
|
|
|
extractTextContent({
|
|
|
|
handler,
|
|
|
|
task,
|
|
|
|
includeMarkedContent,
|
|
|
|
disableNormalization,
|
|
|
|
sink,
|
|
|
|
}) {
|
2023-03-07 17:42:04 +09:00
|
|
|
const contentStreamPromise = this.getContentStream();
|
2018-12-29 23:46:22 +09:00
|
|
|
const resourcesPromise = this.loadResources([
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
"ExtGState",
|
|
|
|
"Font",
|
2021-04-21 02:51:01 +09:00
|
|
|
"Properties",
|
|
|
|
"XObject",
|
2018-12-29 23:46:22 +09:00
|
|
|
]);
|
|
|
|
|
|
|
|
const dataPromises = Promise.all([contentStreamPromise, resourcesPromise]);
|
|
|
|
return dataPromises.then(([contentStream]) => {
|
|
|
|
const partialEvaluator = new PartialEvaluator({
|
Change the signatures of the `PartialEvaluator` "constructor" and its `getOperatorList`/`getTextContent` methods to take parameter objects
Currently these methods accept a large number of parameters, which creates quite unwieldy call-sites. When invoking them, you have to remember not only what arguments to supply, but also the correct order, to avoid runtime errors.
Furthermore, since some of the parameters are optional, you also have to remember to pass e.g. `null` or `undefined` for those ones.
Also, adding new parameters to these methods (which happens occasionally), often becomes unnecessarily tedious (based on personal experience).
Please note that I do *not* think that we need/should convert *every* single method in `evaluator.js` (or elsewhere in `/core` files) to take parameter objects. However, in my opinion, once a method starts relying on approximately five parameter (or even more), passing them in individually becomes quite cumbersome.
With these changes, I obviously needed to update the `evaluator_spec.js` unit-tests. The main change there, except the new method signatures[1], is that it's now re-using *one* `PartialEvalutor` instance, since I couldn't see any compelling reason for creating a new one in every single test.
*Note:* If this patch is accepted, my intention is to (time permitting) see if it makes sense to convert additional methods in `evaluator.js` (and other `/core` files) in a similar fashion, but I figured that it'd be a good idea to limit the initial scope somewhat.
---
[1] A fun fact here, note how the `PartialEvaluator` signature used in `evaluator_spec.js` wasn't even correct in the current `master`.
2017-04-30 06:13:51 +09:00
|
|
|
xref: this.xref,
|
|
|
|
handler,
|
|
|
|
pageIndex: this.pageIndex,
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
idFactory: this._localIdFactory,
|
Change the signatures of the `PartialEvaluator` "constructor" and its `getOperatorList`/`getTextContent` methods to take parameter objects
Currently these methods accept a large number of parameters, which creates quite unwieldy call-sites. When invoking them, you have to remember not only what arguments to supply, but also the correct order, to avoid runtime errors.
Furthermore, since some of the parameters are optional, you also have to remember to pass e.g. `null` or `undefined` for those ones.
Also, adding new parameters to these methods (which happens occasionally), often becomes unnecessarily tedious (based on personal experience).
Please note that I do *not* think that we need/should convert *every* single method in `evaluator.js` (or elsewhere in `/core` files) to take parameter objects. However, in my opinion, once a method starts relying on approximately five parameter (or even more), passing them in individually becomes quite cumbersome.
With these changes, I obviously needed to update the `evaluator_spec.js` unit-tests. The main change there, except the new method signatures[1], is that it's now re-using *one* `PartialEvalutor` instance, since I couldn't see any compelling reason for creating a new one in every single test.
*Note:* If this patch is accepted, my intention is to (time permitting) see if it makes sense to convert additional methods in `evaluator.js` (and other `/core` files) in a similar fashion, but I figured that it'd be a good idea to limit the initial scope somewhat.
---
[1] A fun fact here, note how the `PartialEvaluator` signature used in `evaluator_spec.js` wasn't even correct in the current `master`.
2017-04-30 06:13:51 +09:00
|
|
|
fontCache: this.fontCache,
|
|
|
|
builtInCMapCache: this.builtInCMapCache,
|
2021-06-08 20:58:52 +09:00
|
|
|
standardFontDataCache: this.standardFontDataCache,
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
globalImageCache: this.globalImageCache,
|
2023-05-10 22:31:07 +09:00
|
|
|
systemFontCache: this.systemFontCache,
|
Change the signatures of the `PartialEvaluator` "constructor" and its `getOperatorList`/`getTextContent` methods to take parameter objects
Currently these methods accept a large number of parameters, which creates quite unwieldy call-sites. When invoking them, you have to remember not only what arguments to supply, but also the correct order, to avoid runtime errors.
Furthermore, since some of the parameters are optional, you also have to remember to pass e.g. `null` or `undefined` for those ones.
Also, adding new parameters to these methods (which happens occasionally), often becomes unnecessarily tedious (based on personal experience).
Please note that I do *not* think that we need/should convert *every* single method in `evaluator.js` (or elsewhere in `/core` files) to take parameter objects. However, in my opinion, once a method starts relying on approximately five parameter (or even more), passing them in individually becomes quite cumbersome.
With these changes, I obviously needed to update the `evaluator_spec.js` unit-tests. The main change there, except the new method signatures[1], is that it's now re-using *one* `PartialEvalutor` instance, since I couldn't see any compelling reason for creating a new one in every single test.
*Note:* If this patch is accepted, my intention is to (time permitting) see if it makes sense to convert additional methods in `evaluator.js` (and other `/core` files) in a similar fashion, but I figured that it'd be a good idea to limit the initial scope somewhat.
---
[1] A fun fact here, note how the `PartialEvaluator` signature used in `evaluator_spec.js` wasn't even correct in the current `master`.
2017-04-30 06:13:51 +09:00
|
|
|
options: this.evaluatorOptions,
|
|
|
|
});
|
2013-04-23 06:20:49 +09:00
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
return partialEvaluator.getTextContent({
|
|
|
|
stream: contentStream,
|
|
|
|
task,
|
|
|
|
resources: this.resources,
|
2021-04-01 07:07:02 +09:00
|
|
|
includeMarkedContent,
|
2023-03-23 18:15:14 +09:00
|
|
|
disableNormalization,
|
2018-12-29 23:46:22 +09:00
|
|
|
sink,
|
2022-02-14 03:39:40 +09:00
|
|
|
viewBox: this.view,
|
2013-04-09 07:14:56 +09:00
|
|
|
});
|
2018-12-29 23:46:22 +09:00
|
|
|
});
|
|
|
|
}
|
2013-04-09 07:14:56 +09:00
|
|
|
|
2021-04-01 07:07:02 +09:00
|
|
|
async getStructTree() {
|
|
|
|
const structTreeRoot = await this.pdfManager.ensureCatalog(
|
|
|
|
"structTreeRoot"
|
|
|
|
);
|
2021-04-12 15:52:35 +09:00
|
|
|
if (!structTreeRoot) {
|
|
|
|
return null;
|
|
|
|
}
|
|
|
|
const structTree = await this.pdfManager.ensure(this, "_parseStructTree", [
|
|
|
|
structTreeRoot,
|
|
|
|
]);
|
|
|
|
return structTree.serializable;
|
2021-04-11 19:00:14 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* @private
|
|
|
|
*/
|
|
|
|
_parseStructTree(structTreeRoot) {
|
2021-04-01 07:07:02 +09:00
|
|
|
const tree = new StructTreePage(structTreeRoot, this.pageDict);
|
|
|
|
tree.parse();
|
|
|
|
return tree;
|
|
|
|
}
|
|
|
|
|
2022-08-03 19:03:49 +09:00
|
|
|
async getAnnotationsData(handler, task, intent) {
|
|
|
|
const annotations = await this._parsedAnnotations;
|
|
|
|
if (annotations.length === 0) {
|
|
|
|
return [];
|
|
|
|
}
|
|
|
|
|
2022-07-17 18:24:05 +09:00
|
|
|
const annotationsData = [],
|
|
|
|
textContentPromises = [];
|
2022-08-03 19:03:49 +09:00
|
|
|
let partialEvaluator;
|
[api-minor] Re-factor the *internal* renderingIntent, and change the default `intent` value in the `PDFPageProxy.getAnnotations` method
With the changes made in PR 13746 the *internal* renderingIntent handling became somewhat "messy", since we're now having to do string-matching in various spots in order to handle the "oplist"-intent correctly.
Hence this patch, which implements the idea from PR 13746 to convert the `intent`-strings, used in various API-methods, into an *internal* renderingIntent that's implemented using a bit-field instead. *Please note:* This part of the patch, in itself, does *not* change the public API (but see below).
This patch is tagged `api-minor` for the following reasons:
1. It changes the *default* value for the `intent` parameter, in the `PDFPageProxy.getAnnotations` method, to "display" in order to be consistent across the API.
2. In order to get *all* annotations, with the `PDFPageProxy.getAnnotations` method, you now need to explicitly set "any" as the `intent` parameter.
3. The `PDFPageProxy.getOperatorList` method will now also support the new "any" intent, to allow accessing the operatorList of all annotations (limited to those types that have one).
4. Finally, for consistency across the API, the `PDFPageProxy.render` method also support the new "any" intent (although I'm not sure how useful that'll be).
Points 1 and 2 above are the significant, and thus breaking, changes in *default* behaviour here. However, unfortunately I cannot see a good way to improve the overall API while also keeping `PDFPageProxy.getAnnotations` unchanged.
2021-08-02 21:30:08 +09:00
|
|
|
|
2022-08-03 19:03:49 +09:00
|
|
|
const intentAny = !!(intent & RenderingIntentFlag.ANY),
|
|
|
|
intentDisplay = !!(intent & RenderingIntentFlag.DISPLAY),
|
|
|
|
intentPrint = !!(intent & RenderingIntentFlag.PRINT);
|
|
|
|
|
|
|
|
for (const annotation of annotations) {
|
|
|
|
// Get the annotation even if it's hidden because
|
|
|
|
// JS can change its display.
|
|
|
|
const isVisible = intentAny || (intentDisplay && annotation.viewable);
|
|
|
|
if (isVisible || (intentPrint && annotation.printable)) {
|
|
|
|
annotationsData.push(annotation.data);
|
[api-minor] Re-factor the *internal* renderingIntent, and change the default `intent` value in the `PDFPageProxy.getAnnotations` method
With the changes made in PR 13746 the *internal* renderingIntent handling became somewhat "messy", since we're now having to do string-matching in various spots in order to handle the "oplist"-intent correctly.
Hence this patch, which implements the idea from PR 13746 to convert the `intent`-strings, used in various API-methods, into an *internal* renderingIntent that's implemented using a bit-field instead. *Please note:* This part of the patch, in itself, does *not* change the public API (but see below).
This patch is tagged `api-minor` for the following reasons:
1. It changes the *default* value for the `intent` parameter, in the `PDFPageProxy.getAnnotations` method, to "display" in order to be consistent across the API.
2. In order to get *all* annotations, with the `PDFPageProxy.getAnnotations` method, you now need to explicitly set "any" as the `intent` parameter.
3. The `PDFPageProxy.getOperatorList` method will now also support the new "any" intent, to allow accessing the operatorList of all annotations (limited to those types that have one).
4. Finally, for consistency across the API, the `PDFPageProxy.render` method also support the new "any" intent (although I'm not sure how useful that'll be).
Points 1 and 2 above are the significant, and thus breaking, changes in *default* behaviour here. However, unfortunately I cannot see a good way to improve the overall API while also keeping `PDFPageProxy.getAnnotations` unchanged.
2021-08-02 21:30:08 +09:00
|
|
|
}
|
|
|
|
|
2022-08-03 19:03:49 +09:00
|
|
|
if (annotation.hasTextContent && isVisible) {
|
2022-07-17 18:24:05 +09:00
|
|
|
partialEvaluator ||= new PartialEvaluator({
|
|
|
|
xref: this.xref,
|
|
|
|
handler,
|
|
|
|
pageIndex: this.pageIndex,
|
|
|
|
idFactory: this._localIdFactory,
|
|
|
|
fontCache: this.fontCache,
|
|
|
|
builtInCMapCache: this.builtInCMapCache,
|
|
|
|
standardFontDataCache: this.standardFontDataCache,
|
|
|
|
globalImageCache: this.globalImageCache,
|
2023-05-10 22:31:07 +09:00
|
|
|
systemFontCache: this.systemFontCache,
|
2022-07-17 18:24:05 +09:00
|
|
|
options: this.evaluatorOptions,
|
|
|
|
});
|
|
|
|
|
2022-08-03 19:03:49 +09:00
|
|
|
textContentPromises.push(
|
|
|
|
annotation
|
2023-05-26 05:59:09 +09:00
|
|
|
.extractTextContent(partialEvaluator, task, [
|
|
|
|
-Infinity,
|
|
|
|
-Infinity,
|
|
|
|
Infinity,
|
|
|
|
Infinity,
|
|
|
|
])
|
2022-08-03 19:03:49 +09:00
|
|
|
.catch(function (reason) {
|
|
|
|
warn(
|
|
|
|
`getAnnotationsData - ignoring textContent during "${task.name}" task: "${reason}".`
|
|
|
|
);
|
|
|
|
})
|
|
|
|
);
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
2022-08-03 19:03:49 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
await Promise.all(textContentPromises);
|
|
|
|
return annotationsData;
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
2017-02-16 07:52:15 +09:00
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
get annotations() {
|
2020-12-10 19:31:46 +09:00
|
|
|
const annots = this._getInheritableProperty("Annots");
|
|
|
|
return shadow(this, "annotations", Array.isArray(annots) ? annots : []);
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
Change the signatures of the `PartialEvaluator` "constructor" and its `getOperatorList`/`getTextContent` methods to take parameter objects
Currently these methods accept a large number of parameters, which creates quite unwieldy call-sites. When invoking them, you have to remember not only what arguments to supply, but also the correct order, to avoid runtime errors.
Furthermore, since some of the parameters are optional, you also have to remember to pass e.g. `null` or `undefined` for those ones.
Also, adding new parameters to these methods (which happens occasionally), often becomes unnecessarily tedious (based on personal experience).
Please note that I do *not* think that we need/should convert *every* single method in `evaluator.js` (or elsewhere in `/core` files) to take parameter objects. However, in my opinion, once a method starts relying on approximately five parameter (or even more), passing them in individually becomes quite cumbersome.
With these changes, I obviously needed to update the `evaluator_spec.js` unit-tests. The main change there, except the new method signatures[1], is that it's now re-using *one* `PartialEvalutor` instance, since I couldn't see any compelling reason for creating a new one in every single test.
*Note:* If this patch is accepted, my intention is to (time permitting) see if it makes sense to convert additional methods in `evaluator.js` (and other `/core` files) in a similar fashion, but I figured that it'd be a good idea to limit the initial scope somewhat.
---
[1] A fun fact here, note how the `PartialEvaluator` signature used in `evaluator_spec.js` wasn't even correct in the current `master`.
2017-04-30 06:13:51 +09:00
|
|
|
|
2018-12-29 23:46:22 +09:00
|
|
|
get _parsedAnnotations() {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
const parsedAnnotations = this.pdfManager
|
|
|
|
.ensure(this, "annotations")
|
|
|
|
.then(() => {
|
2018-12-29 23:46:22 +09:00
|
|
|
const annotationPromises = [];
|
2020-05-09 18:58:09 +09:00
|
|
|
for (const annotationRef of this.annotations) {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
annotationPromises.push(
|
|
|
|
AnnotationFactory.create(
|
|
|
|
this.xref,
|
2020-05-09 18:58:09 +09:00
|
|
|
annotationRef,
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
this.pdfManager,
|
2021-03-31 00:50:35 +09:00
|
|
|
this._localIdFactory,
|
|
|
|
/* collectFields */ false
|
2020-05-09 18:58:09 +09:00
|
|
|
).catch(function (reason) {
|
|
|
|
warn(`_parsedAnnotations: "${reason}".`);
|
|
|
|
return null;
|
|
|
|
})
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
);
|
2015-11-22 21:56:52 +09:00
|
|
|
}
|
2018-12-29 23:46:22 +09:00
|
|
|
|
2020-05-09 18:58:09 +09:00
|
|
|
return Promise.all(annotationPromises).then(function (annotations) {
|
2022-08-06 18:36:04 +09:00
|
|
|
if (annotations.length === 0) {
|
|
|
|
return annotations;
|
|
|
|
}
|
|
|
|
|
|
|
|
const sortedAnnotations = [];
|
|
|
|
let popupAnnotations;
|
|
|
|
// Ensure that PopupAnnotations are handled last, since they depend on
|
|
|
|
// their parent Annotation in the display layer; fixes issue 11362.
|
|
|
|
for (const annotation of annotations) {
|
|
|
|
if (!annotation) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
if (annotation instanceof PopupAnnotation) {
|
2022-07-17 18:24:05 +09:00
|
|
|
(popupAnnotations ||= []).push(annotation);
|
2022-08-06 18:36:04 +09:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
sortedAnnotations.push(annotation);
|
|
|
|
}
|
|
|
|
if (popupAnnotations) {
|
|
|
|
sortedAnnotations.push(...popupAnnotations);
|
|
|
|
}
|
|
|
|
|
|
|
|
return sortedAnnotations;
|
2020-05-09 18:58:09 +09:00
|
|
|
});
|
2018-12-29 23:46:22 +09:00
|
|
|
});
|
2018-03-21 09:43:40 +09:00
|
|
|
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(this, "_parsedAnnotations", parsedAnnotations);
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
2020-12-08 03:22:14 +09:00
|
|
|
|
|
|
|
get jsActions() {
|
|
|
|
const actions = collectActions(
|
|
|
|
this.xref,
|
|
|
|
this.pageDict,
|
|
|
|
PageActionEventType
|
|
|
|
);
|
|
|
|
return shadow(this, "jsActions", actions);
|
|
|
|
}
|
2018-12-29 23:46:22 +09:00
|
|
|
}
|
2011-10-25 10:13:12 +09:00
|
|
|
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
const PDF_HEADER_SIGNATURE = new Uint8Array([0x25, 0x50, 0x44, 0x46, 0x2d]);
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
const STARTXREF_SIGNATURE = new Uint8Array([
|
2021-05-19 18:24:38 +09:00
|
|
|
0x73, 0x74, 0x61, 0x72, 0x74, 0x78, 0x72, 0x65, 0x66,
|
|
|
|
]);
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
const ENDOBJ_SIGNATURE = new Uint8Array([0x65, 0x6e, 0x64, 0x6f, 0x62, 0x6a]);
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
|
2018-12-30 00:18:36 +09:00
|
|
|
const FINGERPRINT_FIRST_BYTES = 1024;
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
const EMPTY_FINGERPRINT =
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00";
|
2018-12-30 00:18:36 +09:00
|
|
|
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
function find(stream, signature, limit = 1024, backwards = false) {
|
2023-03-18 20:09:25 +09:00
|
|
|
if (typeof PDFJSDev === "undefined" || PDFJSDev.test("TESTING")) {
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
assert(limit > 0, 'The "limit" must be a positive integer.');
|
|
|
|
}
|
|
|
|
const signatureLength = signature.length;
|
2018-12-30 00:18:36 +09:00
|
|
|
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
const scanBytes = stream.peekBytes(limit);
|
|
|
|
const scanLength = scanBytes.length - signatureLength;
|
2018-12-30 00:18:36 +09:00
|
|
|
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
if (scanLength <= 0) {
|
2018-12-30 00:18:36 +09:00
|
|
|
return false;
|
|
|
|
}
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
if (backwards) {
|
|
|
|
const signatureEnd = signatureLength - 1;
|
|
|
|
|
|
|
|
let pos = scanBytes.length - 1;
|
|
|
|
while (pos >= signatureEnd) {
|
|
|
|
let j = 0;
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
while (
|
|
|
|
j < signatureLength &&
|
|
|
|
scanBytes[pos - j] === signature[signatureEnd - j]
|
|
|
|
) {
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
j++;
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
if (j >= signatureLength) {
|
|
|
|
// `signature` found.
|
|
|
|
stream.pos += pos - signatureEnd;
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
return true;
|
|
|
|
}
|
|
|
|
pos--;
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
} else {
|
|
|
|
// forwards
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
let pos = 0;
|
|
|
|
while (pos <= scanLength) {
|
|
|
|
let j = 0;
|
|
|
|
while (j < signatureLength && scanBytes[pos + j] === signature[j]) {
|
|
|
|
j++;
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
if (j >= signatureLength) {
|
|
|
|
// `signature` found.
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
stream.pos += pos;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
pos++;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return false;
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
|
|
|
|
2011-10-25 10:13:12 +09:00
|
|
|
/**
|
2019-11-25 19:25:57 +09:00
|
|
|
* The `PDFDocument` class holds all the (worker-thread) data of the PDF file.
|
2011-10-25 10:13:12 +09:00
|
|
|
*/
|
2018-12-30 00:18:36 +09:00
|
|
|
class PDFDocument {
|
2022-03-09 01:13:47 +09:00
|
|
|
constructor(pdfManager, stream) {
|
2023-03-18 20:09:25 +09:00
|
|
|
if (typeof PDFJSDev === "undefined" || PDFJSDev.test("TESTING")) {
|
2022-03-09 01:13:47 +09:00
|
|
|
assert(
|
|
|
|
stream instanceof BaseStream,
|
|
|
|
'PDFDocument: Invalid "stream" argument.'
|
|
|
|
);
|
2014-03-23 04:36:35 +09:00
|
|
|
}
|
2017-07-20 21:04:54 +09:00
|
|
|
if (stream.length <= 0) {
|
2019-12-09 23:00:45 +09:00
|
|
|
throw new InvalidPDFException(
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
"The PDF file is empty, i.e. its size is zero bytes."
|
|
|
|
);
|
2017-07-20 21:04:54 +09:00
|
|
|
}
|
2017-01-03 20:39:38 +09:00
|
|
|
|
2013-04-09 07:14:56 +09:00
|
|
|
this.pdfManager = pdfManager;
|
2011-10-25 10:13:12 +09:00
|
|
|
this.stream = stream;
|
2017-01-03 20:39:38 +09:00
|
|
|
this.xref = new XRef(stream, pdfManager);
|
2021-11-27 03:47:13 +09:00
|
|
|
this._pagePromises = new Map();
|
2020-08-23 05:21:38 +09:00
|
|
|
this._version = null;
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
|
|
|
|
const idCounters = {
|
|
|
|
font: 0,
|
|
|
|
};
|
|
|
|
this._globalIdFactory = class {
|
|
|
|
static getDocId() {
|
|
|
|
return `g_${pdfManager.docId}`;
|
|
|
|
}
|
|
|
|
|
|
|
|
static createFontId() {
|
|
|
|
return `f${++idCounters.font}`;
|
|
|
|
}
|
|
|
|
|
|
|
|
static createObjId() {
|
|
|
|
unreachable("Abstract method `createObjId` called.");
|
2021-04-01 07:07:02 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
static getPageObjId() {
|
|
|
|
unreachable("Abstract method `getPageObjId` called.");
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
}
|
|
|
|
};
|
2011-10-25 10:13:12 +09:00
|
|
|
}
|
|
|
|
|
2018-12-30 00:18:36 +09:00
|
|
|
parse(recoveryMode) {
|
2020-08-26 06:22:21 +09:00
|
|
|
this.xref.parse(recoveryMode);
|
|
|
|
this.catalog = new Catalog(this.pdfManager, this.xref);
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
get linearization() {
|
|
|
|
let linearization = null;
|
|
|
|
try {
|
|
|
|
linearization = Linearization.create(this.stream);
|
|
|
|
} catch (err) {
|
|
|
|
if (err instanceof MissingDataException) {
|
|
|
|
throw err;
|
2011-10-25 10:13:12 +09:00
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
info(err);
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(this, "linearization", linearization);
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
get startXRef() {
|
|
|
|
const stream = this.stream;
|
|
|
|
let startXRef = 0;
|
|
|
|
|
|
|
|
if (this.linearization) {
|
|
|
|
// Find the end of the first object.
|
2011-10-25 10:13:12 +09:00
|
|
|
stream.reset();
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
if (find(stream, ENDOBJ_SIGNATURE)) {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
startXRef = stream.pos + 6 - stream.start;
|
2011-10-25 10:13:12 +09:00
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
} else {
|
|
|
|
// Find `startxref` by checking backwards from the end of the file.
|
|
|
|
const step = 1024;
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
const startXRefLength = STARTXREF_SIGNATURE.length;
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
let found = false,
|
|
|
|
pos = stream.end;
|
2018-12-30 00:18:36 +09:00
|
|
|
|
|
|
|
while (!found && pos > 0) {
|
|
|
|
pos -= step - startXRefLength;
|
|
|
|
if (pos < 0) {
|
|
|
|
pos = 0;
|
2017-03-22 22:02:34 +09:00
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
stream.pos = pos;
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
found = find(stream, STARTXREF_SIGNATURE, step, true);
|
2013-08-09 02:02:11 +09:00
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
|
|
|
|
if (found) {
|
|
|
|
stream.skip(9);
|
|
|
|
let ch;
|
|
|
|
do {
|
|
|
|
ch = stream.getByte();
|
2020-02-10 17:38:57 +09:00
|
|
|
} while (isWhiteSpace(ch));
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
let str = "";
|
2019-12-26 04:03:46 +09:00
|
|
|
while (ch >= /* Space = */ 0x20 && ch <= /* '9' = */ 0x39) {
|
2018-12-30 00:18:36 +09:00
|
|
|
str += String.fromCharCode(ch);
|
|
|
|
ch = stream.getByte();
|
2012-08-04 08:11:43 +09:00
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
startXRef = parseInt(str, 10);
|
|
|
|
if (isNaN(startXRef)) {
|
|
|
|
startXRef = 0;
|
2014-08-02 22:19:55 +09:00
|
|
|
}
|
2013-10-03 17:09:06 +09:00
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(this, "startXRef", startXRef);
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
2013-10-03 17:09:06 +09:00
|
|
|
|
2018-12-30 00:18:36 +09:00
|
|
|
// Find the header, get the PDF format version and setup the
|
|
|
|
// stream to start from the header.
|
|
|
|
checkHeader() {
|
|
|
|
const stream = this.stream;
|
|
|
|
stream.reset();
|
|
|
|
|
Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.
The main benefits here are:
- No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
- In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-13 21:42:07 +09:00
|
|
|
if (!find(stream, PDF_HEADER_SIGNATURE)) {
|
2018-12-30 00:18:36 +09:00
|
|
|
// May not be a PDF file, but don't throw an error and let
|
|
|
|
// parsing continue.
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
stream.moveStart();
|
|
|
|
|
2022-10-11 20:03:05 +09:00
|
|
|
// Skip over the "%PDF-" prefix, since it was found above.
|
|
|
|
stream.skip(PDF_HEADER_SIGNATURE.length);
|
2018-12-30 00:18:36 +09:00
|
|
|
// Read the PDF format version.
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
let version = "",
|
|
|
|
ch;
|
2022-10-11 20:03:05 +09:00
|
|
|
while (
|
|
|
|
(ch = stream.getByte()) > /* Space = */ 0x20 &&
|
|
|
|
version.length < /* MAX_PDF_VERSION_LENGTH = */ 7
|
|
|
|
) {
|
2018-12-30 00:18:36 +09:00
|
|
|
version += String.fromCharCode(ch);
|
|
|
|
}
|
2022-10-15 18:55:37 +09:00
|
|
|
|
|
|
|
if (PDF_VERSION_REGEXP.test(version)) {
|
2022-10-11 20:03:05 +09:00
|
|
|
this._version = version;
|
2022-10-15 18:55:37 +09:00
|
|
|
} else {
|
|
|
|
warn(`Invalid PDF header version: ${version}`);
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
|
|
|
}
|
2012-03-27 07:14:59 +09:00
|
|
|
|
2018-12-30 00:18:36 +09:00
|
|
|
parseStartXRef() {
|
|
|
|
this.xref.setStartXRef(this.startXRef);
|
|
|
|
}
|
|
|
|
|
|
|
|
get numPages() {
|
2021-10-16 19:16:40 +09:00
|
|
|
let num = 0;
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
if (this.catalog.hasActualNumPages) {
|
|
|
|
num = this.catalog.numPages;
|
|
|
|
} else if (this.xfaFactory) {
|
2021-11-06 04:52:13 +09:00
|
|
|
// num is a Promise.
|
|
|
|
num = this.xfaFactory.getNumPages();
|
2021-10-16 19:16:40 +09:00
|
|
|
} else if (this.linearization) {
|
|
|
|
num = this.linearization.numPages;
|
|
|
|
} else {
|
|
|
|
num = this.catalog.numPages;
|
2021-03-19 18:11:40 +09:00
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(this, "numPages", num);
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
|
|
|
|
2020-08-23 21:04:49 +09:00
|
|
|
/**
|
|
|
|
* @private
|
|
|
|
*/
|
|
|
|
_hasOnlyDocumentSignatures(fields, recursionDepth = 0) {
|
|
|
|
const RECURSION_LIMIT = 10;
|
2020-10-16 19:57:01 +09:00
|
|
|
|
|
|
|
if (!Array.isArray(fields)) {
|
|
|
|
return false;
|
|
|
|
}
|
2020-08-23 21:04:49 +09:00
|
|
|
return fields.every(field => {
|
|
|
|
field = this.xref.fetchIfRef(field);
|
2020-10-16 19:57:01 +09:00
|
|
|
if (!(field instanceof Dict)) {
|
|
|
|
return false;
|
|
|
|
}
|
2020-08-23 21:04:49 +09:00
|
|
|
if (field.has("Kids")) {
|
|
|
|
if (++recursionDepth > RECURSION_LIMIT) {
|
|
|
|
warn("_hasOnlyDocumentSignatures: maximum recursion depth reached");
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
return this._hasOnlyDocumentSignatures(
|
|
|
|
field.get("Kids"),
|
|
|
|
recursionDepth
|
|
|
|
);
|
|
|
|
}
|
|
|
|
const isSignature = isName(field.get("FT"), "Sig");
|
|
|
|
const rectangle = field.get("Rect");
|
|
|
|
const isInvisible =
|
|
|
|
Array.isArray(rectangle) && rectangle.every(value => value === 0);
|
|
|
|
return isSignature && isInvisible;
|
|
|
|
});
|
|
|
|
}
|
|
|
|
|
2022-04-01 19:55:28 +09:00
|
|
|
get _xfaStreams() {
|
2021-03-19 18:11:40 +09:00
|
|
|
const acroForm = this.catalog.acroForm;
|
|
|
|
if (!acroForm) {
|
|
|
|
return null;
|
|
|
|
}
|
|
|
|
|
|
|
|
const xfa = acroForm.get("XFA");
|
|
|
|
const entries = {
|
|
|
|
"xdp:xdp": "",
|
|
|
|
template: "",
|
|
|
|
datasets: "",
|
|
|
|
config: "",
|
|
|
|
connectionSet: "",
|
|
|
|
localeSet: "",
|
|
|
|
stylesheet: "",
|
|
|
|
"/xdp:xdp": "",
|
|
|
|
};
|
2022-02-17 21:45:42 +09:00
|
|
|
if (xfa instanceof BaseStream && !xfa.isEmpty) {
|
2022-04-01 19:55:28 +09:00
|
|
|
entries["xdp:xdp"] = xfa;
|
|
|
|
return entries;
|
2021-03-19 18:11:40 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
if (!Array.isArray(xfa) || xfa.length === 0) {
|
|
|
|
return null;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (let i = 0, ii = xfa.length; i < ii; i += 2) {
|
|
|
|
let name;
|
|
|
|
if (i === 0) {
|
|
|
|
name = "xdp:xdp";
|
|
|
|
} else if (i === ii - 2) {
|
|
|
|
name = "/xdp:xdp";
|
|
|
|
} else {
|
|
|
|
name = xfa[i];
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!entries.hasOwnProperty(name)) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
const data = this.xref.fetchIfRef(xfa[i + 1]);
|
2022-02-17 21:45:42 +09:00
|
|
|
if (!(data instanceof BaseStream) || data.isEmpty) {
|
2021-03-19 18:11:40 +09:00
|
|
|
continue;
|
|
|
|
}
|
2022-04-01 19:55:28 +09:00
|
|
|
entries[name] = data;
|
|
|
|
}
|
|
|
|
return entries;
|
|
|
|
}
|
|
|
|
|
|
|
|
get xfaDatasets() {
|
|
|
|
const streams = this._xfaStreams;
|
|
|
|
if (!streams) {
|
|
|
|
return shadow(this, "xfaDatasets", null);
|
|
|
|
}
|
|
|
|
for (const key of ["datasets", "xdp:xdp"]) {
|
|
|
|
const stream = streams[key];
|
|
|
|
if (!stream) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
try {
|
|
|
|
const str = stringToUTF8String(stream.getString());
|
|
|
|
const data = { [key]: str };
|
|
|
|
return shadow(this, "xfaDatasets", new DatasetReader(data));
|
2023-06-12 18:46:11 +09:00
|
|
|
} catch {
|
2022-04-01 19:55:28 +09:00
|
|
|
warn("XFA - Invalid utf-8 string.");
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return shadow(this, "xfaDatasets", null);
|
|
|
|
}
|
|
|
|
|
|
|
|
get xfaData() {
|
|
|
|
const streams = this._xfaStreams;
|
|
|
|
if (!streams) {
|
|
|
|
return null;
|
|
|
|
}
|
|
|
|
const data = Object.create(null);
|
|
|
|
for (const [key, stream] of Object.entries(streams)) {
|
|
|
|
if (!stream) {
|
|
|
|
continue;
|
|
|
|
}
|
2021-03-19 18:11:40 +09:00
|
|
|
try {
|
2022-04-01 19:55:28 +09:00
|
|
|
data[key] = stringToUTF8String(stream.getString());
|
2023-06-12 18:46:11 +09:00
|
|
|
} catch {
|
2021-03-19 18:11:40 +09:00
|
|
|
warn("XFA - Invalid utf-8 string.");
|
|
|
|
return null;
|
|
|
|
}
|
|
|
|
}
|
2022-04-01 19:55:28 +09:00
|
|
|
return data;
|
2021-03-19 18:11:40 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
get xfaFactory() {
|
2021-10-16 19:16:40 +09:00
|
|
|
let data;
|
2021-03-19 18:11:40 +09:00
|
|
|
if (
|
|
|
|
this.pdfManager.enableXfa &&
|
2021-07-03 19:59:42 +09:00
|
|
|
this.catalog.needsRendering &&
|
2021-03-19 18:11:40 +09:00
|
|
|
this.formInfo.hasXfa &&
|
|
|
|
!this.formInfo.hasAcroForm
|
|
|
|
) {
|
2021-10-16 19:16:40 +09:00
|
|
|
data = this.xfaData;
|
2021-03-19 18:11:40 +09:00
|
|
|
}
|
2021-10-16 19:16:40 +09:00
|
|
|
return shadow(this, "xfaFactory", data ? new XFAFactory(data) : null);
|
2021-03-19 18:11:40 +09:00
|
|
|
}
|
|
|
|
|
2021-06-15 02:16:42 +09:00
|
|
|
get isPureXfa() {
|
2021-10-16 19:28:48 +09:00
|
|
|
return this.xfaFactory ? this.xfaFactory.isValid() : false;
|
2021-06-15 02:16:42 +09:00
|
|
|
}
|
|
|
|
|
2021-05-25 22:50:12 +09:00
|
|
|
get htmlForXfa() {
|
2021-10-16 19:28:48 +09:00
|
|
|
return this.xfaFactory ? this.xfaFactory.getPages() : null;
|
2021-03-19 18:11:40 +09:00
|
|
|
}
|
|
|
|
|
2021-07-03 03:05:23 +09:00
|
|
|
async loadXfaImages() {
|
|
|
|
const xfaImagesDict = await this.pdfManager.ensureCatalog("xfaImages");
|
|
|
|
if (!xfaImagesDict) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
const keys = xfaImagesDict.getKeys();
|
|
|
|
const objectLoader = new ObjectLoader(xfaImagesDict, keys, this.xref);
|
|
|
|
await objectLoader.load();
|
|
|
|
|
|
|
|
const xfaImages = new Map();
|
|
|
|
for (const key of keys) {
|
|
|
|
const stream = xfaImagesDict.get(key);
|
2022-02-17 21:45:42 +09:00
|
|
|
if (stream instanceof BaseStream) {
|
|
|
|
xfaImages.set(key, stream.getBytes());
|
2021-07-03 03:05:23 +09:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
this.xfaFactory.setImages(xfaImages);
|
|
|
|
}
|
|
|
|
|
2021-03-26 17:28:18 +09:00
|
|
|
async loadXfaFonts(handler, task) {
|
|
|
|
const acroForm = await this.pdfManager.ensureCatalog("acroForm");
|
A couple of small scripting/XFA-related tweaks in the worker-code
- Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible.
- Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.)
- Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be.
Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.)
- Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.
2021-04-17 17:13:42 +09:00
|
|
|
if (!acroForm) {
|
|
|
|
return;
|
|
|
|
}
|
2021-03-26 17:28:18 +09:00
|
|
|
const resources = await acroForm.getAsync("DR");
|
|
|
|
if (!(resources instanceof Dict)) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
const objectLoader = new ObjectLoader(resources, ["Font"], this.xref);
|
|
|
|
await objectLoader.load();
|
|
|
|
|
|
|
|
const fontRes = resources.get("Font");
|
|
|
|
if (!(fontRes instanceof Dict)) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2021-06-09 03:50:31 +09:00
|
|
|
const options = Object.assign(
|
|
|
|
Object.create(null),
|
|
|
|
this.pdfManager.evaluatorOptions
|
|
|
|
);
|
|
|
|
options.useSystemFonts = false;
|
|
|
|
|
2021-03-26 17:28:18 +09:00
|
|
|
const partialEvaluator = new PartialEvaluator({
|
|
|
|
xref: this.xref,
|
|
|
|
handler,
|
|
|
|
pageIndex: -1,
|
|
|
|
idFactory: this._globalIdFactory,
|
|
|
|
fontCache: this.catalog.fontCache,
|
|
|
|
builtInCMapCache: this.catalog.builtInCMapCache,
|
2021-06-08 20:58:52 +09:00
|
|
|
standardFontDataCache: this.catalog.standardFontDataCache,
|
2021-06-09 03:50:31 +09:00
|
|
|
options,
|
2021-03-26 17:28:18 +09:00
|
|
|
});
|
|
|
|
const operatorList = new OperatorList();
|
2021-06-15 02:16:42 +09:00
|
|
|
const pdfFonts = [];
|
2021-03-26 17:28:18 +09:00
|
|
|
const initialState = {
|
2021-06-15 02:16:42 +09:00
|
|
|
get font() {
|
2022-06-09 19:53:39 +09:00
|
|
|
return pdfFonts.at(-1);
|
2021-06-15 02:16:42 +09:00
|
|
|
},
|
|
|
|
set font(font) {
|
|
|
|
pdfFonts.push(font);
|
|
|
|
},
|
2021-03-26 17:28:18 +09:00
|
|
|
clone() {
|
|
|
|
return this;
|
|
|
|
},
|
|
|
|
};
|
|
|
|
|
|
|
|
const fonts = new Map();
|
|
|
|
fontRes.forEach((fontName, font) => {
|
|
|
|
fonts.set(fontName, font);
|
|
|
|
});
|
|
|
|
const promises = [];
|
|
|
|
|
|
|
|
for (const [fontName, font] of fonts) {
|
|
|
|
const descriptor = font.get("FontDescriptor");
|
A couple of small scripting/XFA-related tweaks in the worker-code
- Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible.
- Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.)
- Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be.
Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.)
- Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.
2021-04-17 17:13:42 +09:00
|
|
|
if (!(descriptor instanceof Dict)) {
|
|
|
|
continue;
|
|
|
|
}
|
2021-06-20 21:03:59 +09:00
|
|
|
let fontFamily = descriptor.get("FontFamily");
|
|
|
|
// For example, "Wingdings 3" is not a valid font name in the css specs.
|
2023-03-23 20:34:08 +09:00
|
|
|
fontFamily = fontFamily.replaceAll(/[ ]+(\d)/g, "$1");
|
A couple of small scripting/XFA-related tweaks in the worker-code
- Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible.
- Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.)
- Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be.
Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.)
- Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.
2021-04-17 17:13:42 +09:00
|
|
|
const fontWeight = descriptor.get("FontWeight");
|
2021-05-28 18:06:11 +09:00
|
|
|
|
|
|
|
// Angle is expressed in degrees counterclockwise in PDF
|
|
|
|
// when it's clockwise in CSS
|
|
|
|
// (see https://drafts.csswg.org/css-fonts-4/#valdef-font-style-oblique-angle)
|
|
|
|
const italicAngle = -descriptor.get("ItalicAngle");
|
A couple of small scripting/XFA-related tweaks in the worker-code
- Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible.
- Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.)
- Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be.
Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.)
- Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.
2021-04-17 17:13:42 +09:00
|
|
|
const cssFontInfo = { fontFamily, fontWeight, italicAngle };
|
2021-03-26 17:28:18 +09:00
|
|
|
|
A couple of small scripting/XFA-related tweaks in the worker-code
- Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible.
- Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.)
- Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be.
Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.)
- Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.
2021-04-17 17:13:42 +09:00
|
|
|
if (!validateCSSFont(cssFontInfo)) {
|
|
|
|
continue;
|
2021-03-26 17:28:18 +09:00
|
|
|
}
|
A couple of small scripting/XFA-related tweaks in the worker-code
- Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible.
- Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.)
- Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be.
Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.)
- Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.
2021-04-17 17:13:42 +09:00
|
|
|
promises.push(
|
|
|
|
partialEvaluator
|
|
|
|
.handleSetFont(
|
|
|
|
resources,
|
|
|
|
[Name.get(fontName), 1],
|
|
|
|
/* fontRef = */ null,
|
|
|
|
operatorList,
|
|
|
|
task,
|
|
|
|
initialState,
|
|
|
|
/* fallbackFontDict = */ null,
|
|
|
|
/* cssFontInfo = */ cssFontInfo
|
|
|
|
)
|
|
|
|
.catch(function (reason) {
|
|
|
|
warn(`loadXfaFonts: "${reason}".`);
|
|
|
|
return null;
|
|
|
|
})
|
|
|
|
);
|
2021-03-26 17:28:18 +09:00
|
|
|
}
|
2021-06-20 21:03:59 +09:00
|
|
|
|
2021-03-26 17:28:18 +09:00
|
|
|
await Promise.all(promises);
|
2021-06-29 02:03:47 +09:00
|
|
|
const missingFonts = this.xfaFactory.setFonts(pdfFonts);
|
|
|
|
|
|
|
|
if (!missingFonts) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
options.ignoreErrors = true;
|
|
|
|
promises.length = 0;
|
|
|
|
pdfFonts.length = 0;
|
|
|
|
|
2021-07-10 23:18:45 +09:00
|
|
|
const reallyMissingFonts = new Set();
|
2021-06-29 02:03:47 +09:00
|
|
|
for (const missing of missingFonts) {
|
2021-07-29 01:30:22 +09:00
|
|
|
if (!getXfaFontName(`${missing}-Regular`)) {
|
2021-07-10 23:18:45 +09:00
|
|
|
// No substitution available: we'll fallback on Myriad.
|
|
|
|
reallyMissingFonts.add(missing);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (reallyMissingFonts.size) {
|
|
|
|
missingFonts.push("PdfJS-Fallback");
|
|
|
|
}
|
|
|
|
|
|
|
|
for (const missing of missingFonts) {
|
|
|
|
if (reallyMissingFonts.has(missing)) {
|
|
|
|
continue;
|
|
|
|
}
|
2021-06-29 02:03:47 +09:00
|
|
|
for (const fontInfo of [
|
|
|
|
{ name: "Regular", fontWeight: 400, italicAngle: 0 },
|
|
|
|
{ name: "Bold", fontWeight: 700, italicAngle: 0 },
|
|
|
|
{ name: "Italic", fontWeight: 400, italicAngle: 12 },
|
|
|
|
{ name: "BoldItalic", fontWeight: 700, italicAngle: 12 },
|
|
|
|
]) {
|
|
|
|
const name = `${missing}-${fontInfo.name}`;
|
2021-07-29 01:30:22 +09:00
|
|
|
const dict = getXfaFontDict(name);
|
2021-06-29 02:03:47 +09:00
|
|
|
|
|
|
|
promises.push(
|
|
|
|
partialEvaluator
|
|
|
|
.handleSetFont(
|
|
|
|
resources,
|
|
|
|
[Name.get(name), 1],
|
|
|
|
/* fontRef = */ null,
|
|
|
|
operatorList,
|
|
|
|
task,
|
|
|
|
initialState,
|
|
|
|
/* fallbackFontDict = */ dict,
|
|
|
|
/* cssFontInfo = */ {
|
|
|
|
fontFamily: missing,
|
|
|
|
fontWeight: fontInfo.fontWeight,
|
|
|
|
italicAngle: fontInfo.italicAngle,
|
|
|
|
}
|
|
|
|
)
|
|
|
|
.catch(function (reason) {
|
|
|
|
warn(`loadXfaFonts: "${reason}".`);
|
|
|
|
return null;
|
|
|
|
})
|
|
|
|
);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
await Promise.all(promises);
|
2021-07-10 23:18:45 +09:00
|
|
|
this.xfaFactory.appendFonts(pdfFonts, reallyMissingFonts);
|
2021-03-26 17:28:18 +09:00
|
|
|
}
|
|
|
|
|
2021-06-25 21:31:55 +09:00
|
|
|
async serializeXfaData(annotationStorage) {
|
2021-10-16 19:28:48 +09:00
|
|
|
return this.xfaFactory
|
|
|
|
? this.xfaFactory.serializeData(annotationStorage)
|
|
|
|
: null;
|
2021-06-25 21:31:55 +09:00
|
|
|
}
|
|
|
|
|
2022-10-15 18:55:37 +09:00
|
|
|
/**
|
|
|
|
* The specification states in section 7.5.2 that the version from
|
|
|
|
* the catalog, if present, should overwrite the version from the header.
|
|
|
|
*/
|
|
|
|
get version() {
|
|
|
|
return this.catalog.version || this._version;
|
|
|
|
}
|
|
|
|
|
2020-08-23 21:04:49 +09:00
|
|
|
get formInfo() {
|
2021-04-10 23:53:17 +09:00
|
|
|
const formInfo = {
|
|
|
|
hasFields: false,
|
|
|
|
hasAcroForm: false,
|
|
|
|
hasXfa: false,
|
|
|
|
hasSignatures: false,
|
|
|
|
};
|
2020-08-23 21:04:49 +09:00
|
|
|
const acroForm = this.catalog.acroForm;
|
|
|
|
if (!acroForm) {
|
|
|
|
return shadow(this, "formInfo", formInfo);
|
|
|
|
}
|
|
|
|
|
|
|
|
try {
|
Don't store complex data in `PDFDocument.formInfo`, and replace the `fields` object with a `hasFields` boolean instead
*This patch is based on a couple of smaller things that I noticed when working on PR 12479.*
- Don't store the /Fields on the `formInfo` getter, since that feels like overloading it with unintended (and too complex) data, and utilize a `hasFields` boolean instead.
This functionality was originally added in PR 12271, to help determine what kind of form data a PDF document contains, and I think that we should ensure that the return value of `formInfo` only consists of "simple" data.
With these changes the `fieldObjects` getter instead has to look-up the /Fields manually, however that shouldn't be a problem since the access is guarded by a `formInfo.hasFields` check which ensures that the data both exists and is valid. Furthermore, most documents doesn't even have any /AcroForm data anyway.
- Determine the `hasFields` property *first*, to ensure that it's always correct even if there's errors when checking e.g. the /XFA or /SigFlags entires, since the `fieldObjects` getter depends on it.
- Simplify a loop in `fieldObjects`, since the object being accessed is a `Map` and those have built-in iteration support.
- Use a higher logging level for errors in the `formInfo` getter, and include the actual error message, since that'd have helped with fixing PR 12479 a lot quicker.
- Update the JSDoc comment in `src/display/api.js` to list the return values correctly, and also slightly extend/improve the description.
2020-10-16 19:20:44 +09:00
|
|
|
const fields = acroForm.get("Fields");
|
|
|
|
const hasFields = Array.isArray(fields) && fields.length > 0;
|
|
|
|
formInfo.hasFields = hasFields; // Used by the `fieldObjects` getter.
|
|
|
|
|
2020-08-23 21:04:49 +09:00
|
|
|
// The document contains XFA data if the `XFA` entry is a non-empty
|
|
|
|
// array or stream.
|
|
|
|
const xfa = acroForm.get("XFA");
|
Don't store complex data in `PDFDocument.formInfo`, and replace the `fields` object with a `hasFields` boolean instead
*This patch is based on a couple of smaller things that I noticed when working on PR 12479.*
- Don't store the /Fields on the `formInfo` getter, since that feels like overloading it with unintended (and too complex) data, and utilize a `hasFields` boolean instead.
This functionality was originally added in PR 12271, to help determine what kind of form data a PDF document contains, and I think that we should ensure that the return value of `formInfo` only consists of "simple" data.
With these changes the `fieldObjects` getter instead has to look-up the /Fields manually, however that shouldn't be a problem since the access is guarded by a `formInfo.hasFields` check which ensures that the data both exists and is valid. Furthermore, most documents doesn't even have any /AcroForm data anyway.
- Determine the `hasFields` property *first*, to ensure that it's always correct even if there's errors when checking e.g. the /XFA or /SigFlags entires, since the `fieldObjects` getter depends on it.
- Simplify a loop in `fieldObjects`, since the object being accessed is a `Map` and those have built-in iteration support.
- Use a higher logging level for errors in the `formInfo` getter, and include the actual error message, since that'd have helped with fixing PR 12479 a lot quicker.
- Update the JSDoc comment in `src/display/api.js` to list the return values correctly, and also slightly extend/improve the description.
2020-10-16 19:20:44 +09:00
|
|
|
formInfo.hasXfa =
|
2020-08-23 21:04:49 +09:00
|
|
|
(Array.isArray(xfa) && xfa.length > 0) ||
|
2022-02-17 21:45:42 +09:00
|
|
|
(xfa instanceof BaseStream && !xfa.isEmpty);
|
2020-08-23 21:04:49 +09:00
|
|
|
|
|
|
|
// The document contains AcroForm data if the `Fields` entry is a
|
|
|
|
// non-empty array and it doesn't consist of only document signatures.
|
|
|
|
// This second check is required for files that don't actually contain
|
|
|
|
// AcroForm data (only XFA data), but that use the `Fields` entry to
|
|
|
|
// store (invisible) document signatures. This can be detected using
|
|
|
|
// the first bit of the `SigFlags` integer (see Table 219 in the
|
|
|
|
// specification).
|
|
|
|
const sigFlags = acroForm.get("SigFlags");
|
2021-04-10 23:53:17 +09:00
|
|
|
const hasSignatures = !!(sigFlags & 0x1);
|
2020-08-23 21:04:49 +09:00
|
|
|
const hasOnlyDocumentSignatures =
|
2021-04-10 23:53:17 +09:00
|
|
|
hasSignatures && this._hasOnlyDocumentSignatures(fields);
|
2020-08-23 21:04:49 +09:00
|
|
|
formInfo.hasAcroForm = hasFields && !hasOnlyDocumentSignatures;
|
2021-04-10 23:53:17 +09:00
|
|
|
formInfo.hasSignatures = hasSignatures;
|
2020-08-23 21:04:49 +09:00
|
|
|
} catch (ex) {
|
|
|
|
if (ex instanceof MissingDataException) {
|
|
|
|
throw ex;
|
|
|
|
}
|
Don't store complex data in `PDFDocument.formInfo`, and replace the `fields` object with a `hasFields` boolean instead
*This patch is based on a couple of smaller things that I noticed when working on PR 12479.*
- Don't store the /Fields on the `formInfo` getter, since that feels like overloading it with unintended (and too complex) data, and utilize a `hasFields` boolean instead.
This functionality was originally added in PR 12271, to help determine what kind of form data a PDF document contains, and I think that we should ensure that the return value of `formInfo` only consists of "simple" data.
With these changes the `fieldObjects` getter instead has to look-up the /Fields manually, however that shouldn't be a problem since the access is guarded by a `formInfo.hasFields` check which ensures that the data both exists and is valid. Furthermore, most documents doesn't even have any /AcroForm data anyway.
- Determine the `hasFields` property *first*, to ensure that it's always correct even if there's errors when checking e.g. the /XFA or /SigFlags entires, since the `fieldObjects` getter depends on it.
- Simplify a loop in `fieldObjects`, since the object being accessed is a `Map` and those have built-in iteration support.
- Use a higher logging level for errors in the `formInfo` getter, and include the actual error message, since that'd have helped with fixing PR 12479 a lot quicker.
- Update the JSDoc comment in `src/display/api.js` to list the return values correctly, and also slightly extend/improve the description.
2020-10-16 19:20:44 +09:00
|
|
|
warn(`Cannot fetch form information: "${ex}".`);
|
2020-08-23 21:04:49 +09:00
|
|
|
}
|
|
|
|
return shadow(this, "formInfo", formInfo);
|
|
|
|
}
|
|
|
|
|
2018-12-30 00:18:36 +09:00
|
|
|
get documentInfo() {
|
|
|
|
const docInfo = {
|
2022-10-15 18:55:37 +09:00
|
|
|
PDFFormatVersion: this.version,
|
2021-10-11 22:55:16 +09:00
|
|
|
Language: this.catalog.lang,
|
2021-10-25 23:09:26 +09:00
|
|
|
EncryptFilterName: this.xref.encrypt
|
|
|
|
? this.xref.encrypt.filterName
|
|
|
|
: null,
|
2018-12-30 00:18:36 +09:00
|
|
|
IsLinearized: !!this.linearization,
|
2020-08-23 21:04:49 +09:00
|
|
|
IsAcroFormPresent: this.formInfo.hasAcroForm,
|
|
|
|
IsXFAPresent: this.formInfo.hasXfa,
|
2020-08-23 05:47:15 +09:00
|
|
|
IsCollectionPresent: !!this.catalog.collection,
|
2021-04-10 23:53:17 +09:00
|
|
|
IsSignaturesPresent: this.formInfo.hasSignatures,
|
2018-12-30 00:18:36 +09:00
|
|
|
};
|
|
|
|
|
|
|
|
let infoDict;
|
|
|
|
try {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
infoDict = this.xref.trailer.get("Info");
|
2018-12-30 00:18:36 +09:00
|
|
|
} catch (err) {
|
|
|
|
if (err instanceof MissingDataException) {
|
|
|
|
throw err;
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
info("The document information dictionary is invalid.");
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
2022-02-24 00:57:03 +09:00
|
|
|
if (!(infoDict instanceof Dict)) {
|
|
|
|
return shadow(this, "documentInfo", docInfo);
|
|
|
|
}
|
2013-02-07 08:19:29 +09:00
|
|
|
|
2022-02-24 00:57:03 +09:00
|
|
|
for (const key of infoDict.getKeys()) {
|
|
|
|
const value = infoDict.get(key);
|
|
|
|
|
|
|
|
switch (key) {
|
|
|
|
case "Title":
|
|
|
|
case "Author":
|
|
|
|
case "Subject":
|
|
|
|
case "Keywords":
|
|
|
|
case "Creator":
|
|
|
|
case "Producer":
|
|
|
|
case "CreationDate":
|
|
|
|
case "ModDate":
|
|
|
|
if (typeof value === "string") {
|
|
|
|
docInfo[key] = stringToPDFString(value);
|
|
|
|
continue;
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
2022-02-24 00:57:03 +09:00
|
|
|
break;
|
|
|
|
case "Trapped":
|
|
|
|
if (value instanceof Name) {
|
|
|
|
docInfo[key] = value;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
default:
|
2018-12-30 00:18:36 +09:00
|
|
|
// For custom values, only accept white-listed types to prevent
|
|
|
|
// errors that would occur when trying to send non-serializable
|
|
|
|
// objects to the main-thread (for example `Dict` or `Stream`).
|
|
|
|
let customValue;
|
2022-02-24 00:57:03 +09:00
|
|
|
switch (typeof value) {
|
|
|
|
case "string":
|
|
|
|
customValue = stringToPDFString(value);
|
|
|
|
break;
|
|
|
|
case "number":
|
|
|
|
case "boolean":
|
|
|
|
customValue = value;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
if (value instanceof Name) {
|
|
|
|
customValue = value;
|
|
|
|
}
|
|
|
|
break;
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
2018-07-26 03:50:25 +09:00
|
|
|
|
2022-02-24 00:57:03 +09:00
|
|
|
if (customValue === undefined) {
|
|
|
|
warn(`Bad value, for custom key "${key}", in Info: ${value}.`);
|
|
|
|
continue;
|
|
|
|
}
|
2020-04-17 19:06:27 +09:00
|
|
|
if (!docInfo.Custom) {
|
|
|
|
docInfo.Custom = Object.create(null);
|
2018-07-26 03:50:25 +09:00
|
|
|
}
|
2020-04-17 19:06:27 +09:00
|
|
|
docInfo.Custom[key] = customValue;
|
2022-02-24 00:57:03 +09:00
|
|
|
continue;
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
2022-02-24 00:57:03 +09:00
|
|
|
warn(`Bad value, for key "${key}", in Info: ${value}.`);
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
return shadow(this, "documentInfo", docInfo);
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
2018-07-26 03:50:25 +09:00
|
|
|
|
2021-07-02 23:36:27 +09:00
|
|
|
get fingerprints() {
|
|
|
|
function validate(data) {
|
|
|
|
return (
|
|
|
|
typeof data === "string" &&
|
|
|
|
data.length > 0 &&
|
|
|
|
data !== EMPTY_FINGERPRINT
|
|
|
|
);
|
|
|
|
}
|
|
|
|
|
|
|
|
function hexString(hash) {
|
|
|
|
const buf = [];
|
2022-10-03 19:33:49 +09:00
|
|
|
for (const num of hash) {
|
|
|
|
const hex = num.toString(16);
|
2021-07-02 23:36:27 +09:00
|
|
|
buf.push(hex.padStart(2, "0"));
|
|
|
|
}
|
|
|
|
return buf.join("");
|
|
|
|
}
|
|
|
|
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
const idArray = this.xref.trailer.get("ID");
|
2021-07-02 23:36:27 +09:00
|
|
|
let hashOriginal, hashModified;
|
|
|
|
if (Array.isArray(idArray) && validate(idArray[0])) {
|
|
|
|
hashOriginal = stringToBytes(idArray[0]);
|
|
|
|
|
|
|
|
if (idArray[1] !== idArray[0] && validate(idArray[1])) {
|
|
|
|
hashModified = stringToBytes(idArray[1]);
|
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
} else {
|
2021-07-02 23:36:27 +09:00
|
|
|
hashOriginal = calculateMD5(
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
this.stream.getByteRange(0, FINGERPRINT_FIRST_BYTES),
|
|
|
|
0,
|
|
|
|
FINGERPRINT_FIRST_BYTES
|
|
|
|
);
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
|
|
|
|
2021-07-02 23:36:27 +09:00
|
|
|
return shadow(this, "fingerprints", [
|
|
|
|
hexString(hashOriginal),
|
|
|
|
hashModified ? hexString(hashModified) : null,
|
|
|
|
]);
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
|
|
|
|
2021-11-27 03:57:47 +09:00
|
|
|
async _getLinearizationPage(pageIndex) {
|
2021-12-29 21:13:42 +09:00
|
|
|
const { catalog, linearization, xref } = this;
|
2023-03-18 20:09:25 +09:00
|
|
|
if (typeof PDFJSDev === "undefined" || PDFJSDev.test("TESTING")) {
|
2020-05-05 19:40:01 +09:00
|
|
|
assert(
|
2023-05-15 19:38:28 +09:00
|
|
|
linearization?.pageFirst === pageIndex,
|
2020-05-05 19:40:01 +09:00
|
|
|
"_getLinearizationPage - invalid pageIndex argument."
|
|
|
|
);
|
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
|
2019-05-26 00:40:14 +09:00
|
|
|
const ref = Ref.get(linearization.objectNumberFirst, 0);
|
2021-11-27 03:57:47 +09:00
|
|
|
try {
|
2021-12-29 21:13:42 +09:00
|
|
|
const obj = await xref.fetchAsync(ref);
|
2021-11-27 03:57:47 +09:00
|
|
|
// Ensure that the object that was found is actually a Page dictionary.
|
2021-12-29 21:13:42 +09:00
|
|
|
if (obj instanceof Dict) {
|
|
|
|
let type = obj.getRaw("Type");
|
|
|
|
if (type instanceof Ref) {
|
|
|
|
type = await xref.fetchAsync(type);
|
|
|
|
}
|
|
|
|
if (isName(type, "Page") || (!obj.has("Type") && !obj.has("Kids"))) {
|
2021-12-29 23:57:34 +09:00
|
|
|
if (!catalog.pageKidsCountCache.has(ref)) {
|
2021-12-29 21:13:42 +09:00
|
|
|
catalog.pageKidsCountCache.put(ref, 1); // Cache the Page reference.
|
|
|
|
}
|
2021-12-29 23:57:34 +09:00
|
|
|
// Help improve performance of the `Catalog.getPageIndex` method.
|
|
|
|
if (!catalog.pageIndexCache.has(ref)) {
|
|
|
|
catalog.pageIndexCache.put(ref, 0);
|
|
|
|
}
|
|
|
|
|
2021-12-29 21:13:42 +09:00
|
|
|
return [obj, ref];
|
Check that the first page can be successfully loaded, to try and ascertain the validity of the XRef table (issue 7496, issue 10326)
For PDF documents with sufficiently broken XRef tables, it's usually quite obvious when you need to fallback to indexing the entire file. However, for certain kinds of corrupted PDF documents the XRef table will, for all intents and purposes, appear to be valid. It's not until you actually try to fetch various objects that things will start to break, which is the case in the referenced issues[1].
Since there's generally a real effort being in made PDF.js to load even corrupt PDF documents, this patch contains a suggested approach to attempt to do a bit more validation of the XRef table during the initial document loading phase.
Here the choice is made to attempt to load the *first* page, as a basic sanity check of the validity of the XRef table. Please note that attempting to load a more-or-less arbitrarily chosen object without any context of what it's supposed to be isn't a very useful, which is why this particular choice was made.
Obviously, just because the first page can be loaded successfully that doesn't guarantee that the *entire* XRef table is valid, however if even the first page fails to load you can be reasonably sure that the document is *not* valid[2].
Even though this patch won't cause any significant increase in the amount of parsing required during initial loading of the document[3], it will require loading of more data upfront which thus delays the initial `getDocument` call.
Whether or not this is a problem depends very much on what you actually measure, please consider the following examples:
```javascript
console.time('first');
getDocument(...).promise.then((pdfDocument) => {
console.timeEnd('first');
});
console.time('second');
getDocument(...).promise.then((pdfDocument) => {
pdfDocument.getPage(1).then((pdfPage) => { // Note: the API uses `pageNumber >= 1`, the Worker uses `pageIndex >= 0`.
console.timeEnd('second');
});
});
```
The first case is pretty much guaranteed to show a small regression, however the second case won't be affected at all since the Worker caches the result of `getPage` calls. Again, please remember that the second case is what matters for the standard PDF.js use-case which is why I'm hoping that this patch is deemed acceptable.
---
[1] In issue 7496, the problem is that the document is edited without the XRef table being correctly updated.
In issue 10326, the generator was sorting the XRef table according to the offsets rather than the objects.
[2] The idea of checking the first page in particular came from the "standard" use-case for the PDF.js library, i.e. the default viewer, where a failure to load the first page basically means that nothing will work; note how `{BaseViewer, PDFThumbnailViewer}.setDocument` depends completely on being able to fetch the *first* page.
[3] The only extra parsing is caused by, potentially, having to traverse *part* of the `Pages` tree to find the first page.
2018-12-05 05:51:27 +09:00
|
|
|
}
|
2021-11-27 03:57:47 +09:00
|
|
|
}
|
|
|
|
throw new FormatError(
|
|
|
|
"The Linearization dictionary doesn't point to a valid Page dictionary."
|
|
|
|
);
|
|
|
|
} catch (reason) {
|
2021-12-29 21:13:42 +09:00
|
|
|
warn(`_getLinearizationPage: "${reason.message}".`);
|
2021-11-27 03:57:47 +09:00
|
|
|
return catalog.getPageDict(pageIndex);
|
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
getPage(pageIndex) {
|
2021-11-27 03:47:13 +09:00
|
|
|
const cachedPromise = this._pagePromises.get(pageIndex);
|
|
|
|
if (cachedPromise) {
|
|
|
|
return cachedPromise;
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
2021-11-27 03:47:13 +09:00
|
|
|
const { catalog, linearization, xfaFactory } = this;
|
2018-12-30 00:18:36 +09:00
|
|
|
|
2021-11-27 03:47:13 +09:00
|
|
|
let promise;
|
|
|
|
if (xfaFactory) {
|
|
|
|
promise = Promise.resolve([Dict.empty, null]);
|
2023-05-15 19:38:28 +09:00
|
|
|
} else if (linearization?.pageFirst === pageIndex) {
|
2021-11-27 03:47:13 +09:00
|
|
|
promise = this._getLinearizationPage(pageIndex);
|
|
|
|
} else {
|
|
|
|
promise = catalog.getPageDict(pageIndex);
|
2021-03-19 18:11:40 +09:00
|
|
|
}
|
2021-11-27 03:47:13 +09:00
|
|
|
promise = promise.then(([pageDict, ref]) => {
|
2018-12-30 00:18:36 +09:00
|
|
|
return new Page({
|
|
|
|
pdfManager: this.pdfManager,
|
|
|
|
xref: this.xref,
|
|
|
|
pageIndex,
|
|
|
|
pageDict,
|
|
|
|
ref,
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
globalIdFactory: this._globalIdFactory,
|
2018-12-30 00:18:36 +09:00
|
|
|
fontCache: catalog.fontCache,
|
|
|
|
builtInCMapCache: catalog.builtInCMapCache,
|
2021-06-08 20:58:52 +09:00
|
|
|
standardFontDataCache: catalog.standardFontDataCache,
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
globalImageCache: catalog.globalImageCache,
|
2023-05-10 22:31:07 +09:00
|
|
|
systemFontCache: catalog.systemFontCache,
|
Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in `PartialEvaluator.hasBlendModes`
The `PartialEvaluator.hasBlendModes` method is necessary to determine if there's any blend modes on a page, which unfortunately requires *synchronous* parsing of the /Resources of each page before its rendering can start (see the "StartRenderPage"-message).
In practice it's not uncommon for certain /Resources-entries to be found on more than one page (referenced via the XRef-table), which thus leads to unnecessary re-fetching/re-parsing of data in `PartialEvaluator.hasBlendModes`.
To improve performance, especially in pathological cases, we can cache /Resources-entries when it's absolutely clear that they do not contain *any* blend modes at all[1]. This way, subsequent `PartialEvaluator.hasBlendModes` calls can be made significantly more efficient.
This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf:
```
[
{ "id": "issue6961",
"file": "../web/pdfs/issue6961.pdf",
"md5": "a80e4357a8fda758d96c2c76f2980b03",
"rounds": 100,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, page, stat --
browser | page | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ---- | ------------ | ----- | ------------ | ----------- | ---- | ------ | -------------
firefox | 0 | Overall | 100 | 1034 | 555 | -480 | -46.39 | faster
firefox | 0 | Page Request | 100 | 489 | 7 | -482 | -98.67 | faster
firefox | 0 | Rendering | 100 | 545 | 548 | 2 | 0.45 |
firefox | 1 | Overall | 100 | 912 | 428 | -484 | -53.06 | faster
firefox | 1 | Page Request | 100 | 487 | 1 | -486 | -99.77 | faster
firefox | 1 | Rendering | 100 | 425 | 427 | 2 | 0.51 |
```
---
[1] In the case where blend modes *are* found, it becomes a lot more difficult to know if it's generally safe to skip /Resources-entries. Hence we don't cache anything in that case, however note that most document/pages do not utilize blend modes anyway.
2020-11-05 21:35:33 +09:00
|
|
|
nonBlendModesSet: catalog.nonBlendModesSet,
|
2021-11-27 03:47:13 +09:00
|
|
|
xfaFactory,
|
Check that the first page can be successfully loaded, to try and ascertain the validity of the XRef table (issue 7496, issue 10326)
For PDF documents with sufficiently broken XRef tables, it's usually quite obvious when you need to fallback to indexing the entire file. However, for certain kinds of corrupted PDF documents the XRef table will, for all intents and purposes, appear to be valid. It's not until you actually try to fetch various objects that things will start to break, which is the case in the referenced issues[1].
Since there's generally a real effort being in made PDF.js to load even corrupt PDF documents, this patch contains a suggested approach to attempt to do a bit more validation of the XRef table during the initial document loading phase.
Here the choice is made to attempt to load the *first* page, as a basic sanity check of the validity of the XRef table. Please note that attempting to load a more-or-less arbitrarily chosen object without any context of what it's supposed to be isn't a very useful, which is why this particular choice was made.
Obviously, just because the first page can be loaded successfully that doesn't guarantee that the *entire* XRef table is valid, however if even the first page fails to load you can be reasonably sure that the document is *not* valid[2].
Even though this patch won't cause any significant increase in the amount of parsing required during initial loading of the document[3], it will require loading of more data upfront which thus delays the initial `getDocument` call.
Whether or not this is a problem depends very much on what you actually measure, please consider the following examples:
```javascript
console.time('first');
getDocument(...).promise.then((pdfDocument) => {
console.timeEnd('first');
});
console.time('second');
getDocument(...).promise.then((pdfDocument) => {
pdfDocument.getPage(1).then((pdfPage) => { // Note: the API uses `pageNumber >= 1`, the Worker uses `pageIndex >= 0`.
console.timeEnd('second');
});
});
```
The first case is pretty much guaranteed to show a small regression, however the second case won't be affected at all since the Worker caches the result of `getPage` calls. Again, please remember that the second case is what matters for the standard PDF.js use-case which is why I'm hoping that this patch is deemed acceptable.
---
[1] In issue 7496, the problem is that the document is edited without the XRef table being correctly updated.
In issue 10326, the generator was sorting the XRef table according to the offsets rather than the objects.
[2] The idea of checking the first page in particular came from the "standard" use-case for the PDF.js library, i.e. the default viewer, where a failure to load the first page basically means that nothing will work; note how `{BaseViewer, PDFThumbnailViewer}.setDocument` depends completely on being able to fetch the *first* page.
[3] The only extra parsing is caused by, potentially, having to traverse *part* of the `Pages` tree to find the first page.
2018-12-05 05:51:27 +09:00
|
|
|
});
|
2021-11-27 03:47:13 +09:00
|
|
|
});
|
|
|
|
|
|
|
|
this._pagePromises.set(pageIndex, promise);
|
|
|
|
return promise;
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
Check that the first page can be successfully loaded, to try and ascertain the validity of the XRef table (issue 7496, issue 10326)
For PDF documents with sufficiently broken XRef tables, it's usually quite obvious when you need to fallback to indexing the entire file. However, for certain kinds of corrupted PDF documents the XRef table will, for all intents and purposes, appear to be valid. It's not until you actually try to fetch various objects that things will start to break, which is the case in the referenced issues[1].
Since there's generally a real effort being in made PDF.js to load even corrupt PDF documents, this patch contains a suggested approach to attempt to do a bit more validation of the XRef table during the initial document loading phase.
Here the choice is made to attempt to load the *first* page, as a basic sanity check of the validity of the XRef table. Please note that attempting to load a more-or-less arbitrarily chosen object without any context of what it's supposed to be isn't a very useful, which is why this particular choice was made.
Obviously, just because the first page can be loaded successfully that doesn't guarantee that the *entire* XRef table is valid, however if even the first page fails to load you can be reasonably sure that the document is *not* valid[2].
Even though this patch won't cause any significant increase in the amount of parsing required during initial loading of the document[3], it will require loading of more data upfront which thus delays the initial `getDocument` call.
Whether or not this is a problem depends very much on what you actually measure, please consider the following examples:
```javascript
console.time('first');
getDocument(...).promise.then((pdfDocument) => {
console.timeEnd('first');
});
console.time('second');
getDocument(...).promise.then((pdfDocument) => {
pdfDocument.getPage(1).then((pdfPage) => { // Note: the API uses `pageNumber >= 1`, the Worker uses `pageIndex >= 0`.
console.timeEnd('second');
});
});
```
The first case is pretty much guaranteed to show a small regression, however the second case won't be affected at all since the Worker caches the result of `getPage` calls. Again, please remember that the second case is what matters for the standard PDF.js use-case which is why I'm hoping that this patch is deemed acceptable.
---
[1] In issue 7496, the problem is that the document is edited without the XRef table being correctly updated.
In issue 10326, the generator was sorting the XRef table according to the offsets rather than the objects.
[2] The idea of checking the first page in particular came from the "standard" use-case for the PDF.js library, i.e. the default viewer, where a failure to load the first page basically means that nothing will work; note how `{BaseViewer, PDFThumbnailViewer}.setDocument` depends completely on being able to fetch the *first* page.
[3] The only extra parsing is caused by, potentially, having to traverse *part* of the `Pages` tree to find the first page.
2018-12-05 05:51:27 +09:00
|
|
|
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
async checkFirstPage(recoveryMode = false) {
|
|
|
|
if (recoveryMode) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
try {
|
|
|
|
await this.getPage(0);
|
|
|
|
} catch (reason) {
|
2018-12-30 00:18:36 +09:00
|
|
|
if (reason instanceof XRefEntryException) {
|
|
|
|
// Clear out the various caches to ensure that we haven't stored any
|
|
|
|
// inconsistent and/or incorrect state, since that could easily break
|
|
|
|
// subsequent `this.getPage` calls.
|
2021-12-10 19:45:09 +09:00
|
|
|
this._pagePromises.delete(0);
|
2019-12-07 20:20:26 +09:00
|
|
|
await this.cleanup();
|
2011-10-25 10:13:12 +09:00
|
|
|
|
2018-12-30 00:18:36 +09:00
|
|
|
throw new XRefParseException();
|
|
|
|
}
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
async checkLastPage(recoveryMode = false) {
|
2021-12-02 09:40:52 +09:00
|
|
|
const { catalog, pdfManager } = this;
|
|
|
|
|
|
|
|
catalog.setActualNumPages(); // Ensure that it's always reset.
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
let numPages;
|
|
|
|
|
|
|
|
try {
|
|
|
|
await Promise.all([
|
2021-12-02 09:40:52 +09:00
|
|
|
pdfManager.ensureDoc("xfaFactory"),
|
|
|
|
pdfManager.ensureDoc("linearization"),
|
|
|
|
pdfManager.ensureCatalog("numPages"),
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
]);
|
|
|
|
|
|
|
|
if (this.xfaFactory) {
|
|
|
|
return; // The Page count is always calculated for XFA-documents.
|
|
|
|
} else if (this.linearization) {
|
|
|
|
numPages = this.linearization.numPages;
|
|
|
|
} else {
|
2021-12-02 09:40:52 +09:00
|
|
|
numPages = catalog.numPages;
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
}
|
|
|
|
|
2021-12-02 09:40:52 +09:00
|
|
|
if (!Number.isInteger(numPages)) {
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
throw new FormatError("Page count is not an integer.");
|
2021-12-02 09:40:52 +09:00
|
|
|
} else if (numPages <= 1) {
|
|
|
|
return;
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
}
|
|
|
|
await this.getPage(numPages - 1);
|
|
|
|
} catch (reason) {
|
|
|
|
// Clear out the various caches to ensure that we haven't stored any
|
|
|
|
// inconsistent and/or incorrect state, since that could easily break
|
|
|
|
// subsequent `this.getPage` calls.
|
2021-12-10 19:45:09 +09:00
|
|
|
this._pagePromises.delete(numPages - 1);
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
await this.cleanup();
|
|
|
|
|
2021-12-10 19:45:09 +09:00
|
|
|
if (reason instanceof XRefEntryException && !recoveryMode) {
|
|
|
|
throw new XRefParseException();
|
|
|
|
}
|
|
|
|
warn(`checkLastPage - invalid /Pages tree /Count: ${numPages}.`);
|
|
|
|
|
2021-12-02 09:40:52 +09:00
|
|
|
let pagesTree;
|
|
|
|
try {
|
2021-12-31 22:57:01 +09:00
|
|
|
pagesTree = await catalog.getAllPageDicts(recoveryMode);
|
2021-12-02 09:40:52 +09:00
|
|
|
} catch (reasonAll) {
|
2021-12-10 19:45:09 +09:00
|
|
|
if (reasonAll instanceof XRefEntryException && !recoveryMode) {
|
|
|
|
throw new XRefParseException();
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
}
|
2021-12-02 09:40:52 +09:00
|
|
|
catalog.setActualNumPages(1);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (const [pageIndex, [pageDict, ref]] of pagesTree) {
|
|
|
|
let promise;
|
|
|
|
if (pageDict instanceof Error) {
|
|
|
|
promise = Promise.reject(pageDict);
|
|
|
|
|
|
|
|
// Prevent "uncaught exception: Object"-messages in the console.
|
|
|
|
promise.catch(() => {});
|
|
|
|
} else {
|
|
|
|
promise = Promise.resolve(
|
|
|
|
new Page({
|
|
|
|
pdfManager,
|
|
|
|
xref: this.xref,
|
|
|
|
pageIndex,
|
|
|
|
pageDict,
|
|
|
|
ref,
|
|
|
|
globalIdFactory: this._globalIdFactory,
|
|
|
|
fontCache: catalog.fontCache,
|
|
|
|
builtInCMapCache: catalog.builtInCMapCache,
|
|
|
|
standardFontDataCache: catalog.standardFontDataCache,
|
|
|
|
globalImageCache: catalog.globalImageCache,
|
2023-05-10 22:31:07 +09:00
|
|
|
systemFontCache: catalog.systemFontCache,
|
2021-12-02 09:40:52 +09:00
|
|
|
nonBlendModesSet: catalog.nonBlendModesSet,
|
|
|
|
xfaFactory: null,
|
|
|
|
})
|
|
|
|
);
|
|
|
|
}
|
|
|
|
|
|
|
|
this._pagePromises.set(pageIndex, promise);
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
}
|
2021-12-02 09:40:52 +09:00
|
|
|
catalog.setActualNumPages(pagesTree.size);
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
|
|
|
|
Fallback to the built-in font renderer when font loading fails
After PR 9340 all glyphs are now re-mapped to a Private Use Area (PUA) which means that if a font fails to load, for whatever reason[1], all glyphs in the font will now render as Unicode glyph outlines.
This obviously doesn't look good, to say the least, and might be seen as a "regression" since previously many glyphs were left in their original positions which provided a slightly better fallback[2].
Hence this patch, which implements a *general* fallback to the PDF.js built-in font renderer for fonts that fail to load (i.e. are rejected by the sanitizer). One caveat here is that this only works for the Font Loading API, since it's easy to handle errors in that case[3].
The solution implemented in this patch does *not* in any way delay the loading of valid fonts, which was the problem with my previous attempt at a solution, and will only require a bit of extra work/waiting for those fonts that actually fail to load.
*Please note:* This patch doesn't fix any of the underlying PDF.js font conversion bugs that's responsible for creating corrupt font files, however it does *improve* rendering in a number of cases; refer to this possibly incomplete list:
[Bug 1524888](https://bugzilla.mozilla.org/show_bug.cgi?id=1524888)
Issue 10175
Issue 10232
---
[1] Usually because the PDF.js font conversion code wasn't able to parse the font file correctly.
[2] Glyphs fell back to some default font, which while not accurate was more useful than the current state.
[3] Furthermore I'm not sure how to implement this generally, assuming that's even possible, and don't really have time/interest to look into it either.
2019-02-11 08:47:56 +09:00
|
|
|
fontFallback(id, handler) {
|
|
|
|
return this.catalog.fontFallback(id, handler);
|
|
|
|
}
|
|
|
|
|
2020-05-23 18:21:32 +09:00
|
|
|
async cleanup(manuallyTriggered = false) {
|
|
|
|
return this.catalog
|
|
|
|
? this.catalog.cleanup(manuallyTriggered)
|
2022-01-25 00:16:54 +09:00
|
|
|
: clearGlobalCaches();
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
2020-10-01 03:58:45 +09:00
|
|
|
|
2020-10-16 19:57:01 +09:00
|
|
|
/**
|
|
|
|
* @private
|
|
|
|
*/
|
2020-10-01 03:58:45 +09:00
|
|
|
_collectFieldObjects(name, fieldRef, promises) {
|
|
|
|
const field = this.xref.fetchIfRef(fieldRef);
|
|
|
|
if (field.has("T")) {
|
|
|
|
const partName = stringToPDFString(field.get("T"));
|
2023-07-27 16:18:26 +09:00
|
|
|
name = name === "" ? partName : `${name}.${partName}`;
|
2020-10-01 03:58:45 +09:00
|
|
|
}
|
|
|
|
|
2023-08-08 22:05:27 +09:00
|
|
|
if (!field.has("Kids") && field.has("T") && /\[\d+\]$/.test(name)) {
|
2023-06-17 01:45:09 +09:00
|
|
|
// We've a terminal node: strip the index.
|
|
|
|
name = name.substring(0, name.lastIndexOf("["));
|
|
|
|
}
|
|
|
|
|
2020-12-17 00:00:12 +09:00
|
|
|
if (!promises.has(name)) {
|
2020-10-01 03:58:45 +09:00
|
|
|
promises.set(name, []);
|
|
|
|
}
|
|
|
|
promises.get(name).push(
|
|
|
|
AnnotationFactory.create(
|
|
|
|
this.xref,
|
|
|
|
fieldRef,
|
|
|
|
this.pdfManager,
|
2021-03-31 00:50:35 +09:00
|
|
|
this._localIdFactory,
|
|
|
|
/* collectFields */ true
|
2020-10-01 03:58:45 +09:00
|
|
|
)
|
2023-05-15 19:38:28 +09:00
|
|
|
.then(annotation => annotation?.getFieldObject())
|
2020-10-01 03:58:45 +09:00
|
|
|
.catch(function (reason) {
|
|
|
|
warn(`_collectFieldObjects: "${reason}".`);
|
|
|
|
return null;
|
|
|
|
})
|
|
|
|
);
|
|
|
|
|
|
|
|
if (field.has("Kids")) {
|
|
|
|
const kids = field.get("Kids");
|
|
|
|
for (const kid of kids) {
|
|
|
|
this._collectFieldObjects(name, kid, promises);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
get fieldObjects() {
|
Don't store complex data in `PDFDocument.formInfo`, and replace the `fields` object with a `hasFields` boolean instead
*This patch is based on a couple of smaller things that I noticed when working on PR 12479.*
- Don't store the /Fields on the `formInfo` getter, since that feels like overloading it with unintended (and too complex) data, and utilize a `hasFields` boolean instead.
This functionality was originally added in PR 12271, to help determine what kind of form data a PDF document contains, and I think that we should ensure that the return value of `formInfo` only consists of "simple" data.
With these changes the `fieldObjects` getter instead has to look-up the /Fields manually, however that shouldn't be a problem since the access is guarded by a `formInfo.hasFields` check which ensures that the data both exists and is valid. Furthermore, most documents doesn't even have any /AcroForm data anyway.
- Determine the `hasFields` property *first*, to ensure that it's always correct even if there's errors when checking e.g. the /XFA or /SigFlags entires, since the `fieldObjects` getter depends on it.
- Simplify a loop in `fieldObjects`, since the object being accessed is a `Map` and those have built-in iteration support.
- Use a higher logging level for errors in the `formInfo` getter, and include the actual error message, since that'd have helped with fixing PR 12479 a lot quicker.
- Update the JSDoc comment in `src/display/api.js` to list the return values correctly, and also slightly extend/improve the description.
2020-10-16 19:20:44 +09:00
|
|
|
if (!this.formInfo.hasFields) {
|
2020-10-01 03:58:45 +09:00
|
|
|
return shadow(this, "fieldObjects", Promise.resolve(null));
|
|
|
|
}
|
|
|
|
|
|
|
|
const allFields = Object.create(null);
|
|
|
|
const fieldPromises = new Map();
|
Don't store complex data in `PDFDocument.formInfo`, and replace the `fields` object with a `hasFields` boolean instead
*This patch is based on a couple of smaller things that I noticed when working on PR 12479.*
- Don't store the /Fields on the `formInfo` getter, since that feels like overloading it with unintended (and too complex) data, and utilize a `hasFields` boolean instead.
This functionality was originally added in PR 12271, to help determine what kind of form data a PDF document contains, and I think that we should ensure that the return value of `formInfo` only consists of "simple" data.
With these changes the `fieldObjects` getter instead has to look-up the /Fields manually, however that shouldn't be a problem since the access is guarded by a `formInfo.hasFields` check which ensures that the data both exists and is valid. Furthermore, most documents doesn't even have any /AcroForm data anyway.
- Determine the `hasFields` property *first*, to ensure that it's always correct even if there's errors when checking e.g. the /XFA or /SigFlags entires, since the `fieldObjects` getter depends on it.
- Simplify a loop in `fieldObjects`, since the object being accessed is a `Map` and those have built-in iteration support.
- Use a higher logging level for errors in the `formInfo` getter, and include the actual error message, since that'd have helped with fixing PR 12479 a lot quicker.
- Update the JSDoc comment in `src/display/api.js` to list the return values correctly, and also slightly extend/improve the description.
2020-10-16 19:20:44 +09:00
|
|
|
for (const fieldRef of this.catalog.acroForm.get("Fields")) {
|
2020-10-01 03:58:45 +09:00
|
|
|
this._collectFieldObjects("", fieldRef, fieldPromises);
|
|
|
|
}
|
|
|
|
|
|
|
|
const allPromises = [];
|
Don't store complex data in `PDFDocument.formInfo`, and replace the `fields` object with a `hasFields` boolean instead
*This patch is based on a couple of smaller things that I noticed when working on PR 12479.*
- Don't store the /Fields on the `formInfo` getter, since that feels like overloading it with unintended (and too complex) data, and utilize a `hasFields` boolean instead.
This functionality was originally added in PR 12271, to help determine what kind of form data a PDF document contains, and I think that we should ensure that the return value of `formInfo` only consists of "simple" data.
With these changes the `fieldObjects` getter instead has to look-up the /Fields manually, however that shouldn't be a problem since the access is guarded by a `formInfo.hasFields` check which ensures that the data both exists and is valid. Furthermore, most documents doesn't even have any /AcroForm data anyway.
- Determine the `hasFields` property *first*, to ensure that it's always correct even if there's errors when checking e.g. the /XFA or /SigFlags entires, since the `fieldObjects` getter depends on it.
- Simplify a loop in `fieldObjects`, since the object being accessed is a `Map` and those have built-in iteration support.
- Use a higher logging level for errors in the `formInfo` getter, and include the actual error message, since that'd have helped with fixing PR 12479 a lot quicker.
- Update the JSDoc comment in `src/display/api.js` to list the return values correctly, and also slightly extend/improve the description.
2020-10-16 19:20:44 +09:00
|
|
|
for (const [name, promises] of fieldPromises) {
|
2020-10-01 03:58:45 +09:00
|
|
|
allPromises.push(
|
|
|
|
Promise.all(promises).then(fields => {
|
2020-10-22 20:24:43 +09:00
|
|
|
fields = fields.filter(field => !!field);
|
2020-10-01 03:58:45 +09:00
|
|
|
if (fields.length > 0) {
|
|
|
|
allFields[name] = fields;
|
|
|
|
}
|
|
|
|
})
|
|
|
|
);
|
|
|
|
}
|
|
|
|
|
|
|
|
return shadow(
|
|
|
|
this,
|
|
|
|
"fieldObjects",
|
|
|
|
Promise.all(allPromises).then(() => allFields)
|
|
|
|
);
|
|
|
|
}
|
2020-10-17 00:15:58 +09:00
|
|
|
|
2020-10-29 03:16:56 +09:00
|
|
|
get hasJSActions() {
|
A couple of small scripting/XFA-related tweaks in the worker-code
- Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible.
- Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.)
- Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be.
Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.)
- Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.
2021-04-17 17:13:42 +09:00
|
|
|
const promise = this.pdfManager.ensureDoc("_parseHasJSActions");
|
2021-04-13 19:30:20 +09:00
|
|
|
return shadow(this, "hasJSActions", promise);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* @private
|
|
|
|
*/
|
|
|
|
async _parseHasJSActions() {
|
|
|
|
const [catalogJsActions, fieldObjects] = await Promise.all([
|
|
|
|
this.pdfManager.ensureCatalog("jsActions"),
|
A couple of small scripting/XFA-related tweaks in the worker-code
- Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible.
- Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.)
- Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be.
Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.)
- Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.
2021-04-17 17:13:42 +09:00
|
|
|
this.pdfManager.ensureDoc("fieldObjects"),
|
2021-04-13 19:30:20 +09:00
|
|
|
]);
|
|
|
|
|
|
|
|
if (catalogJsActions) {
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
if (fieldObjects) {
|
|
|
|
return Object.values(fieldObjects).some(fieldObject =>
|
|
|
|
fieldObject.some(object => object.actions !== null)
|
|
|
|
);
|
|
|
|
}
|
|
|
|
return false;
|
2020-10-29 03:16:56 +09:00
|
|
|
}
|
|
|
|
|
2020-10-17 00:15:58 +09:00
|
|
|
get calculationOrderIds() {
|
|
|
|
const acroForm = this.catalog.acroForm;
|
2023-05-19 01:52:54 +09:00
|
|
|
if (!acroForm?.has("CO")) {
|
2020-10-17 00:15:58 +09:00
|
|
|
return shadow(this, "calculationOrderIds", null);
|
|
|
|
}
|
|
|
|
|
|
|
|
const calculationOrder = acroForm.get("CO");
|
|
|
|
if (!Array.isArray(calculationOrder) || calculationOrder.length === 0) {
|
|
|
|
return shadow(this, "calculationOrderIds", null);
|
|
|
|
}
|
|
|
|
|
2022-02-18 20:11:45 +09:00
|
|
|
const ids = [];
|
|
|
|
for (const id of calculationOrder) {
|
|
|
|
if (id instanceof Ref) {
|
|
|
|
ids.push(id.toString());
|
|
|
|
}
|
|
|
|
}
|
2020-12-11 00:02:11 +09:00
|
|
|
if (ids.length === 0) {
|
|
|
|
return shadow(this, "calculationOrderIds", null);
|
|
|
|
}
|
2020-10-17 00:15:58 +09:00
|
|
|
return shadow(this, "calculationOrderIds", ids);
|
|
|
|
}
|
2018-12-30 00:18:36 +09:00
|
|
|
}
|
2015-11-22 01:32:47 +09:00
|
|
|
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
export { Page, PDFDocument };
|