2019-02-24 00:14:31 +09:00
|
|
|
/* Copyright 2019 Mozilla Foundation
|
|
|
|
*
|
|
|
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
|
* you may not use this file except in compliance with the License.
|
|
|
|
* You may obtain a copy of the License at
|
|
|
|
*
|
|
|
|
* http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
*
|
|
|
|
* Unless required by applicable law or agreed to in writing, software
|
|
|
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
* See the License for the specific language governing permissions and
|
|
|
|
* limitations under the License.
|
|
|
|
*/
|
|
|
|
|
2020-12-08 03:22:14 +09:00
|
|
|
import {
|
|
|
|
assert,
|
|
|
|
BaseException,
|
[api-minor] Replace `PDFDocumentProxy.getStats` with a synchronous `PDFDocumentProxy.stats` getter
*Please note:* These changes will primarily benefit longer documents, somewhat at the expense of e.g. one-page documents.
The existing `PDFDocumentProxy.getStats` function, which in the default viewer is called for each rendered page, requires a round-trip to the worker-thread in order to obtain the current document stats. In the default viewer, we currently make one such API-call for *every rendered* page.
This patch proposes replacing that method with a *synchronous* `PDFDocumentProxy.stats` getter instead, combined with re-factoring the worker-thread code by adding a `DocStats`-class to track Stream/Font-types and *only send* them to the main-thread *the first time* that a type is encountered.
Note that in practice most PDF documents only use a fairly limited number of Stream/Font-types, which means that in longer documents most of the `PDFDocumentProxy.getStats`-calls will return the same data.[1]
This re-factoring will obviously benefit longer document the most[2], and could actually be seen as a regression for one-page documents, since in practice there'll usually be a couple of "DocStats" messages sent during the parsing of the first page. However, if the user zooms/rotates the document (which causes re-rendering), note that even a one-page document would start to benefit from these changes.
Another benefit of having the data available/cached in the API is that unless the document stats change during parsing, repeated `PDFDocumentProxy.stats`-calls will return *the same identical* object.
This is something that we can easily take advantage of in the default viewer, by now *only* reporting "documentStats" telemetry[3] when the data actually have changed rather than once per rendered page (again beneficial in longer documents).
---
[1] Furthermore, the maximium number of `StreamType`/`FontType` are `10` respectively `12`, which means that regardless of the complexity and page count in a PDF document there'll never be more than twenty-two "DocStats" messages sent; see https://github.com/mozilla/pdf.js/blob/41ac3f0c07128bf34baccdcc067a108c712fd6ef/src/shared/util.js#L206-L232
[2] One example is the `pdf.pdf` document in the test-suite, where rendering all of its 1310 pages only result in a total of seven "DocStats" messages being sent from the worker-thread.
[3] Reporting telemetry, in Firefox, includes using `JSON.stringify` on the data and then sending an event to the `PdfStreamConverter.jsm`-code.
In that code the event is handled and `JSON.parse` is used to retrieve the data, and in the "documentStats"-case we'll then iterate through the data to avoid double-reporting telemetry; see https://searchfox.org/mozilla-central/rev/8f4c180b87e52f3345ef8a3432d6e54bd1eb18dc/toolkit/components/pdfjs/content/PdfStreamConverter.jsm#515-549
2021-11-12 02:14:26 +09:00
|
|
|
FontType,
|
2020-12-08 03:22:14 +09:00
|
|
|
objectSize,
|
[api-minor] Replace `PDFDocumentProxy.getStats` with a synchronous `PDFDocumentProxy.stats` getter
*Please note:* These changes will primarily benefit longer documents, somewhat at the expense of e.g. one-page documents.
The existing `PDFDocumentProxy.getStats` function, which in the default viewer is called for each rendered page, requires a round-trip to the worker-thread in order to obtain the current document stats. In the default viewer, we currently make one such API-call for *every rendered* page.
This patch proposes replacing that method with a *synchronous* `PDFDocumentProxy.stats` getter instead, combined with re-factoring the worker-thread code by adding a `DocStats`-class to track Stream/Font-types and *only send* them to the main-thread *the first time* that a type is encountered.
Note that in practice most PDF documents only use a fairly limited number of Stream/Font-types, which means that in longer documents most of the `PDFDocumentProxy.getStats`-calls will return the same data.[1]
This re-factoring will obviously benefit longer document the most[2], and could actually be seen as a regression for one-page documents, since in practice there'll usually be a couple of "DocStats" messages sent during the parsing of the first page. However, if the user zooms/rotates the document (which causes re-rendering), note that even a one-page document would start to benefit from these changes.
Another benefit of having the data available/cached in the API is that unless the document stats change during parsing, repeated `PDFDocumentProxy.stats`-calls will return *the same identical* object.
This is something that we can easily take advantage of in the default viewer, by now *only* reporting "documentStats" telemetry[3] when the data actually have changed rather than once per rendered page (again beneficial in longer documents).
---
[1] Furthermore, the maximium number of `StreamType`/`FontType` are `10` respectively `12`, which means that regardless of the complexity and page count in a PDF document there'll never be more than twenty-two "DocStats" messages sent; see https://github.com/mozilla/pdf.js/blob/41ac3f0c07128bf34baccdcc067a108c712fd6ef/src/shared/util.js#L206-L232
[2] One example is the `pdf.pdf` document in the test-suite, where rendering all of its 1310 pages only result in a total of seven "DocStats" messages being sent from the worker-thread.
[3] Reporting telemetry, in Firefox, includes using `JSON.stringify` on the data and then sending an event to the `PdfStreamConverter.jsm`-code.
In that code the event is handled and `JSON.parse` is used to retrieve the data, and in the "documentStats"-case we'll then iterate through the data to avoid double-reporting telemetry; see https://searchfox.org/mozilla-central/rev/8f4c180b87e52f3345ef8a3432d6e54bd1eb18dc/toolkit/components/pdfjs/content/PdfStreamConverter.jsm#515-549
2021-11-12 02:14:26 +09:00
|
|
|
StreamType,
|
2020-12-08 03:22:14 +09:00
|
|
|
stringToPDFString,
|
2021-03-26 17:28:18 +09:00
|
|
|
warn,
|
2020-12-08 03:22:14 +09:00
|
|
|
} from "../shared/util.js";
|
2022-02-18 20:11:45 +09:00
|
|
|
import { Dict, isName, Ref, RefSet } from "./primitives.js";
|
2022-02-16 21:43:42 +09:00
|
|
|
import { BaseStream } from "./base_stream.js";
|
2019-02-24 00:14:31 +09:00
|
|
|
|
|
|
|
function getLookupTableFactory(initializer) {
|
|
|
|
let lookup;
|
2020-04-14 19:28:14 +09:00
|
|
|
return function () {
|
2019-02-24 00:14:31 +09:00
|
|
|
if (initializer) {
|
|
|
|
lookup = Object.create(null);
|
|
|
|
initializer(lookup);
|
|
|
|
initializer = null;
|
|
|
|
}
|
|
|
|
return lookup;
|
|
|
|
};
|
|
|
|
}
|
|
|
|
|
2020-10-25 21:06:27 +09:00
|
|
|
function getArrayLookupTableFactory(initializer) {
|
|
|
|
let lookup;
|
|
|
|
return function () {
|
|
|
|
if (initializer) {
|
|
|
|
let arr = initializer();
|
|
|
|
initializer = null;
|
|
|
|
lookup = Object.create(null);
|
|
|
|
for (let i = 0, ii = arr.length; i < ii; i += 2) {
|
|
|
|
lookup[arr[i]] = arr[i + 1];
|
|
|
|
}
|
|
|
|
arr = null;
|
|
|
|
}
|
|
|
|
return lookup;
|
|
|
|
};
|
|
|
|
}
|
|
|
|
|
2019-09-29 08:18:48 +09:00
|
|
|
class MissingDataException extends BaseException {
|
|
|
|
constructor(begin, end) {
|
2021-08-09 19:02:49 +09:00
|
|
|
super(`Missing data [${begin}, ${end})`, "MissingDataException");
|
2019-02-24 00:14:31 +09:00
|
|
|
this.begin = begin;
|
|
|
|
this.end = end;
|
|
|
|
}
|
2019-09-29 08:18:48 +09:00
|
|
|
}
|
2019-02-24 00:14:31 +09:00
|
|
|
|
2021-08-09 19:02:49 +09:00
|
|
|
class ParserEOFException extends BaseException {
|
|
|
|
constructor(msg) {
|
|
|
|
super(msg, "ParserEOFException");
|
|
|
|
}
|
|
|
|
}
|
2021-07-24 00:37:55 +09:00
|
|
|
|
2021-08-09 19:02:49 +09:00
|
|
|
class XRefEntryException extends BaseException {
|
|
|
|
constructor(msg) {
|
|
|
|
super(msg, "XRefEntryException");
|
|
|
|
}
|
|
|
|
}
|
2019-02-24 00:14:31 +09:00
|
|
|
|
2021-08-09 19:02:49 +09:00
|
|
|
class XRefParseException extends BaseException {
|
|
|
|
constructor(msg) {
|
|
|
|
super(msg, "XRefParseException");
|
|
|
|
}
|
|
|
|
}
|
2019-02-24 00:14:31 +09:00
|
|
|
|
[api-minor] Replace `PDFDocumentProxy.getStats` with a synchronous `PDFDocumentProxy.stats` getter
*Please note:* These changes will primarily benefit longer documents, somewhat at the expense of e.g. one-page documents.
The existing `PDFDocumentProxy.getStats` function, which in the default viewer is called for each rendered page, requires a round-trip to the worker-thread in order to obtain the current document stats. In the default viewer, we currently make one such API-call for *every rendered* page.
This patch proposes replacing that method with a *synchronous* `PDFDocumentProxy.stats` getter instead, combined with re-factoring the worker-thread code by adding a `DocStats`-class to track Stream/Font-types and *only send* them to the main-thread *the first time* that a type is encountered.
Note that in practice most PDF documents only use a fairly limited number of Stream/Font-types, which means that in longer documents most of the `PDFDocumentProxy.getStats`-calls will return the same data.[1]
This re-factoring will obviously benefit longer document the most[2], and could actually be seen as a regression for one-page documents, since in practice there'll usually be a couple of "DocStats" messages sent during the parsing of the first page. However, if the user zooms/rotates the document (which causes re-rendering), note that even a one-page document would start to benefit from these changes.
Another benefit of having the data available/cached in the API is that unless the document stats change during parsing, repeated `PDFDocumentProxy.stats`-calls will return *the same identical* object.
This is something that we can easily take advantage of in the default viewer, by now *only* reporting "documentStats" telemetry[3] when the data actually have changed rather than once per rendered page (again beneficial in longer documents).
---
[1] Furthermore, the maximium number of `StreamType`/`FontType` are `10` respectively `12`, which means that regardless of the complexity and page count in a PDF document there'll never be more than twenty-two "DocStats" messages sent; see https://github.com/mozilla/pdf.js/blob/41ac3f0c07128bf34baccdcc067a108c712fd6ef/src/shared/util.js#L206-L232
[2] One example is the `pdf.pdf` document in the test-suite, where rendering all of its 1310 pages only result in a total of seven "DocStats" messages being sent from the worker-thread.
[3] Reporting telemetry, in Firefox, includes using `JSON.stringify` on the data and then sending an event to the `PdfStreamConverter.jsm`-code.
In that code the event is handled and `JSON.parse` is used to retrieve the data, and in the "documentStats"-case we'll then iterate through the data to avoid double-reporting telemetry; see https://searchfox.org/mozilla-central/rev/8f4c180b87e52f3345ef8a3432d6e54bd1eb18dc/toolkit/components/pdfjs/content/PdfStreamConverter.jsm#515-549
2021-11-12 02:14:26 +09:00
|
|
|
class DocStats {
|
|
|
|
constructor(handler) {
|
|
|
|
this._handler = handler;
|
|
|
|
|
|
|
|
this._streamTypes = new Set();
|
|
|
|
this._fontTypes = new Set();
|
|
|
|
}
|
|
|
|
|
|
|
|
_send() {
|
|
|
|
const streamTypes = Object.create(null),
|
|
|
|
fontTypes = Object.create(null);
|
|
|
|
for (const type of this._streamTypes) {
|
|
|
|
streamTypes[type] = true;
|
|
|
|
}
|
|
|
|
for (const type of this._fontTypes) {
|
|
|
|
fontTypes[type] = true;
|
|
|
|
}
|
|
|
|
this._handler.send("DocStats", { streamTypes, fontTypes });
|
|
|
|
}
|
|
|
|
|
|
|
|
addStreamType(type) {
|
|
|
|
if (
|
|
|
|
typeof PDFJSDev === "undefined" ||
|
|
|
|
PDFJSDev.test("!PRODUCTION || TESTING")
|
|
|
|
) {
|
|
|
|
assert(StreamType[type] === type, 'addStreamType: Invalid "type" value.');
|
|
|
|
}
|
|
|
|
if (this._streamTypes.has(type)) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
this._streamTypes.add(type);
|
|
|
|
this._send();
|
|
|
|
}
|
|
|
|
|
|
|
|
addFontType(type) {
|
|
|
|
if (
|
|
|
|
typeof PDFJSDev === "undefined" ||
|
|
|
|
PDFJSDev.test("!PRODUCTION || TESTING")
|
|
|
|
) {
|
|
|
|
assert(FontType[type] === type, 'addFontType: Invalid "type" value.');
|
|
|
|
}
|
|
|
|
if (this._fontTypes.has(type)) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
this._fontTypes.add(type);
|
|
|
|
this._send();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-02-24 00:14:31 +09:00
|
|
|
/**
|
|
|
|
* Get the value of an inheritable property.
|
|
|
|
*
|
|
|
|
* If the PDF specification explicitly lists a property in a dictionary as
|
|
|
|
* inheritable, then the value of the property may be present in the dictionary
|
|
|
|
* itself or in one or more parents of the dictionary.
|
|
|
|
*
|
|
|
|
* If the key is not found in the tree, `undefined` is returned. Otherwise,
|
|
|
|
* the value for the key is returned or, if `stopWhenFound` is `false`, a list
|
2021-02-06 20:23:35 +09:00
|
|
|
* of values is returned.
|
2019-02-24 00:14:31 +09:00
|
|
|
*
|
|
|
|
* @param {Dict} dict - Dictionary from where to start the traversal.
|
|
|
|
* @param {string} key - The key of the property to find the value for.
|
|
|
|
* @param {boolean} getArray - Whether or not the value should be fetched as an
|
|
|
|
* array. The default value is `false`.
|
|
|
|
* @param {boolean} stopWhenFound - Whether or not to stop the traversal when
|
|
|
|
* the key is found. If set to `false`, we always walk up the entire parent
|
|
|
|
* chain, for example to be able to find `\Resources` placed on multiple
|
|
|
|
* levels of the tree. The default value is `true`.
|
|
|
|
*/
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
function getInheritableProperty({
|
|
|
|
dict,
|
|
|
|
key,
|
|
|
|
getArray = false,
|
|
|
|
stopWhenFound = true,
|
|
|
|
}) {
|
2019-02-24 00:14:31 +09:00
|
|
|
let values;
|
2021-02-06 20:23:35 +09:00
|
|
|
const visited = new RefSet();
|
2019-02-24 00:14:31 +09:00
|
|
|
|
2021-02-06 20:23:35 +09:00
|
|
|
while (dict instanceof Dict && !(dict.objId && visited.has(dict.objId))) {
|
|
|
|
if (dict.objId) {
|
|
|
|
visited.put(dict.objId);
|
|
|
|
}
|
2019-02-24 00:14:31 +09:00
|
|
|
const value = getArray ? dict.getArray(key) : dict.get(key);
|
|
|
|
if (value !== undefined) {
|
|
|
|
if (stopWhenFound) {
|
|
|
|
return value;
|
|
|
|
}
|
|
|
|
if (!values) {
|
|
|
|
values = [];
|
|
|
|
}
|
|
|
|
values.push(value);
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
dict = dict.get("Parent");
|
2019-02-24 00:14:31 +09:00
|
|
|
}
|
|
|
|
return values;
|
|
|
|
}
|
|
|
|
|
2019-12-25 23:54:34 +09:00
|
|
|
// prettier-ignore
|
2019-02-24 00:14:31 +09:00
|
|
|
const ROMAN_NUMBER_MAP = [
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
"", "C", "CC", "CCC", "CD", "D", "DC", "DCC", "DCCC", "CM",
|
|
|
|
"", "X", "XX", "XXX", "XL", "L", "LX", "LXX", "LXXX", "XC",
|
|
|
|
"", "I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX"
|
2019-02-24 00:14:31 +09:00
|
|
|
];
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Converts positive integers to (upper case) Roman numerals.
|
2019-10-12 22:54:17 +09:00
|
|
|
* @param {number} number - The number that should be converted.
|
2019-02-24 00:14:31 +09:00
|
|
|
* @param {boolean} lowerCase - Indicates if the result should be converted
|
|
|
|
* to lower case letters. The default value is `false`.
|
2019-10-13 01:14:29 +09:00
|
|
|
* @returns {string} The resulting Roman number.
|
2019-02-24 00:14:31 +09:00
|
|
|
*/
|
|
|
|
function toRomanNumerals(number, lowerCase = false) {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
assert(
|
|
|
|
Number.isInteger(number) && number > 0,
|
|
|
|
"The number should be a positive integer."
|
|
|
|
);
|
2020-01-24 21:21:16 +09:00
|
|
|
const romanBuf = [];
|
|
|
|
let pos;
|
2019-02-24 00:14:31 +09:00
|
|
|
// Thousands
|
|
|
|
while (number >= 1000) {
|
|
|
|
number -= 1000;
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
romanBuf.push("M");
|
2019-02-24 00:14:31 +09:00
|
|
|
}
|
|
|
|
// Hundreds
|
|
|
|
pos = (number / 100) | 0;
|
|
|
|
number %= 100;
|
|
|
|
romanBuf.push(ROMAN_NUMBER_MAP[pos]);
|
|
|
|
// Tens
|
|
|
|
pos = (number / 10) | 0;
|
|
|
|
number %= 10;
|
|
|
|
romanBuf.push(ROMAN_NUMBER_MAP[10 + pos]);
|
|
|
|
// Ones
|
2021-05-24 20:20:19 +09:00
|
|
|
romanBuf.push(ROMAN_NUMBER_MAP[20 + number]); // eslint-disable-line unicorn/no-array-push-push
|
2019-02-24 00:14:31 +09:00
|
|
|
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
const romanStr = romanBuf.join("");
|
|
|
|
return lowerCase ? romanStr.toLowerCase() : romanStr;
|
2019-02-24 00:14:31 +09:00
|
|
|
}
|
|
|
|
|
2020-01-08 03:59:16 +09:00
|
|
|
// Calculate the base 2 logarithm of the number `x`. This differs from the
|
|
|
|
// native function in the sense that it returns the ceiling value and that it
|
|
|
|
// returns 0 instead of `Infinity`/`NaN` for `x` values smaller than/equal to 0.
|
|
|
|
function log2(x) {
|
|
|
|
if (x <= 0) {
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
return Math.ceil(Math.log2(x));
|
|
|
|
}
|
|
|
|
|
|
|
|
function readInt8(data, offset) {
|
|
|
|
return (data[offset] << 24) >> 24;
|
|
|
|
}
|
|
|
|
|
|
|
|
function readUint16(data, offset) {
|
|
|
|
return (data[offset] << 8) | data[offset + 1];
|
|
|
|
}
|
|
|
|
|
|
|
|
function readUint32(data, offset) {
|
|
|
|
return (
|
|
|
|
((data[offset] << 24) |
|
|
|
|
(data[offset + 1] << 16) |
|
|
|
|
(data[offset + 2] << 8) |
|
|
|
|
data[offset + 3]) >>>
|
|
|
|
0
|
|
|
|
);
|
|
|
|
}
|
|
|
|
|
|
|
|
// Checks if ch is one of the following characters: SPACE, TAB, CR or LF.
|
2020-02-10 17:38:57 +09:00
|
|
|
function isWhiteSpace(ch) {
|
2020-01-08 03:59:16 +09:00
|
|
|
return ch === 0x20 || ch === 0x09 || ch === 0x0d || ch === 0x0a;
|
|
|
|
}
|
|
|
|
|
2020-09-09 18:46:02 +09:00
|
|
|
/**
|
|
|
|
* AcroForm field names use an array like notation to refer to
|
|
|
|
* repeated XFA elements e.g. foo.bar[nnn].
|
|
|
|
* see: XFA Spec Chapter 3 - Repeated Elements
|
|
|
|
*
|
|
|
|
* @param {string} path - XFA path name.
|
|
|
|
* @returns {Array} - Array of Objects with the name and pos of
|
|
|
|
* each part of the path.
|
|
|
|
*/
|
|
|
|
function parseXFAPath(path) {
|
2021-09-02 18:41:20 +09:00
|
|
|
const positionPattern = /(.+)\[(\d+)\]$/;
|
2020-09-09 18:46:02 +09:00
|
|
|
return path.split(".").map(component => {
|
|
|
|
const m = component.match(positionPattern);
|
|
|
|
if (m) {
|
|
|
|
return { name: m[1], pos: parseInt(m[2], 10) };
|
|
|
|
}
|
|
|
|
return { name: component, pos: 0 };
|
|
|
|
});
|
|
|
|
}
|
|
|
|
|
2020-09-10 01:39:14 +09:00
|
|
|
function escapePDFName(str) {
|
|
|
|
const buffer = [];
|
|
|
|
let start = 0;
|
|
|
|
for (let i = 0, ii = str.length; i < ii; i++) {
|
|
|
|
const char = str.charCodeAt(i);
|
2020-11-25 02:25:26 +09:00
|
|
|
// Whitespace or delimiters aren't regular chars, so escape them.
|
|
|
|
if (
|
|
|
|
char < 0x21 ||
|
|
|
|
char > 0x7e ||
|
|
|
|
char === 0x23 /* # */ ||
|
|
|
|
char === 0x28 /* ( */ ||
|
|
|
|
char === 0x29 /* ) */ ||
|
|
|
|
char === 0x3c /* < */ ||
|
|
|
|
char === 0x3e /* > */ ||
|
|
|
|
char === 0x5b /* [ */ ||
|
|
|
|
char === 0x5d /* ] */ ||
|
|
|
|
char === 0x7b /* { */ ||
|
|
|
|
char === 0x7d /* } */ ||
|
|
|
|
char === 0x2f /* / */ ||
|
|
|
|
char === 0x25 /* % */
|
|
|
|
) {
|
2020-09-10 01:39:14 +09:00
|
|
|
if (start < i) {
|
|
|
|
buffer.push(str.substring(start, i));
|
|
|
|
}
|
|
|
|
buffer.push(`#${char.toString(16)}`);
|
|
|
|
start = i + 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (buffer.length === 0) {
|
|
|
|
return str;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (start < str.length) {
|
|
|
|
buffer.push(str.substring(start, str.length));
|
|
|
|
}
|
|
|
|
|
|
|
|
return buffer.join("");
|
|
|
|
}
|
|
|
|
|
2020-12-08 03:22:14 +09:00
|
|
|
function _collectJS(entry, xref, list, parents) {
|
|
|
|
if (!entry) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
let parent = null;
|
2022-02-18 20:11:45 +09:00
|
|
|
if (entry instanceof Ref) {
|
2020-12-08 03:22:14 +09:00
|
|
|
if (parents.has(entry)) {
|
|
|
|
// If we've already found entry then we've a cycle.
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
parent = entry;
|
|
|
|
parents.put(parent);
|
|
|
|
entry = xref.fetch(entry);
|
|
|
|
}
|
|
|
|
if (Array.isArray(entry)) {
|
|
|
|
for (const element of entry) {
|
|
|
|
_collectJS(element, xref, list, parents);
|
|
|
|
}
|
|
|
|
} else if (entry instanceof Dict) {
|
2022-02-16 21:43:42 +09:00
|
|
|
if (isName(entry.get("S"), "JavaScript")) {
|
2020-12-08 03:22:14 +09:00
|
|
|
const js = entry.get("JS");
|
|
|
|
let code;
|
2022-02-16 21:43:42 +09:00
|
|
|
if (js instanceof BaseStream) {
|
2021-05-01 19:11:09 +09:00
|
|
|
code = js.getString();
|
2022-02-16 21:43:42 +09:00
|
|
|
} else if (typeof js === "string") {
|
2020-12-08 03:22:14 +09:00
|
|
|
code = js;
|
|
|
|
}
|
2022-02-16 21:43:42 +09:00
|
|
|
code = code && stringToPDFString(code);
|
2020-12-08 03:22:14 +09:00
|
|
|
if (code) {
|
|
|
|
list.push(code);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
_collectJS(entry.getRaw("Next"), xref, list, parents);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (parent) {
|
|
|
|
parents.remove(parent);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
function collectActions(xref, dict, eventType) {
|
|
|
|
const actions = Object.create(null);
|
2021-03-31 00:50:35 +09:00
|
|
|
const additionalActionsDicts = getInheritableProperty({
|
|
|
|
dict,
|
|
|
|
key: "AA",
|
|
|
|
stopWhenFound: false,
|
|
|
|
});
|
|
|
|
if (additionalActionsDicts) {
|
|
|
|
// additionalActionsDicts contains dicts from ancestors
|
|
|
|
// as they're found in the tree from bottom to top.
|
|
|
|
// So the dicts are visited in reverse order to guarantee
|
|
|
|
// that actions from elder ancestors will be overwritten
|
|
|
|
// by ones from younger ancestors.
|
|
|
|
for (let i = additionalActionsDicts.length - 1; i >= 0; i--) {
|
|
|
|
const additionalActions = additionalActionsDicts[i];
|
|
|
|
if (!(additionalActions instanceof Dict)) {
|
2020-12-08 03:22:14 +09:00
|
|
|
continue;
|
|
|
|
}
|
2021-03-31 00:50:35 +09:00
|
|
|
for (const key of additionalActions.getKeys()) {
|
|
|
|
const action = eventType[key];
|
|
|
|
if (!action) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
const actionDict = additionalActions.getRaw(key);
|
|
|
|
const parents = new RefSet();
|
|
|
|
const list = [];
|
|
|
|
_collectJS(actionDict, xref, list, parents);
|
|
|
|
if (list.length > 0) {
|
|
|
|
actions[action] = list;
|
|
|
|
}
|
2020-12-08 03:22:14 +09:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
// Collect the Action if any (we may have one on pushbutton).
|
|
|
|
if (dict.has("A")) {
|
|
|
|
const actionDict = dict.get("A");
|
|
|
|
const parents = new RefSet();
|
|
|
|
const list = [];
|
|
|
|
_collectJS(actionDict, xref, list, parents);
|
|
|
|
if (list.length > 0) {
|
|
|
|
actions.Action = list;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return objectSize(actions) > 0 ? actions : null;
|
|
|
|
}
|
|
|
|
|
2021-02-16 22:13:51 +09:00
|
|
|
const XMLEntities = {
|
|
|
|
/* < */ 0x3c: "<",
|
|
|
|
/* > */ 0x3e: ">",
|
|
|
|
/* & */ 0x26: "&",
|
|
|
|
/* " */ 0x22: """,
|
|
|
|
/* ' */ 0x27: "'",
|
|
|
|
};
|
|
|
|
|
|
|
|
function encodeToXmlString(str) {
|
|
|
|
const buffer = [];
|
|
|
|
let start = 0;
|
|
|
|
for (let i = 0, ii = str.length; i < ii; i++) {
|
|
|
|
const char = str.codePointAt(i);
|
|
|
|
if (0x20 <= char && char <= 0x7e) {
|
|
|
|
// ascii
|
|
|
|
const entity = XMLEntities[char];
|
|
|
|
if (entity) {
|
|
|
|
if (start < i) {
|
|
|
|
buffer.push(str.substring(start, i));
|
|
|
|
}
|
|
|
|
buffer.push(entity);
|
|
|
|
start = i + 1;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
if (start < i) {
|
|
|
|
buffer.push(str.substring(start, i));
|
|
|
|
}
|
|
|
|
buffer.push(`&#x${char.toString(16).toUpperCase()};`);
|
|
|
|
if (char > 0xd7ff && (char < 0xe000 || char > 0xfffd)) {
|
|
|
|
// char is represented by two u16
|
|
|
|
i++;
|
|
|
|
}
|
|
|
|
start = i + 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (buffer.length === 0) {
|
|
|
|
return str;
|
|
|
|
}
|
|
|
|
if (start < str.length) {
|
|
|
|
buffer.push(str.substring(start, str.length));
|
|
|
|
}
|
|
|
|
return buffer.join("");
|
|
|
|
}
|
|
|
|
|
2021-03-26 17:28:18 +09:00
|
|
|
function validateCSSFont(cssFontInfo) {
|
|
|
|
// See https://developer.mozilla.org/en-US/docs/Web/CSS/font-style.
|
|
|
|
const DEFAULT_CSS_FONT_OBLIQUE = "14";
|
|
|
|
// See https://developer.mozilla.org/en-US/docs/Web/CSS/font-weight.
|
|
|
|
const DEFAULT_CSS_FONT_WEIGHT = "400";
|
|
|
|
const CSS_FONT_WEIGHT_VALUES = new Set([
|
|
|
|
"100",
|
|
|
|
"200",
|
|
|
|
"300",
|
|
|
|
"400",
|
|
|
|
"500",
|
|
|
|
"600",
|
|
|
|
"700",
|
|
|
|
"800",
|
|
|
|
"900",
|
|
|
|
"1000",
|
|
|
|
"normal",
|
|
|
|
"bold",
|
|
|
|
"bolder",
|
|
|
|
"lighter",
|
|
|
|
]);
|
|
|
|
|
|
|
|
const { fontFamily, fontWeight, italicAngle } = cssFontInfo;
|
|
|
|
|
|
|
|
// See https://developer.mozilla.org/en-US/docs/Web/CSS/string.
|
|
|
|
if (/^".*"$/.test(fontFamily)) {
|
|
|
|
if (/[^\\]"/.test(fontFamily.slice(1, fontFamily.length - 1))) {
|
|
|
|
warn(`XFA - FontFamily contains some unescaped ": ${fontFamily}.`);
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
} else if (/^'.*'$/.test(fontFamily)) {
|
|
|
|
if (/[^\\]'/.test(fontFamily.slice(1, fontFamily.length - 1))) {
|
|
|
|
warn(`XFA - FontFamily contains some unescaped ': ${fontFamily}.`);
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
// See https://developer.mozilla.org/en-US/docs/Web/CSS/custom-ident.
|
|
|
|
for (const ident of fontFamily.split(/[ \t]+/)) {
|
2021-09-02 18:41:20 +09:00
|
|
|
if (/^(\d|(-(\d|-)))/.test(ident) || !/^[\w-\\]+$/.test(ident)) {
|
2021-03-26 17:28:18 +09:00
|
|
|
warn(
|
|
|
|
`XFA - FontFamily contains some invalid <custom-ident>: ${fontFamily}.`
|
|
|
|
);
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
const weight = fontWeight ? fontWeight.toString() : "";
|
|
|
|
cssFontInfo.fontWeight = CSS_FONT_WEIGHT_VALUES.has(weight)
|
|
|
|
? weight
|
|
|
|
: DEFAULT_CSS_FONT_WEIGHT;
|
|
|
|
|
|
|
|
const angle = parseFloat(italicAngle);
|
|
|
|
cssFontInfo.italicAngle =
|
|
|
|
isNaN(angle) || angle < -90 || angle > 90
|
|
|
|
? DEFAULT_CSS_FONT_OBLIQUE
|
|
|
|
: italicAngle.toString();
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2021-09-25 21:46:40 +09:00
|
|
|
function recoverJsURL(str) {
|
|
|
|
// Attempt to recover valid URLs from `JS` entries with certain
|
|
|
|
// white-listed formats:
|
|
|
|
// - window.open('http://example.com')
|
|
|
|
// - app.launchURL('http://example.com', true)
|
|
|
|
// - xfa.host.gotoURL('http://example.com')
|
|
|
|
const URL_OPEN_METHODS = ["app.launchURL", "window.open", "xfa.host.gotoURL"];
|
|
|
|
const regex = new RegExp(
|
|
|
|
"^\\s*(" +
|
|
|
|
URL_OPEN_METHODS.join("|").split(".").join("\\.") +
|
|
|
|
")\\((?:'|\")([^'\"]*)(?:'|\")(?:,\\s*(\\w+)\\)|\\))",
|
|
|
|
"i"
|
|
|
|
);
|
|
|
|
|
|
|
|
const jsUrl = regex.exec(str);
|
|
|
|
if (jsUrl && jsUrl[2]) {
|
|
|
|
const url = jsUrl[2];
|
|
|
|
let newWindow = false;
|
|
|
|
|
|
|
|
if (jsUrl[3] === "true" && jsUrl[1] === "app.launchURL") {
|
|
|
|
newWindow = true;
|
|
|
|
}
|
|
|
|
return { url, newWindow };
|
|
|
|
}
|
|
|
|
|
|
|
|
return null;
|
|
|
|
}
|
|
|
|
|
2019-02-24 00:14:31 +09:00
|
|
|
export {
|
2020-12-08 03:22:14 +09:00
|
|
|
collectActions,
|
[api-minor] Replace `PDFDocumentProxy.getStats` with a synchronous `PDFDocumentProxy.stats` getter
*Please note:* These changes will primarily benefit longer documents, somewhat at the expense of e.g. one-page documents.
The existing `PDFDocumentProxy.getStats` function, which in the default viewer is called for each rendered page, requires a round-trip to the worker-thread in order to obtain the current document stats. In the default viewer, we currently make one such API-call for *every rendered* page.
This patch proposes replacing that method with a *synchronous* `PDFDocumentProxy.stats` getter instead, combined with re-factoring the worker-thread code by adding a `DocStats`-class to track Stream/Font-types and *only send* them to the main-thread *the first time* that a type is encountered.
Note that in practice most PDF documents only use a fairly limited number of Stream/Font-types, which means that in longer documents most of the `PDFDocumentProxy.getStats`-calls will return the same data.[1]
This re-factoring will obviously benefit longer document the most[2], and could actually be seen as a regression for one-page documents, since in practice there'll usually be a couple of "DocStats" messages sent during the parsing of the first page. However, if the user zooms/rotates the document (which causes re-rendering), note that even a one-page document would start to benefit from these changes.
Another benefit of having the data available/cached in the API is that unless the document stats change during parsing, repeated `PDFDocumentProxy.stats`-calls will return *the same identical* object.
This is something that we can easily take advantage of in the default viewer, by now *only* reporting "documentStats" telemetry[3] when the data actually have changed rather than once per rendered page (again beneficial in longer documents).
---
[1] Furthermore, the maximium number of `StreamType`/`FontType` are `10` respectively `12`, which means that regardless of the complexity and page count in a PDF document there'll never be more than twenty-two "DocStats" messages sent; see https://github.com/mozilla/pdf.js/blob/41ac3f0c07128bf34baccdcc067a108c712fd6ef/src/shared/util.js#L206-L232
[2] One example is the `pdf.pdf` document in the test-suite, where rendering all of its 1310 pages only result in a total of seven "DocStats" messages being sent from the worker-thread.
[3] Reporting telemetry, in Firefox, includes using `JSON.stringify` on the data and then sending an event to the `PdfStreamConverter.jsm`-code.
In that code the event is handled and `JSON.parse` is used to retrieve the data, and in the "documentStats"-case we'll then iterate through the data to avoid double-reporting telemetry; see https://searchfox.org/mozilla-central/rev/8f4c180b87e52f3345ef8a3432d6e54bd1eb18dc/toolkit/components/pdfjs/content/PdfStreamConverter.jsm#515-549
2021-11-12 02:14:26 +09:00
|
|
|
DocStats,
|
2021-02-16 22:13:51 +09:00
|
|
|
encodeToXmlString,
|
2020-09-10 01:39:14 +09:00
|
|
|
escapePDFName,
|
2020-10-25 21:06:27 +09:00
|
|
|
getArrayLookupTableFactory,
|
2019-02-24 00:14:31 +09:00
|
|
|
getInheritableProperty,
|
2021-01-09 23:37:44 +09:00
|
|
|
getLookupTableFactory,
|
|
|
|
isWhiteSpace,
|
2020-01-08 03:59:16 +09:00
|
|
|
log2,
|
2021-01-09 23:37:44 +09:00
|
|
|
MissingDataException,
|
2021-07-24 00:37:55 +09:00
|
|
|
ParserEOFException,
|
2020-09-09 18:46:02 +09:00
|
|
|
parseXFAPath,
|
2020-01-08 03:59:16 +09:00
|
|
|
readInt8,
|
|
|
|
readUint16,
|
|
|
|
readUint32,
|
2021-09-25 21:46:40 +09:00
|
|
|
recoverJsURL,
|
2021-01-09 23:37:44 +09:00
|
|
|
toRomanNumerals,
|
2021-03-26 17:28:18 +09:00
|
|
|
validateCSSFont,
|
2021-01-09 23:37:44 +09:00
|
|
|
XRefEntryException,
|
|
|
|
XRefParseException,
|
2019-02-24 00:14:31 +09:00
|
|
|
};
|