2017-02-17 21:44:49 +09:00
|
|
|
/* Copyright 2017 Mozilla Foundation
|
|
|
|
*
|
|
|
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
|
* you may not use this file except in compliance with the License.
|
|
|
|
* You may obtain a copy of the License at
|
|
|
|
*
|
|
|
|
* http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
*
|
|
|
|
* Unless required by applicable law or agreed to in writing, software
|
|
|
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
* See the License for the specific language governing permissions and
|
|
|
|
* limitations under the License.
|
|
|
|
*/
|
|
|
|
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
import { Page, PDFDocument } from "../../src/core/document.js";
|
2020-06-29 20:18:51 +09:00
|
|
|
import { assert } from "../../src/shared/util.js";
|
[api-minor] Replace `PDFDocumentProxy.getStats` with a synchronous `PDFDocumentProxy.stats` getter
*Please note:* These changes will primarily benefit longer documents, somewhat at the expense of e.g. one-page documents.
The existing `PDFDocumentProxy.getStats` function, which in the default viewer is called for each rendered page, requires a round-trip to the worker-thread in order to obtain the current document stats. In the default viewer, we currently make one such API-call for *every rendered* page.
This patch proposes replacing that method with a *synchronous* `PDFDocumentProxy.stats` getter instead, combined with re-factoring the worker-thread code by adding a `DocStats`-class to track Stream/Font-types and *only send* them to the main-thread *the first time* that a type is encountered.
Note that in practice most PDF documents only use a fairly limited number of Stream/Font-types, which means that in longer documents most of the `PDFDocumentProxy.getStats`-calls will return the same data.[1]
This re-factoring will obviously benefit longer document the most[2], and could actually be seen as a regression for one-page documents, since in practice there'll usually be a couple of "DocStats" messages sent during the parsing of the first page. However, if the user zooms/rotates the document (which causes re-rendering), note that even a one-page document would start to benefit from these changes.
Another benefit of having the data available/cached in the API is that unless the document stats change during parsing, repeated `PDFDocumentProxy.stats`-calls will return *the same identical* object.
This is something that we can easily take advantage of in the default viewer, by now *only* reporting "documentStats" telemetry[3] when the data actually have changed rather than once per rendered page (again beneficial in longer documents).
---
[1] Furthermore, the maximium number of `StreamType`/`FontType` are `10` respectively `12`, which means that regardless of the complexity and page count in a PDF document there'll never be more than twenty-two "DocStats" messages sent; see https://github.com/mozilla/pdf.js/blob/41ac3f0c07128bf34baccdcc067a108c712fd6ef/src/shared/util.js#L206-L232
[2] One example is the `pdf.pdf` document in the test-suite, where rendering all of its 1310 pages only result in a total of seven "DocStats" messages being sent from the worker-thread.
[3] Reporting telemetry, in Firefox, includes using `JSON.stringify` on the data and then sending an event to the `PdfStreamConverter.jsm`-code.
In that code the event is handled and `JSON.parse` is used to retrieve the data, and in the "documentStats"-case we'll then iterate through the data to avoid double-reporting telemetry; see https://searchfox.org/mozilla-central/rev/8f4c180b87e52f3345ef8a3432d6e54bd1eb18dc/toolkit/components/pdfjs/content/PdfStreamConverter.jsm#515-549
2021-11-12 02:14:26 +09:00
|
|
|
import { DocStats } from "../../src/core/core_utils.js";
|
2020-01-02 20:00:16 +09:00
|
|
|
import { isNodeJS } from "../../src/shared/is_node.js";
|
2022-02-18 20:11:45 +09:00
|
|
|
import { Ref } from "../../src/core/primitives.js";
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
import { StringStream } from "../../src/core/stream.js";
|
2017-02-17 21:44:49 +09:00
|
|
|
|
2021-01-09 01:12:58 +09:00
|
|
|
const TEST_PDFS_PATH = isNodeJS ? "./test/pdfs/" : "../pdfs/";
|
|
|
|
|
|
|
|
const CMAP_PARAMS = {
|
|
|
|
cMapUrl: isNodeJS ? "./external/bcmaps/" : "../../external/bcmaps/",
|
|
|
|
cMapPacked: true,
|
|
|
|
};
|
|
|
|
|
2020-12-11 10:32:18 +09:00
|
|
|
const STANDARD_FONT_DATA_URL = isNodeJS
|
|
|
|
? "./external/standard_fonts/"
|
|
|
|
: "../../external/standard_fonts/";
|
|
|
|
|
2019-02-17 20:34:37 +09:00
|
|
|
class DOMFileReaderFactory {
|
|
|
|
static async fetch(params) {
|
|
|
|
const response = await fetch(params.path);
|
|
|
|
if (!response.ok) {
|
|
|
|
throw new Error(response.statusText);
|
|
|
|
}
|
|
|
|
return new Uint8Array(await response.arrayBuffer());
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-04-09 00:09:54 +09:00
|
|
|
class NodeFileReaderFactory {
|
2019-02-17 20:34:37 +09:00
|
|
|
static async fetch(params) {
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
const fs = require("fs");
|
2019-02-17 20:34:37 +09:00
|
|
|
|
|
|
|
return new Promise((resolve, reject) => {
|
|
|
|
fs.readFile(params.path, (error, data) => {
|
|
|
|
if (error || !data) {
|
|
|
|
reject(error || new Error(`Empty file for: ${params.path}`));
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
resolve(new Uint8Array(data));
|
|
|
|
});
|
|
|
|
});
|
2017-04-09 00:09:54 +09:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-01-09 01:12:58 +09:00
|
|
|
const DefaultFileReaderFactory = isNodeJS
|
|
|
|
? NodeFileReaderFactory
|
|
|
|
: DOMFileReaderFactory;
|
2017-05-16 20:01:03 +09:00
|
|
|
|
|
|
|
function buildGetDocumentParams(filename, options) {
|
2020-01-24 17:48:21 +09:00
|
|
|
const params = Object.create(null);
|
2021-01-09 01:12:58 +09:00
|
|
|
params.url = isNodeJS
|
|
|
|
? TEST_PDFS_PATH + filename
|
|
|
|
: new URL(TEST_PDFS_PATH + filename, window.location).href;
|
2020-12-11 10:32:18 +09:00
|
|
|
params.standardFontDataUrl = STANDARD_FONT_DATA_URL;
|
2021-01-09 01:12:58 +09:00
|
|
|
|
2020-01-24 17:48:21 +09:00
|
|
|
for (const option in options) {
|
2017-05-16 20:01:03 +09:00
|
|
|
params[option] = options[option];
|
|
|
|
}
|
|
|
|
return params;
|
|
|
|
}
|
|
|
|
|
2017-07-29 07:35:10 +09:00
|
|
|
class XRefMock {
|
|
|
|
constructor(array) {
|
|
|
|
this._map = Object.create(null);
|
[api-minor] Replace `PDFDocumentProxy.getStats` with a synchronous `PDFDocumentProxy.stats` getter
*Please note:* These changes will primarily benefit longer documents, somewhat at the expense of e.g. one-page documents.
The existing `PDFDocumentProxy.getStats` function, which in the default viewer is called for each rendered page, requires a round-trip to the worker-thread in order to obtain the current document stats. In the default viewer, we currently make one such API-call for *every rendered* page.
This patch proposes replacing that method with a *synchronous* `PDFDocumentProxy.stats` getter instead, combined with re-factoring the worker-thread code by adding a `DocStats`-class to track Stream/Font-types and *only send* them to the main-thread *the first time* that a type is encountered.
Note that in practice most PDF documents only use a fairly limited number of Stream/Font-types, which means that in longer documents most of the `PDFDocumentProxy.getStats`-calls will return the same data.[1]
This re-factoring will obviously benefit longer document the most[2], and could actually be seen as a regression for one-page documents, since in practice there'll usually be a couple of "DocStats" messages sent during the parsing of the first page. However, if the user zooms/rotates the document (which causes re-rendering), note that even a one-page document would start to benefit from these changes.
Another benefit of having the data available/cached in the API is that unless the document stats change during parsing, repeated `PDFDocumentProxy.stats`-calls will return *the same identical* object.
This is something that we can easily take advantage of in the default viewer, by now *only* reporting "documentStats" telemetry[3] when the data actually have changed rather than once per rendered page (again beneficial in longer documents).
---
[1] Furthermore, the maximium number of `StreamType`/`FontType` are `10` respectively `12`, which means that regardless of the complexity and page count in a PDF document there'll never be more than twenty-two "DocStats" messages sent; see https://github.com/mozilla/pdf.js/blob/41ac3f0c07128bf34baccdcc067a108c712fd6ef/src/shared/util.js#L206-L232
[2] One example is the `pdf.pdf` document in the test-suite, where rendering all of its 1310 pages only result in a total of seven "DocStats" messages being sent from the worker-thread.
[3] Reporting telemetry, in Firefox, includes using `JSON.stringify` on the data and then sending an event to the `PdfStreamConverter.jsm`-code.
In that code the event is handled and `JSON.parse` is used to retrieve the data, and in the "documentStats"-case we'll then iterate through the data to avoid double-reporting telemetry; see https://searchfox.org/mozilla-central/rev/8f4c180b87e52f3345ef8a3432d6e54bd1eb18dc/toolkit/components/pdfjs/content/PdfStreamConverter.jsm#515-549
2021-11-12 02:14:26 +09:00
|
|
|
this.stats = new DocStats({ send: () => {} });
|
2020-08-04 02:44:04 +09:00
|
|
|
this._newRefNum = null;
|
2017-07-29 07:35:10 +09:00
|
|
|
|
2020-01-24 17:48:21 +09:00
|
|
|
for (const key in array) {
|
|
|
|
const obj = array[key];
|
2017-07-29 07:35:10 +09:00
|
|
|
this._map[obj.ref.toString()] = obj.data;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-08-04 02:44:04 +09:00
|
|
|
getNewRef() {
|
|
|
|
if (this._newRefNum === null) {
|
2022-06-01 22:42:46 +09:00
|
|
|
this._newRefNum = Object.keys(this._map).length || 1;
|
2020-08-04 02:44:04 +09:00
|
|
|
}
|
|
|
|
return Ref.get(this._newRefNum++, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
resetNewRef() {
|
|
|
|
this.newRef = null;
|
|
|
|
}
|
|
|
|
|
2017-07-29 07:35:10 +09:00
|
|
|
fetch(ref) {
|
|
|
|
return this._map[ref.toString()];
|
|
|
|
}
|
|
|
|
|
2020-10-15 20:20:27 +09:00
|
|
|
async fetchAsync(ref) {
|
|
|
|
return this.fetch(ref);
|
2017-07-29 07:35:10 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
fetchIfRef(obj) {
|
2022-02-18 20:11:45 +09:00
|
|
|
if (obj instanceof Ref) {
|
|
|
|
return this.fetch(obj);
|
2017-07-29 07:35:10 +09:00
|
|
|
}
|
2022-02-18 20:11:45 +09:00
|
|
|
return obj;
|
2017-07-29 07:35:10 +09:00
|
|
|
}
|
|
|
|
|
2020-10-15 20:20:27 +09:00
|
|
|
async fetchIfRefAsync(obj) {
|
|
|
|
return this.fetchIfRef(obj);
|
2017-07-29 07:35:10 +09:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-04-20 19:36:49 +09:00
|
|
|
function createIdFactory(pageIndex) {
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
const pdfManager = {
|
|
|
|
get docId() {
|
|
|
|
return "d0";
|
2019-04-20 19:36:49 +09:00
|
|
|
},
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
};
|
|
|
|
const stream = new StringStream("Dummy_PDF_data");
|
|
|
|
const pdfDocument = new PDFDocument(pdfManager, stream);
|
|
|
|
|
|
|
|
const page = new Page({
|
|
|
|
pdfManager: pdfDocument.pdfManager,
|
|
|
|
xref: pdfDocument.xref,
|
2019-04-20 19:36:49 +09:00
|
|
|
pageIndex,
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
globalIdFactory: pdfDocument._globalIdFactory,
|
2019-04-20 19:36:49 +09:00
|
|
|
});
|
Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.
For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.
In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 23:00:05 +09:00
|
|
|
return page._localIdFactory;
|
2019-04-20 19:36:49 +09:00
|
|
|
}
|
|
|
|
|
2020-06-09 23:48:03 +09:00
|
|
|
function isEmptyObj(obj) {
|
|
|
|
assert(
|
|
|
|
typeof obj === "object" && obj !== null,
|
|
|
|
"isEmptyObj - invalid argument."
|
|
|
|
);
|
|
|
|
return Object.keys(obj).length === 0;
|
|
|
|
}
|
|
|
|
|
2017-04-17 05:30:27 +09:00
|
|
|
export {
|
2017-05-16 20:01:03 +09:00
|
|
|
buildGetDocumentParams,
|
2021-01-09 01:12:58 +09:00
|
|
|
CMAP_PARAMS,
|
2019-04-20 19:36:49 +09:00
|
|
|
createIdFactory,
|
2021-01-09 23:37:44 +09:00
|
|
|
DefaultFileReaderFactory,
|
2020-06-09 23:48:03 +09:00
|
|
|
isEmptyObj,
|
2020-12-11 10:32:18 +09:00
|
|
|
STANDARD_FONT_DATA_URL,
|
2021-01-09 23:37:44 +09:00
|
|
|
TEST_PDFS_PATH,
|
|
|
|
XRefMock,
|
2017-04-17 05:30:27 +09:00
|
|
|
};
|