2017-01-10 01:40:57 +09:00
|
|
|
/* Copyright 2017 Mozilla Foundation
|
|
|
|
*
|
|
|
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
|
* you may not use this file except in compliance with the License.
|
|
|
|
* You may obtain a copy of the License at
|
|
|
|
*
|
|
|
|
* http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
*
|
|
|
|
* Unless required by applicable law or agreed to in writing, software
|
|
|
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
* See the License for the specific language governing permissions and
|
|
|
|
* limitations under the License.
|
|
|
|
*/
|
2017-04-17 05:30:27 +09:00
|
|
|
|
2017-05-16 20:01:03 +09:00
|
|
|
import {
|
2023-03-22 02:14:43 +09:00
|
|
|
AnnotationEditorType,
|
[api-minor] Introduce a new `annotationMode`-option, in `PDFPageProxy.{render, getOperatorList}`
*This is a follow-up to PRs 13867 and 13899.*
This patch is tagged `api-minor` for the following reasons:
- It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.
- For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.
- It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.
---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
2021-08-08 21:36:28 +09:00
|
|
|
AnnotationMode,
|
2022-10-04 00:55:13 +09:00
|
|
|
AnnotationType,
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
ImageKind,
|
2019-04-14 20:13:59 +09:00
|
|
|
InvalidPDFException,
|
2023-07-17 23:33:06 +09:00
|
|
|
isNodeJS,
|
2019-04-14 20:13:59 +09:00
|
|
|
MissingPDFException,
|
2023-03-30 20:36:42 +09:00
|
|
|
objectSize,
|
2019-04-14 20:13:59 +09:00
|
|
|
OPS,
|
|
|
|
PasswordException,
|
|
|
|
PasswordResponses,
|
|
|
|
PermissionFlag,
|
2022-03-08 01:41:41 +09:00
|
|
|
PromiseCapability,
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
UnknownErrorException,
|
2020-01-02 20:00:16 +09:00
|
|
|
} from "../../src/shared/util.js";
|
[api-minor] Introduce a new `annotationMode`-option, in `PDFPageProxy.{render, getOperatorList}`
*This is a follow-up to PRs 13867 and 13899.*
This patch is tagged `api-minor` for the following reasons:
- It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.
- For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.
- It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.
---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
2021-08-08 21:36:28 +09:00
|
|
|
import {
|
|
|
|
buildGetDocumentParams,
|
2023-03-21 20:24:21 +09:00
|
|
|
CMAP_URL,
|
[api-minor] Introduce a new `annotationMode`-option, in `PDFPageProxy.{render, getOperatorList}`
*This is a follow-up to PRs 13867 and 13899.*
This patch is tagged `api-minor` for the following reasons:
- It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.
- For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.
- It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.
---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
2021-08-08 21:36:28 +09:00
|
|
|
DefaultFileReaderFactory,
|
|
|
|
TEST_PDFS_PATH,
|
|
|
|
} from "./test_utils.js";
|
2017-04-17 05:30:27 +09:00
|
|
|
import {
|
2021-01-09 01:12:58 +09:00
|
|
|
DefaultCanvasFactory,
|
2018-02-14 22:49:24 +09:00
|
|
|
getDocument,
|
|
|
|
PDFDataRangeTransport,
|
2021-09-13 20:34:28 +09:00
|
|
|
PDFDocumentLoadingTask,
|
2018-02-14 22:49:24 +09:00
|
|
|
PDFDocumentProxy,
|
|
|
|
PDFPageProxy,
|
|
|
|
PDFWorker,
|
2022-03-10 21:37:21 +09:00
|
|
|
PDFWorkerUtil,
|
2021-09-13 20:34:37 +09:00
|
|
|
RenderTask,
|
2020-01-02 20:00:16 +09:00
|
|
|
} from "../../src/display/api.js";
|
2021-01-09 01:12:58 +09:00
|
|
|
import {
|
2022-01-13 19:58:45 +09:00
|
|
|
PageViewport,
|
2021-01-09 01:12:58 +09:00
|
|
|
RenderingCancelledException,
|
|
|
|
StatTimer,
|
|
|
|
} from "../../src/display/display_utils.js";
|
2020-01-02 20:00:16 +09:00
|
|
|
import { AutoPrintRegExp } from "../../web/ui_utils.js";
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
import { GlobalImageCache } from "../../src/core/image_utils.js";
|
2020-01-02 20:00:16 +09:00
|
|
|
import { GlobalWorkerOptions } from "../../src/display/worker_options.js";
|
|
|
|
import { Metadata } from "../../src/display/metadata.js";
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
[api-major] Output JavaScript modules in the builds (issue 10317)
At this point in time all browsers, and also Node.js, support standard `import`/`export` statements and we can now finally consider outputting modern JavaScript modules in the builds.[1]
In order for this to work we can *only* use proper `import`/`export` statements throughout the main code-base, and (as expected) our Node.js support made this much more complicated since both the official builds and the GitHub Actions-based tests must keep working.[2]
One remaining issue is that the `pdf.scripting.js` file cannot be built as a JavaScript module, since doing so breaks PDF scripting.
Note that my initial goal was to try and split these changes into a couple of commits, however that unfortunately didn't really work since it turned out to be difficult for smaller patches to work correctly and pass (all) tests that way.[3]
This is a classic case of every change requiring a couple of other changes, with each of those changes requiring further changes in turn and the size/scope quickly increasing as a result.
One possible "issue" with these changes is that we'll now only output JavaScript modules in the builds, which could perhaps be a problem with older tools. However it unfortunately seems far too complicated/time-consuming for us to attempt to support both the old and modern module formats, hence the alternative would be to do "nothing" here and just keep our "old" builds.[4]
---
[1] The final blocker was module support in workers in Firefox, which was implemented in Firefox 114; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import#browser_compatibility
[2] It's probably possible to further improve/simplify especially the Node.js-specific code, but it does appear to work as-is.
[3] Having partially "broken" patches, that fail tests, as part of the commit history is *really not* a good idea in general.
[4] Outputting JavaScript modules was first requested almost five years ago, see issue 10317, and nowadays there *should* be much better support for JavaScript modules in various tools.
2023-09-28 20:00:10 +09:00
|
|
|
const WORKER_SRC = "../../build/generic/build/pdf.worker.mjs";
|
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
describe("api", function () {
|
2020-01-24 17:48:21 +09:00
|
|
|
const basicApiFileName = "basicapi.pdf";
|
|
|
|
const basicApiFileLength = 105779; // bytes
|
|
|
|
const basicApiGetDocumentParams = buildGetDocumentParams(basicApiFileName);
|
2023-08-15 19:13:36 +09:00
|
|
|
const tracemonkeyFileName = "tracemonkey.pdf";
|
|
|
|
const tracemonkeyGetDocumentParams =
|
|
|
|
buildGetDocumentParams(tracemonkeyFileName);
|
2017-04-09 00:09:54 +09:00
|
|
|
|
|
|
|
let CanvasFactory;
|
2017-03-13 21:56:59 +09:00
|
|
|
|
2021-04-11 03:21:31 +09:00
|
|
|
beforeAll(function () {
|
2021-01-09 01:12:58 +09:00
|
|
|
CanvasFactory = new DefaultCanvasFactory();
|
2017-03-13 21:56:59 +09:00
|
|
|
});
|
|
|
|
|
2021-04-11 03:21:31 +09:00
|
|
|
afterAll(function () {
|
2017-03-13 21:56:59 +09:00
|
|
|
CanvasFactory = null;
|
|
|
|
});
|
2016-03-29 23:34:13 +09:00
|
|
|
|
2016-02-10 05:55:11 +09:00
|
|
|
function waitSome(callback) {
|
2020-10-25 23:40:51 +09:00
|
|
|
const WAIT_TIMEOUT = 10;
|
2020-04-14 19:28:14 +09:00
|
|
|
setTimeout(function () {
|
2016-02-10 05:55:11 +09:00
|
|
|
callback();
|
|
|
|
}, WAIT_TIMEOUT);
|
2014-08-24 04:08:27 +09:00
|
|
|
}
|
2015-10-17 23:08:10 +09:00
|
|
|
|
2021-05-24 02:03:53 +09:00
|
|
|
function mergeText(items) {
|
2022-06-25 23:40:46 +09:00
|
|
|
return items
|
|
|
|
.map(chunk => (chunk.str ?? "") + (chunk.hasEOL ? "\n" : ""))
|
|
|
|
.join("");
|
2021-05-24 02:03:53 +09:00
|
|
|
}
|
|
|
|
|
2023-02-23 06:08:21 +09:00
|
|
|
function getNamedNodeInXML(node, path) {
|
|
|
|
for (const component of path.split(".")) {
|
|
|
|
if (!node.childNodes) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
for (const child of node.childNodes) {
|
|
|
|
if (child.nodeName === component) {
|
|
|
|
node = child;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return node;
|
|
|
|
}
|
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
describe("getDocument", function () {
|
[api-minor] Support proper `URL`-objects, in addition to URL-strings, in `getDocument`
Currently only URL-strings are officially supported by `getDocument`, however at this point in time I cannot really see any compelling reason to not support `URL`-objects as well.
Most likely the reason that we've don't already support `URL`-objects, in `getDocument`, is that historically `URL` wasn't fully implemented across browsers and our old polyfill wasn't perfect; see https://developer.mozilla.org/en-US/docs/Web/API/URL/URL#browser_compatibility
*Please note:* Because of how the `url` parameter is currently handled, there's actually *some* cases where passing a `URL`-object to `getDocument` already works. That, in my opinion, provides additional motivation for supporting `URL`-objects officially, since it makes the API more consistent.
The following is an attempt to summarize the *current* situation, based on the actual code rather than the JSDocs:
- `getDocument("url string")` works and is documented.[1]
- `getDocument({ url: "url string", })` works and is documented.[1]
- `getDocument(new URL(...))` throws immediately, since no supported parameters are found.
- `getDocument({ url: new URL(...), })` actually works even though it's not documented.[1] Originally, when data was fetched on the worker-thread, this would likely have thrown since `URL` isn't clonable.[2]
- `getDocument({ url: { abc: 123, }, })`, or some similarily meaningless input, will be "accepted" by `getDocument` and then throw a `MissingPDFException` when attempting to fetch the bogus data.
With the changes in this patch, not only is `URL`-objects now officially supported and documented when calling `getDocument`, but we'll also do a much better job at actually validating any URL-data passed to `getDocument` (and instead fail early).
---
[1] In *browsers*, we create a valid URL thus indirectly validating the input. In Node.js environments, on the other hand, no validation is done since obtaining a baseUrl is more difficult (and PDF.js is primarily written for browsers anyway).
[2] https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm#supported_types
2021-03-31 23:21:41 +09:00
|
|
|
it("creates pdf doc from URL-string", async function () {
|
|
|
|
const urlStr = TEST_PDFS_PATH + basicApiFileName;
|
|
|
|
const loadingTask = getDocument(urlStr);
|
2021-09-13 20:34:28 +09:00
|
|
|
expect(loadingTask instanceof PDFDocumentLoadingTask).toEqual(true);
|
[api-minor] Support proper `URL`-objects, in addition to URL-strings, in `getDocument`
Currently only URL-strings are officially supported by `getDocument`, however at this point in time I cannot really see any compelling reason to not support `URL`-objects as well.
Most likely the reason that we've don't already support `URL`-objects, in `getDocument`, is that historically `URL` wasn't fully implemented across browsers and our old polyfill wasn't perfect; see https://developer.mozilla.org/en-US/docs/Web/API/URL/URL#browser_compatibility
*Please note:* Because of how the `url` parameter is currently handled, there's actually *some* cases where passing a `URL`-object to `getDocument` already works. That, in my opinion, provides additional motivation for supporting `URL`-objects officially, since it makes the API more consistent.
The following is an attempt to summarize the *current* situation, based on the actual code rather than the JSDocs:
- `getDocument("url string")` works and is documented.[1]
- `getDocument({ url: "url string", })` works and is documented.[1]
- `getDocument(new URL(...))` throws immediately, since no supported parameters are found.
- `getDocument({ url: new URL(...), })` actually works even though it's not documented.[1] Originally, when data was fetched on the worker-thread, this would likely have thrown since `URL` isn't clonable.[2]
- `getDocument({ url: { abc: 123, }, })`, or some similarily meaningless input, will be "accepted" by `getDocument` and then throw a `MissingPDFException` when attempting to fetch the bogus data.
With the changes in this patch, not only is `URL`-objects now officially supported and documented when calling `getDocument`, but we'll also do a much better job at actually validating any URL-data passed to `getDocument` (and instead fail early).
---
[1] In *browsers*, we create a valid URL thus indirectly validating the input. In Node.js environments, on the other hand, no validation is done since obtaining a baseUrl is more difficult (and PDF.js is primarily written for browsers anyway).
[2] https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm#supported_types
2021-03-31 23:21:41 +09:00
|
|
|
const pdfDocument = await loadingTask.promise;
|
|
|
|
|
|
|
|
expect(typeof urlStr).toEqual("string");
|
|
|
|
expect(pdfDocument instanceof PDFDocumentProxy).toEqual(true);
|
|
|
|
expect(pdfDocument.numPages).toEqual(3);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
|
|
|
it("creates pdf doc from URL-object", async function () {
|
|
|
|
if (isNodeJS) {
|
|
|
|
pending("window.location is not supported in Node.js.");
|
|
|
|
}
|
|
|
|
const urlObj = new URL(
|
|
|
|
TEST_PDFS_PATH + basicApiFileName,
|
|
|
|
window.location
|
|
|
|
);
|
|
|
|
const loadingTask = getDocument(urlObj);
|
2021-09-13 20:34:28 +09:00
|
|
|
expect(loadingTask instanceof PDFDocumentLoadingTask).toEqual(true);
|
[api-minor] Support proper `URL`-objects, in addition to URL-strings, in `getDocument`
Currently only URL-strings are officially supported by `getDocument`, however at this point in time I cannot really see any compelling reason to not support `URL`-objects as well.
Most likely the reason that we've don't already support `URL`-objects, in `getDocument`, is that historically `URL` wasn't fully implemented across browsers and our old polyfill wasn't perfect; see https://developer.mozilla.org/en-US/docs/Web/API/URL/URL#browser_compatibility
*Please note:* Because of how the `url` parameter is currently handled, there's actually *some* cases where passing a `URL`-object to `getDocument` already works. That, in my opinion, provides additional motivation for supporting `URL`-objects officially, since it makes the API more consistent.
The following is an attempt to summarize the *current* situation, based on the actual code rather than the JSDocs:
- `getDocument("url string")` works and is documented.[1]
- `getDocument({ url: "url string", })` works and is documented.[1]
- `getDocument(new URL(...))` throws immediately, since no supported parameters are found.
- `getDocument({ url: new URL(...), })` actually works even though it's not documented.[1] Originally, when data was fetched on the worker-thread, this would likely have thrown since `URL` isn't clonable.[2]
- `getDocument({ url: { abc: 123, }, })`, or some similarily meaningless input, will be "accepted" by `getDocument` and then throw a `MissingPDFException` when attempting to fetch the bogus data.
With the changes in this patch, not only is `URL`-objects now officially supported and documented when calling `getDocument`, but we'll also do a much better job at actually validating any URL-data passed to `getDocument` (and instead fail early).
---
[1] In *browsers*, we create a valid URL thus indirectly validating the input. In Node.js environments, on the other hand, no validation is done since obtaining a baseUrl is more difficult (and PDF.js is primarily written for browsers anyway).
[2] https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm#supported_types
2021-03-31 23:21:41 +09:00
|
|
|
const pdfDocument = await loadingTask.promise;
|
|
|
|
|
|
|
|
expect(urlObj instanceof URL).toEqual(true);
|
|
|
|
expect(pdfDocument instanceof PDFDocumentProxy).toEqual(true);
|
|
|
|
expect(pdfDocument.numPages).toEqual(3);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("creates pdf doc from URL", async function () {
|
2020-10-25 23:40:51 +09:00
|
|
|
const loadingTask = getDocument(basicApiGetDocumentParams);
|
2021-09-13 20:34:28 +09:00
|
|
|
expect(loadingTask instanceof PDFDocumentLoadingTask).toEqual(true);
|
2015-10-17 23:08:10 +09:00
|
|
|
|
2022-03-08 01:41:41 +09:00
|
|
|
const progressReportedCapability = new PromiseCapability();
|
2018-02-18 07:51:24 +09:00
|
|
|
// Attach the callback that is used to report loading progress;
|
|
|
|
// similarly to how viewer.js works.
|
2020-04-14 19:28:14 +09:00
|
|
|
loadingTask.onProgress = function (progressData) {
|
2019-02-02 21:10:06 +09:00
|
|
|
if (!progressReportedCapability.settled) {
|
2018-02-18 07:51:24 +09:00
|
|
|
progressReportedCapability.resolve(progressData);
|
|
|
|
}
|
|
|
|
};
|
2015-10-17 23:08:10 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const data = await Promise.all([
|
2020-10-25 23:40:51 +09:00
|
|
|
progressReportedCapability.promise,
|
|
|
|
loadingTask.promise,
|
2021-04-17 04:48:42 +09:00
|
|
|
]);
|
|
|
|
|
|
|
|
expect(data[0].loaded / data[0].total >= 0).toEqual(true);
|
|
|
|
expect(data[1] instanceof PDFDocumentProxy).toEqual(true);
|
|
|
|
expect(loadingTask).toEqual(data[1].loadingTask);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
2018-02-18 07:51:24 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("creates pdf doc from URL and aborts before worker initialized", async function () {
|
2020-10-25 23:40:51 +09:00
|
|
|
const loadingTask = getDocument(basicApiGetDocumentParams);
|
2021-09-13 20:34:28 +09:00
|
|
|
expect(loadingTask instanceof PDFDocumentLoadingTask).toEqual(true);
|
2020-01-24 17:48:21 +09:00
|
|
|
const destroyed = loadingTask.destroy();
|
2017-05-10 07:21:09 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
try {
|
|
|
|
await loadingTask.promise;
|
|
|
|
|
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
2023-06-12 18:46:11 +09:00
|
|
|
} catch {
|
2021-04-17 04:48:42 +09:00
|
|
|
expect(true).toEqual(true);
|
|
|
|
await destroyed;
|
|
|
|
}
|
2018-02-18 07:51:24 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("creates pdf doc from URL and aborts loading after worker initialized", async function () {
|
2020-10-25 23:40:51 +09:00
|
|
|
const loadingTask = getDocument(basicApiGetDocumentParams);
|
2021-09-13 20:34:28 +09:00
|
|
|
expect(loadingTask instanceof PDFDocumentLoadingTask).toEqual(true);
|
2018-02-18 07:51:24 +09:00
|
|
|
// This can be somewhat random -- we cannot guarantee perfect
|
|
|
|
// 'Terminate' message to the worker before/after setting up pdfManager.
|
2020-10-25 23:40:51 +09:00
|
|
|
const destroyed = loadingTask._worker.promise.then(function () {
|
2018-02-18 07:51:24 +09:00
|
|
|
return loadingTask.destroy();
|
2012-04-13 09:59:30 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
await destroyed;
|
|
|
|
expect(true).toEqual(true);
|
2018-02-18 07:51:24 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
2022-08-10 21:13:01 +09:00
|
|
|
it("creates pdf doc from TypedArray", async function () {
|
2021-04-17 04:48:42 +09:00
|
|
|
const typedArrayPdf = await DefaultFileReaderFactory.fetch({
|
2021-01-09 01:12:58 +09:00
|
|
|
path: TEST_PDFS_PATH + basicApiFileName,
|
|
|
|
});
|
2014-08-15 23:04:39 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
// Sanity check to make sure that we fetched the entire PDF file.
|
2022-08-10 21:13:01 +09:00
|
|
|
expect(typedArrayPdf instanceof Uint8Array).toEqual(true);
|
2021-04-17 04:48:42 +09:00
|
|
|
expect(typedArrayPdf.length).toEqual(basicApiFileLength);
|
Ensure that `onProgress` is always called when the entire PDF file has been loaded, regardless of how it was fetched (issue 10160)
*Please note:* I'm totally fine with this patch being rejected, and the issue closed as WONTFIX; however these changes should address the issue if that's desired.
From a conceptual point of view, reporting loading progress doesn't really make a lot of sense for PDF files opened by passing raw binary data directly to `getDocument` (since obviously *all* data was loaded).
This is compared to PDF files loaded via e.g. `XMLHttpRequest` or the Fetch API, where the entire PDF file isn't available from the start and knowing the loading progress makes total sense.
However I can certainly see why the current API could be considered inconsistent, which isn't great, since a registered `onProgress` callback will never be called for certain `getDocument` calls.
The simplest solution to this inconsistency thus seem to be to ensure that `onProgress` is always called when handling the `DataLoaded` message, since that will *always* be dispatched[1] from the worker-thread.
---
[1] Note that this isn't guaranteed to happen, since setting `disableAutoFetch = true` often prevents the *entire* file from ever loading. However, this isn't relevant for the issue at hand, and is a well-known consequence of using `disableAutoFetch = true`; note how the default viewer even has a specialized code-path for hiding the loadingBar.
2018-10-16 20:24:02 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const loadingTask = getDocument(typedArrayPdf);
|
2021-09-13 20:34:28 +09:00
|
|
|
expect(loadingTask instanceof PDFDocumentLoadingTask).toEqual(true);
|
Ensure that `onProgress` is always called when the entire PDF file has been loaded, regardless of how it was fetched (issue 10160)
*Please note:* I'm totally fine with this patch being rejected, and the issue closed as WONTFIX; however these changes should address the issue if that's desired.
From a conceptual point of view, reporting loading progress doesn't really make a lot of sense for PDF files opened by passing raw binary data directly to `getDocument` (since obviously *all* data was loaded).
This is compared to PDF files loaded via e.g. `XMLHttpRequest` or the Fetch API, where the entire PDF file isn't available from the start and knowing the loading progress makes total sense.
However I can certainly see why the current API could be considered inconsistent, which isn't great, since a registered `onProgress` callback will never be called for certain `getDocument` calls.
The simplest solution to this inconsistency thus seem to be to ensure that `onProgress` is always called when handling the `DataLoaded` message, since that will *always* be dispatched[1] from the worker-thread.
---
[1] Note that this isn't guaranteed to happen, since setting `disableAutoFetch = true` often prevents the *entire* file from ever loading. However, this isn't relevant for the issue at hand, and is a well-known consequence of using `disableAutoFetch = true`; note how the default viewer even has a specialized code-path for hiding the loadingBar.
2018-10-16 20:24:02 +09:00
|
|
|
|
2022-03-08 01:41:41 +09:00
|
|
|
const progressReportedCapability = new PromiseCapability();
|
2021-04-17 04:48:42 +09:00
|
|
|
loadingTask.onProgress = function (data) {
|
|
|
|
progressReportedCapability.resolve(data);
|
|
|
|
};
|
Ensure that `onProgress` is always called when the entire PDF file has been loaded, regardless of how it was fetched (issue 10160)
*Please note:* I'm totally fine with this patch being rejected, and the issue closed as WONTFIX; however these changes should address the issue if that's desired.
From a conceptual point of view, reporting loading progress doesn't really make a lot of sense for PDF files opened by passing raw binary data directly to `getDocument` (since obviously *all* data was loaded).
This is compared to PDF files loaded via e.g. `XMLHttpRequest` or the Fetch API, where the entire PDF file isn't available from the start and knowing the loading progress makes total sense.
However I can certainly see why the current API could be considered inconsistent, which isn't great, since a registered `onProgress` callback will never be called for certain `getDocument` calls.
The simplest solution to this inconsistency thus seem to be to ensure that `onProgress` is always called when handling the `DataLoaded` message, since that will *always* be dispatched[1] from the worker-thread.
---
[1] Note that this isn't guaranteed to happen, since setting `disableAutoFetch = true` often prevents the *entire* file from ever loading. However, this isn't relevant for the issue at hand, and is a well-known consequence of using `disableAutoFetch = true`; note how the default viewer even has a specialized code-path for hiding the loadingBar.
2018-10-16 20:24:02 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const data = await Promise.all([
|
|
|
|
loadingTask.promise,
|
|
|
|
progressReportedCapability.promise,
|
|
|
|
]);
|
|
|
|
expect(data[0] instanceof PDFDocumentProxy).toEqual(true);
|
|
|
|
expect(data[1].loaded / data[1].total).toEqual(1);
|
2019-02-17 20:34:37 +09:00
|
|
|
|
2023-03-20 05:49:27 +09:00
|
|
|
// Check that the TypedArray was transferred.
|
|
|
|
expect(typedArrayPdf.length).toEqual(0);
|
2023-01-10 01:24:52 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await loadingTask.destroy();
|
2018-02-18 07:51:24 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
2022-08-10 21:13:01 +09:00
|
|
|
it("creates pdf doc from ArrayBuffer", async function () {
|
|
|
|
const { buffer: arrayBufferPdf } = await DefaultFileReaderFactory.fetch({
|
|
|
|
path: TEST_PDFS_PATH + basicApiFileName,
|
|
|
|
});
|
|
|
|
|
|
|
|
// Sanity check to make sure that we fetched the entire PDF file.
|
|
|
|
expect(arrayBufferPdf instanceof ArrayBuffer).toEqual(true);
|
|
|
|
expect(arrayBufferPdf.byteLength).toEqual(basicApiFileLength);
|
|
|
|
|
|
|
|
const loadingTask = getDocument(arrayBufferPdf);
|
|
|
|
expect(loadingTask instanceof PDFDocumentLoadingTask).toEqual(true);
|
|
|
|
|
2022-03-08 01:41:41 +09:00
|
|
|
const progressReportedCapability = new PromiseCapability();
|
2022-08-10 21:13:01 +09:00
|
|
|
loadingTask.onProgress = function (data) {
|
|
|
|
progressReportedCapability.resolve(data);
|
|
|
|
};
|
|
|
|
|
|
|
|
const data = await Promise.all([
|
|
|
|
loadingTask.promise,
|
|
|
|
progressReportedCapability.promise,
|
|
|
|
]);
|
|
|
|
expect(data[0] instanceof PDFDocumentProxy).toEqual(true);
|
|
|
|
expect(data[1].loaded / data[1].total).toEqual(1);
|
|
|
|
|
2023-03-20 05:49:27 +09:00
|
|
|
// Check that the ArrayBuffer was transferred.
|
|
|
|
expect(arrayBufferPdf.byteLength).toEqual(0);
|
[api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up)
This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data *by default* instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases:
- TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data.
- TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer).
*PLEASE NOTE:* To avoid being affected by this, please simply *copy* any TypedArray data before passing it to either of the functions/methods mentioned above.
Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues.
Hence we'll check for this and only allow transferring of *safe* TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here.
---
[1] See https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2492-L2506 respectively https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2578-L2590
2023-01-13 19:16:28 +09:00
|
|
|
|
2022-08-10 21:13:01 +09:00
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("creates pdf doc from invalid PDF file", async function () {
|
2018-02-18 07:51:24 +09:00
|
|
|
// A severely corrupt PDF file (even Adobe Reader fails to open it).
|
2020-10-25 23:40:51 +09:00
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("bug1020226.pdf"));
|
2021-09-13 20:34:28 +09:00
|
|
|
expect(loadingTask instanceof PDFDocumentLoadingTask).toEqual(true);
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
try {
|
|
|
|
await loadingTask.promise;
|
|
|
|
|
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
|
|
|
} catch (reason) {
|
|
|
|
expect(reason instanceof InvalidPDFException).toEqual(true);
|
|
|
|
expect(reason.message).toEqual("Invalid PDF structure.");
|
|
|
|
}
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
2018-02-18 07:51:24 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("creates pdf doc from non-existent URL", async function () {
|
2020-10-25 23:40:51 +09:00
|
|
|
const loadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("non-existent.pdf")
|
|
|
|
);
|
2021-09-13 20:34:28 +09:00
|
|
|
expect(loadingTask instanceof PDFDocumentLoadingTask).toEqual(true);
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
try {
|
|
|
|
await loadingTask.promise;
|
|
|
|
|
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
|
|
|
} catch (reason) {
|
|
|
|
expect(reason instanceof MissingPDFException).toEqual(true);
|
|
|
|
}
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
2018-02-18 07:51:24 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("creates pdf doc from PDF file protected with user and owner password", async function () {
|
2020-10-25 23:40:51 +09:00
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("pr6531_1.pdf"));
|
2021-09-13 20:34:28 +09:00
|
|
|
expect(loadingTask instanceof PDFDocumentLoadingTask).toEqual(true);
|
2015-10-17 01:48:26 +09:00
|
|
|
|
2022-03-08 01:41:41 +09:00
|
|
|
const passwordNeededCapability = new PromiseCapability();
|
|
|
|
const passwordIncorrectCapability = new PromiseCapability();
|
2018-02-18 07:51:24 +09:00
|
|
|
// Attach the callback that is used to request a password;
|
2022-02-09 21:48:29 +09:00
|
|
|
// similarly to how the default viewer handles passwords.
|
2020-04-14 19:28:14 +09:00
|
|
|
loadingTask.onPassword = function (updatePassword, reason) {
|
2018-02-18 07:51:24 +09:00
|
|
|
if (
|
|
|
|
reason === PasswordResponses.NEED_PASSWORD &&
|
2019-02-02 21:10:06 +09:00
|
|
|
!passwordNeededCapability.settled
|
|
|
|
) {
|
2018-02-18 07:51:24 +09:00
|
|
|
passwordNeededCapability.resolve();
|
2015-10-17 01:48:26 +09:00
|
|
|
|
2018-02-18 07:51:24 +09:00
|
|
|
updatePassword("qwerty"); // Provide an incorrect password.
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
if (
|
|
|
|
reason === PasswordResponses.INCORRECT_PASSWORD &&
|
2019-02-02 21:10:06 +09:00
|
|
|
!passwordIncorrectCapability.settled
|
|
|
|
) {
|
2018-02-18 07:51:24 +09:00
|
|
|
passwordIncorrectCapability.resolve();
|
2015-10-17 23:08:10 +09:00
|
|
|
|
2018-02-18 07:51:24 +09:00
|
|
|
updatePassword("asdfasdf"); // Provide the correct password.
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
|
|
|
};
|
2015-10-17 23:08:10 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const data = await Promise.all([
|
2018-02-18 07:51:24 +09:00
|
|
|
passwordNeededCapability.promise,
|
|
|
|
passwordIncorrectCapability.promise,
|
|
|
|
loadingTask.promise,
|
2021-04-17 04:48:42 +09:00
|
|
|
]);
|
|
|
|
expect(data[2] instanceof PDFDocumentProxy).toEqual(true);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
2018-02-18 07:51:24 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("creates pdf doc from PDF file protected with only a user password", async function () {
|
2020-10-25 23:40:51 +09:00
|
|
|
const filename = "pr6531_2.pdf";
|
2015-10-17 01:48:26 +09:00
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const passwordNeededLoadingTask = getDocument(
|
2018-02-18 07:51:24 +09:00
|
|
|
buildGetDocumentParams(filename, {
|
|
|
|
password: "",
|
|
|
|
})
|
|
|
|
);
|
2021-09-13 20:34:28 +09:00
|
|
|
expect(
|
|
|
|
passwordNeededLoadingTask instanceof PDFDocumentLoadingTask
|
|
|
|
).toEqual(true);
|
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const result1 = passwordNeededLoadingTask.promise.then(
|
2020-04-14 19:28:14 +09:00
|
|
|
function () {
|
2021-04-17 04:48:42 +09:00
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
2023-05-13 18:30:25 +09:00
|
|
|
throw new Error("loadingTask should be rejected");
|
2018-02-18 07:51:24 +09:00
|
|
|
},
|
2020-04-14 19:28:14 +09:00
|
|
|
function (data) {
|
2018-02-18 07:51:24 +09:00
|
|
|
expect(data instanceof PasswordException).toEqual(true);
|
|
|
|
expect(data.code).toEqual(PasswordResponses.NEED_PASSWORD);
|
|
|
|
return passwordNeededLoadingTask.destroy();
|
|
|
|
}
|
|
|
|
);
|
2015-10-17 01:48:26 +09:00
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const passwordIncorrectLoadingTask = getDocument(
|
2018-02-18 07:51:24 +09:00
|
|
|
buildGetDocumentParams(filename, {
|
|
|
|
password: "qwerty",
|
|
|
|
})
|
|
|
|
);
|
2021-09-13 20:34:28 +09:00
|
|
|
expect(
|
|
|
|
passwordIncorrectLoadingTask instanceof PDFDocumentLoadingTask
|
|
|
|
).toEqual(true);
|
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const result2 = passwordIncorrectLoadingTask.promise.then(
|
2020-04-14 19:28:14 +09:00
|
|
|
function () {
|
2021-04-17 04:48:42 +09:00
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
2023-05-13 18:30:25 +09:00
|
|
|
throw new Error("loadingTask should be rejected");
|
2018-02-18 07:51:24 +09:00
|
|
|
},
|
2020-04-14 19:28:14 +09:00
|
|
|
function (data) {
|
2018-02-18 07:51:24 +09:00
|
|
|
expect(data instanceof PasswordException).toEqual(true);
|
|
|
|
expect(data.code).toEqual(PasswordResponses.INCORRECT_PASSWORD);
|
|
|
|
return passwordIncorrectLoadingTask.destroy();
|
|
|
|
}
|
|
|
|
);
|
2015-10-17 01:48:26 +09:00
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const passwordAcceptedLoadingTask = getDocument(
|
2018-02-18 07:51:24 +09:00
|
|
|
buildGetDocumentParams(filename, {
|
|
|
|
password: "asdfasdf",
|
|
|
|
})
|
|
|
|
);
|
2021-09-13 20:34:28 +09:00
|
|
|
expect(
|
|
|
|
passwordAcceptedLoadingTask instanceof PDFDocumentLoadingTask
|
|
|
|
).toEqual(true);
|
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const result3 = passwordAcceptedLoadingTask.promise.then(function (data) {
|
2018-02-18 07:51:24 +09:00
|
|
|
expect(data instanceof PDFDocumentProxy).toEqual(true);
|
|
|
|
return passwordAcceptedLoadingTask.destroy();
|
2015-10-17 01:48:26 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
await Promise.all([result1, result2, result3]);
|
2018-02-18 07:51:24 +09:00
|
|
|
});
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2018-02-18 07:51:24 +09:00
|
|
|
it(
|
|
|
|
"creates pdf doc from password protected PDF file and aborts/throws " +
|
|
|
|
"in the onPassword callback (issue 7806)",
|
2021-04-17 04:48:42 +09:00
|
|
|
async function () {
|
2020-10-25 23:40:51 +09:00
|
|
|
const filename = "issue3371.pdf";
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const passwordNeededLoadingTask = getDocument(
|
2018-02-18 07:51:24 +09:00
|
|
|
buildGetDocumentParams(filename)
|
|
|
|
);
|
2021-09-13 20:34:28 +09:00
|
|
|
expect(
|
|
|
|
passwordNeededLoadingTask instanceof PDFDocumentLoadingTask
|
|
|
|
).toEqual(true);
|
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const passwordIncorrectLoadingTask = getDocument(
|
2018-02-18 07:51:24 +09:00
|
|
|
buildGetDocumentParams(filename, {
|
|
|
|
password: "qwerty",
|
|
|
|
})
|
|
|
|
);
|
2021-09-13 20:34:28 +09:00
|
|
|
expect(
|
|
|
|
passwordIncorrectLoadingTask instanceof PDFDocumentLoadingTask
|
|
|
|
).toEqual(true);
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2018-02-18 07:51:24 +09:00
|
|
|
let passwordNeededDestroyed;
|
2020-04-14 19:28:14 +09:00
|
|
|
passwordNeededLoadingTask.onPassword = function (callback, reason) {
|
2018-02-18 07:51:24 +09:00
|
|
|
if (reason === PasswordResponses.NEED_PASSWORD) {
|
|
|
|
passwordNeededDestroyed = passwordNeededLoadingTask.destroy();
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
|
|
|
};
|
2020-10-25 23:40:51 +09:00
|
|
|
const result1 = passwordNeededLoadingTask.promise.then(
|
2020-04-14 19:28:14 +09:00
|
|
|
function () {
|
2021-04-17 04:48:42 +09:00
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
2023-05-13 18:30:25 +09:00
|
|
|
throw new Error("loadingTask should be rejected");
|
2018-02-18 07:51:24 +09:00
|
|
|
},
|
2020-04-14 19:28:14 +09:00
|
|
|
function (reason) {
|
2018-02-18 07:51:24 +09:00
|
|
|
expect(reason instanceof PasswordException).toEqual(true);
|
|
|
|
expect(reason.code).toEqual(PasswordResponses.NEED_PASSWORD);
|
|
|
|
return passwordNeededDestroyed;
|
|
|
|
}
|
|
|
|
);
|
2016-12-31 21:59:07 +09:00
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
passwordIncorrectLoadingTask.onPassword = function (callback, reason) {
|
2018-02-18 07:51:24 +09:00
|
|
|
if (reason === PasswordResponses.INCORRECT_PASSWORD) {
|
|
|
|
throw new Error("Incorrect password");
|
|
|
|
}
|
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
|
|
|
};
|
2020-10-25 23:40:51 +09:00
|
|
|
const result2 = passwordIncorrectLoadingTask.promise.then(
|
2020-04-14 19:28:14 +09:00
|
|
|
function () {
|
2021-04-17 04:48:42 +09:00
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
2023-05-13 18:30:25 +09:00
|
|
|
throw new Error("loadingTask should be rejected");
|
2018-02-18 07:51:24 +09:00
|
|
|
},
|
2020-04-14 19:28:14 +09:00
|
|
|
function (reason) {
|
2018-02-18 07:51:24 +09:00
|
|
|
expect(reason instanceof PasswordException).toEqual(true);
|
|
|
|
expect(reason.code).toEqual(PasswordResponses.INCORRECT_PASSWORD);
|
|
|
|
return passwordIncorrectLoadingTask.destroy();
|
|
|
|
}
|
|
|
|
);
|
2016-12-31 21:59:07 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await Promise.all([result1, result2]);
|
2012-04-13 09:59:30 +09:00
|
|
|
}
|
|
|
|
);
|
2019-12-09 23:00:45 +09:00
|
|
|
|
2022-02-09 21:48:29 +09:00
|
|
|
it(
|
|
|
|
"creates pdf doc from password protected PDF file and passes an Error " +
|
|
|
|
"(asynchronously) to the onPassword callback (bug 1754421)",
|
|
|
|
async function () {
|
|
|
|
const loadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("issue3371.pdf")
|
|
|
|
);
|
|
|
|
expect(loadingTask instanceof PDFDocumentLoadingTask).toEqual(true);
|
|
|
|
|
|
|
|
// Attach the callback that is used to request a password;
|
|
|
|
// similarly to how the default viewer handles passwords.
|
|
|
|
loadingTask.onPassword = function (updatePassword, reason) {
|
|
|
|
waitSome(() => {
|
|
|
|
updatePassword(new Error("Should reject the loadingTask."));
|
|
|
|
});
|
|
|
|
};
|
|
|
|
|
|
|
|
await loadingTask.promise.then(
|
|
|
|
function () {
|
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
|
|
|
},
|
|
|
|
function (reason) {
|
|
|
|
expect(reason instanceof PasswordException).toEqual(true);
|
|
|
|
expect(reason.code).toEqual(PasswordResponses.NEED_PASSWORD);
|
|
|
|
}
|
|
|
|
);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
}
|
|
|
|
);
|
|
|
|
|
2022-08-10 21:13:01 +09:00
|
|
|
it("creates pdf doc from empty TypedArray", async function () {
|
2019-12-09 23:00:45 +09:00
|
|
|
const loadingTask = getDocument(new Uint8Array(0));
|
2021-09-13 20:34:28 +09:00
|
|
|
expect(loadingTask instanceof PDFDocumentLoadingTask).toEqual(true);
|
2019-12-09 23:00:45 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
try {
|
|
|
|
await loadingTask.promise;
|
2019-12-09 23:00:45 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
|
|
|
} catch (reason) {
|
|
|
|
expect(reason instanceof InvalidPDFException).toEqual(true);
|
|
|
|
expect(reason.message).toEqual(
|
|
|
|
"The PDF file is empty, i.e. its size is zero bytes."
|
|
|
|
);
|
|
|
|
}
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
2019-12-09 23:00:45 +09:00
|
|
|
});
|
2021-07-28 18:50:44 +09:00
|
|
|
|
2024-02-12 23:31:08 +09:00
|
|
|
it("checks the `startxref` position of a linearized pdf doc (issue 17665)", async function () {
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("empty.pdf"));
|
|
|
|
expect(loadingTask instanceof PDFDocumentLoadingTask).toEqual(true);
|
|
|
|
|
|
|
|
const pdfDocument = await loadingTask.promise;
|
|
|
|
|
|
|
|
const startXRefPos = await pdfDocument.getStartXRefPos();
|
|
|
|
expect(startXRefPos).toEqual(116);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2021-07-28 18:50:44 +09:00
|
|
|
it("checks that `docId`s are unique and increasing", async function () {
|
|
|
|
const loadingTask1 = getDocument(basicApiGetDocumentParams);
|
2021-09-13 20:34:28 +09:00
|
|
|
expect(loadingTask1 instanceof PDFDocumentLoadingTask).toEqual(true);
|
2021-07-28 18:50:44 +09:00
|
|
|
await loadingTask1.promise;
|
|
|
|
const docId1 = loadingTask1.docId;
|
|
|
|
|
|
|
|
const loadingTask2 = getDocument(basicApiGetDocumentParams);
|
2021-09-13 20:34:28 +09:00
|
|
|
expect(loadingTask2 instanceof PDFDocumentLoadingTask).toEqual(true);
|
2021-07-28 18:50:44 +09:00
|
|
|
await loadingTask2.promise;
|
|
|
|
const docId2 = loadingTask2.docId;
|
|
|
|
|
|
|
|
expect(docId1).not.toEqual(docId2);
|
|
|
|
|
2021-08-01 22:18:25 +09:00
|
|
|
const docIdRegExp = /^d(\d+)$/,
|
2021-07-28 18:50:44 +09:00
|
|
|
docNum1 = docIdRegExp.exec(docId1)?.[1],
|
|
|
|
docNum2 = docIdRegExp.exec(docId2)?.[1];
|
|
|
|
|
|
|
|
expect(+docNum1).toBeLessThan(+docNum2);
|
|
|
|
|
|
|
|
await Promise.all([loadingTask1.destroy(), loadingTask2.destroy()]);
|
|
|
|
});
|
2021-11-25 02:55:28 +09:00
|
|
|
|
[api-minor] Clear all caches in `XRef.indexObjects`, and improve /Root dictionary validation in `XRef.parse` (issue 14303)
*This patch improves handling of a couple of PDF documents from issue 14303.*
- Update `XRef.indexObjects` to actually clear *all* XRef-caches. Invalid XRef tables *usually* cause issues early enough during parsing that we've not populated the XRef-cache, however to prevent any issues we obviously need to clear that one as well.
- Improve the /Root dictionary validation in `XRef.parse` (PR 9827 follow-up). In addition to checking that a /Pages entry exists, we'll now also check that it can be successfully fetched *and* that it's of the correct type. There's really no point trying to use a /Root dictionary that e.g. `Catalog.toplevelPagesDict` will reject, and this way we'll be able to fallback to indexing the objects in corrupt documents.
- Throw an `InvalidPDFException`, rather than a general `FormatError`, in `XRef.parse` when no usable /Root dictionary could be found. That really seems more appropriate overall, since all attempts at parsing/recovery have failed. (This part of the patch is API-observable, hence the tag.)
With these changes, two existing test-cases are improved and the unit-tests are updated/re-factored to highlight that. In particular `GHOSTSCRIPT-698804-1-fuzzed.pdf` will now both load and "render" correctly, whereas `poppler-395-0-fuzzed.pdf` will now fail immediately upon loading (rather than *appearing* to work).
2021-12-03 00:40:31 +09:00
|
|
|
it("creates pdf doc from PDF file with bad XRef entry", async function () {
|
2021-11-25 02:55:28 +09:00
|
|
|
// A corrupt PDF file, where the XRef table have (some) bogus entries.
|
|
|
|
const loadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("PDFBOX-4352-0.pdf", {
|
|
|
|
rangeChunkSize: 100,
|
|
|
|
})
|
|
|
|
);
|
|
|
|
expect(loadingTask instanceof PDFDocumentLoadingTask).toEqual(true);
|
|
|
|
|
|
|
|
const pdfDocument = await loadingTask.promise;
|
|
|
|
expect(pdfDocument.numPages).toEqual(1);
|
|
|
|
|
2021-11-30 06:33:48 +09:00
|
|
|
const page = await pdfDocument.getPage(1);
|
|
|
|
expect(page instanceof PDFPageProxy).toEqual(true);
|
|
|
|
|
|
|
|
const opList = await page.getOperatorList();
|
|
|
|
expect(opList.fnArray.length).toEqual(0);
|
|
|
|
expect(opList.argsArray.length).toEqual(0);
|
|
|
|
expect(opList.lastChunk).toEqual(true);
|
2022-06-27 18:41:37 +09:00
|
|
|
expect(opList.separateAnnots).toEqual(null);
|
2021-11-30 06:33:48 +09:00
|
|
|
|
2021-11-25 02:55:28 +09:00
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
2021-11-25 21:28:24 +09:00
|
|
|
|
[api-minor] Clear all caches in `XRef.indexObjects`, and improve /Root dictionary validation in `XRef.parse` (issue 14303)
*This patch improves handling of a couple of PDF documents from issue 14303.*
- Update `XRef.indexObjects` to actually clear *all* XRef-caches. Invalid XRef tables *usually* cause issues early enough during parsing that we've not populated the XRef-cache, however to prevent any issues we obviously need to clear that one as well.
- Improve the /Root dictionary validation in `XRef.parse` (PR 9827 follow-up). In addition to checking that a /Pages entry exists, we'll now also check that it can be successfully fetched *and* that it's of the correct type. There's really no point trying to use a /Root dictionary that e.g. `Catalog.toplevelPagesDict` will reject, and this way we'll be able to fallback to indexing the objects in corrupt documents.
- Throw an `InvalidPDFException`, rather than a general `FormatError`, in `XRef.parse` when no usable /Root dictionary could be found. That really seems more appropriate overall, since all attempts at parsing/recovery have failed. (This part of the patch is API-observable, hence the tag.)
With these changes, two existing test-cases are improved and the unit-tests are updated/re-factored to highlight that. In particular `GHOSTSCRIPT-698804-1-fuzzed.pdf` will now both load and "render" correctly, whereas `poppler-395-0-fuzzed.pdf` will now fail immediately upon loading (rather than *appearing* to work).
2021-12-03 00:40:31 +09:00
|
|
|
it("creates pdf doc from PDF file with bad XRef header", async function () {
|
|
|
|
const loadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("GHOSTSCRIPT-698804-1-fuzzed.pdf")
|
|
|
|
);
|
|
|
|
expect(loadingTask instanceof PDFDocumentLoadingTask).toEqual(true);
|
|
|
|
|
|
|
|
const pdfDocument = await loadingTask.promise;
|
|
|
|
expect(pdfDocument.numPages).toEqual(1);
|
|
|
|
|
|
|
|
const page = await pdfDocument.getPage(1);
|
|
|
|
expect(page instanceof PDFPageProxy).toEqual(true);
|
|
|
|
|
|
|
|
const opList = await page.getOperatorList();
|
|
|
|
expect(opList.fnArray.length).toEqual(0);
|
|
|
|
expect(opList.argsArray.length).toEqual(0);
|
|
|
|
expect(opList.lastChunk).toEqual(true);
|
2022-06-27 18:41:37 +09:00
|
|
|
expect(opList.separateAnnots).toEqual(null);
|
[api-minor] Clear all caches in `XRef.indexObjects`, and improve /Root dictionary validation in `XRef.parse` (issue 14303)
*This patch improves handling of a couple of PDF documents from issue 14303.*
- Update `XRef.indexObjects` to actually clear *all* XRef-caches. Invalid XRef tables *usually* cause issues early enough during parsing that we've not populated the XRef-cache, however to prevent any issues we obviously need to clear that one as well.
- Improve the /Root dictionary validation in `XRef.parse` (PR 9827 follow-up). In addition to checking that a /Pages entry exists, we'll now also check that it can be successfully fetched *and* that it's of the correct type. There's really no point trying to use a /Root dictionary that e.g. `Catalog.toplevelPagesDict` will reject, and this way we'll be able to fallback to indexing the objects in corrupt documents.
- Throw an `InvalidPDFException`, rather than a general `FormatError`, in `XRef.parse` when no usable /Root dictionary could be found. That really seems more appropriate overall, since all attempts at parsing/recovery have failed. (This part of the patch is API-observable, hence the tag.)
With these changes, two existing test-cases are improved and the unit-tests are updated/re-factored to highlight that. In particular `GHOSTSCRIPT-698804-1-fuzzed.pdf` will now both load and "render" correctly, whereas `poppler-395-0-fuzzed.pdf` will now fail immediately upon loading (rather than *appearing* to work).
2021-12-03 00:40:31 +09:00
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2021-11-25 21:28:24 +09:00
|
|
|
it("creates pdf doc from PDF file with bad XRef byteWidths", async function () {
|
|
|
|
// A corrupt PDF file, where the XRef /W-array have (some) bogus entries.
|
|
|
|
const loadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("REDHAT-1531897-0.pdf")
|
|
|
|
);
|
|
|
|
expect(loadingTask instanceof PDFDocumentLoadingTask).toEqual(true);
|
|
|
|
|
|
|
|
try {
|
|
|
|
await loadingTask.promise;
|
|
|
|
|
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
|
|
|
} catch (reason) {
|
|
|
|
expect(reason instanceof InvalidPDFException).toEqual(true);
|
|
|
|
expect(reason.message).toEqual("Invalid PDF structure.");
|
|
|
|
}
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
|
[api-minor] Clear all caches in `XRef.indexObjects`, and improve /Root dictionary validation in `XRef.parse` (issue 14303)
*This patch improves handling of a couple of PDF documents from issue 14303.*
- Update `XRef.indexObjects` to actually clear *all* XRef-caches. Invalid XRef tables *usually* cause issues early enough during parsing that we've not populated the XRef-cache, however to prevent any issues we obviously need to clear that one as well.
- Improve the /Root dictionary validation in `XRef.parse` (PR 9827 follow-up). In addition to checking that a /Pages entry exists, we'll now also check that it can be successfully fetched *and* that it's of the correct type. There's really no point trying to use a /Root dictionary that e.g. `Catalog.toplevelPagesDict` will reject, and this way we'll be able to fallback to indexing the objects in corrupt documents.
- Throw an `InvalidPDFException`, rather than a general `FormatError`, in `XRef.parse` when no usable /Root dictionary could be found. That really seems more appropriate overall, since all attempts at parsing/recovery have failed. (This part of the patch is API-observable, hence the tag.)
With these changes, two existing test-cases are improved and the unit-tests are updated/re-factored to highlight that. In particular `GHOSTSCRIPT-698804-1-fuzzed.pdf` will now both load and "render" correctly, whereas `poppler-395-0-fuzzed.pdf` will now fail immediately upon loading (rather than *appearing* to work).
2021-12-03 00:40:31 +09:00
|
|
|
it("creates pdf doc from PDF file with inaccessible /Pages tree", async function () {
|
|
|
|
const loadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("poppler-395-0-fuzzed.pdf")
|
|
|
|
);
|
|
|
|
expect(loadingTask instanceof PDFDocumentLoadingTask).toEqual(true);
|
|
|
|
|
|
|
|
try {
|
|
|
|
await loadingTask.promise;
|
|
|
|
|
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
|
|
|
} catch (reason) {
|
|
|
|
expect(reason instanceof InvalidPDFException).toEqual(true);
|
|
|
|
expect(reason.message).toEqual("Invalid Root reference.");
|
|
|
|
}
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
it("creates pdf doc from PDF files, with bad /Pages tree /Count", async function () {
|
|
|
|
const loadingTask1 = getDocument(
|
|
|
|
buildGetDocumentParams("poppler-67295-0.pdf")
|
|
|
|
);
|
|
|
|
const loadingTask2 = getDocument(
|
|
|
|
buildGetDocumentParams("poppler-85140-0.pdf")
|
|
|
|
);
|
2022-10-21 22:49:45 +09:00
|
|
|
const loadingTask3 = getDocument(
|
|
|
|
buildGetDocumentParams("poppler-85140-0.pdf", { stopAtErrors: true })
|
|
|
|
);
|
2021-12-02 09:40:52 +09:00
|
|
|
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
expect(loadingTask1 instanceof PDFDocumentLoadingTask).toEqual(true);
|
|
|
|
expect(loadingTask2 instanceof PDFDocumentLoadingTask).toEqual(true);
|
2022-10-21 22:49:45 +09:00
|
|
|
expect(loadingTask3 instanceof PDFDocumentLoadingTask).toEqual(true);
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
|
|
|
|
const pdfDocument1 = await loadingTask1.promise;
|
|
|
|
const pdfDocument2 = await loadingTask2.promise;
|
2022-10-21 22:49:45 +09:00
|
|
|
const pdfDocument3 = await loadingTask3.promise;
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
|
|
|
|
expect(pdfDocument1.numPages).toEqual(1);
|
|
|
|
expect(pdfDocument2.numPages).toEqual(1);
|
2022-10-21 22:49:45 +09:00
|
|
|
expect(pdfDocument3.numPages).toEqual(1);
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
|
2022-10-21 22:49:45 +09:00
|
|
|
const pageA = await pdfDocument1.getPage(1);
|
|
|
|
expect(pageA instanceof PDFPageProxy).toEqual(true);
|
[api-minor] Clear all caches in `XRef.indexObjects`, and improve /Root dictionary validation in `XRef.parse` (issue 14303)
*This patch improves handling of a couple of PDF documents from issue 14303.*
- Update `XRef.indexObjects` to actually clear *all* XRef-caches. Invalid XRef tables *usually* cause issues early enough during parsing that we've not populated the XRef-cache, however to prevent any issues we obviously need to clear that one as well.
- Improve the /Root dictionary validation in `XRef.parse` (PR 9827 follow-up). In addition to checking that a /Pages entry exists, we'll now also check that it can be successfully fetched *and* that it's of the correct type. There's really no point trying to use a /Root dictionary that e.g. `Catalog.toplevelPagesDict` will reject, and this way we'll be able to fallback to indexing the objects in corrupt documents.
- Throw an `InvalidPDFException`, rather than a general `FormatError`, in `XRef.parse` when no usable /Root dictionary could be found. That really seems more appropriate overall, since all attempts at parsing/recovery have failed. (This part of the patch is API-observable, hence the tag.)
With these changes, two existing test-cases are improved and the unit-tests are updated/re-factored to highlight that. In particular `GHOSTSCRIPT-698804-1-fuzzed.pdf` will now both load and "render" correctly, whereas `poppler-395-0-fuzzed.pdf` will now fail immediately upon loading (rather than *appearing* to work).
2021-12-03 00:40:31 +09:00
|
|
|
|
2022-10-21 22:49:45 +09:00
|
|
|
const opListA = await pageA.getOperatorList();
|
|
|
|
expect(opListA.fnArray.length).toBeGreaterThan(5);
|
|
|
|
expect(opListA.argsArray.length).toBeGreaterThan(5);
|
|
|
|
expect(opListA.lastChunk).toEqual(true);
|
|
|
|
expect(opListA.separateAnnots).toEqual(null);
|
|
|
|
|
|
|
|
const pageB = await pdfDocument2.getPage(1);
|
|
|
|
expect(pageB instanceof PDFPageProxy).toEqual(true);
|
|
|
|
|
|
|
|
const opListB = await pageB.getOperatorList();
|
|
|
|
expect(opListB.fnArray.length).toBe(0);
|
|
|
|
expect(opListB.argsArray.length).toBe(0);
|
|
|
|
expect(opListB.lastChunk).toEqual(true);
|
|
|
|
expect(opListB.separateAnnots).toEqual(null);
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
|
|
|
|
try {
|
2022-10-21 22:49:45 +09:00
|
|
|
await pdfDocument3.getPage(1);
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
|
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
|
|
|
} catch (reason) {
|
|
|
|
expect(reason instanceof UnknownErrorException).toEqual(true);
|
|
|
|
expect(reason.message).toEqual("Bad (uncompressed) XRef entry: 3R");
|
|
|
|
}
|
|
|
|
|
2022-10-21 22:49:45 +09:00
|
|
|
await Promise.all([
|
|
|
|
loadingTask1.destroy(),
|
|
|
|
loadingTask2.destroy(),
|
|
|
|
loadingTask3.destroy(),
|
|
|
|
]);
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
});
|
Prevent circular references in XRef tables from hanging the worker-thread (issue 14303)
*Please note:* While this patch on its own is sufficient to prevent the worker-thread from hanging, however in combination with PR 14311 these PDF documents will both load *and* render correctly.
Rather than focusing on the particular structure of these PDF documents, it seemed (at least to me) to make sense to try and prevent all circular references when fetching/looking-up data using the XRef table.
To avoid a solution that required tracking the references manually everywhere, the implementation settled on here instead handles that internally in the `XRef.fetch`-method. This should work, since that method *and* the `Parser`/`Lexer`-implementations are completely synchronous.
Note also that the existing `XRef`-caching, used for all data-types *except* Streams, should hopefully help to lessen the performance impact of these changes.
One *potential* problem with these changes could be certain *browser* exceptions, since those are generally not catchable in JavaScript code, however those would most likely "stop" worker-thread parsing anyway (at least I hope so).
Finally, note that I settled on returning dummy-data rather than throwing an exception. This was done to allow parsing, for the rest of the document, to continue such that *one* bad reference doesn't prevent an entire document from loading.
Fixes two of the issues listed in issue 14303, namely the `poppler-91414-0.zip-2.gz-53.pdf` and `poppler-91414-0.zip-2.gz-54.pdf` documents.
2021-11-26 22:11:39 +09:00
|
|
|
|
|
|
|
it("creates pdf doc from PDF files, with circular references", async function () {
|
|
|
|
const loadingTask1 = getDocument(
|
|
|
|
buildGetDocumentParams("poppler-91414-0-53.pdf")
|
|
|
|
);
|
|
|
|
const loadingTask2 = getDocument(
|
|
|
|
buildGetDocumentParams("poppler-91414-0-54.pdf")
|
|
|
|
);
|
|
|
|
expect(loadingTask1 instanceof PDFDocumentLoadingTask).toEqual(true);
|
|
|
|
expect(loadingTask2 instanceof PDFDocumentLoadingTask).toEqual(true);
|
|
|
|
|
|
|
|
const pdfDocument1 = await loadingTask1.promise;
|
|
|
|
const pdfDocument2 = await loadingTask2.promise;
|
|
|
|
|
|
|
|
expect(pdfDocument1.numPages).toEqual(1);
|
|
|
|
expect(pdfDocument2.numPages).toEqual(1);
|
|
|
|
|
|
|
|
const pageA = await pdfDocument1.getPage(1);
|
|
|
|
const pageB = await pdfDocument2.getPage(1);
|
|
|
|
|
|
|
|
expect(pageA instanceof PDFPageProxy).toEqual(true);
|
|
|
|
expect(pageB instanceof PDFPageProxy).toEqual(true);
|
|
|
|
|
|
|
|
for (const opList of [
|
|
|
|
await pageA.getOperatorList(),
|
|
|
|
await pageB.getOperatorList(),
|
|
|
|
]) {
|
|
|
|
expect(opList.fnArray.length).toBeGreaterThan(5);
|
|
|
|
expect(opList.argsArray.length).toBeGreaterThan(5);
|
|
|
|
expect(opList.lastChunk).toEqual(true);
|
2022-06-27 18:41:37 +09:00
|
|
|
expect(opList.separateAnnots).toEqual(null);
|
Prevent circular references in XRef tables from hanging the worker-thread (issue 14303)
*Please note:* While this patch on its own is sufficient to prevent the worker-thread from hanging, however in combination with PR 14311 these PDF documents will both load *and* render correctly.
Rather than focusing on the particular structure of these PDF documents, it seemed (at least to me) to make sense to try and prevent all circular references when fetching/looking-up data using the XRef table.
To avoid a solution that required tracking the references manually everywhere, the implementation settled on here instead handles that internally in the `XRef.fetch`-method. This should work, since that method *and* the `Parser`/`Lexer`-implementations are completely synchronous.
Note also that the existing `XRef`-caching, used for all data-types *except* Streams, should hopefully help to lessen the performance impact of these changes.
One *potential* problem with these changes could be certain *browser* exceptions, since those are generally not catchable in JavaScript code, however those would most likely "stop" worker-thread parsing anyway (at least I hope so).
Finally, note that I settled on returning dummy-data rather than throwing an exception. This was done to allow parsing, for the rest of the document, to continue such that *one* bad reference doesn't prevent an entire document from loading.
Fixes two of the issues listed in issue 14303, namely the `poppler-91414-0.zip-2.gz-53.pdf` and `poppler-91414-0.zip-2.gz-54.pdf` documents.
2021-11-26 22:11:39 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
await Promise.all([loadingTask1.destroy(), loadingTask2.destroy()]);
|
|
|
|
});
|
2021-12-02 03:35:02 +09:00
|
|
|
|
|
|
|
it("creates pdf doc from PDF files, with bad /Pages tree /Kids entries", async function () {
|
|
|
|
const loadingTask1 = getDocument(
|
|
|
|
buildGetDocumentParams("poppler-742-0-fuzzed.pdf")
|
|
|
|
);
|
|
|
|
const loadingTask2 = getDocument(
|
|
|
|
buildGetDocumentParams("poppler-937-0-fuzzed.pdf")
|
|
|
|
);
|
|
|
|
expect(loadingTask1 instanceof PDFDocumentLoadingTask).toEqual(true);
|
|
|
|
expect(loadingTask2 instanceof PDFDocumentLoadingTask).toEqual(true);
|
|
|
|
|
|
|
|
const pdfDocument1 = await loadingTask1.promise;
|
|
|
|
const pdfDocument2 = await loadingTask2.promise;
|
|
|
|
|
|
|
|
expect(pdfDocument1.numPages).toEqual(1);
|
|
|
|
expect(pdfDocument2.numPages).toEqual(1);
|
|
|
|
|
|
|
|
try {
|
|
|
|
await pdfDocument1.getPage(1);
|
|
|
|
|
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
|
|
|
} catch (reason) {
|
|
|
|
expect(reason instanceof UnknownErrorException).toEqual(true);
|
[api-minor] Convert `Catalog.getPageDict` to an asynchronous method
Besides converting `Catalog.getPageDict` to an `async` method, thus simplifying the code, this patch also allows us to pro-actively fix a existing issue.
Note how we're looking up References in such a way that `MissingDataException`s won't cause trouble, however it's *technically possible* that the entries (i.e. /Count, /Kids, and /Type) in a /Pages Dictionary could actually be indirect objects as well. In the existing code this could lead to *some*, or even all, pages failing to load/render as intended.
In practice that doesn't *appear* to happen in real-world PDF documents, but given all the weird things that PDF software do I'd prefer to fix this pro-actively (rather than waiting for a bug report).
With `Catalog.getPageDict` being `async` this is now really simple to address, however I didn't want to introduce a bunch more *unconditional* asynchronicity in this method if it could be avoided (since that could slow things down). Hence we'll *synchronously* lookup the *raw* data in a /Pages Dictionary, and only fallback to asynchronous data lookup when a Reference was encountered.
In addition to the above, this patch also makes the following notable changes:
- Let `Catalog.getPageDict` *consistently* reject with the actual error, regardless of what data we're fetching. Previously we'd "swallow" the actual errors except when looking up Dictionary entries, which is inconsistent and thus seem unfortunate. As can be seen from the updated unit-tests this change is API-observable, hence why the patch is tagged `[api-minor]`.
- Improve the consistency of the Dictionary /Type-checks in both the `Catalog.getPageDict` and `Catalog.getAllPageDicts` methods.
In `Catalog.getPageDict` there's a fallback code-path where we're *incorrectly* checking the /Page Dictionary for a /Contents-entry, which is wrong since a /Page Dictionary doesn't need to have a /Contents-entry in order to be valid.
For consistency the `Catalog.getAllPageDicts` method is also updated to handle errors in the /Type-lookup correctly.
- Reduce the `PagesCountLimit.PAUSE_EAGER_PAGE_INIT` viewer constant, to further improve loading/rendering performance of the *second* page during initialization of very long documents; PR 14359 follow-up.
2021-12-24 21:46:35 +09:00
|
|
|
expect(reason.message).toEqual("Illegal character: 41");
|
2021-12-02 03:35:02 +09:00
|
|
|
}
|
|
|
|
try {
|
|
|
|
await pdfDocument2.getPage(1);
|
|
|
|
|
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
|
|
|
} catch (reason) {
|
|
|
|
expect(reason instanceof UnknownErrorException).toEqual(true);
|
[api-minor] Convert `Catalog.getPageDict` to an asynchronous method
Besides converting `Catalog.getPageDict` to an `async` method, thus simplifying the code, this patch also allows us to pro-actively fix a existing issue.
Note how we're looking up References in such a way that `MissingDataException`s won't cause trouble, however it's *technically possible* that the entries (i.e. /Count, /Kids, and /Type) in a /Pages Dictionary could actually be indirect objects as well. In the existing code this could lead to *some*, or even all, pages failing to load/render as intended.
In practice that doesn't *appear* to happen in real-world PDF documents, but given all the weird things that PDF software do I'd prefer to fix this pro-actively (rather than waiting for a bug report).
With `Catalog.getPageDict` being `async` this is now really simple to address, however I didn't want to introduce a bunch more *unconditional* asynchronicity in this method if it could be avoided (since that could slow things down). Hence we'll *synchronously* lookup the *raw* data in a /Pages Dictionary, and only fallback to asynchronous data lookup when a Reference was encountered.
In addition to the above, this patch also makes the following notable changes:
- Let `Catalog.getPageDict` *consistently* reject with the actual error, regardless of what data we're fetching. Previously we'd "swallow" the actual errors except when looking up Dictionary entries, which is inconsistent and thus seem unfortunate. As can be seen from the updated unit-tests this change is API-observable, hence why the patch is tagged `[api-minor]`.
- Improve the consistency of the Dictionary /Type-checks in both the `Catalog.getPageDict` and `Catalog.getAllPageDicts` methods.
In `Catalog.getPageDict` there's a fallback code-path where we're *incorrectly* checking the /Page Dictionary for a /Contents-entry, which is wrong since a /Page Dictionary doesn't need to have a /Contents-entry in order to be valid.
For consistency the `Catalog.getAllPageDicts` method is also updated to handle errors in the /Type-lookup correctly.
- Reduce the `PagesCountLimit.PAUSE_EAGER_PAGE_INIT` viewer constant, to further improve loading/rendering performance of the *second* page during initialization of very long documents; PR 14359 follow-up.
2021-12-24 21:46:35 +09:00
|
|
|
expect(reason.message).toEqual("End of file inside array.");
|
2021-12-02 03:35:02 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
await Promise.all([loadingTask1.destroy(), loadingTask2.destroy()]);
|
|
|
|
});
|
2022-07-08 19:06:25 +09:00
|
|
|
|
|
|
|
it("creates pdf doc from PDF file with bad /Resources entry", async function () {
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue15150.pdf"));
|
|
|
|
expect(loadingTask instanceof PDFDocumentLoadingTask).toEqual(true);
|
|
|
|
|
|
|
|
const pdfDocument = await loadingTask.promise;
|
|
|
|
expect(pdfDocument.numPages).toEqual(1);
|
|
|
|
|
|
|
|
const page = await pdfDocument.getPage(1);
|
|
|
|
expect(page instanceof PDFPageProxy).toEqual(true);
|
|
|
|
|
|
|
|
const opList = await page.getOperatorList();
|
|
|
|
expect(opList.fnArray).toEqual([
|
|
|
|
OPS.setLineWidth,
|
|
|
|
OPS.setStrokeRGBColor,
|
|
|
|
OPS.constructPath,
|
|
|
|
OPS.closeStroke,
|
|
|
|
]);
|
|
|
|
expect(opList.argsArray).toEqual([
|
|
|
|
[0.5],
|
|
|
|
new Uint8ClampedArray([255, 0, 0]),
|
|
|
|
[
|
|
|
|
[OPS.moveTo, OPS.lineTo],
|
|
|
|
[0, 9.75, 0.5, 9.75],
|
2024-01-25 05:16:58 +09:00
|
|
|
[0, 9.75, 0.5, 9.75],
|
2022-07-08 19:06:25 +09:00
|
|
|
],
|
|
|
|
null,
|
|
|
|
]);
|
|
|
|
expect(opList.lastChunk).toEqual(true);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
2022-10-20 23:24:17 +09:00
|
|
|
|
|
|
|
it("creates pdf doc from PDF file, with incomplete trailer", async function () {
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue15590.pdf"));
|
|
|
|
expect(loadingTask instanceof PDFDocumentLoadingTask).toEqual(true);
|
|
|
|
|
|
|
|
const pdfDocument = await loadingTask.promise;
|
|
|
|
expect(pdfDocument.numPages).toEqual(1);
|
|
|
|
|
|
|
|
const jsActions = await pdfDocument.getJSActions();
|
|
|
|
expect(jsActions).toEqual({
|
|
|
|
OpenAction: ["func=function(){app.alert(1)};func();"],
|
|
|
|
});
|
|
|
|
|
2022-10-21 22:49:45 +09:00
|
|
|
const page = await pdfDocument.getPage(1);
|
|
|
|
expect(page instanceof PDFPageProxy).toEqual(true);
|
2022-10-20 23:24:17 +09:00
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
2012-04-13 09:59:30 +09:00
|
|
|
});
|
2018-02-18 07:51:24 +09:00
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
describe("PDFWorker", function () {
|
2021-04-17 04:48:42 +09:00
|
|
|
it("worker created or destroyed", async function () {
|
2019-11-11 00:42:46 +09:00
|
|
|
if (isNodeJS) {
|
2018-03-18 03:56:39 +09:00
|
|
|
pending("Worker is not supported in Node.js.");
|
|
|
|
}
|
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const worker = new PDFWorker({ name: "test1" });
|
2021-04-17 04:48:42 +09:00
|
|
|
await worker.promise;
|
|
|
|
expect(worker.name).toEqual("test1");
|
|
|
|
expect(!!worker.port).toEqual(true);
|
|
|
|
expect(worker.destroyed).toEqual(false);
|
|
|
|
expect(!!worker._webWorker).toEqual(true);
|
|
|
|
expect(worker.port === worker._webWorker).toEqual(true);
|
|
|
|
|
|
|
|
worker.destroy();
|
|
|
|
expect(!!worker.port).toEqual(false);
|
|
|
|
expect(worker.destroyed).toEqual(true);
|
2015-10-28 02:55:15 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("worker created or destroyed by getDocument", async function () {
|
2019-11-11 00:42:46 +09:00
|
|
|
if (isNodeJS) {
|
2018-03-18 03:56:39 +09:00
|
|
|
pending("Worker is not supported in Node.js.");
|
|
|
|
}
|
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const loadingTask = getDocument(basicApiGetDocumentParams);
|
|
|
|
let worker;
|
2020-04-14 19:28:14 +09:00
|
|
|
loadingTask.promise.then(function () {
|
2015-10-28 02:55:15 +09:00
|
|
|
worker = loadingTask._worker;
|
|
|
|
expect(!!worker).toEqual(true);
|
|
|
|
});
|
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const destroyPromise = loadingTask.promise.then(function () {
|
2015-10-28 02:55:15 +09:00
|
|
|
return loadingTask.destroy();
|
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
await destroyPromise;
|
|
|
|
|
|
|
|
const destroyedWorker = loadingTask._worker;
|
|
|
|
expect(!!destroyedWorker).toEqual(false);
|
|
|
|
expect(worker.destroyed).toEqual(true);
|
2015-10-28 02:55:15 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("worker created and can be used in getDocument", async function () {
|
2019-11-11 00:42:46 +09:00
|
|
|
if (isNodeJS) {
|
2018-03-18 03:56:39 +09:00
|
|
|
pending("Worker is not supported in Node.js.");
|
|
|
|
}
|
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const worker = new PDFWorker({ name: "test1" });
|
|
|
|
const loadingTask = getDocument(
|
2017-04-09 00:09:54 +09:00
|
|
|
buildGetDocumentParams(basicApiFileName, {
|
|
|
|
worker,
|
|
|
|
})
|
|
|
|
);
|
2020-04-14 19:28:14 +09:00
|
|
|
loadingTask.promise.then(function () {
|
2020-10-25 23:40:51 +09:00
|
|
|
const docWorker = loadingTask._worker;
|
2015-10-28 02:55:15 +09:00
|
|
|
expect(!!docWorker).toEqual(false);
|
|
|
|
// checking is the same port is used in the MessageHandler
|
2020-10-25 23:40:51 +09:00
|
|
|
const messageHandlerPort = loadingTask._transport.messageHandler.comObj;
|
2015-10-28 02:55:15 +09:00
|
|
|
expect(messageHandlerPort === worker.port).toEqual(true);
|
|
|
|
});
|
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const destroyPromise = loadingTask.promise.then(function () {
|
2015-10-28 02:55:15 +09:00
|
|
|
return loadingTask.destroy();
|
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
await destroyPromise;
|
|
|
|
|
|
|
|
expect(worker.destroyed).toEqual(false);
|
|
|
|
worker.destroy();
|
2015-10-28 02:55:15 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("creates more than one worker", async function () {
|
2019-11-11 00:42:46 +09:00
|
|
|
if (isNodeJS) {
|
2018-03-18 03:56:39 +09:00
|
|
|
pending("Worker is not supported in Node.js.");
|
|
|
|
}
|
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const worker1 = new PDFWorker({ name: "test1" });
|
|
|
|
const worker2 = new PDFWorker({ name: "test2" });
|
|
|
|
const worker3 = new PDFWorker({ name: "test3" });
|
2021-04-17 04:48:42 +09:00
|
|
|
await Promise.all([worker1.promise, worker2.promise, worker3.promise]);
|
|
|
|
|
|
|
|
expect(
|
|
|
|
worker1.port !== worker2.port &&
|
|
|
|
worker1.port !== worker3.port &&
|
|
|
|
worker2.port !== worker3.port
|
|
|
|
).toEqual(true);
|
|
|
|
worker1.destroy();
|
|
|
|
worker2.destroy();
|
|
|
|
worker3.destroy();
|
2015-10-28 02:55:15 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
it("gets current workerSrc", function () {
|
2019-11-11 00:42:46 +09:00
|
|
|
if (isNodeJS) {
|
2018-03-18 03:56:39 +09:00
|
|
|
pending("Worker is not supported in Node.js.");
|
|
|
|
}
|
|
|
|
|
[api-minor] Remove the closure from the `PDFWorker` class, in the `src/display/api.js` file
This patch removes the only remaining closure in the `src/display/api.js` file, utilizing a similar approach as used in lots of other parts of the code-base, which results in a small decrease in the size of the *build* `pdf.js` file.
Given that `PDFWorker` is exposed through the *public* API, this complicates things somewhat since there's a couple of worker-related properties that really should stay *private*. Initially, while working on PR 13813, I believed that we'd need support for private (static) class fields in order to get rid of this closure, however I've managed to come up with what's hopefully deemed an acceptable work-around here.
Furthermore, some helper functions were simply moved into the `PDFWorker` class as static methods, thus simplifying the overall implementation (e.g. we don't need to manually cache the Promise in the `PDFWorker._setupFakeWorkerGlobal`-method).
Finally, as part of this re-factoring a number of missing JSDoc-comments were added which *together* with the removal of the closure significantly improves the `gulp jsdoc` output for the `PDFWorker` class.
*Please note:* This patch is tagged with `api-minor` since it deprecates `PDFWorker.getWorkerSrc()` in favor of the shorter `PDFWorker.workerSrc`, with the fallback limited to `GENERIC` builds.
2021-08-06 20:11:29 +09:00
|
|
|
const workerSrc = PDFWorker.workerSrc;
|
2018-01-29 23:58:40 +09:00
|
|
|
expect(typeof workerSrc).toEqual("string");
|
2018-02-14 22:49:24 +09:00
|
|
|
expect(workerSrc).toEqual(GlobalWorkerOptions.workerSrc);
|
2018-01-29 23:58:40 +09:00
|
|
|
});
|
2015-10-28 02:55:15 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
2023-08-13 01:27:01 +09:00
|
|
|
describe("GlobalWorkerOptions", function () {
|
|
|
|
let savedGlobalWorkerPort;
|
|
|
|
|
|
|
|
beforeAll(function () {
|
|
|
|
savedGlobalWorkerPort = GlobalWorkerOptions.workerPort;
|
|
|
|
});
|
|
|
|
|
|
|
|
afterAll(function () {
|
|
|
|
GlobalWorkerOptions.workerPort = savedGlobalWorkerPort;
|
|
|
|
});
|
|
|
|
|
|
|
|
it("use global `workerPort` with multiple, sequential, documents", async function () {
|
|
|
|
if (isNodeJS) {
|
|
|
|
pending("Worker is not supported in Node.js.");
|
|
|
|
}
|
|
|
|
|
|
|
|
GlobalWorkerOptions.workerPort = new Worker(
|
[api-major] Output JavaScript modules in the builds (issue 10317)
At this point in time all browsers, and also Node.js, support standard `import`/`export` statements and we can now finally consider outputting modern JavaScript modules in the builds.[1]
In order for this to work we can *only* use proper `import`/`export` statements throughout the main code-base, and (as expected) our Node.js support made this much more complicated since both the official builds and the GitHub Actions-based tests must keep working.[2]
One remaining issue is that the `pdf.scripting.js` file cannot be built as a JavaScript module, since doing so breaks PDF scripting.
Note that my initial goal was to try and split these changes into a couple of commits, however that unfortunately didn't really work since it turned out to be difficult for smaller patches to work correctly and pass (all) tests that way.[3]
This is a classic case of every change requiring a couple of other changes, with each of those changes requiring further changes in turn and the size/scope quickly increasing as a result.
One possible "issue" with these changes is that we'll now only output JavaScript modules in the builds, which could perhaps be a problem with older tools. However it unfortunately seems far too complicated/time-consuming for us to attempt to support both the old and modern module formats, hence the alternative would be to do "nothing" here and just keep our "old" builds.[4]
---
[1] The final blocker was module support in workers in Firefox, which was implemented in Firefox 114; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import#browser_compatibility
[2] It's probably possible to further improve/simplify especially the Node.js-specific code, but it does appear to work as-is.
[3] Having partially "broken" patches, that fail tests, as part of the commit history is *really not* a good idea in general.
[4] Outputting JavaScript modules was first requested almost five years ago, see issue 10317, and nowadays there *should* be much better support for JavaScript modules in various tools.
2023-09-28 20:00:10 +09:00
|
|
|
new URL(WORKER_SRC, window.location),
|
|
|
|
{ type: "module" }
|
2023-08-13 01:27:01 +09:00
|
|
|
);
|
|
|
|
|
|
|
|
const loadingTask1 = getDocument(basicApiGetDocumentParams);
|
|
|
|
const pdfDoc1 = await loadingTask1.promise;
|
|
|
|
expect(pdfDoc1.numPages).toEqual(3);
|
|
|
|
await loadingTask1.destroy();
|
|
|
|
|
2023-08-15 19:13:36 +09:00
|
|
|
const loadingTask2 = getDocument(tracemonkeyGetDocumentParams);
|
2023-08-13 01:27:01 +09:00
|
|
|
const pdfDoc2 = await loadingTask2.promise;
|
|
|
|
expect(pdfDoc2.numPages).toEqual(14);
|
|
|
|
await loadingTask2.destroy();
|
|
|
|
});
|
|
|
|
|
2023-08-15 19:13:36 +09:00
|
|
|
it("use global `workerPort` with multiple, parallel, documents", async function () {
|
|
|
|
if (isNodeJS) {
|
|
|
|
pending("Worker is not supported in Node.js.");
|
|
|
|
}
|
|
|
|
|
|
|
|
GlobalWorkerOptions.workerPort = new Worker(
|
[api-major] Output JavaScript modules in the builds (issue 10317)
At this point in time all browsers, and also Node.js, support standard `import`/`export` statements and we can now finally consider outputting modern JavaScript modules in the builds.[1]
In order for this to work we can *only* use proper `import`/`export` statements throughout the main code-base, and (as expected) our Node.js support made this much more complicated since both the official builds and the GitHub Actions-based tests must keep working.[2]
One remaining issue is that the `pdf.scripting.js` file cannot be built as a JavaScript module, since doing so breaks PDF scripting.
Note that my initial goal was to try and split these changes into a couple of commits, however that unfortunately didn't really work since it turned out to be difficult for smaller patches to work correctly and pass (all) tests that way.[3]
This is a classic case of every change requiring a couple of other changes, with each of those changes requiring further changes in turn and the size/scope quickly increasing as a result.
One possible "issue" with these changes is that we'll now only output JavaScript modules in the builds, which could perhaps be a problem with older tools. However it unfortunately seems far too complicated/time-consuming for us to attempt to support both the old and modern module formats, hence the alternative would be to do "nothing" here and just keep our "old" builds.[4]
---
[1] The final blocker was module support in workers in Firefox, which was implemented in Firefox 114; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import#browser_compatibility
[2] It's probably possible to further improve/simplify especially the Node.js-specific code, but it does appear to work as-is.
[3] Having partially "broken" patches, that fail tests, as part of the commit history is *really not* a good idea in general.
[4] Outputting JavaScript modules was first requested almost five years ago, see issue 10317, and nowadays there *should* be much better support for JavaScript modules in various tools.
2023-09-28 20:00:10 +09:00
|
|
|
new URL(WORKER_SRC, window.location),
|
|
|
|
{ type: "module" }
|
2023-08-15 19:13:36 +09:00
|
|
|
);
|
|
|
|
|
|
|
|
const loadingTask1 = getDocument(basicApiGetDocumentParams);
|
2024-01-21 18:13:12 +09:00
|
|
|
const promise1 = loadingTask1.promise.then(pdfDoc => pdfDoc.numPages);
|
2023-08-15 19:13:36 +09:00
|
|
|
|
|
|
|
const loadingTask2 = getDocument(tracemonkeyGetDocumentParams);
|
2024-01-21 18:13:12 +09:00
|
|
|
const promise2 = loadingTask2.promise.then(pdfDoc => pdfDoc.numPages);
|
2023-08-15 19:13:36 +09:00
|
|
|
|
|
|
|
const [numPages1, numPages2] = await Promise.all([promise1, promise2]);
|
|
|
|
expect(numPages1).toEqual(3);
|
|
|
|
expect(numPages2).toEqual(14);
|
|
|
|
|
|
|
|
await Promise.all([loadingTask1.destroy(), loadingTask2.destroy()]);
|
|
|
|
});
|
|
|
|
|
2023-08-13 01:27:01 +09:00
|
|
|
it(
|
|
|
|
"avoid using the global `workerPort` when destruction has started, " +
|
|
|
|
"but not yet finished (issue 16777)",
|
|
|
|
async function () {
|
|
|
|
if (isNodeJS) {
|
|
|
|
pending("Worker is not supported in Node.js.");
|
|
|
|
}
|
|
|
|
|
|
|
|
GlobalWorkerOptions.workerPort = new Worker(
|
[api-major] Output JavaScript modules in the builds (issue 10317)
At this point in time all browsers, and also Node.js, support standard `import`/`export` statements and we can now finally consider outputting modern JavaScript modules in the builds.[1]
In order for this to work we can *only* use proper `import`/`export` statements throughout the main code-base, and (as expected) our Node.js support made this much more complicated since both the official builds and the GitHub Actions-based tests must keep working.[2]
One remaining issue is that the `pdf.scripting.js` file cannot be built as a JavaScript module, since doing so breaks PDF scripting.
Note that my initial goal was to try and split these changes into a couple of commits, however that unfortunately didn't really work since it turned out to be difficult for smaller patches to work correctly and pass (all) tests that way.[3]
This is a classic case of every change requiring a couple of other changes, with each of those changes requiring further changes in turn and the size/scope quickly increasing as a result.
One possible "issue" with these changes is that we'll now only output JavaScript modules in the builds, which could perhaps be a problem with older tools. However it unfortunately seems far too complicated/time-consuming for us to attempt to support both the old and modern module formats, hence the alternative would be to do "nothing" here and just keep our "old" builds.[4]
---
[1] The final blocker was module support in workers in Firefox, which was implemented in Firefox 114; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import#browser_compatibility
[2] It's probably possible to further improve/simplify especially the Node.js-specific code, but it does appear to work as-is.
[3] Having partially "broken" patches, that fail tests, as part of the commit history is *really not* a good idea in general.
[4] Outputting JavaScript modules was first requested almost five years ago, see issue 10317, and nowadays there *should* be much better support for JavaScript modules in various tools.
2023-09-28 20:00:10 +09:00
|
|
|
new URL(WORKER_SRC, window.location),
|
|
|
|
{ type: "module" }
|
2023-08-13 01:27:01 +09:00
|
|
|
);
|
|
|
|
|
|
|
|
const loadingTask = getDocument(basicApiGetDocumentParams);
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
expect(pdfDoc.numPages).toEqual(3);
|
|
|
|
const destroyPromise = loadingTask.destroy();
|
|
|
|
|
|
|
|
expect(function () {
|
2023-08-15 19:13:36 +09:00
|
|
|
getDocument(tracemonkeyGetDocumentParams);
|
2023-08-13 01:27:01 +09:00
|
|
|
}).toThrow(
|
|
|
|
new Error(
|
|
|
|
"PDFWorker.fromPort - the worker is being destroyed.\n" +
|
|
|
|
"Please remember to await `PDFDocumentLoadingTask.destroy()`-calls."
|
|
|
|
)
|
|
|
|
);
|
|
|
|
|
|
|
|
await destroyPromise;
|
|
|
|
}
|
|
|
|
);
|
|
|
|
});
|
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
describe("PDFDocument", function () {
|
2020-03-24 18:44:17 +09:00
|
|
|
let pdfLoadingTask, pdfDocument;
|
2016-01-21 07:57:17 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
beforeAll(async function () {
|
2020-03-24 18:44:17 +09:00
|
|
|
pdfLoadingTask = getDocument(basicApiGetDocumentParams);
|
2021-04-17 04:48:42 +09:00
|
|
|
pdfDocument = await pdfLoadingTask.promise;
|
2016-01-21 07:57:17 +09:00
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
afterAll(async function () {
|
|
|
|
await pdfLoadingTask.destroy();
|
2012-04-13 09:59:30 +09:00
|
|
|
});
|
2016-01-21 07:57:17 +09:00
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
it("gets number of pages", function () {
|
2020-03-24 18:44:17 +09:00
|
|
|
expect(pdfDocument.numPages).toEqual(3);
|
2012-04-13 09:59:30 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
2021-07-02 23:36:27 +09:00
|
|
|
it("gets fingerprints", function () {
|
|
|
|
expect(pdfDocument.fingerprints).toEqual([
|
|
|
|
"ea8b35919d6279a369e835bde778611b",
|
|
|
|
null,
|
|
|
|
]);
|
|
|
|
});
|
|
|
|
|
|
|
|
it("gets fingerprints, from modified document", async function () {
|
|
|
|
const loadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("annotation-tx.pdf")
|
2020-03-24 18:44:17 +09:00
|
|
|
);
|
2021-07-02 23:36:27 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
|
|
|
|
expect(pdfDoc.fingerprints).toEqual([
|
|
|
|
"3ebd77c320274649a68f10dbf3b9f882",
|
|
|
|
"e7087346aa4b4ae0911c1f1643b57345",
|
|
|
|
]);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
2012-04-13 09:59:30 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("gets page", async function () {
|
|
|
|
const data = await pdfDocument.getPage(1);
|
|
|
|
expect(data instanceof PDFPageProxy).toEqual(true);
|
|
|
|
expect(data.pageNumber).toEqual(1);
|
2015-10-04 21:28:24 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("gets non-existent page", async function () {
|
2022-02-24 19:20:27 +09:00
|
|
|
const pageNumbers = [
|
|
|
|
/* outOfRange = */ 100,
|
|
|
|
/* nonInteger = */ 2.5,
|
|
|
|
/* nonNumber = */ "1",
|
|
|
|
];
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2022-02-24 19:20:27 +09:00
|
|
|
for (const pageNumber of pageNumbers) {
|
|
|
|
try {
|
|
|
|
await pdfDocument.getPage(pageNumber);
|
|
|
|
|
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
|
|
|
} catch (reason) {
|
2016-09-05 21:43:16 +09:00
|
|
|
expect(reason instanceof Error).toEqual(true);
|
2022-02-24 19:20:27 +09:00
|
|
|
expect(reason.message).toEqual("Invalid page request.");
|
2016-09-05 21:43:16 +09:00
|
|
|
}
|
2022-02-24 19:20:27 +09:00
|
|
|
}
|
2012-04-13 09:59:30 +09:00
|
|
|
});
|
2020-02-09 01:43:53 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets page, from /Pages tree with circular reference", async function () {
|
2020-02-09 01:43:53 +09:00
|
|
|
const loadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("Pages-tree-refs.pdf")
|
|
|
|
);
|
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
const page1 = loadingTask.promise.then(function (pdfDoc) {
|
2020-02-09 01:43:53 +09:00
|
|
|
return pdfDoc.getPage(1).then(
|
2020-04-14 19:28:14 +09:00
|
|
|
function (pdfPage) {
|
2020-02-09 01:43:53 +09:00
|
|
|
expect(pdfPage instanceof PDFPageProxy).toEqual(true);
|
|
|
|
expect(pdfPage.ref).toEqual({ num: 6, gen: 0 });
|
|
|
|
},
|
2020-04-14 19:28:14 +09:00
|
|
|
function (reason) {
|
2020-02-09 01:43:53 +09:00
|
|
|
throw new Error("shall not fail for valid page");
|
|
|
|
}
|
|
|
|
);
|
|
|
|
});
|
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
const page2 = loadingTask.promise.then(function (pdfDoc) {
|
2020-02-09 01:43:53 +09:00
|
|
|
return pdfDoc.getPage(2).then(
|
2020-04-14 19:28:14 +09:00
|
|
|
function (pdfPage) {
|
2020-02-09 01:43:53 +09:00
|
|
|
throw new Error("shall fail for invalid page");
|
|
|
|
},
|
2020-04-14 19:28:14 +09:00
|
|
|
function (reason) {
|
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
|
|
|
expect(reason instanceof UnknownErrorException).toEqual(true);
|
2020-02-09 01:43:53 +09:00
|
|
|
expect(reason.message).toEqual(
|
|
|
|
"Pages tree contains circular reference."
|
|
|
|
);
|
|
|
|
}
|
|
|
|
);
|
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await Promise.all([page1, page2]);
|
|
|
|
await loadingTask.destroy();
|
2020-02-09 01:43:53 +09:00
|
|
|
});
|
|
|
|
|
2021-11-21 02:24:12 +09:00
|
|
|
it("gets page multiple time, with working caches", async function () {
|
|
|
|
const promiseA = pdfDocument.getPage(1);
|
|
|
|
const promiseB = pdfDocument.getPage(1);
|
|
|
|
|
|
|
|
expect(promiseA instanceof Promise).toEqual(true);
|
|
|
|
expect(promiseA).toBe(promiseB);
|
|
|
|
|
|
|
|
const pageA = await promiseA;
|
|
|
|
const pageB = await promiseB;
|
|
|
|
|
|
|
|
expect(pageA instanceof PDFPageProxy).toEqual(true);
|
|
|
|
expect(pageA).toBe(pageB);
|
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets page index", async function () {
|
|
|
|
const ref = { num: 17, gen: 0 }; // Reference to second page.
|
|
|
|
const pageIndex = await pdfDocument.getPageIndex(ref);
|
|
|
|
expect(pageIndex).toEqual(1);
|
2014-03-19 18:17:58 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("gets invalid page index", async function () {
|
2022-02-24 20:01:51 +09:00
|
|
|
const pageRefs = [
|
|
|
|
/* fontRef = */ { num: 3, gen: 0 },
|
|
|
|
/* invalidRef = */ { num: -1, gen: 0 },
|
|
|
|
/* nonRef = */ "qwerty",
|
|
|
|
/* nullRef = */ null,
|
|
|
|
];
|
2021-04-17 04:48:42 +09:00
|
|
|
|
2022-02-24 20:01:51 +09:00
|
|
|
const expectedErrors = [
|
|
|
|
{
|
|
|
|
exception: UnknownErrorException,
|
|
|
|
message: "The reference does not point to a /Page dictionary.",
|
|
|
|
},
|
|
|
|
{ exception: Error, message: "Invalid pageIndex request." },
|
|
|
|
{ exception: Error, message: "Invalid pageIndex request." },
|
|
|
|
{ exception: Error, message: "Invalid pageIndex request." },
|
|
|
|
];
|
2021-04-17 04:48:42 +09:00
|
|
|
|
2022-02-24 20:01:51 +09:00
|
|
|
for (let i = 0, ii = pageRefs.length; i < ii; i++) {
|
|
|
|
try {
|
|
|
|
await pdfDocument.getPageIndex(pageRefs[i]);
|
|
|
|
|
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
|
|
|
} catch (reason) {
|
|
|
|
const { exception, message } = expectedErrors[i];
|
|
|
|
|
|
|
|
expect(reason instanceof exception).toEqual(true);
|
|
|
|
expect(reason.message).toEqual(message);
|
|
|
|
}
|
2021-04-17 04:48:42 +09:00
|
|
|
}
|
2016-05-16 23:28:25 +09:00
|
|
|
});
|
2016-03-29 22:46:21 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets destinations, from /Dests dictionary", async function () {
|
|
|
|
const destinations = await pdfDocument.getDestinations();
|
|
|
|
expect(destinations).toEqual({
|
|
|
|
chapter1: [{ gen: 0, num: 17 }, { name: "XYZ" }, 0, 841.89, null],
|
|
|
|
});
|
2012-04-13 09:59:30 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("gets a destination, from /Dests dictionary", async function () {
|
|
|
|
const destination = await pdfDocument.getDestination("chapter1");
|
|
|
|
expect(destination).toEqual([
|
|
|
|
{ gen: 0, num: 17 },
|
|
|
|
{ name: "XYZ" },
|
|
|
|
0,
|
|
|
|
841.89,
|
|
|
|
null,
|
|
|
|
]);
|
2014-10-05 22:56:40 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("gets a non-existent destination, from /Dests dictionary", async function () {
|
|
|
|
const destination = await pdfDocument.getDestination(
|
2020-03-24 18:44:17 +09:00
|
|
|
"non-existent-named-destination"
|
|
|
|
);
|
2021-04-17 04:48:42 +09:00
|
|
|
expect(destination).toEqual(null);
|
2015-07-08 04:48:57 +09:00
|
|
|
});
|
2016-03-29 22:46:21 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets destinations, from /Names (NameTree) dictionary", async function () {
|
2020-10-25 23:40:51 +09:00
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue6204.pdf"));
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const destinations = await pdfDoc.getDestinations();
|
|
|
|
expect(destinations).toEqual({
|
|
|
|
"Page.1": [{ num: 1, gen: 0 }, { name: "XYZ" }, 0, 375, null],
|
|
|
|
"Page.2": [{ num: 6, gen: 0 }, { name: "XYZ" }, 0, 375, null],
|
2016-03-29 22:46:21 +09:00
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await loadingTask.destroy();
|
2016-03-29 22:46:21 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("gets a destination, from /Names (NameTree) dictionary", async function () {
|
2020-10-25 23:40:51 +09:00
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue6204.pdf"));
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const destination = await pdfDoc.getDestination("Page.1");
|
|
|
|
expect(destination).toEqual([
|
|
|
|
{ num: 1, gen: 0 },
|
|
|
|
{ name: "XYZ" },
|
|
|
|
0,
|
|
|
|
375,
|
|
|
|
null,
|
|
|
|
]);
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await loadingTask.destroy();
|
2016-03-29 22:46:21 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("gets a non-existent destination, from /Names (NameTree) dictionary", async function () {
|
2020-10-25 23:40:51 +09:00
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue6204.pdf"));
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const destination = await pdfDoc.getDestination(
|
|
|
|
"non-existent-named-destination"
|
|
|
|
);
|
|
|
|
expect(destination).toEqual(null);
|
2016-03-29 22:46:21 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await loadingTask.destroy();
|
2016-03-29 22:46:21 +09:00
|
|
|
});
|
|
|
|
|
Improve handling of named destinations in out-of-order NameTrees (PR 10274 follow-up)
According to the specification, see https://web.archive.org/web/20210404042322if_/https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.2384179, the keys of a NameTree/NumberTree should be ordered.
For corrupt PDF files, which violate this assumption, it's thus possible that trying to lookup a single entry fails.
Previously, in PR 10274, we implemented a fallback that only applies to the "bottom" node of a NameTree/NumberTree, which in general might not actually help for sufficiently corrupt NameTree/NumberTree data.
Instead we remove the current *limited* fallback from `NameOrNumberTree.get`, and defer to the call-site to handle this case explicitly e.g. by using `NameOrNumberTree.getAll` for data where that makes sense. For well-formed documents, these changes should *not* lead to any additional data fetching/parsing.
Finally, as part of these changes, the validation of named destination data is improved in the `Catalog` and a new unit-test is also added.
2021-05-21 20:58:33 +09:00
|
|
|
it("gets a destination, from out-of-order /Names (NameTree) dictionary (issue 10272)", async function () {
|
|
|
|
if (isNodeJS) {
|
|
|
|
pending("Linked test-cases are not supported in Node.js.");
|
|
|
|
}
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue10272.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const destination = await pdfDoc.getDestination("link_1");
|
|
|
|
expect(destination).toEqual([
|
|
|
|
{ num: 17, gen: 0 },
|
|
|
|
{ name: "XYZ" },
|
|
|
|
69,
|
|
|
|
125,
|
|
|
|
0,
|
|
|
|
]);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2022-04-27 17:34:31 +09:00
|
|
|
it("gets a destination, from /Names (NameTree) dictionary with keys using PDFDocEncoding (issue 14847)", async function () {
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue14847.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const destination = await pdfDoc.getDestination("index");
|
|
|
|
expect(destination).toEqual([
|
|
|
|
{ num: 10, gen: 0 },
|
|
|
|
{ name: "XYZ" },
|
|
|
|
85.039,
|
|
|
|
728.504,
|
|
|
|
null,
|
|
|
|
]);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets non-string destination", async function () {
|
2020-03-24 18:44:17 +09:00
|
|
|
let numberPromise = pdfDocument.getDestination(4.3);
|
|
|
|
let booleanPromise = pdfDocument.getDestination(true);
|
|
|
|
let arrayPromise = pdfDocument.getDestination([
|
2018-08-11 23:00:48 +09:00
|
|
|
{ num: 17, gen: 0 },
|
|
|
|
{ name: "XYZ" },
|
|
|
|
0,
|
|
|
|
841.89,
|
|
|
|
null,
|
|
|
|
]);
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2018-08-11 23:00:48 +09:00
|
|
|
numberPromise = numberPromise.then(
|
2020-04-14 19:28:14 +09:00
|
|
|
function () {
|
2018-08-11 23:00:48 +09:00
|
|
|
throw new Error("shall fail for non-string destination.");
|
|
|
|
},
|
2020-04-14 19:28:14 +09:00
|
|
|
function (reason) {
|
2018-08-11 23:00:48 +09:00
|
|
|
expect(reason instanceof Error).toEqual(true);
|
|
|
|
}
|
|
|
|
);
|
|
|
|
booleanPromise = booleanPromise.then(
|
2020-04-14 19:28:14 +09:00
|
|
|
function () {
|
2018-08-11 23:00:48 +09:00
|
|
|
throw new Error("shall fail for non-string destination.");
|
|
|
|
},
|
2020-04-14 19:28:14 +09:00
|
|
|
function (reason) {
|
2018-08-11 23:00:48 +09:00
|
|
|
expect(reason instanceof Error).toEqual(true);
|
|
|
|
}
|
|
|
|
);
|
|
|
|
arrayPromise = arrayPromise.then(
|
2020-04-14 19:28:14 +09:00
|
|
|
function () {
|
2018-08-11 23:00:48 +09:00
|
|
|
throw new Error("shall fail for non-string destination.");
|
|
|
|
},
|
2020-04-14 19:28:14 +09:00
|
|
|
function (reason) {
|
2018-08-11 23:00:48 +09:00
|
|
|
expect(reason instanceof Error).toEqual(true);
|
|
|
|
}
|
|
|
|
);
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await Promise.all([numberPromise, booleanPromise, arrayPromise]);
|
2018-08-11 23:00:48 +09:00
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets non-existent page labels", async function () {
|
|
|
|
const pageLabels = await pdfDocument.getPageLabels();
|
|
|
|
expect(pageLabels).toEqual(null);
|
2015-12-26 05:57:08 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("gets page labels", async function () {
|
2015-12-26 05:57:08 +09:00
|
|
|
// PageLabels with Roman/Arabic numerals.
|
2020-10-25 23:40:51 +09:00
|
|
|
const loadingTask0 = getDocument(buildGetDocumentParams("bug793632.pdf"));
|
|
|
|
const promise0 = loadingTask0.promise.then(function (pdfDoc) {
|
2015-12-26 05:57:08 +09:00
|
|
|
return pdfDoc.getPageLabels();
|
|
|
|
});
|
2016-01-27 07:01:38 +09:00
|
|
|
|
2015-12-26 05:57:08 +09:00
|
|
|
// PageLabels with only a label prefix.
|
2020-10-25 23:40:51 +09:00
|
|
|
const loadingTask1 = getDocument(buildGetDocumentParams("issue1453.pdf"));
|
|
|
|
const promise1 = loadingTask1.promise.then(function (pdfDoc) {
|
2015-12-26 05:57:08 +09:00
|
|
|
return pdfDoc.getPageLabels();
|
|
|
|
});
|
2016-01-27 07:01:38 +09:00
|
|
|
|
2015-12-26 05:57:08 +09:00
|
|
|
// PageLabels identical to standard page numbering.
|
2020-10-25 23:40:51 +09:00
|
|
|
const loadingTask2 = getDocument(buildGetDocumentParams("rotation.pdf"));
|
|
|
|
const promise2 = loadingTask2.promise.then(function (pdfDoc) {
|
2015-12-26 05:57:08 +09:00
|
|
|
return pdfDoc.getPageLabels();
|
|
|
|
});
|
|
|
|
|
2016-11-04 03:48:08 +09:00
|
|
|
// PageLabels with bad "Prefix" entries.
|
2020-10-25 23:40:51 +09:00
|
|
|
const loadingTask3 = getDocument(
|
2017-04-09 00:09:54 +09:00
|
|
|
buildGetDocumentParams("bad-PageLabels.pdf")
|
|
|
|
);
|
2020-10-25 23:40:51 +09:00
|
|
|
const promise3 = loadingTask3.promise.then(function (pdfDoc) {
|
2016-11-04 03:48:08 +09:00
|
|
|
return pdfDoc.getPageLabels();
|
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const pageLabels = await Promise.all([
|
|
|
|
promise0,
|
|
|
|
promise1,
|
|
|
|
promise2,
|
|
|
|
promise3,
|
|
|
|
]);
|
|
|
|
expect(pageLabels[0]).toEqual(["i", "ii", "iii", "1"]);
|
|
|
|
expect(pageLabels[1]).toEqual(["Front Page1"]);
|
|
|
|
expect(pageLabels[2]).toEqual(["1", "2"]);
|
|
|
|
expect(pageLabels[3]).toEqual(["X3"]);
|
|
|
|
|
|
|
|
await Promise.all([
|
|
|
|
loadingTask0.destroy(),
|
|
|
|
loadingTask1.destroy(),
|
|
|
|
loadingTask2.destroy(),
|
|
|
|
loadingTask3.destroy(),
|
|
|
|
]);
|
2015-12-26 05:57:08 +09:00
|
|
|
});
|
2016-07-02 20:13:30 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets default page layout", async function () {
|
2023-08-15 19:13:36 +09:00
|
|
|
const loadingTask = getDocument(tracemonkeyGetDocumentParams);
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pageLayout = await pdfDoc.getPageLayout();
|
|
|
|
expect(pageLayout).toEqual("");
|
2019-04-03 20:48:18 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
2019-04-03 20:48:18 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets non-default page layout", async function () {
|
|
|
|
const pageLayout = await pdfDocument.getPageLayout();
|
|
|
|
expect(pageLayout).toEqual("SinglePage");
|
2019-04-03 20:48:18 +09:00
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets default page mode", async function () {
|
2023-08-15 19:13:36 +09:00
|
|
|
const loadingTask = getDocument(tracemonkeyGetDocumentParams);
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pageMode = await pdfDoc.getPageMode();
|
|
|
|
expect(pageMode).toEqual("UseNone");
|
2017-07-18 20:08:02 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
2017-07-18 20:08:02 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets non-default page mode", async function () {
|
|
|
|
const pageMode = await pdfDocument.getPageMode();
|
|
|
|
expect(pageMode).toEqual("UseOutlines");
|
2017-07-18 20:08:02 +09:00
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets default viewer preferences", async function () {
|
2023-08-15 19:13:36 +09:00
|
|
|
const loadingTask = getDocument(tracemonkeyGetDocumentParams);
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const prefs = await pdfDoc.getViewerPreferences();
|
|
|
|
expect(prefs).toEqual(null);
|
2019-04-14 20:13:59 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
2019-04-14 20:13:59 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets non-default viewer preferences", async function () {
|
|
|
|
const prefs = await pdfDocument.getViewerPreferences();
|
|
|
|
expect(prefs).toEqual({ Direction: "L2R" });
|
2019-04-14 20:13:59 +09:00
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets default open action", async function () {
|
2023-08-15 19:13:36 +09:00
|
|
|
const loadingTask = getDocument(tracemonkeyGetDocumentParams);
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const openAction = await pdfDoc.getOpenAction();
|
|
|
|
expect(openAction).toEqual(null);
|
2018-12-06 04:09:15 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await loadingTask.destroy();
|
2018-12-06 04:09:15 +09:00
|
|
|
});
|
2020-02-28 22:54:07 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets non-default open action (with destination)", async function () {
|
|
|
|
const openAction = await pdfDocument.getOpenAction();
|
|
|
|
expect(openAction.dest).toEqual([
|
|
|
|
{ num: 15, gen: 0 },
|
|
|
|
{ name: "FitH" },
|
|
|
|
null,
|
|
|
|
]);
|
|
|
|
expect(openAction.action).toBeUndefined();
|
2018-12-06 04:09:15 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("gets non-default open action (with Print action)", async function () {
|
2020-02-28 22:54:07 +09:00
|
|
|
// PDF document with "Print" Named action in the OpenAction dictionary.
|
|
|
|
const loadingTask1 = getDocument(
|
|
|
|
buildGetDocumentParams("bug1001080.pdf")
|
|
|
|
);
|
|
|
|
// PDF document with "Print" Named action in the OpenAction dictionary,
|
|
|
|
// but the OpenAction dictionary is missing the `Type` entry.
|
|
|
|
const loadingTask2 = getDocument(
|
|
|
|
buildGetDocumentParams("issue11442_reduced.pdf")
|
|
|
|
);
|
|
|
|
|
|
|
|
const promise1 = loadingTask1.promise
|
2020-04-14 19:28:14 +09:00
|
|
|
.then(function (pdfDoc) {
|
2020-03-24 18:44:17 +09:00
|
|
|
return pdfDoc.getOpenAction();
|
2020-02-28 22:54:07 +09:00
|
|
|
})
|
2020-04-14 19:28:14 +09:00
|
|
|
.then(function (openAction) {
|
2020-02-28 22:54:07 +09:00
|
|
|
expect(openAction.dest).toBeUndefined();
|
|
|
|
expect(openAction.action).toEqual("Print");
|
|
|
|
|
|
|
|
return loadingTask1.destroy();
|
|
|
|
});
|
|
|
|
const promise2 = loadingTask2.promise
|
2020-04-14 19:28:14 +09:00
|
|
|
.then(function (pdfDoc) {
|
2020-03-24 18:44:17 +09:00
|
|
|
return pdfDoc.getOpenAction();
|
2020-02-28 22:54:07 +09:00
|
|
|
})
|
2020-04-14 19:28:14 +09:00
|
|
|
.then(function (openAction) {
|
2020-02-28 22:54:07 +09:00
|
|
|
expect(openAction.dest).toBeUndefined();
|
|
|
|
expect(openAction.action).toEqual("Print");
|
|
|
|
|
|
|
|
return loadingTask2.destroy();
|
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await Promise.all([promise1, promise2]);
|
2020-02-28 22:54:07 +09:00
|
|
|
});
|
2018-12-06 04:09:15 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets non-existent attachments", async function () {
|
|
|
|
const attachments = await pdfDocument.getAttachments();
|
|
|
|
expect(attachments).toEqual(null);
|
2014-05-19 06:35:29 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("gets attachments", async function () {
|
2020-10-25 23:40:51 +09:00
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("attachment.pdf"));
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const attachments = await pdfDoc.getAttachments();
|
2016-07-02 20:13:30 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const attachment = attachments["foo.txt"];
|
|
|
|
expect(attachment.filename).toEqual("foo.txt");
|
|
|
|
expect(attachment.content).toEqual(
|
|
|
|
new Uint8Array([98, 97, 114, 32, 98, 97, 122, 32, 10])
|
|
|
|
);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
2016-07-02 20:13:30 +09:00
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets javascript with printing instructions (JS action)", async function () {
|
2019-12-25 07:42:42 +09:00
|
|
|
// PDF document with "JavaScript" action in the OpenAction dictionary.
|
2020-10-25 23:40:51 +09:00
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue6106.pdf"));
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
2023-08-01 16:02:05 +09:00
|
|
|
const { OpenAction } = await pdfDoc.getJSActions();
|
2021-04-17 04:48:42 +09:00
|
|
|
|
2023-08-01 16:02:05 +09:00
|
|
|
expect(OpenAction).toEqual([
|
2021-04-17 04:48:42 +09:00
|
|
|
"this.print({bUI:true,bSilent:false,bShrinkToFit:true});",
|
|
|
|
]);
|
2023-08-01 16:02:05 +09:00
|
|
|
expect(OpenAction[0]).toMatch(AutoPrintRegExp);
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
await loadingTask.destroy();
|
2015-07-21 01:25:02 +09:00
|
|
|
});
|
2019-12-25 07:42:42 +09:00
|
|
|
|
2021-04-13 19:30:20 +09:00
|
|
|
it("gets hasJSActions, in document without javaScript", async function () {
|
|
|
|
const hasJSActions = await pdfDocument.hasJSActions();
|
|
|
|
|
|
|
|
expect(hasJSActions).toEqual(false);
|
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
2021-04-13 19:30:20 +09:00
|
|
|
it("gets hasJSActions, in document with javaScript", async function () {
|
|
|
|
const loadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("doc_actions.pdf")
|
|
|
|
);
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const hasJSActions = await pdfDoc.hasJSActions();
|
|
|
|
|
|
|
|
expect(hasJSActions).toEqual(true);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets non-existent JSActions", async function () {
|
|
|
|
const jsActions = await pdfDocument.getJSActions();
|
|
|
|
expect(jsActions).toEqual(null);
|
2020-12-08 03:22:14 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("gets JSActions", async function () {
|
2020-12-08 03:22:14 +09:00
|
|
|
// PDF document with "JavaScript" action in the OpenAction dictionary.
|
2020-12-24 02:57:44 +09:00
|
|
|
const loadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("doc_actions.pdf")
|
|
|
|
);
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const docActions = await pdfDoc.getJSActions();
|
|
|
|
const page1 = await pdfDoc.getPage(1);
|
|
|
|
const page1Actions = await page1.getJSActions();
|
|
|
|
const page3 = await pdfDoc.getPage(3);
|
|
|
|
const page3Actions = await page3.getJSActions();
|
|
|
|
|
|
|
|
expect(docActions).toEqual({
|
|
|
|
DidPrint: [`this.getField("Text2").value = "DidPrint";`],
|
|
|
|
DidSave: [`this.getField("Text2").value = "DidSave";`],
|
|
|
|
WillClose: [`this.getField("Text1").value = "WillClose";`],
|
|
|
|
WillPrint: [`this.getField("Text1").value = "WillPrint";`],
|
|
|
|
WillSave: [`this.getField("Text1").value = "WillSave";`],
|
2020-12-08 03:22:14 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
expect(page1Actions).toEqual({
|
|
|
|
PageOpen: [`this.getField("Text1").value = "PageOpen 1";`],
|
|
|
|
PageClose: [`this.getField("Text2").value = "PageClose 1";`],
|
|
|
|
});
|
|
|
|
expect(page3Actions).toEqual({
|
|
|
|
PageOpen: [`this.getField("Text5").value = "PageOpen 3";`],
|
|
|
|
PageClose: [`this.getField("Text6").value = "PageClose 3";`],
|
|
|
|
});
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
2020-12-08 03:22:14 +09:00
|
|
|
});
|
|
|
|
|
2021-12-29 20:22:20 +09:00
|
|
|
it("gets non-existent fieldObjects", async function () {
|
|
|
|
const fieldObjects = await pdfDocument.getFieldObjects();
|
|
|
|
expect(fieldObjects).toEqual(null);
|
|
|
|
});
|
|
|
|
|
|
|
|
it("gets fieldObjects", async function () {
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("js-authors.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const fieldObjects = await pdfDoc.getFieldObjects();
|
|
|
|
|
|
|
|
expect(fieldObjects).toEqual({
|
|
|
|
Text1: [
|
|
|
|
{
|
|
|
|
id: "25R",
|
|
|
|
value: "",
|
2022-05-03 02:28:00 +09:00
|
|
|
defaultValue: "",
|
2021-12-29 20:22:20 +09:00
|
|
|
multiline: false,
|
|
|
|
password: false,
|
2022-08-19 02:27:53 +09:00
|
|
|
charLimit: 0,
|
2021-12-29 20:22:20 +09:00
|
|
|
comb: false,
|
|
|
|
editable: true,
|
|
|
|
hidden: false,
|
|
|
|
name: "Text1",
|
|
|
|
rect: [24.1789, 719.66, 432.22, 741.66],
|
|
|
|
actions: null,
|
|
|
|
page: 0,
|
|
|
|
strokeColor: null,
|
|
|
|
fillColor: null,
|
2022-06-19 23:39:54 +09:00
|
|
|
rotation: 0,
|
2021-12-29 20:22:20 +09:00
|
|
|
type: "text",
|
|
|
|
},
|
|
|
|
],
|
|
|
|
Button1: [
|
|
|
|
{
|
|
|
|
id: "26R",
|
|
|
|
value: "Off",
|
|
|
|
defaultValue: null,
|
|
|
|
exportValues: undefined,
|
|
|
|
editable: true,
|
|
|
|
name: "Button1",
|
|
|
|
rect: [455.436, 719.678, 527.436, 739.678],
|
|
|
|
hidden: false,
|
|
|
|
actions: {
|
|
|
|
Action: [
|
|
|
|
`this.getField("Text1").value = this.info.authors.join("::");`,
|
|
|
|
],
|
|
|
|
},
|
|
|
|
page: 0,
|
|
|
|
strokeColor: null,
|
|
|
|
fillColor: new Uint8ClampedArray([192, 192, 192]),
|
2022-06-19 23:39:54 +09:00
|
|
|
rotation: 0,
|
2021-12-29 20:22:20 +09:00
|
|
|
type: "button",
|
|
|
|
},
|
|
|
|
],
|
|
|
|
});
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2023-08-10 18:59:40 +09:00
|
|
|
it("gets fieldObjects with missing /P-entries", async function () {
|
|
|
|
if (isNodeJS) {
|
|
|
|
pending("Linked test-cases are not supported in Node.js.");
|
|
|
|
}
|
|
|
|
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("bug1847733.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const fieldObjects = await pdfDoc.getFieldObjects();
|
|
|
|
|
|
|
|
for (const name in fieldObjects) {
|
|
|
|
const pageIndexes = fieldObjects[name].map(o => o.page);
|
|
|
|
let expected;
|
|
|
|
|
|
|
|
switch (name) {
|
|
|
|
case "formID":
|
|
|
|
case "pdf_submission_new":
|
|
|
|
case "simple_spc":
|
|
|
|
case "adobeWarning":
|
2023-08-24 00:14:29 +09:00
|
|
|
case "typeA13[0]":
|
|
|
|
case "typeA13[1]":
|
|
|
|
case "typeA13[2]":
|
|
|
|
case "typeA13[3]":
|
2023-08-10 18:59:40 +09:00
|
|
|
expected = [0];
|
|
|
|
break;
|
|
|
|
case "typeA15[0]":
|
|
|
|
case "typeA15[1]":
|
|
|
|
case "typeA15[2]":
|
|
|
|
case "typeA15[3]":
|
|
|
|
expected = [-1, 0, 0, 0, 0];
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
expect(pageIndexes).toEqual(expected);
|
|
|
|
}
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2021-12-29 20:22:20 +09:00
|
|
|
it("gets non-existent calculationOrder", async function () {
|
|
|
|
const calculationOrder = await pdfDocument.getCalculationOrderIds();
|
|
|
|
expect(calculationOrder).toEqual(null);
|
|
|
|
});
|
|
|
|
|
|
|
|
it("gets calculationOrder", async function () {
|
|
|
|
if (isNodeJS) {
|
|
|
|
pending("Linked test-cases are not supported in Node.js.");
|
|
|
|
}
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue13132.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const calculationOrder = await pdfDoc.getCalculationOrderIds();
|
|
|
|
|
|
|
|
expect(calculationOrder).toEqual([
|
|
|
|
"319R",
|
|
|
|
"320R",
|
|
|
|
"321R",
|
|
|
|
"322R",
|
|
|
|
"323R",
|
|
|
|
"324R",
|
|
|
|
"325R",
|
|
|
|
"326R",
|
|
|
|
"327R",
|
|
|
|
"328R",
|
|
|
|
"329R",
|
|
|
|
"330R",
|
|
|
|
"331R",
|
|
|
|
"332R",
|
|
|
|
"333R",
|
|
|
|
"334R",
|
|
|
|
"335R",
|
|
|
|
]);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets non-existent outline", async function () {
|
2023-08-15 19:13:36 +09:00
|
|
|
const loadingTask = getDocument(tracemonkeyGetDocumentParams);
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const outline = await pdfDoc.getOutline();
|
|
|
|
expect(outline).toEqual(null);
|
2016-02-14 07:13:01 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
2016-02-14 07:13:01 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets outline", async function () {
|
|
|
|
const outline = await pdfDocument.getOutline();
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
// Two top level entries.
|
|
|
|
expect(Array.isArray(outline)).toEqual(true);
|
|
|
|
expect(outline.length).toEqual(2);
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
// Make sure some basic attributes are set.
|
|
|
|
const outlineItem = outline[1];
|
|
|
|
expect(outlineItem.title).toEqual("Chapter 1");
|
|
|
|
expect(Array.isArray(outlineItem.dest)).toEqual(true);
|
|
|
|
expect(outlineItem.url).toEqual(null);
|
|
|
|
expect(outlineItem.unsafeUrl).toBeUndefined();
|
|
|
|
expect(outlineItem.newWindow).toBeUndefined();
|
2015-12-22 20:59:23 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
expect(outlineItem.bold).toEqual(true);
|
|
|
|
expect(outlineItem.italic).toEqual(false);
|
|
|
|
expect(outlineItem.color).toEqual(new Uint8ClampedArray([0, 64, 128]));
|
|
|
|
|
|
|
|
expect(outlineItem.items.length).toEqual(1);
|
|
|
|
expect(outlineItem.items[0].title).toEqual("Paragraph 1.1");
|
2012-04-13 09:59:30 +09:00
|
|
|
});
|
2018-08-27 04:37:05 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets outline containing a URL", async function () {
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue3214.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const outline = await pdfDoc.getOutline();
|
|
|
|
expect(Array.isArray(outline)).toEqual(true);
|
|
|
|
expect(outline.length).toEqual(5);
|
|
|
|
|
|
|
|
const outlineItemTwo = outline[2];
|
|
|
|
expect(typeof outlineItemTwo.title).toEqual("string");
|
|
|
|
expect(outlineItemTwo.dest).toEqual(null);
|
|
|
|
expect(outlineItemTwo.url).toEqual("http://google.com/");
|
|
|
|
expect(outlineItemTwo.unsafeUrl).toEqual("http://google.com");
|
|
|
|
expect(outlineItemTwo.newWindow).toBeUndefined();
|
|
|
|
|
|
|
|
const outlineItemOne = outline[1];
|
|
|
|
expect(outlineItemOne.bold).toEqual(false);
|
|
|
|
expect(outlineItemOne.italic).toEqual(true);
|
|
|
|
expect(outlineItemOne.color).toEqual(new Uint8ClampedArray([0, 0, 0]));
|
2018-08-27 04:37:05 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await loadingTask.destroy();
|
2018-08-27 04:37:05 +09:00
|
|
|
});
|
|
|
|
|
2022-05-02 16:46:44 +09:00
|
|
|
it("gets outline, with dest-strings using PDFDocEncoding (issue 14864)", async function () {
|
|
|
|
if (isNodeJS) {
|
|
|
|
pending("Linked test-cases are not supported in Node.js.");
|
|
|
|
}
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue14864.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const outline = await pdfDoc.getOutline();
|
|
|
|
|
|
|
|
expect(Array.isArray(outline)).toEqual(true);
|
|
|
|
expect(outline.length).toEqual(6);
|
|
|
|
|
|
|
|
expect(outline[4]).toEqual({
|
2022-08-31 01:40:27 +09:00
|
|
|
action: null,
|
2022-10-04 00:55:13 +09:00
|
|
|
attachment: undefined,
|
2022-05-02 16:46:44 +09:00
|
|
|
dest: "Händel -- Halle🎆lujah",
|
|
|
|
url: null,
|
|
|
|
unsafeUrl: undefined,
|
|
|
|
newWindow: undefined,
|
2022-09-01 00:50:28 +09:00
|
|
|
setOCGState: undefined,
|
2022-05-02 16:46:44 +09:00
|
|
|
title: "Händel -- Halle🎆lujah",
|
|
|
|
color: new Uint8ClampedArray([0, 0, 0]),
|
|
|
|
count: undefined,
|
|
|
|
bold: false,
|
|
|
|
italic: false,
|
|
|
|
items: [],
|
|
|
|
});
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2022-08-31 01:40:27 +09:00
|
|
|
it("gets outline, with named-actions (issue 15367)", async function () {
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue15367.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const outline = await pdfDoc.getOutline();
|
|
|
|
|
|
|
|
expect(Array.isArray(outline)).toEqual(true);
|
|
|
|
expect(outline.length).toEqual(4);
|
|
|
|
|
|
|
|
expect(outline[1]).toEqual({
|
|
|
|
action: "PrevPage",
|
2022-10-04 00:55:13 +09:00
|
|
|
attachment: undefined,
|
2022-08-31 01:40:27 +09:00
|
|
|
dest: null,
|
|
|
|
url: null,
|
|
|
|
unsafeUrl: undefined,
|
|
|
|
newWindow: undefined,
|
2022-09-01 00:50:28 +09:00
|
|
|
setOCGState: undefined,
|
2022-08-31 01:40:27 +09:00
|
|
|
title: "Previous Page",
|
|
|
|
color: new Uint8ClampedArray([0, 0, 0]),
|
|
|
|
count: undefined,
|
|
|
|
bold: false,
|
|
|
|
italic: false,
|
|
|
|
items: [],
|
|
|
|
});
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2022-09-01 00:50:28 +09:00
|
|
|
it("gets outline, with SetOCGState-actions (issue 15372)", async function () {
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue15372.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const outline = await pdfDoc.getOutline();
|
|
|
|
|
|
|
|
expect(Array.isArray(outline)).toEqual(true);
|
|
|
|
expect(outline.length).toEqual(1);
|
|
|
|
|
|
|
|
expect(outline[0]).toEqual({
|
|
|
|
action: null,
|
2022-10-04 00:55:13 +09:00
|
|
|
attachment: undefined,
|
2022-09-01 00:50:28 +09:00
|
|
|
dest: null,
|
|
|
|
url: null,
|
|
|
|
unsafeUrl: undefined,
|
|
|
|
newWindow: undefined,
|
|
|
|
setOCGState: { state: ["OFF", "ON", "50R"], preserveRB: false },
|
|
|
|
title: "Display Layer",
|
|
|
|
color: new Uint8ClampedArray([0, 0, 0]),
|
|
|
|
count: undefined,
|
|
|
|
bold: false,
|
|
|
|
italic: false,
|
|
|
|
items: [],
|
|
|
|
});
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2021-11-13 03:41:32 +09:00
|
|
|
it("gets outline with non-displayable chars", async function () {
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue14267.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const outline = await pdfDoc.getOutline();
|
|
|
|
expect(Array.isArray(outline)).toEqual(true);
|
|
|
|
expect(outline.length).toEqual(1);
|
|
|
|
|
|
|
|
const outlineItem = outline[0];
|
|
|
|
expect(outlineItem.title).toEqual("hello\x11world");
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await loadingTask.destroy();
|
2018-08-27 04:37:05 +09:00
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets non-existent permissions", async function () {
|
|
|
|
const permissions = await pdfDocument.getPermissions();
|
|
|
|
expect(permissions).toEqual(null);
|
|
|
|
});
|
|
|
|
|
|
|
|
it("gets permissions", async function () {
|
2018-08-27 04:37:05 +09:00
|
|
|
// Editing not allowed.
|
|
|
|
const loadingTask0 = getDocument(
|
|
|
|
buildGetDocumentParams("issue9972-1.pdf")
|
|
|
|
);
|
2020-04-14 19:28:14 +09:00
|
|
|
const promise0 = loadingTask0.promise.then(function (pdfDoc) {
|
2020-03-24 18:44:17 +09:00
|
|
|
return pdfDoc.getPermissions();
|
2018-08-27 04:37:05 +09:00
|
|
|
});
|
|
|
|
|
|
|
|
// Printing not allowed.
|
|
|
|
const loadingTask1 = getDocument(
|
|
|
|
buildGetDocumentParams("issue9972-2.pdf")
|
|
|
|
);
|
2020-04-14 19:28:14 +09:00
|
|
|
const promise1 = loadingTask1.promise.then(function (pdfDoc) {
|
2020-03-24 18:44:17 +09:00
|
|
|
return pdfDoc.getPermissions();
|
2018-08-27 04:37:05 +09:00
|
|
|
});
|
|
|
|
|
|
|
|
// Copying not allowed.
|
|
|
|
const loadingTask2 = getDocument(
|
|
|
|
buildGetDocumentParams("issue9972-3.pdf")
|
|
|
|
);
|
2020-04-14 19:28:14 +09:00
|
|
|
const promise2 = loadingTask2.promise.then(function (pdfDoc) {
|
2020-03-24 18:44:17 +09:00
|
|
|
return pdfDoc.getPermissions();
|
2018-08-27 04:37:05 +09:00
|
|
|
});
|
|
|
|
|
|
|
|
const totalPermissionCount = Object.keys(PermissionFlag).length;
|
2021-04-17 04:48:42 +09:00
|
|
|
const permissions = await Promise.all([promise0, promise1, promise2]);
|
|
|
|
|
|
|
|
expect(permissions[0].length).toEqual(totalPermissionCount - 1);
|
|
|
|
expect(
|
|
|
|
permissions[0].includes(PermissionFlag.MODIFY_CONTENTS)
|
|
|
|
).toBeFalsy();
|
|
|
|
|
|
|
|
expect(permissions[1].length).toEqual(totalPermissionCount - 2);
|
|
|
|
expect(permissions[1].includes(PermissionFlag.PRINT)).toBeFalsy();
|
|
|
|
expect(
|
|
|
|
permissions[1].includes(PermissionFlag.PRINT_HIGH_QUALITY)
|
|
|
|
).toBeFalsy();
|
|
|
|
|
|
|
|
expect(permissions[2].length).toEqual(totalPermissionCount - 1);
|
|
|
|
expect(permissions[2].includes(PermissionFlag.COPY)).toBeFalsy();
|
|
|
|
|
|
|
|
await Promise.all([
|
|
|
|
loadingTask0.destroy(),
|
|
|
|
loadingTask1.destroy(),
|
|
|
|
loadingTask2.destroy(),
|
|
|
|
]);
|
|
|
|
});
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets metadata", async function () {
|
2021-05-16 17:58:34 +09:00
|
|
|
const { info, metadata, contentDispositionFilename, contentLength } =
|
|
|
|
await pdfDocument.getMetadata();
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
expect(info.Title).toEqual("Basic API Test");
|
|
|
|
// Custom, non-standard, information dictionary entries.
|
|
|
|
expect(info.Custom).toEqual(undefined);
|
|
|
|
// The following are PDF.js specific, non-standard, properties.
|
|
|
|
expect(info.PDFFormatVersion).toEqual("1.7");
|
2021-10-11 22:55:16 +09:00
|
|
|
expect(info.Language).toEqual("en");
|
2021-10-25 23:09:26 +09:00
|
|
|
expect(info.EncryptFilterName).toEqual(null);
|
2021-04-17 04:48:42 +09:00
|
|
|
expect(info.IsLinearized).toEqual(false);
|
|
|
|
expect(info.IsAcroFormPresent).toEqual(false);
|
|
|
|
expect(info.IsXFAPresent).toEqual(false);
|
|
|
|
expect(info.IsCollectionPresent).toEqual(false);
|
|
|
|
expect(info.IsSignaturesPresent).toEqual(false);
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
expect(metadata instanceof Metadata).toEqual(true);
|
|
|
|
expect(metadata.get("dc:title")).toEqual("Basic API Test");
|
|
|
|
|
|
|
|
expect(contentDispositionFilename).toEqual(null);
|
|
|
|
expect(contentLength).toEqual(basicApiFileLength);
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("gets metadata, with custom info dict entries", async function () {
|
2023-08-15 19:13:36 +09:00
|
|
|
const loadingTask = getDocument(tracemonkeyGetDocumentParams);
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
2021-05-16 17:58:34 +09:00
|
|
|
const { info, metadata, contentDispositionFilename, contentLength } =
|
|
|
|
await pdfDoc.getMetadata();
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
expect(info.Creator).toEqual("TeX");
|
|
|
|
expect(info.Producer).toEqual("pdfeTeX-1.21a");
|
|
|
|
expect(info.CreationDate).toEqual("D:20090401163925-07'00'");
|
|
|
|
// Custom, non-standard, information dictionary entries.
|
|
|
|
const custom = info.Custom;
|
|
|
|
expect(typeof custom === "object" && custom !== null).toEqual(true);
|
|
|
|
|
|
|
|
expect(custom["PTEX.Fullbanner"]).toEqual(
|
|
|
|
"This is pdfeTeX, " +
|
|
|
|
"Version 3.141592-1.21a-2.2 (Web2C 7.5.4) kpathsea version 3.5.6"
|
|
|
|
);
|
|
|
|
// The following are PDF.js specific, non-standard, properties.
|
|
|
|
expect(info.PDFFormatVersion).toEqual("1.4");
|
2021-10-11 22:55:16 +09:00
|
|
|
expect(info.Language).toEqual(null);
|
2021-10-25 23:09:26 +09:00
|
|
|
expect(info.EncryptFilterName).toEqual(null);
|
2021-04-17 04:48:42 +09:00
|
|
|
expect(info.IsLinearized).toEqual(false);
|
|
|
|
expect(info.IsAcroFormPresent).toEqual(false);
|
|
|
|
expect(info.IsXFAPresent).toEqual(false);
|
|
|
|
expect(info.IsCollectionPresent).toEqual(false);
|
|
|
|
expect(info.IsSignaturesPresent).toEqual(false);
|
|
|
|
|
|
|
|
expect(metadata).toEqual(null);
|
|
|
|
expect(contentDispositionFilename).toEqual(null);
|
|
|
|
expect(contentLength).toEqual(1016315);
|
2018-08-27 04:37:05 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await loadingTask.destroy();
|
2018-08-27 04:37:05 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("gets metadata, with missing PDF header (bug 1606566)", async function () {
|
2020-02-05 21:59:47 +09:00
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("bug1606566.pdf"));
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
2021-05-16 17:58:34 +09:00
|
|
|
const { info, metadata, contentDispositionFilename, contentLength } =
|
|
|
|
await pdfDoc.getMetadata();
|
2021-04-17 04:48:42 +09:00
|
|
|
|
2021-12-04 17:35:40 +09:00
|
|
|
// Custom, non-standard, information dictionary entries.
|
|
|
|
expect(info.Custom).toEqual(undefined);
|
2021-04-17 04:48:42 +09:00
|
|
|
// The following are PDF.js specific, non-standard, properties.
|
|
|
|
expect(info.PDFFormatVersion).toEqual(null);
|
2021-10-11 22:55:16 +09:00
|
|
|
expect(info.Language).toEqual(null);
|
2021-10-25 23:09:26 +09:00
|
|
|
expect(info.EncryptFilterName).toEqual(null);
|
2021-04-17 04:48:42 +09:00
|
|
|
expect(info.IsLinearized).toEqual(false);
|
|
|
|
expect(info.IsAcroFormPresent).toEqual(false);
|
|
|
|
expect(info.IsXFAPresent).toEqual(false);
|
|
|
|
expect(info.IsCollectionPresent).toEqual(false);
|
|
|
|
expect(info.IsSignaturesPresent).toEqual(false);
|
|
|
|
|
|
|
|
expect(metadata).toEqual(null);
|
|
|
|
expect(contentDispositionFilename).toEqual(null);
|
|
|
|
expect(contentLength).toEqual(624);
|
2020-02-05 21:59:47 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await loadingTask.destroy();
|
2018-08-27 04:37:05 +09:00
|
|
|
});
|
|
|
|
|
2021-12-04 17:35:40 +09:00
|
|
|
it("gets metadata, with corrupt /Metadata XRef entry", async function () {
|
|
|
|
const loadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("PDFBOX-3148-2-fuzzed.pdf")
|
|
|
|
);
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const { info, metadata, contentDispositionFilename, contentLength } =
|
|
|
|
await pdfDoc.getMetadata();
|
|
|
|
|
|
|
|
// Custom, non-standard, information dictionary entries.
|
|
|
|
expect(info.Custom).toEqual(undefined);
|
|
|
|
// The following are PDF.js specific, non-standard, properties.
|
|
|
|
expect(info.PDFFormatVersion).toEqual("1.6");
|
|
|
|
expect(info.Language).toEqual(null);
|
|
|
|
expect(info.EncryptFilterName).toEqual(null);
|
|
|
|
expect(info.IsLinearized).toEqual(false);
|
|
|
|
expect(info.IsAcroFormPresent).toEqual(true);
|
|
|
|
expect(info.IsXFAPresent).toEqual(false);
|
|
|
|
expect(info.IsCollectionPresent).toEqual(false);
|
|
|
|
expect(info.IsSignaturesPresent).toEqual(false);
|
|
|
|
|
|
|
|
expect(metadata).toEqual(null);
|
|
|
|
expect(contentDispositionFilename).toEqual(null);
|
|
|
|
expect(contentLength).toEqual(244351);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets markInfo", async function () {
|
2020-10-24 08:30:36 +09:00
|
|
|
const loadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("annotation-line.pdf")
|
|
|
|
);
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const markInfo = await pdfDoc.getMarkInfo();
|
|
|
|
expect(markInfo.Marked).toEqual(true);
|
|
|
|
expect(markInfo.UserProperties).toEqual(false);
|
|
|
|
expect(markInfo.Suspects).toEqual(false);
|
|
|
|
});
|
2020-10-24 08:30:36 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets data", async function () {
|
|
|
|
const data = await pdfDocument.getData();
|
|
|
|
expect(data instanceof Uint8Array).toEqual(true);
|
|
|
|
expect(data.length).toEqual(basicApiFileLength);
|
2020-10-24 08:30:36 +09:00
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets download info", async function () {
|
|
|
|
const downloadInfo = await pdfDocument.getDownloadInfo();
|
|
|
|
expect(downloadInfo).toEqual({ length: basicApiFileLength });
|
2014-05-14 18:57:48 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
2021-04-02 19:26:46 +09:00
|
|
|
it("cleans up document resources", async function () {
|
|
|
|
await pdfDocument.cleanup();
|
|
|
|
|
|
|
|
expect(true).toEqual(true);
|
[api-minor] Change `PDFDocumentProxy.cleanup`/`PDFPageProxy.cleanup` to return data
This patch makes the following changes, to improve these API methods:
- Let `PDFPageProxy.cleanup` return a boolean indicating if clean-up actually happened, since ongoing rendering will block clean-up.
Besides being used in other parts of this patch, it seems that an API user may also be interested in the return value given that clean-up isn't *guaranteed* to happen.
- Let `PDFDocumentProxy.cleanup` return the promise indicating when clean-up is finished.
- Improve the JSDoc comment for `PDFDocumentProxy.cleanup` to mention that clean-up is triggered on *both* threads (without going into unnecessary specifics regarding what *exactly* said data actually is).
Add a note in the JSDoc comment about not calling this method when rendering is ongoing.
- Change `WorkerTransport.startCleanup` to throw an `Error` if it's called when rendering is ongoing, to prevent rendering from breaking.
Please note that this won't stop *worker-thread* clean-up from happening (since there's no general "something is rendering"-flag), however I'm not sure if that's really a problem; but please don't quote me on that :-)
All of the caches that's being cleared in `Catalog.cleanup`, on the worker-thread, *should* be re-filled automatically even if cleared *during* parsing/rendering, and the only thing that probably happens is that e.g. font data would have to be re-parsed.
On the main-thread, on the other hand, clearing the caches is more-or-less guaranteed to cause rendering errors, since the rendering code in `src/display/canvas.js` isn't able to re-request any image/font data that's suddenly being pulled out from under it.
- Last, but not least, add a couple of basic unit-tests for the clean-up functionality.
2020-02-07 23:48:58 +09:00
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("checks that fingerprints are unique", async function () {
|
2019-07-15 18:19:17 +09:00
|
|
|
const loadingTask1 = getDocument(
|
|
|
|
buildGetDocumentParams("issue4436r.pdf")
|
|
|
|
);
|
|
|
|
const loadingTask2 = getDocument(buildGetDocumentParams("issue4575.pdf"));
|
2015-10-24 21:15:47 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const data = await Promise.all([
|
|
|
|
loadingTask1.promise,
|
|
|
|
loadingTask2.promise,
|
|
|
|
]);
|
2021-07-02 23:36:27 +09:00
|
|
|
const fingerprints1 = data[0].fingerprints;
|
|
|
|
const fingerprints2 = data[1].fingerprints;
|
2015-10-24 21:15:47 +09:00
|
|
|
|
2021-07-02 23:36:27 +09:00
|
|
|
expect(fingerprints1).not.toEqual(fingerprints2);
|
2016-01-21 07:57:17 +09:00
|
|
|
|
2022-04-06 22:34:08 +09:00
|
|
|
expect(fingerprints1).toEqual(["657428c0628e329f9a281fb6d2d092d4", null]);
|
2021-07-02 23:36:27 +09:00
|
|
|
expect(fingerprints2).toEqual(["04c7126b34a46b6d4d6e7a1eff7edcb6", null]);
|
2019-07-15 18:19:17 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await Promise.all([loadingTask1.destroy(), loadingTask2.destroy()]);
|
2015-10-24 21:15:47 +09:00
|
|
|
});
|
2017-08-31 21:08:22 +09:00
|
|
|
|
2021-09-24 01:18:55 +09:00
|
|
|
it("write a value in an annotation, save the pdf and load it", async function () {
|
|
|
|
let loadingTask = getDocument(buildGetDocumentParams("evaljs.pdf"));
|
|
|
|
let pdfDoc = await loadingTask.promise;
|
|
|
|
const value = "Hello World";
|
|
|
|
|
|
|
|
pdfDoc.annotationStorage.setValue("55R", { value });
|
|
|
|
|
|
|
|
const data = await pdfDoc.saveDocument();
|
|
|
|
await loadingTask.destroy();
|
|
|
|
|
|
|
|
loadingTask = getDocument(data);
|
|
|
|
pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
|
|
|
const annotations = await pdfPage.getAnnotations();
|
|
|
|
|
|
|
|
const field = annotations.find(annotation => annotation.id === "55R");
|
|
|
|
expect(!!field).toEqual(true);
|
|
|
|
expect(field.fieldValue).toEqual(value);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2023-02-23 06:08:21 +09:00
|
|
|
it("write a value in an annotation, save the pdf and check the value in xfa datasets (1)", async function () {
|
|
|
|
if (isNodeJS) {
|
|
|
|
pending("Linked test-cases are not supported in Node.js.");
|
|
|
|
}
|
|
|
|
|
|
|
|
let loadingTask = getDocument(buildGetDocumentParams("issue16081.pdf"));
|
|
|
|
let pdfDoc = await loadingTask.promise;
|
|
|
|
const value = "Hello World";
|
|
|
|
|
|
|
|
pdfDoc.annotationStorage.setValue("2055R", { value });
|
2023-09-07 22:52:58 +09:00
|
|
|
pdfDoc.annotationStorage.setValue("2090R", { value });
|
2023-02-23 06:08:21 +09:00
|
|
|
|
|
|
|
const data = await pdfDoc.saveDocument();
|
|
|
|
await loadingTask.destroy();
|
|
|
|
|
|
|
|
loadingTask = getDocument(data);
|
|
|
|
pdfDoc = await loadingTask.promise;
|
|
|
|
const datasets = await pdfDoc.getXFADatasets();
|
|
|
|
|
|
|
|
const surName = getNamedNodeInXML(
|
|
|
|
datasets.node,
|
|
|
|
"xfa:data.PPTC_153.Page1.PersonalInformation.TitleAndNameInformation.PersonalInfo.Surname.#text"
|
|
|
|
);
|
|
|
|
expect(surName.nodeValue).toEqual(value);
|
|
|
|
|
2023-09-07 22:52:58 +09:00
|
|
|
// The path for the date is:
|
|
|
|
// PPTC_153[0].Page1[0].DeclerationAndSignatures[0]
|
|
|
|
// .#subform[2].currentDate[0]
|
|
|
|
// and it contains a class (i.e. #subform[2]) which is irrelevant in the
|
|
|
|
// context of datasets (it's more a template concept).
|
|
|
|
const date = getNamedNodeInXML(
|
|
|
|
datasets.node,
|
|
|
|
"xfa:data.PPTC_153.Page1.DeclerationAndSignatures.currentDate.#text"
|
|
|
|
);
|
|
|
|
expect(date.nodeValue).toEqual(value);
|
|
|
|
|
2023-02-23 06:08:21 +09:00
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
|
|
|
it("write a value in an annotation, save the pdf and check the value in xfa datasets (2)", async function () {
|
|
|
|
if (isNodeJS) {
|
|
|
|
pending("Linked test-cases are not supported in Node.js.");
|
|
|
|
}
|
|
|
|
|
|
|
|
// In this file the path to the fields are wrong but the last path element
|
|
|
|
// is unique so we can guess what the node is.
|
|
|
|
let loadingTask = getDocument(buildGetDocumentParams("f1040_2022.pdf"));
|
|
|
|
let pdfDoc = await loadingTask.promise;
|
|
|
|
|
|
|
|
pdfDoc.annotationStorage.setValue("1573R", { value: "hello" });
|
|
|
|
pdfDoc.annotationStorage.setValue("1577R", { value: "world" });
|
|
|
|
|
|
|
|
const data = await pdfDoc.saveDocument();
|
|
|
|
await loadingTask.destroy();
|
|
|
|
|
|
|
|
loadingTask = getDocument(data);
|
|
|
|
pdfDoc = await loadingTask.promise;
|
|
|
|
const datasets = await pdfDoc.getXFADatasets();
|
|
|
|
|
|
|
|
const firstName = getNamedNodeInXML(
|
|
|
|
datasets.node,
|
|
|
|
"xfa:data.topmostSubform.f1_02.#text"
|
|
|
|
);
|
|
|
|
expect(firstName.nodeValue).toEqual("hello");
|
|
|
|
|
|
|
|
const lastName = getNamedNodeInXML(
|
|
|
|
datasets.node,
|
|
|
|
"xfa:data.topmostSubform.f1_06.#text"
|
|
|
|
);
|
|
|
|
expect(lastName.nodeValue).toEqual("world");
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2023-06-16 03:48:54 +09:00
|
|
|
it("write a new annotation, save the pdf and check that the prev entry in xref stream is correct", async function () {
|
2023-03-22 02:14:43 +09:00
|
|
|
if (isNodeJS) {
|
|
|
|
pending("Linked test-cases are not supported in Node.js.");
|
|
|
|
}
|
|
|
|
|
|
|
|
let loadingTask = getDocument(buildGetDocumentParams("bug1823296.pdf"));
|
|
|
|
let pdfDoc = await loadingTask.promise;
|
|
|
|
pdfDoc.annotationStorage.setValue("pdfjs_internal_editor_0", {
|
|
|
|
annotationType: AnnotationEditorType.FREETEXT,
|
|
|
|
rect: [12, 34, 56, 78],
|
|
|
|
rotation: 0,
|
|
|
|
fontSize: 10,
|
|
|
|
color: [0, 0, 0],
|
|
|
|
value: "Hello PDF.js World!",
|
|
|
|
pageIndex: 0,
|
|
|
|
});
|
|
|
|
|
|
|
|
const data = await pdfDoc.saveDocument();
|
|
|
|
await loadingTask.destroy();
|
|
|
|
|
|
|
|
loadingTask = getDocument(data);
|
|
|
|
pdfDoc = await loadingTask.promise;
|
|
|
|
const xrefPrev = await pdfDoc.getXRefPrevValue();
|
|
|
|
|
|
|
|
expect(xrefPrev).toEqual(143954);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2023-06-16 03:43:57 +09:00
|
|
|
it("edit and write an existing annotation, save the pdf and check that the Annot array doesn't contain dup entries", async function () {
|
|
|
|
let loadingTask = getDocument(buildGetDocumentParams("issue14438.pdf"));
|
|
|
|
let pdfDoc = await loadingTask.promise;
|
|
|
|
pdfDoc.annotationStorage.setValue("pdfjs_internal_editor_0", {
|
|
|
|
annotationType: AnnotationEditorType.FREETEXT,
|
|
|
|
rect: [12, 34, 56, 78],
|
|
|
|
rotation: 0,
|
|
|
|
fontSize: 10,
|
|
|
|
color: [0, 0, 0],
|
|
|
|
value: "Hello PDF.js World!",
|
|
|
|
pageIndex: 0,
|
|
|
|
id: "10R",
|
|
|
|
});
|
|
|
|
pdfDoc.annotationStorage.setValue("pdfjs_internal_editor_1", {
|
|
|
|
annotationType: AnnotationEditorType.FREETEXT,
|
|
|
|
rect: [12, 34, 56, 78],
|
|
|
|
rotation: 0,
|
|
|
|
fontSize: 10,
|
|
|
|
color: [0, 0, 0],
|
|
|
|
value: "Hello PDF.js World!",
|
|
|
|
pageIndex: 0,
|
|
|
|
});
|
|
|
|
|
|
|
|
const data = await pdfDoc.saveDocument();
|
|
|
|
await loadingTask.destroy();
|
|
|
|
|
|
|
|
loadingTask = getDocument(data);
|
|
|
|
pdfDoc = await loadingTask.promise;
|
|
|
|
const annotations = await pdfDoc.getAnnotArray(0);
|
|
|
|
|
|
|
|
expect(annotations).toEqual([
|
|
|
|
"4R",
|
|
|
|
"10R",
|
|
|
|
"17R",
|
|
|
|
"20R",
|
|
|
|
"21R",
|
|
|
|
"22R",
|
|
|
|
"25R",
|
|
|
|
"28R",
|
|
|
|
"29R",
|
|
|
|
"30R",
|
|
|
|
"33R",
|
|
|
|
"36R",
|
|
|
|
"37R",
|
|
|
|
"42R",
|
|
|
|
"43R",
|
|
|
|
"44R",
|
|
|
|
"47R",
|
|
|
|
"50R",
|
|
|
|
"51R",
|
|
|
|
"54R",
|
|
|
|
"55R",
|
|
|
|
"58R",
|
|
|
|
"59R",
|
|
|
|
"62R",
|
|
|
|
"63R",
|
|
|
|
"66R",
|
|
|
|
"69R",
|
|
|
|
"72R",
|
|
|
|
"75R",
|
|
|
|
"78R",
|
|
|
|
"140R",
|
|
|
|
]);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2023-06-16 16:44:18 +09:00
|
|
|
it("write a new annotation, save the pdf and check that the text content is correct", async function () {
|
|
|
|
// This test helps to check that the text stream is correctly compressed
|
|
|
|
// when saving.
|
|
|
|
const manifesto = `
|
|
|
|
The Mozilla Manifesto Addendum
|
|
|
|
Pledge for a Healthy Internet
|
|
|
|
|
|
|
|
The open, global internet is the most powerful communication and collaboration resource we have ever seen.
|
|
|
|
It embodies some of our deepest hopes for human progress.
|
|
|
|
It enables new opportunities for learning, building a sense of shared humanity, and solving the pressing problems
|
|
|
|
facing people everywhere.
|
|
|
|
|
|
|
|
Over the last decade we have seen this promise fulfilled in many ways.
|
|
|
|
We have also seen the power of the internet used to magnify divisiveness,
|
|
|
|
incite violence, promote hatred, and intentionally manipulate fact and reality.
|
|
|
|
We have learned that we should more explicitly set out our aspirations for the human experience of the internet.
|
|
|
|
We do so now.
|
|
|
|
`.repeat(100);
|
2023-08-15 22:12:17 +09:00
|
|
|
expect(manifesto.length).toEqual(80500);
|
|
|
|
|
2023-06-16 16:44:18 +09:00
|
|
|
let loadingTask = getDocument(buildGetDocumentParams("empty.pdf"));
|
|
|
|
let pdfDoc = await loadingTask.promise;
|
2023-08-15 22:12:17 +09:00
|
|
|
// The initial document size (indirectly) affects the length check below.
|
|
|
|
let typedArray = await pdfDoc.getData();
|
|
|
|
expect(typedArray.length).toBeLessThan(5000);
|
|
|
|
|
2023-06-16 16:44:18 +09:00
|
|
|
pdfDoc.annotationStorage.setValue("pdfjs_internal_editor_0", {
|
|
|
|
annotationType: AnnotationEditorType.FREETEXT,
|
|
|
|
rect: [10, 10, 500, 500],
|
|
|
|
rotation: 0,
|
|
|
|
fontSize: 1,
|
|
|
|
color: [0, 0, 0],
|
|
|
|
value: manifesto,
|
|
|
|
pageIndex: 0,
|
|
|
|
});
|
|
|
|
const data = await pdfDoc.saveDocument();
|
|
|
|
await loadingTask.destroy();
|
|
|
|
|
|
|
|
loadingTask = getDocument(data);
|
|
|
|
pdfDoc = await loadingTask.promise;
|
2023-08-15 22:12:17 +09:00
|
|
|
// Ensure that the Annotation text-content was actually compressed.
|
|
|
|
typedArray = await pdfDoc.getData();
|
|
|
|
expect(typedArray.length).toBeLessThan(90000);
|
|
|
|
|
2023-06-16 16:44:18 +09:00
|
|
|
const page = await pdfDoc.getPage(1);
|
|
|
|
const annotations = await page.getAnnotations();
|
|
|
|
|
|
|
|
expect(annotations[0].contentsObj.str).toEqual(manifesto);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2023-06-23 02:48:40 +09:00
|
|
|
it("write a new stamp annotation, save the pdf and check that the same image has the same ref", async function () {
|
|
|
|
if (isNodeJS) {
|
|
|
|
pending("Cannot create a bitmap from Node.js.");
|
|
|
|
}
|
|
|
|
|
|
|
|
const TEST_IMAGES_PATH = "../images/";
|
|
|
|
const filename = "firefox_logo.png";
|
|
|
|
const path = new URL(TEST_IMAGES_PATH + filename, window.location).href;
|
|
|
|
|
|
|
|
const response = await fetch(path);
|
|
|
|
const blob = await response.blob();
|
|
|
|
const bitmap = await createImageBitmap(blob);
|
|
|
|
|
|
|
|
let loadingTask = getDocument(buildGetDocumentParams("empty.pdf"));
|
|
|
|
let pdfDoc = await loadingTask.promise;
|
|
|
|
pdfDoc.annotationStorage.setValue("pdfjs_internal_editor_0", {
|
|
|
|
annotationType: AnnotationEditorType.STAMP,
|
|
|
|
rect: [12, 34, 56, 78],
|
|
|
|
rotation: 0,
|
|
|
|
bitmap,
|
|
|
|
bitmapId: "im1",
|
|
|
|
pageIndex: 0,
|
|
|
|
});
|
|
|
|
pdfDoc.annotationStorage.setValue("pdfjs_internal_editor_1", {
|
|
|
|
annotationType: AnnotationEditorType.STAMP,
|
|
|
|
rect: [112, 134, 156, 178],
|
|
|
|
rotation: 0,
|
|
|
|
bitmapId: "im1",
|
|
|
|
pageIndex: 0,
|
|
|
|
});
|
|
|
|
|
|
|
|
const data = await pdfDoc.saveDocument();
|
|
|
|
await loadingTask.destroy();
|
|
|
|
|
|
|
|
loadingTask = getDocument(data);
|
|
|
|
pdfDoc = await loadingTask.promise;
|
|
|
|
const page = await pdfDoc.getPage(1);
|
|
|
|
const opList = await page.getOperatorList();
|
|
|
|
|
|
|
|
// The pdf contains two stamp annotations with the same image.
|
|
|
|
// The image should be stored only once in the pdf and referenced twice.
|
|
|
|
// So we can verify that the image is referenced twice in the opList.
|
|
|
|
|
|
|
|
for (let i = 0; i < opList.fnArray.length; i++) {
|
|
|
|
if (opList.fnArray[i] === OPS.paintImageXObject) {
|
|
|
|
expect(opList.argsArray[i][0]).toEqual("img_p0_1");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2023-09-28 18:37:35 +09:00
|
|
|
it("write a new stamp annotation in a tagged pdf, save and check the structure tree", async function () {
|
2023-09-12 00:51:22 +09:00
|
|
|
if (isNodeJS) {
|
|
|
|
pending("Cannot create a bitmap from Node.js.");
|
|
|
|
}
|
|
|
|
|
|
|
|
const TEST_IMAGES_PATH = "../images/";
|
|
|
|
const filename = "firefox_logo.png";
|
|
|
|
const path = new URL(TEST_IMAGES_PATH + filename, window.location).href;
|
|
|
|
|
|
|
|
const response = await fetch(path);
|
|
|
|
const blob = await response.blob();
|
|
|
|
const bitmap = await createImageBitmap(blob);
|
|
|
|
|
|
|
|
let loadingTask = getDocument(buildGetDocumentParams("bug1823296.pdf"));
|
|
|
|
let pdfDoc = await loadingTask.promise;
|
|
|
|
pdfDoc.annotationStorage.setValue("pdfjs_internal_editor_0", {
|
|
|
|
annotationType: AnnotationEditorType.STAMP,
|
|
|
|
rect: [128, 400, 148, 420],
|
|
|
|
rotation: 0,
|
|
|
|
bitmap,
|
|
|
|
bitmapId: "im1",
|
|
|
|
pageIndex: 0,
|
|
|
|
structTreeParentId: "p3R_mc12",
|
|
|
|
accessibilityData: {
|
|
|
|
type: "Figure",
|
|
|
|
alt: "Hello World",
|
|
|
|
},
|
|
|
|
});
|
|
|
|
|
|
|
|
const data = await pdfDoc.saveDocument();
|
|
|
|
await loadingTask.destroy();
|
|
|
|
|
|
|
|
loadingTask = getDocument(data);
|
|
|
|
pdfDoc = await loadingTask.promise;
|
|
|
|
const page = await pdfDoc.getPage(1);
|
|
|
|
const tree = await page.getStructTree();
|
|
|
|
const leaf = tree.children[0].children[6].children[1];
|
|
|
|
|
|
|
|
expect(leaf).toEqual({
|
|
|
|
role: "Figure",
|
|
|
|
children: [
|
|
|
|
{
|
|
|
|
type: "annotation",
|
|
|
|
id: "pdfjs_internal_id_477R",
|
|
|
|
},
|
|
|
|
],
|
|
|
|
alt: "Hello World",
|
|
|
|
});
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2023-09-28 18:37:35 +09:00
|
|
|
it("write a new stamp annotation in a tagged pdf, save, repeat and check the structure tree", async function () {
|
|
|
|
if (isNodeJS) {
|
|
|
|
pending("Cannot create a bitmap from Node.js.");
|
|
|
|
}
|
|
|
|
|
|
|
|
const TEST_IMAGES_PATH = "../images/";
|
|
|
|
const filename = "firefox_logo.png";
|
|
|
|
const path = new URL(TEST_IMAGES_PATH + filename, window.location).href;
|
|
|
|
|
|
|
|
const response = await fetch(path);
|
|
|
|
const blob = await response.blob();
|
|
|
|
let loadingTask, pdfDoc;
|
|
|
|
let data = buildGetDocumentParams("empty.pdf");
|
|
|
|
|
|
|
|
for (let i = 1; i <= 2; i++) {
|
|
|
|
const bitmap = await createImageBitmap(blob);
|
|
|
|
loadingTask = getDocument(data);
|
|
|
|
pdfDoc = await loadingTask.promise;
|
|
|
|
pdfDoc.annotationStorage.setValue("pdfjs_internal_editor_0", {
|
|
|
|
annotationType: AnnotationEditorType.STAMP,
|
|
|
|
rect: [10 * i, 10 * i, 20 * i, 20 * i],
|
|
|
|
rotation: 0,
|
|
|
|
bitmap,
|
|
|
|
bitmapId: "im1",
|
|
|
|
pageIndex: 0,
|
|
|
|
structTreeParentId: null,
|
|
|
|
accessibilityData: {
|
|
|
|
type: "Figure",
|
|
|
|
alt: `Hello World ${i}`,
|
|
|
|
},
|
|
|
|
});
|
|
|
|
|
|
|
|
data = await pdfDoc.saveDocument();
|
|
|
|
await loadingTask.destroy();
|
|
|
|
}
|
|
|
|
|
|
|
|
loadingTask = getDocument(data);
|
|
|
|
pdfDoc = await loadingTask.promise;
|
|
|
|
const page = await pdfDoc.getPage(1);
|
|
|
|
const tree = await page.getStructTree();
|
|
|
|
|
|
|
|
expect(tree).toEqual({
|
|
|
|
children: [
|
|
|
|
{
|
|
|
|
role: "Figure",
|
|
|
|
children: [
|
|
|
|
{
|
|
|
|
type: "annotation",
|
|
|
|
id: "pdfjs_internal_id_18R",
|
|
|
|
},
|
|
|
|
],
|
|
|
|
alt: "Hello World 1",
|
|
|
|
},
|
|
|
|
{
|
|
|
|
role: "Figure",
|
|
|
|
children: [
|
|
|
|
{
|
|
|
|
type: "annotation",
|
|
|
|
id: "pdfjs_internal_id_26R",
|
|
|
|
},
|
|
|
|
],
|
|
|
|
alt: "Hello World 2",
|
|
|
|
},
|
|
|
|
],
|
|
|
|
role: "Root",
|
|
|
|
});
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2023-09-12 00:51:22 +09:00
|
|
|
it("write a new stamp annotation in a non-tagged pdf, save and check that the structure tree", async function () {
|
|
|
|
if (isNodeJS) {
|
|
|
|
pending("Cannot create a bitmap from Node.js.");
|
|
|
|
}
|
|
|
|
|
|
|
|
const TEST_IMAGES_PATH = "../images/";
|
|
|
|
const filename = "firefox_logo.png";
|
|
|
|
const path = new URL(TEST_IMAGES_PATH + filename, window.location).href;
|
|
|
|
|
|
|
|
const response = await fetch(path);
|
|
|
|
const blob = await response.blob();
|
|
|
|
const bitmap = await createImageBitmap(blob);
|
|
|
|
|
|
|
|
let loadingTask = getDocument(buildGetDocumentParams("empty.pdf"));
|
|
|
|
let pdfDoc = await loadingTask.promise;
|
|
|
|
pdfDoc.annotationStorage.setValue("pdfjs_internal_editor_0", {
|
|
|
|
annotationType: AnnotationEditorType.STAMP,
|
|
|
|
rect: [128, 400, 148, 420],
|
|
|
|
rotation: 0,
|
|
|
|
bitmap,
|
|
|
|
bitmapId: "im1",
|
|
|
|
pageIndex: 0,
|
|
|
|
structTreeParentId: null,
|
|
|
|
accessibilityData: {
|
|
|
|
type: "Figure",
|
|
|
|
alt: "Hello World",
|
|
|
|
},
|
|
|
|
});
|
|
|
|
|
|
|
|
const data = await pdfDoc.saveDocument();
|
|
|
|
await loadingTask.destroy();
|
|
|
|
|
|
|
|
loadingTask = getDocument(data);
|
|
|
|
pdfDoc = await loadingTask.promise;
|
|
|
|
const page = await pdfDoc.getPage(1);
|
|
|
|
const tree = await page.getStructTree();
|
|
|
|
|
|
|
|
expect(tree).toEqual({
|
|
|
|
children: [
|
2023-09-26 18:02:14 +09:00
|
|
|
{
|
|
|
|
role: "Figure",
|
|
|
|
children: [
|
|
|
|
{
|
|
|
|
type: "annotation",
|
|
|
|
id: "pdfjs_internal_id_18R",
|
|
|
|
},
|
|
|
|
],
|
|
|
|
alt: "Hello World",
|
|
|
|
},
|
|
|
|
],
|
|
|
|
role: "Root",
|
|
|
|
});
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
|
|
|
it("write a text and a stamp annotation but no alt text (bug 1855157)", async function () {
|
|
|
|
if (isNodeJS) {
|
|
|
|
pending("Cannot create a bitmap from Node.js.");
|
|
|
|
}
|
|
|
|
|
|
|
|
const TEST_IMAGES_PATH = "../images/";
|
|
|
|
const filename = "firefox_logo.png";
|
|
|
|
const path = new URL(TEST_IMAGES_PATH + filename, window.location).href;
|
|
|
|
|
|
|
|
const response = await fetch(path);
|
|
|
|
const blob = await response.blob();
|
|
|
|
const bitmap = await createImageBitmap(blob);
|
|
|
|
|
|
|
|
let loadingTask = getDocument(buildGetDocumentParams("empty.pdf"));
|
|
|
|
let pdfDoc = await loadingTask.promise;
|
|
|
|
pdfDoc.annotationStorage.setValue("pdfjs_internal_editor_0", {
|
|
|
|
annotationType: AnnotationEditorType.STAMP,
|
|
|
|
rect: [128, 400, 148, 420],
|
|
|
|
rotation: 0,
|
|
|
|
bitmap,
|
|
|
|
bitmapId: "im1",
|
|
|
|
pageIndex: 0,
|
|
|
|
structTreeParentId: null,
|
|
|
|
accessibilityData: {
|
|
|
|
type: "Figure",
|
|
|
|
alt: "Hello World",
|
|
|
|
},
|
|
|
|
});
|
|
|
|
pdfDoc.annotationStorage.setValue("pdfjs_internal_editor_1", {
|
|
|
|
annotationType: AnnotationEditorType.FREETEXT,
|
|
|
|
color: [0, 0, 0],
|
|
|
|
fontSize: 10,
|
|
|
|
value: "Hello World",
|
|
|
|
pageIndex: 0,
|
|
|
|
rect: [
|
|
|
|
133.2444863336475, 653.5583423367227, 191.03166882427766,
|
|
|
|
673.363146394756,
|
|
|
|
],
|
|
|
|
rotation: 0,
|
|
|
|
structTreeParentId: null,
|
|
|
|
id: null,
|
|
|
|
});
|
|
|
|
|
|
|
|
const data = await pdfDoc.saveDocument();
|
|
|
|
await loadingTask.destroy();
|
|
|
|
|
|
|
|
loadingTask = getDocument(data);
|
|
|
|
pdfDoc = await loadingTask.promise;
|
|
|
|
const page = await pdfDoc.getPage(1);
|
|
|
|
const tree = await page.getStructTree();
|
|
|
|
|
|
|
|
expect(tree).toEqual({
|
|
|
|
children: [
|
2023-09-12 00:51:22 +09:00
|
|
|
{
|
|
|
|
role: "Figure",
|
|
|
|
children: [
|
|
|
|
{
|
|
|
|
type: "annotation",
|
|
|
|
id: "pdfjs_internal_id_18R",
|
|
|
|
},
|
|
|
|
],
|
|
|
|
alt: "Hello World",
|
|
|
|
},
|
|
|
|
],
|
|
|
|
role: "Root",
|
|
|
|
});
|
|
|
|
|
2024-01-15 03:34:06 +09:00
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
|
|
|
it("read content from multiline textfield containing an empty line", async function () {
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue17492.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
|
|
|
const annotations = await pdfPage.getAnnotations();
|
|
|
|
|
|
|
|
const field = annotations.find(annotation => annotation.id === "144R");
|
|
|
|
expect(!!field).toEqual(true);
|
|
|
|
expect(field.fieldValue).toEqual("Several\n\nOther\nJobs");
|
|
|
|
expect(field.textContent).toEqual(["Several", "", "Other", "Jobs"]);
|
|
|
|
|
2023-09-12 00:51:22 +09:00
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
describe("Cross-origin", function () {
|
2020-10-25 23:40:51 +09:00
|
|
|
let loadingTask;
|
2017-08-31 21:08:22 +09:00
|
|
|
function _checkCanLoad(expectSuccess, filename, options) {
|
2019-11-11 00:42:46 +09:00
|
|
|
if (isNodeJS) {
|
2017-08-31 21:08:22 +09:00
|
|
|
pending("Cannot simulate cross-origin requests in Node.js");
|
|
|
|
}
|
2020-10-25 23:40:51 +09:00
|
|
|
const params = buildGetDocumentParams(filename, options);
|
|
|
|
const url = new URL(params.url);
|
2017-08-31 21:08:22 +09:00
|
|
|
if (url.hostname === "localhost") {
|
|
|
|
url.hostname = "127.0.0.1";
|
|
|
|
} else if (params.url.hostname === "127.0.0.1") {
|
|
|
|
url.hostname = "localhost";
|
|
|
|
} else {
|
|
|
|
pending("Can only run cross-origin test on localhost!");
|
|
|
|
}
|
|
|
|
params.url = url.href;
|
|
|
|
loadingTask = getDocument(params);
|
|
|
|
return loadingTask.promise
|
2020-04-14 19:28:14 +09:00
|
|
|
.then(function (pdf) {
|
2017-08-31 21:08:22 +09:00
|
|
|
return pdf.destroy();
|
|
|
|
})
|
|
|
|
.then(
|
2020-04-14 19:28:14 +09:00
|
|
|
function () {
|
2017-08-31 21:08:22 +09:00
|
|
|
expect(expectSuccess).toEqual(true);
|
|
|
|
},
|
2020-04-14 19:28:14 +09:00
|
|
|
function (error) {
|
2017-08-31 21:08:22 +09:00
|
|
|
if (expectSuccess) {
|
|
|
|
// For ease of debugging.
|
|
|
|
expect(error).toEqual("There should not be any error");
|
|
|
|
}
|
|
|
|
expect(expectSuccess).toEqual(false);
|
|
|
|
}
|
|
|
|
);
|
|
|
|
}
|
|
|
|
function testCanLoad(filename, options) {
|
|
|
|
return _checkCanLoad(true, filename, options);
|
|
|
|
}
|
|
|
|
function testCannotLoad(filename, options) {
|
|
|
|
return _checkCanLoad(false, filename, options);
|
|
|
|
}
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
afterEach(async function () {
|
2019-03-11 20:43:44 +09:00
|
|
|
if (loadingTask && !loadingTask.destroyed) {
|
2021-04-17 04:48:42 +09:00
|
|
|
await loadingTask.destroy();
|
2017-08-31 21:08:22 +09:00
|
|
|
}
|
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("server disallows cors", async function () {
|
|
|
|
await testCannotLoad("basicapi.pdf");
|
2017-08-31 21:08:22 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("server allows cors without credentials, default withCredentials", async function () {
|
|
|
|
await testCanLoad("basicapi.pdf?cors=withoutCredentials");
|
2017-08-31 21:08:22 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("server allows cors without credentials, and withCredentials=false", async function () {
|
|
|
|
await testCanLoad("basicapi.pdf?cors=withoutCredentials", {
|
2017-08-31 21:08:22 +09:00
|
|
|
withCredentials: false,
|
2021-04-17 04:48:42 +09:00
|
|
|
});
|
2017-08-31 21:08:22 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("server allows cors without credentials, but withCredentials=true", async function () {
|
|
|
|
await testCannotLoad("basicapi.pdf?cors=withoutCredentials", {
|
2017-08-31 21:08:22 +09:00
|
|
|
withCredentials: true,
|
2021-04-17 04:48:42 +09:00
|
|
|
});
|
2017-08-31 21:08:22 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("server allows cors with credentials, and withCredentials=true", async function () {
|
|
|
|
await testCanLoad("basicapi.pdf?cors=withCredentials", {
|
2017-08-31 21:08:22 +09:00
|
|
|
withCredentials: true,
|
2021-04-17 04:48:42 +09:00
|
|
|
});
|
2017-08-31 21:08:22 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("server allows cors with credentials, and withCredentials=false", async function () {
|
2017-08-31 21:08:22 +09:00
|
|
|
// The server supports even more than we need, so if the previous tests
|
|
|
|
// pass, then this should pass for sure.
|
|
|
|
// The only case where this test fails is when the server does not reply
|
|
|
|
// with the Access-Control-Allow-Origin header.
|
2021-04-17 04:48:42 +09:00
|
|
|
await testCanLoad("basicapi.pdf?cors=withCredentials", {
|
2017-08-31 21:08:22 +09:00
|
|
|
withCredentials: false,
|
2021-04-17 04:48:42 +09:00
|
|
|
});
|
2017-08-31 21:08:22 +09:00
|
|
|
});
|
|
|
|
});
|
2012-04-13 09:59:30 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
describe("Page", function () {
|
2020-03-24 18:44:17 +09:00
|
|
|
let pdfLoadingTask, pdfDocument, page;
|
2016-01-21 07:57:17 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
beforeAll(async function () {
|
2020-03-24 18:44:17 +09:00
|
|
|
pdfLoadingTask = getDocument(basicApiGetDocumentParams);
|
2021-04-17 04:48:42 +09:00
|
|
|
pdfDocument = await pdfLoadingTask.promise;
|
|
|
|
page = await pdfDocument.getPage(1);
|
2012-04-13 09:59:30 +09:00
|
|
|
});
|
2016-01-21 07:57:17 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
afterAll(async function () {
|
|
|
|
await pdfLoadingTask.destroy();
|
2012-04-13 09:59:30 +09:00
|
|
|
});
|
2016-01-21 07:57:17 +09:00
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
it("gets page number", function () {
|
2014-08-12 19:04:00 +09:00
|
|
|
expect(page.pageNumber).toEqual(1);
|
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
it("gets rotate", function () {
|
2014-08-12 19:04:00 +09:00
|
|
|
expect(page.rotate).toEqual(0);
|
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
it("gets ref", function () {
|
Fix inconsistent spacing and trailing commas in objects in `test/` files, so we can enable the `comma-dangle` and `object-curly-spacing` ESLint rules later on
http://eslint.org/docs/rules/comma-dangle
http://eslint.org/docs/rules/object-curly-spacing
Given that we currently have quite inconsistent object formatting, fixing this in *one* big patch probably wouldn't be feasible (since I cannot imagine anyone wanting to review that); hence I've opted to try and do this piecewise instead.
Please note: This patch was created automatically, using the ESLint `--fix` command line option. In a couple of places this caused lines to become too long, and I've fixed those manually; please refer to the interdiff below for the only hand-edits in this patch.
```diff
diff --git a/test/chromium/test-telemetry.js b/test/chromium/test-telemetry.js
index cc412a31..2e5bdfa1 100755
--- a/test/chromium/test-telemetry.js
+++ b/test/chromium/test-telemetry.js
@@ -324,7 +324,7 @@ var tests = [
var window = createExtensionGlobal();
telemetryScript.runInNewContext(window);
window.chrome.runtime.getManifest = function() {
- return { version: '1.0.1', };
+ return { version: '1.0.1', };
};
window.Date.test_now_value += 12 * 36E5;
telemetryScript.runInNewContext(window);
diff --git a/test/unit/api_spec.js b/test/unit/api_spec.js
index 1f00747a..f22988e7 100644
--- a/test/unit/api_spec.js
+++ b/test/unit/api_spec.js
@@ -503,8 +503,9 @@ describe('api', function() {
it('gets destinations, from /Dests dictionary', function(done) {
var promise = doc.getDestinations();
promise.then(function(data) {
- expect(data).toEqual({ chapter1: [{ gen: 0, num: 17, }, { name: 'XYZ', },
- 0, 841.89, null], });
+ expect(data).toEqual({
+ chapter1: [{ gen: 0, num: 17, }, { name: 'XYZ', }, 0, 841.89, null],
+ });
done();
}).catch(function (reason) {
done.fail(reason);
diff --git a/test/unit/function_spec.js b/test/unit/function_spec.js
index 66441212..62127eb9 100644
--- a/test/unit/function_spec.js
+++ b/test/unit/function_spec.js
@@ -492,9 +492,11 @@ describe('function', function() {
it('check compiled mul', function() {
check([0.25, 0.5, 'mul'], [], [0, 1], [{ input: [], output: [0.125], }]);
check([0, 'mul'], [0, 1], [0, 1], [{ input: [0.25], output: [0], }]);
- check([0.5, 'mul'], [0, 1], [0, 1], [{ input: [0.25], output: [0.125], }]);
+ check([0.5, 'mul'], [0, 1], [0, 1],
+ [{ input: [0.25], output: [0.125], }]);
check([1, 'mul'], [0, 1], [0, 1], [{ input: [0.25], output: [0.25], }]);
- check([0, 'exch', 'mul'], [0, 1], [0, 1], [{ input: [0.25], output: [0], }]);
+ check([0, 'exch', 'mul'], [0, 1], [0, 1],
+ [{ input: [0.25], output: [0], }]);
check([0.5, 'exch', 'mul'], [0, 1], [0, 1],
[{ input: [0.25], output: [0.125], }]);
check([1, 'exch', 'mul'], [0, 1], [0, 1],
```
2017-06-02 19:55:01 +09:00
|
|
|
expect(page.ref).toEqual({ num: 15, gen: 0 });
|
2014-08-12 19:04:00 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
it("gets userUnit", function () {
|
2016-11-22 06:39:04 +09:00
|
|
|
expect(page.userUnit).toEqual(1.0);
|
|
|
|
});
|
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".
The patch makes the following notable changes:
- Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
- Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
- Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
- Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
- Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.
---
[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-08 22:54:46 +09:00
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
it("gets view", function () {
|
2014-08-12 19:04:00 +09:00
|
|
|
expect(page.view).toEqual([0, 0, 595.28, 841.89]);
|
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("gets view, with empty/invalid bounding boxes", async function () {
|
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".
The patch makes the following notable changes:
- Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
- Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
- Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
- Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
- Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.
---
[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-08 22:54:46 +09:00
|
|
|
const viewLoadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("boundingBox_invalid.pdf")
|
|
|
|
);
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfDoc = await viewLoadingTask.promise;
|
|
|
|
const numPages = pdfDoc.numPages;
|
|
|
|
expect(numPages).toEqual(3);
|
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".
The patch makes the following notable changes:
- Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
- Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
- Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
- Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
- Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.
---
[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-08 22:54:46 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const viewPromises = [];
|
|
|
|
for (let i = 0; i < numPages; i++) {
|
2024-01-21 23:47:39 +09:00
|
|
|
viewPromises[i] = pdfDoc.getPage(i + 1).then(pdfPage => pdfPage.view);
|
2021-04-17 04:48:42 +09:00
|
|
|
}
|
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".
The patch makes the following notable changes:
- Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
- Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
- Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
- Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
- Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.
---
[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-08 22:54:46 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const [page1, page2, page3] = await Promise.all(viewPromises);
|
|
|
|
expect(page1).toEqual([0, 0, 612, 792]);
|
|
|
|
expect(page2).toEqual([0, 0, 800, 600]);
|
|
|
|
expect(page3).toEqual([0, 0, 600, 800]);
|
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".
The patch makes the following notable changes:
- Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
- Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
- Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
- Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
- Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.
---
[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-08 22:54:46 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await viewLoadingTask.destroy();
|
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".
The patch makes the following notable changes:
- Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
- Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
- Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
- Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
- Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.
---
[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-08 22:54:46 +09:00
|
|
|
});
|
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
it("gets viewport", function () {
|
2020-10-25 23:40:51 +09:00
|
|
|
const viewport = page.getViewport({ scale: 1.5, rotation: 90 });
|
2022-01-13 19:58:45 +09:00
|
|
|
expect(viewport instanceof PageViewport).toEqual(true);
|
|
|
|
|
2014-08-12 19:04:00 +09:00
|
|
|
expect(viewport.viewBox).toEqual(page.view);
|
|
|
|
expect(viewport.scale).toEqual(1.5);
|
|
|
|
expect(viewport.rotation).toEqual(90);
|
|
|
|
expect(viewport.transform).toEqual([0, 1.5, 1.5, 0, 0, 0]);
|
|
|
|
expect(viewport.width).toEqual(1262.835);
|
|
|
|
expect(viewport.height).toEqual(892.92);
|
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
it('gets viewport with "offsetX/offsetY" arguments', function () {
|
2019-10-24 03:35:49 +09:00
|
|
|
const viewport = page.getViewport({
|
|
|
|
scale: 1,
|
|
|
|
rotation: 0,
|
|
|
|
offsetX: 100,
|
|
|
|
offsetY: -100,
|
|
|
|
});
|
2022-01-13 19:58:45 +09:00
|
|
|
expect(viewport instanceof PageViewport).toEqual(true);
|
|
|
|
|
2019-10-24 03:35:49 +09:00
|
|
|
expect(viewport.transform).toEqual([1, 0, 0, -1, 100, 741.89]);
|
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
it('gets viewport respecting "dontFlip" argument', function () {
|
2019-10-24 03:30:25 +09:00
|
|
|
const scale = 1,
|
|
|
|
rotation = 0;
|
2020-01-24 17:48:21 +09:00
|
|
|
const viewport = page.getViewport({ scale, rotation });
|
2022-01-13 19:58:45 +09:00
|
|
|
expect(viewport instanceof PageViewport).toEqual(true);
|
|
|
|
|
2020-01-24 17:48:21 +09:00
|
|
|
const dontFlipViewport = page.getViewport({
|
2018-12-21 19:47:37 +09:00
|
|
|
scale,
|
|
|
|
rotation,
|
|
|
|
dontFlip: true,
|
|
|
|
});
|
2022-01-13 19:58:45 +09:00
|
|
|
expect(dontFlipViewport instanceof PageViewport).toEqual(true);
|
2017-11-04 01:05:53 +09:00
|
|
|
|
|
|
|
expect(dontFlipViewport).not.toEqual(viewport);
|
|
|
|
expect(dontFlipViewport).toEqual(viewport.clone({ dontFlip: true }));
|
|
|
|
|
|
|
|
expect(viewport.transform).toEqual([1, 0, 0, -1, 0, 841.89]);
|
|
|
|
expect(dontFlipViewport.transform).toEqual([1, 0, -0, 1, 0, 0]);
|
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
2020-04-22 22:18:27 +09:00
|
|
|
it("gets viewport with invalid rotation", function () {
|
|
|
|
expect(function () {
|
|
|
|
page.getViewport({ scale: 1, rotation: 45 });
|
|
|
|
}).toThrow(
|
|
|
|
new Error(
|
|
|
|
"PageViewport: Invalid rotation, must be a multiple of 90 degrees."
|
|
|
|
)
|
|
|
|
);
|
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets annotations", async function () {
|
2020-10-25 23:40:51 +09:00
|
|
|
const defaultPromise = page.getAnnotations().then(function (data) {
|
2015-11-22 21:56:52 +09:00
|
|
|
expect(data.length).toEqual(4);
|
|
|
|
});
|
|
|
|
|
[api-minor] Re-factor the *internal* renderingIntent, and change the default `intent` value in the `PDFPageProxy.getAnnotations` method
With the changes made in PR 13746 the *internal* renderingIntent handling became somewhat "messy", since we're now having to do string-matching in various spots in order to handle the "oplist"-intent correctly.
Hence this patch, which implements the idea from PR 13746 to convert the `intent`-strings, used in various API-methods, into an *internal* renderingIntent that's implemented using a bit-field instead. *Please note:* This part of the patch, in itself, does *not* change the public API (but see below).
This patch is tagged `api-minor` for the following reasons:
1. It changes the *default* value for the `intent` parameter, in the `PDFPageProxy.getAnnotations` method, to "display" in order to be consistent across the API.
2. In order to get *all* annotations, with the `PDFPageProxy.getAnnotations` method, you now need to explicitly set "any" as the `intent` parameter.
3. The `PDFPageProxy.getOperatorList` method will now also support the new "any" intent, to allow accessing the operatorList of all annotations (limited to those types that have one).
4. Finally, for consistency across the API, the `PDFPageProxy.render` method also support the new "any" intent (although I'm not sure how useful that'll be).
Points 1 and 2 above are the significant, and thus breaking, changes in *default* behaviour here. However, unfortunately I cannot see a good way to improve the overall API while also keeping `PDFPageProxy.getAnnotations` unchanged.
2021-08-02 21:30:08 +09:00
|
|
|
const anyPromise = page
|
|
|
|
.getAnnotations({ intent: "any" })
|
|
|
|
.then(function (data) {
|
|
|
|
expect(data.length).toEqual(4);
|
|
|
|
});
|
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const displayPromise = page
|
Fix inconsistent spacing and trailing commas in objects in `test/` files, so we can enable the `comma-dangle` and `object-curly-spacing` ESLint rules later on
http://eslint.org/docs/rules/comma-dangle
http://eslint.org/docs/rules/object-curly-spacing
Given that we currently have quite inconsistent object formatting, fixing this in *one* big patch probably wouldn't be feasible (since I cannot imagine anyone wanting to review that); hence I've opted to try and do this piecewise instead.
Please note: This patch was created automatically, using the ESLint `--fix` command line option. In a couple of places this caused lines to become too long, and I've fixed those manually; please refer to the interdiff below for the only hand-edits in this patch.
```diff
diff --git a/test/chromium/test-telemetry.js b/test/chromium/test-telemetry.js
index cc412a31..2e5bdfa1 100755
--- a/test/chromium/test-telemetry.js
+++ b/test/chromium/test-telemetry.js
@@ -324,7 +324,7 @@ var tests = [
var window = createExtensionGlobal();
telemetryScript.runInNewContext(window);
window.chrome.runtime.getManifest = function() {
- return { version: '1.0.1', };
+ return { version: '1.0.1', };
};
window.Date.test_now_value += 12 * 36E5;
telemetryScript.runInNewContext(window);
diff --git a/test/unit/api_spec.js b/test/unit/api_spec.js
index 1f00747a..f22988e7 100644
--- a/test/unit/api_spec.js
+++ b/test/unit/api_spec.js
@@ -503,8 +503,9 @@ describe('api', function() {
it('gets destinations, from /Dests dictionary', function(done) {
var promise = doc.getDestinations();
promise.then(function(data) {
- expect(data).toEqual({ chapter1: [{ gen: 0, num: 17, }, { name: 'XYZ', },
- 0, 841.89, null], });
+ expect(data).toEqual({
+ chapter1: [{ gen: 0, num: 17, }, { name: 'XYZ', }, 0, 841.89, null],
+ });
done();
}).catch(function (reason) {
done.fail(reason);
diff --git a/test/unit/function_spec.js b/test/unit/function_spec.js
index 66441212..62127eb9 100644
--- a/test/unit/function_spec.js
+++ b/test/unit/function_spec.js
@@ -492,9 +492,11 @@ describe('function', function() {
it('check compiled mul', function() {
check([0.25, 0.5, 'mul'], [], [0, 1], [{ input: [], output: [0.125], }]);
check([0, 'mul'], [0, 1], [0, 1], [{ input: [0.25], output: [0], }]);
- check([0.5, 'mul'], [0, 1], [0, 1], [{ input: [0.25], output: [0.125], }]);
+ check([0.5, 'mul'], [0, 1], [0, 1],
+ [{ input: [0.25], output: [0.125], }]);
check([1, 'mul'], [0, 1], [0, 1], [{ input: [0.25], output: [0.25], }]);
- check([0, 'exch', 'mul'], [0, 1], [0, 1], [{ input: [0.25], output: [0], }]);
+ check([0, 'exch', 'mul'], [0, 1], [0, 1],
+ [{ input: [0.25], output: [0], }]);
check([0.5, 'exch', 'mul'], [0, 1], [0, 1],
[{ input: [0.25], output: [0.125], }]);
check([1, 'exch', 'mul'], [0, 1], [0, 1],
```
2017-06-02 19:55:01 +09:00
|
|
|
.getAnnotations({ intent: "display" })
|
2020-04-14 19:28:14 +09:00
|
|
|
.then(function (data) {
|
2015-11-22 21:56:52 +09:00
|
|
|
expect(data.length).toEqual(4);
|
|
|
|
});
|
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const printPromise = page
|
Fix inconsistent spacing and trailing commas in objects in `test/` files, so we can enable the `comma-dangle` and `object-curly-spacing` ESLint rules later on
http://eslint.org/docs/rules/comma-dangle
http://eslint.org/docs/rules/object-curly-spacing
Given that we currently have quite inconsistent object formatting, fixing this in *one* big patch probably wouldn't be feasible (since I cannot imagine anyone wanting to review that); hence I've opted to try and do this piecewise instead.
Please note: This patch was created automatically, using the ESLint `--fix` command line option. In a couple of places this caused lines to become too long, and I've fixed those manually; please refer to the interdiff below for the only hand-edits in this patch.
```diff
diff --git a/test/chromium/test-telemetry.js b/test/chromium/test-telemetry.js
index cc412a31..2e5bdfa1 100755
--- a/test/chromium/test-telemetry.js
+++ b/test/chromium/test-telemetry.js
@@ -324,7 +324,7 @@ var tests = [
var window = createExtensionGlobal();
telemetryScript.runInNewContext(window);
window.chrome.runtime.getManifest = function() {
- return { version: '1.0.1', };
+ return { version: '1.0.1', };
};
window.Date.test_now_value += 12 * 36E5;
telemetryScript.runInNewContext(window);
diff --git a/test/unit/api_spec.js b/test/unit/api_spec.js
index 1f00747a..f22988e7 100644
--- a/test/unit/api_spec.js
+++ b/test/unit/api_spec.js
@@ -503,8 +503,9 @@ describe('api', function() {
it('gets destinations, from /Dests dictionary', function(done) {
var promise = doc.getDestinations();
promise.then(function(data) {
- expect(data).toEqual({ chapter1: [{ gen: 0, num: 17, }, { name: 'XYZ', },
- 0, 841.89, null], });
+ expect(data).toEqual({
+ chapter1: [{ gen: 0, num: 17, }, { name: 'XYZ', }, 0, 841.89, null],
+ });
done();
}).catch(function (reason) {
done.fail(reason);
diff --git a/test/unit/function_spec.js b/test/unit/function_spec.js
index 66441212..62127eb9 100644
--- a/test/unit/function_spec.js
+++ b/test/unit/function_spec.js
@@ -492,9 +492,11 @@ describe('function', function() {
it('check compiled mul', function() {
check([0.25, 0.5, 'mul'], [], [0, 1], [{ input: [], output: [0.125], }]);
check([0, 'mul'], [0, 1], [0, 1], [{ input: [0.25], output: [0], }]);
- check([0.5, 'mul'], [0, 1], [0, 1], [{ input: [0.25], output: [0.125], }]);
+ check([0.5, 'mul'], [0, 1], [0, 1],
+ [{ input: [0.25], output: [0.125], }]);
check([1, 'mul'], [0, 1], [0, 1], [{ input: [0.25], output: [0.25], }]);
- check([0, 'exch', 'mul'], [0, 1], [0, 1], [{ input: [0.25], output: [0], }]);
+ check([0, 'exch', 'mul'], [0, 1], [0, 1],
+ [{ input: [0.25], output: [0], }]);
check([0.5, 'exch', 'mul'], [0, 1], [0, 1],
[{ input: [0.25], output: [0.125], }]);
check([1, 'exch', 'mul'], [0, 1], [0, 1],
```
2017-06-02 19:55:01 +09:00
|
|
|
.getAnnotations({ intent: "print" })
|
2020-04-14 19:28:14 +09:00
|
|
|
.then(function (data) {
|
2014-08-12 19:04:00 +09:00
|
|
|
expect(data.length).toEqual(4);
|
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
[api-minor] Re-factor the *internal* renderingIntent, and change the default `intent` value in the `PDFPageProxy.getAnnotations` method
With the changes made in PR 13746 the *internal* renderingIntent handling became somewhat "messy", since we're now having to do string-matching in various spots in order to handle the "oplist"-intent correctly.
Hence this patch, which implements the idea from PR 13746 to convert the `intent`-strings, used in various API-methods, into an *internal* renderingIntent that's implemented using a bit-field instead. *Please note:* This part of the patch, in itself, does *not* change the public API (but see below).
This patch is tagged `api-minor` for the following reasons:
1. It changes the *default* value for the `intent` parameter, in the `PDFPageProxy.getAnnotations` method, to "display" in order to be consistent across the API.
2. In order to get *all* annotations, with the `PDFPageProxy.getAnnotations` method, you now need to explicitly set "any" as the `intent` parameter.
3. The `PDFPageProxy.getOperatorList` method will now also support the new "any" intent, to allow accessing the operatorList of all annotations (limited to those types that have one).
4. Finally, for consistency across the API, the `PDFPageProxy.render` method also support the new "any" intent (although I'm not sure how useful that'll be).
Points 1 and 2 above are the significant, and thus breaking, changes in *default* behaviour here. However, unfortunately I cannot see a good way to improve the overall API while also keeping `PDFPageProxy.getAnnotations` unchanged.
2021-08-02 21:30:08 +09:00
|
|
|
await Promise.all([
|
|
|
|
defaultPromise,
|
|
|
|
anyPromise,
|
|
|
|
displayPromise,
|
|
|
|
printPromise,
|
|
|
|
]);
|
2014-08-12 19:04:00 +09:00
|
|
|
});
|
2016-10-01 19:05:07 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets annotations containing relative URLs (bug 766086)", async function () {
|
2020-10-25 23:40:51 +09:00
|
|
|
const filename = "bug766086.pdf";
|
2016-10-01 19:05:07 +09:00
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const defaultLoadingTask = getDocument(buildGetDocumentParams(filename));
|
|
|
|
const defaultPromise = defaultLoadingTask.promise.then(function (pdfDoc) {
|
2020-04-14 19:28:14 +09:00
|
|
|
return pdfDoc.getPage(1).then(function (pdfPage) {
|
2016-10-01 19:05:07 +09:00
|
|
|
return pdfPage.getAnnotations();
|
|
|
|
});
|
|
|
|
});
|
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const docBaseUrlLoadingTask = getDocument(
|
2017-04-09 00:09:54 +09:00
|
|
|
buildGetDocumentParams(filename, {
|
|
|
|
docBaseUrl: "http://www.example.com/test/pdfs/qwerty.pdf",
|
|
|
|
})
|
|
|
|
);
|
2023-07-15 16:09:20 +09:00
|
|
|
const docBaseUrlPromise = docBaseUrlLoadingTask.promise.then(
|
|
|
|
function (pdfDoc) {
|
|
|
|
return pdfDoc.getPage(1).then(function (pdfPage) {
|
|
|
|
return pdfPage.getAnnotations();
|
|
|
|
});
|
|
|
|
}
|
|
|
|
);
|
2016-10-01 19:05:07 +09:00
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const invalidDocBaseUrlLoadingTask = getDocument(
|
2017-04-09 00:09:54 +09:00
|
|
|
buildGetDocumentParams(filename, {
|
|
|
|
docBaseUrl: "qwerty.pdf",
|
|
|
|
})
|
|
|
|
);
|
2021-05-16 17:58:34 +09:00
|
|
|
const invalidDocBaseUrlPromise =
|
|
|
|
invalidDocBaseUrlLoadingTask.promise.then(function (pdfDoc) {
|
2020-04-14 19:28:14 +09:00
|
|
|
return pdfDoc.getPage(1).then(function (pdfPage) {
|
2016-10-01 19:05:07 +09:00
|
|
|
return pdfPage.getAnnotations();
|
|
|
|
});
|
2021-05-16 17:58:34 +09:00
|
|
|
});
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const [
|
|
|
|
defaultAnnotations,
|
|
|
|
docBaseUrlAnnotations,
|
|
|
|
invalidDocBaseUrlAnnotations,
|
|
|
|
] = await Promise.all([
|
|
|
|
defaultPromise,
|
|
|
|
docBaseUrlPromise,
|
|
|
|
invalidDocBaseUrlPromise,
|
|
|
|
]);
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
expect(defaultAnnotations[0].url).toBeUndefined();
|
|
|
|
expect(defaultAnnotations[0].unsafeUrl).toEqual(
|
|
|
|
"../../0021/002156/215675E.pdf#15"
|
|
|
|
);
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
expect(docBaseUrlAnnotations[0].url).toEqual(
|
|
|
|
"http://www.example.com/0021/002156/215675E.pdf#15"
|
|
|
|
);
|
|
|
|
expect(docBaseUrlAnnotations[0].unsafeUrl).toEqual(
|
|
|
|
"../../0021/002156/215675E.pdf#15"
|
|
|
|
);
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
expect(invalidDocBaseUrlAnnotations[0].url).toBeUndefined();
|
|
|
|
expect(invalidDocBaseUrlAnnotations[0].unsafeUrl).toEqual(
|
|
|
|
"../../0021/002156/215675E.pdf#15"
|
|
|
|
);
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await Promise.all([
|
|
|
|
defaultLoadingTask.destroy(),
|
|
|
|
docBaseUrlLoadingTask.destroy(),
|
|
|
|
invalidDocBaseUrlLoadingTask.destroy(),
|
|
|
|
]);
|
2016-10-01 19:05:07 +09:00
|
|
|
});
|
|
|
|
|
2022-10-04 00:55:13 +09:00
|
|
|
it("gets annotations containing GoToE action (issue 8844)", async function () {
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue8844.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
|
|
|
const annotations = await pdfPage.getAnnotations();
|
|
|
|
|
|
|
|
expect(annotations.length).toEqual(1);
|
|
|
|
expect(annotations[0].annotationType).toEqual(AnnotationType.LINK);
|
|
|
|
|
|
|
|
const { filename, content } = annotations[0].attachment;
|
|
|
|
expect(filename).toEqual("man.pdf");
|
|
|
|
expect(content instanceof Uint8Array).toEqual(true);
|
|
|
|
expect(content.length).toEqual(4508);
|
|
|
|
|
2023-10-03 15:01:55 +09:00
|
|
|
expect(annotations[0].attachmentDest).toEqual('[-1,{"name":"Fit"}]');
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
|
|
|
it("gets annotations containing GoToE action with destination (issue 17056)", async function () {
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue17056.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
|
|
|
|
|
|
|
const annotations = await pdfPage.getAnnotations();
|
|
|
|
expect(annotations.length).toEqual(30);
|
|
|
|
|
|
|
|
const { annotationType, attachment, attachmentDest } = annotations[0];
|
|
|
|
expect(annotationType).toEqual(AnnotationType.LINK);
|
|
|
|
|
|
|
|
const { filename, content } = attachment;
|
|
|
|
expect(filename).toEqual("destination-doc.pdf");
|
|
|
|
expect(content instanceof Uint8Array).toEqual(true);
|
|
|
|
expect(content.length).toEqual(10305);
|
|
|
|
|
|
|
|
expect(attachmentDest).toEqual('[0,{"name":"Fit"}]');
|
|
|
|
|
2023-10-16 23:20:46 +09:00
|
|
|
// Check that the attachments, which are identical, aren't duplicated.
|
|
|
|
for (let i = 1, ii = annotations.length; i < ii; i++) {
|
|
|
|
expect(annotations[i].attachment).toBe(attachment);
|
|
|
|
}
|
|
|
|
|
2022-10-04 00:55:13 +09:00
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets text content", async function () {
|
2023-03-30 20:36:42 +09:00
|
|
|
const { items, styles } = await page.getTextContent();
|
2021-04-30 21:41:13 +09:00
|
|
|
|
2023-03-30 20:36:42 +09:00
|
|
|
expect(items.length).toEqual(15);
|
|
|
|
expect(objectSize(styles)).toEqual(5);
|
2015-11-24 00:57:43 +09:00
|
|
|
|
2023-03-30 20:36:42 +09:00
|
|
|
const text = mergeText(items);
|
|
|
|
expect(text).toEqual(`Table Of Content
|
2021-05-24 02:03:53 +09:00
|
|
|
Chapter 1 .......................................................... 2
|
|
|
|
Paragraph 1.1 ...................................................... 3
|
|
|
|
page 1 / 3`);
|
2012-04-13 09:59:30 +09:00
|
|
|
});
|
2019-01-29 22:24:48 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets text content, with correct properties (issue 8276)", async function () {
|
2019-01-29 22:24:48 +09:00
|
|
|
const loadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("issue8276_reduced.pdf")
|
|
|
|
);
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
2023-03-23 18:15:14 +09:00
|
|
|
const { items, styles } = await pdfPage.getTextContent({
|
|
|
|
disableNormalization: true,
|
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
expect(items.length).toEqual(1);
|
2022-11-04 08:19:23 +09:00
|
|
|
// Font name will be a random object id.
|
2020-12-11 10:32:18 +09:00
|
|
|
const fontName = items[0].fontName;
|
|
|
|
expect(Object.keys(styles)).toEqual([fontName]);
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
expect(items[0]).toEqual({
|
|
|
|
dir: "ltr",
|
2020-12-11 10:32:18 +09:00
|
|
|
fontName,
|
2021-04-17 04:48:42 +09:00
|
|
|
height: 18,
|
|
|
|
str: "Issue 8276",
|
|
|
|
transform: [18, 0, 0, 18, 441.81, 708.4499999999999],
|
|
|
|
width: 77.49,
|
2021-04-30 21:41:13 +09:00
|
|
|
hasEOL: false,
|
2021-04-17 04:48:42 +09:00
|
|
|
});
|
2020-12-11 10:32:18 +09:00
|
|
|
expect(styles[fontName]).toEqual({
|
2021-04-17 04:48:42 +09:00
|
|
|
fontFamily: "serif",
|
2022-01-28 06:51:30 +09:00
|
|
|
// `useSystemFonts` has a different value in web environments
|
|
|
|
// and in Node.js.
|
|
|
|
ascent: isNodeJS ? NaN : 0.683,
|
|
|
|
descent: isNodeJS ? NaN : -0.217,
|
2021-04-17 04:48:42 +09:00
|
|
|
vertical: false,
|
|
|
|
});
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2022-11-04 08:19:23 +09:00
|
|
|
// Wait for font data to be loaded so we can check that the font names
|
|
|
|
// match.
|
|
|
|
await pdfPage.getOperatorList();
|
|
|
|
expect(pdfPage.commonObjs.has(fontName)).toEqual(true);
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await loadingTask.destroy();
|
2019-01-29 22:24:48 +09:00
|
|
|
});
|
|
|
|
|
2021-05-24 02:03:53 +09:00
|
|
|
it("gets text content, with no extra spaces (issue 13226)", async function () {
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue13226.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
2023-03-23 18:15:14 +09:00
|
|
|
const { items } = await pdfPage.getTextContent({
|
|
|
|
disableNormalization: true,
|
|
|
|
});
|
2021-05-24 02:03:53 +09:00
|
|
|
const text = mergeText(items);
|
|
|
|
|
|
|
|
expect(text).toEqual(
|
|
|
|
"Mitarbeiterinnen und Mitarbeiter arbeiten in über 100 Ländern engagiert im Dienste"
|
|
|
|
);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2023-03-08 03:38:49 +09:00
|
|
|
it("gets text content, with no extra spaces (issue 16119)", async function () {
|
|
|
|
if (isNodeJS) {
|
|
|
|
pending("Linked test-cases are not supported in Node.js.");
|
|
|
|
}
|
|
|
|
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue16119.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
2023-03-23 18:15:14 +09:00
|
|
|
const { items } = await pdfPage.getTextContent({
|
|
|
|
disableNormalization: true,
|
|
|
|
});
|
2023-03-08 03:38:49 +09:00
|
|
|
const text = mergeText(items);
|
|
|
|
|
|
|
|
expect(
|
|
|
|
text.includes(
|
|
|
|
"Engang var der i Samvirke en opskrift på en fiskelagkage, som jeg med"
|
|
|
|
)
|
|
|
|
).toBe(true);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2021-05-24 02:03:53 +09:00
|
|
|
it("gets text content, with merged spaces (issue 13201)", async function () {
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue13201.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
2023-03-23 18:15:14 +09:00
|
|
|
const { items } = await pdfPage.getTextContent({
|
|
|
|
disableNormalization: true,
|
|
|
|
});
|
2021-05-24 02:03:53 +09:00
|
|
|
const text = mergeText(items);
|
|
|
|
|
|
|
|
expect(
|
|
|
|
text.includes(
|
|
|
|
"Abstract. A purely peer-to-peer version of electronic cash would allow online"
|
|
|
|
)
|
|
|
|
).toEqual(true);
|
|
|
|
expect(
|
|
|
|
text.includes(
|
|
|
|
"avoid mediating disputes. The cost of mediation increases transaction costs, limiting the"
|
|
|
|
)
|
|
|
|
).toEqual(true);
|
|
|
|
expect(
|
|
|
|
text.includes(
|
|
|
|
"system is secure as long as honest nodes collectively control more CPU power than any"
|
|
|
|
)
|
|
|
|
).toEqual(true);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
|
|
|
it("gets text content, with no spaces between letters of words (issue 11913)", async function () {
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue11913.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
2023-03-23 18:15:14 +09:00
|
|
|
const { items } = await pdfPage.getTextContent({
|
|
|
|
disableNormalization: true,
|
|
|
|
});
|
2021-05-24 02:03:53 +09:00
|
|
|
const text = mergeText(items);
|
|
|
|
|
|
|
|
expect(
|
|
|
|
text.includes(
|
|
|
|
"1. The first of these cases arises from the tragic handicap which has blighted the life of the Plaintiff, and from the response of the"
|
|
|
|
)
|
|
|
|
).toEqual(true);
|
|
|
|
expect(
|
|
|
|
text.includes(
|
|
|
|
"argued in this Court the appeal raises narrower, but important, issues which may be summarised as follows:-"
|
|
|
|
)
|
|
|
|
).toEqual(true);
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
|
|
|
it("gets text content, with merged spaces (issue 10900)", async function () {
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue10900.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
2023-03-23 18:15:14 +09:00
|
|
|
const { items } = await pdfPage.getTextContent({
|
|
|
|
disableNormalization: true,
|
|
|
|
});
|
2021-05-24 02:03:53 +09:00
|
|
|
const text = mergeText(items);
|
|
|
|
|
|
|
|
expect(
|
|
|
|
text.includes(`3 3 3 3
|
|
|
|
851.5 854.9 839.3 837.5
|
|
|
|
633.6 727.8 789.9 796.2
|
|
|
|
1,485.1 1,582.7 1,629.2 1,633.7
|
|
|
|
114.2 121.7 125.3 130.7
|
|
|
|
13.0x 13.0x 13.0x 12.5x`)
|
|
|
|
).toEqual(true);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
|
|
|
it("gets text content, with spaces (issue 10640)", async function () {
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue10640.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
2023-03-23 18:15:14 +09:00
|
|
|
let { items } = await pdfPage.getTextContent({
|
|
|
|
disableNormalization: true,
|
|
|
|
});
|
|
|
|
let text = mergeText(items);
|
|
|
|
let expected = `Open Sans is a humanist sans serif typeface designed by Steve Matteson.
|
|
|
|
Open Sans was designed with an upright stress, open forms and a neu-
|
|
|
|
tral, yet friendly appearance. It was optimized for print, web, and mobile
|
|
|
|
interfaces, and has excellent legibility characteristics in its letterforms (see
|
|
|
|
figure \x81 on the following page). This font is available from the Google Font
|
|
|
|
Directory [\x81] as TrueType files licensed under the Apache License version \x82.\x80.
|
|
|
|
This package provides support for this font in LATEX. It includes Type \x81
|
|
|
|
versions of the fonts, converted for this package using FontForge from its
|
|
|
|
sources, for full support with Dvips.`;
|
2021-05-24 02:03:53 +09:00
|
|
|
|
2023-03-23 18:15:14 +09:00
|
|
|
expect(text.includes(expected)).toEqual(true);
|
|
|
|
|
|
|
|
({ items } = await pdfPage.getTextContent({
|
|
|
|
disableNormalization: false,
|
|
|
|
}));
|
|
|
|
text = mergeText(items);
|
|
|
|
expected = `Open Sans is a humanist sans serif typeface designed by Steve Matteson.
|
2021-05-24 02:03:53 +09:00
|
|
|
Open Sans was designed with an upright stress, open forms and a neu-
|
|
|
|
tral, yet friendly appearance. It was optimized for print, web, and mobile
|
|
|
|
interfaces, and has excellent legibility characteristics in its letterforms (see
|
|
|
|
figure \x81 on the following page). This font is available from the Google Font
|
|
|
|
Directory [\x81] as TrueType files licensed under the Apache License version \x82.\x80.
|
|
|
|
This package provides support for this font in LATEX. It includes Type \x81
|
|
|
|
versions of the fonts, converted for this package using FontForge from its
|
2023-03-23 18:15:14 +09:00
|
|
|
sources, for full support with Dvips.`;
|
|
|
|
expect(text.includes(expected)).toEqual(true);
|
2021-05-24 02:03:53 +09:00
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2021-11-13 02:04:17 +09:00
|
|
|
it("gets text content, with negative spaces (bug 931481)", async function () {
|
|
|
|
if (isNodeJS) {
|
|
|
|
pending("Linked test-cases are not supported in Node.js.");
|
|
|
|
}
|
|
|
|
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("bug931481.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
2023-03-23 18:15:14 +09:00
|
|
|
const { items } = await pdfPage.getTextContent({
|
|
|
|
disableNormalization: true,
|
|
|
|
});
|
2021-11-13 02:04:17 +09:00
|
|
|
const text = mergeText(items);
|
|
|
|
|
|
|
|
expect(
|
|
|
|
text.includes(`Kathrin Nachbaur
|
|
|
|
Die promovierte Juristin ist 1979 in Graz geboren und aufgewachsen. Nach
|
|
|
|
erfolgreichem Studienabschluss mit Fokus auf Europarecht absolvierte sie ein
|
|
|
|
Praktikum bei Magna International in Kanada in der Human Resources Abteilung.
|
|
|
|
Anschliessend wurde sie geschult in Human Resources, Arbeitsrecht und
|
|
|
|
Kommunikation, währenddessen sie auch an ihrem Doktorat im Wirtschaftsrecht
|
|
|
|
arbeitete. Seither arbeitete sie bei Magna International als Projekt Manager in der
|
|
|
|
Innovationsabteilung. Seit 2009 ist sie Frank Stronachs Büroleiterin in Österreich und
|
|
|
|
Kanada. Zusätzlich ist sie seit 2012 Vice President, Business Development der
|
|
|
|
Stronach Group und Vizepräsidentin und Institutsleiterin des Stronach Institut für
|
|
|
|
sozialökonomische Gerechtigkeit.`)
|
|
|
|
).toEqual(true);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2022-01-24 07:04:18 +09:00
|
|
|
it("gets text content, with invisible text marks (issue 9186)", async function () {
|
|
|
|
if (isNodeJS) {
|
|
|
|
pending("Linked test-cases are not supported in Node.js.");
|
|
|
|
}
|
|
|
|
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue9186.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
2023-03-23 18:15:14 +09:00
|
|
|
const { items } = await pdfPage.getTextContent({
|
|
|
|
disableNormalization: true,
|
|
|
|
});
|
2022-01-24 07:04:18 +09:00
|
|
|
const text = mergeText(items);
|
|
|
|
|
|
|
|
expect(
|
|
|
|
text.includes(`This Agreement (“Agreement”) is made as of this 25th day of January, 2017, by and
|
|
|
|
between EDWARD G. ATSINGER III, not individually but as sole Trustee of the ATSINGER
|
|
|
|
FAMILY TRUST /u/a dated October 31, 1980 as amended, and STUART W. EPPERSON, not
|
|
|
|
individually but solely as Trustee of the STUART W. EPPERSON REVOCABLE LIVING
|
|
|
|
TRUST /u/a dated January 14th 1993 as amended, collectively referred to herein as “Lessor”, and
|
|
|
|
Caron Broadcasting, Inc., an Ohio corporation (“Lessee”).`)
|
|
|
|
).toEqual(true);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2021-10-24 18:51:57 +09:00
|
|
|
it("gets text content, with beginbfrange operator handled correctly (bug 1627427)", async function () {
|
|
|
|
const loadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("bug1627427_reduced.pdf")
|
|
|
|
);
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
2023-03-23 18:15:14 +09:00
|
|
|
const { items } = await pdfPage.getTextContent({
|
|
|
|
disableNormalization: true,
|
|
|
|
});
|
2021-10-24 18:51:57 +09:00
|
|
|
const text = mergeText(items);
|
|
|
|
|
|
|
|
expect(text).toEqual(
|
|
|
|
"침하게 흐린 품이 눈이 올 듯하더니 눈은 아니 오고 얼다가 만 비가 추"
|
|
|
|
);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2022-02-14 03:39:40 +09:00
|
|
|
it("gets text content, and check that out-of-page text is not present (bug 1755201)", async function () {
|
|
|
|
if (isNodeJS) {
|
|
|
|
pending("Linked test-cases are not supported in Node.js.");
|
|
|
|
}
|
|
|
|
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("bug1755201.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(6);
|
2023-03-23 18:15:14 +09:00
|
|
|
const { items } = await pdfPage.getTextContent({
|
|
|
|
disableNormalization: true,
|
|
|
|
});
|
2022-02-14 03:39:40 +09:00
|
|
|
const text = mergeText(items);
|
|
|
|
|
|
|
|
expect(/win aisle/.test(text)).toEqual(false);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2022-06-25 23:40:46 +09:00
|
|
|
it("gets text content with or without includeMarkedContent, and compare (issue 15094)", async function () {
|
|
|
|
if (isNodeJS) {
|
|
|
|
pending("Linked test-cases are not supported in Node.js.");
|
|
|
|
}
|
|
|
|
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("pdf.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(568);
|
|
|
|
let { items } = await pdfPage.getTextContent({
|
|
|
|
includeMarkedContent: false,
|
2023-03-23 18:15:14 +09:00
|
|
|
disableNormalization: true,
|
2022-06-25 23:40:46 +09:00
|
|
|
});
|
|
|
|
const textWithoutMC = mergeText(items);
|
|
|
|
({ items } = await pdfPage.getTextContent({
|
|
|
|
includeMarkedContent: true,
|
2023-03-23 18:15:14 +09:00
|
|
|
disableNormalization: true,
|
2022-06-25 23:40:46 +09:00
|
|
|
}));
|
|
|
|
const textWithMC = mergeText(items);
|
|
|
|
|
|
|
|
expect(textWithoutMC).toEqual(textWithMC);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2023-03-21 20:24:21 +09:00
|
|
|
it("gets text content with multi-byte entries, using predefined CMaps (issue 16176)", async function () {
|
|
|
|
const loadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("issue16176.pdf", {
|
|
|
|
cMapUrl: CMAP_URL,
|
|
|
|
useWorkerFetch: false,
|
|
|
|
})
|
|
|
|
);
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
2023-03-23 18:15:14 +09:00
|
|
|
const { items } = await pdfPage.getTextContent({
|
|
|
|
disableNormalization: true,
|
|
|
|
});
|
2023-03-21 20:24:21 +09:00
|
|
|
const text = mergeText(items);
|
|
|
|
|
|
|
|
expect(text).toEqual("𠮷");
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2023-03-28 19:00:53 +09:00
|
|
|
it("gets text content with a rised text", async function () {
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue16221.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
2023-03-23 18:15:14 +09:00
|
|
|
const { items } = await pdfPage.getTextContent({
|
|
|
|
disableNormalization: true,
|
|
|
|
});
|
2023-03-28 19:00:53 +09:00
|
|
|
|
|
|
|
expect(items.map(i => i.str)).toEqual(["Hello ", "World"]);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2023-04-18 23:39:10 +09:00
|
|
|
it("gets text content with a specific view box", async function () {
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue16316.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
|
|
|
const { items } = await pdfPage.getTextContent({
|
|
|
|
disableNormalization: true,
|
|
|
|
});
|
|
|
|
const text = mergeText(items);
|
|
|
|
|
|
|
|
expect(text).toEqual("Experimentation,");
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2023-05-19 00:22:42 +09:00
|
|
|
it("check that a chunk is pushed when font is restored", async function () {
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("issue14755.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
|
|
|
const { items } = await pdfPage.getTextContent({
|
|
|
|
disableNormalization: true,
|
|
|
|
});
|
|
|
|
expect(items).toEqual([
|
|
|
|
jasmine.objectContaining({
|
|
|
|
str: "ABC",
|
|
|
|
dir: "ltr",
|
|
|
|
width: 20.56,
|
|
|
|
height: 10,
|
|
|
|
transform: [10, 0, 0, 10, 100, 100],
|
|
|
|
hasEOL: false,
|
|
|
|
}),
|
|
|
|
jasmine.objectContaining({
|
|
|
|
str: "DEF",
|
|
|
|
dir: "ltr",
|
|
|
|
width: 20,
|
|
|
|
height: 10,
|
|
|
|
transform: [10, 0, 0, 10, 120, 100],
|
|
|
|
hasEOL: false,
|
|
|
|
}),
|
|
|
|
jasmine.objectContaining({
|
|
|
|
str: "GHI",
|
|
|
|
dir: "ltr",
|
|
|
|
width: 17.78,
|
|
|
|
height: 10,
|
|
|
|
transform: [10, 0, 0, 10, 140, 100],
|
|
|
|
hasEOL: false,
|
|
|
|
}),
|
|
|
|
]);
|
|
|
|
expect(items[0].fontName).toEqual(items[2].fontName);
|
|
|
|
expect(items[1].fontName).not.toEqual(items[0].fontName);
|
|
|
|
});
|
|
|
|
|
2021-04-11 19:04:29 +09:00
|
|
|
it("gets empty structure tree", async function () {
|
|
|
|
const tree = await page.getStructTree();
|
|
|
|
|
|
|
|
expect(tree).toEqual(null);
|
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
2021-04-11 19:04:29 +09:00
|
|
|
it("gets simple structure tree", async function () {
|
|
|
|
const loadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("structure_simple.pdf")
|
|
|
|
);
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
|
|
|
const tree = await pdfPage.getStructTree();
|
|
|
|
|
|
|
|
expect(tree).toEqual({
|
|
|
|
role: "Root",
|
|
|
|
children: [
|
|
|
|
{
|
|
|
|
role: "Document",
|
2021-11-11 22:36:18 +09:00
|
|
|
lang: "en-US",
|
2021-04-11 19:04:29 +09:00
|
|
|
children: [
|
|
|
|
{
|
|
|
|
role: "H1",
|
|
|
|
children: [
|
|
|
|
{
|
|
|
|
role: "NonStruct",
|
2023-05-19 05:23:42 +09:00
|
|
|
children: [{ type: "content", id: "p2R_mc0" }],
|
2021-04-11 19:04:29 +09:00
|
|
|
},
|
|
|
|
],
|
|
|
|
},
|
|
|
|
{
|
|
|
|
role: "P",
|
|
|
|
children: [
|
|
|
|
{
|
|
|
|
role: "NonStruct",
|
2023-05-19 05:23:42 +09:00
|
|
|
children: [{ type: "content", id: "p2R_mc1" }],
|
2021-04-11 19:04:29 +09:00
|
|
|
},
|
|
|
|
],
|
|
|
|
},
|
|
|
|
{
|
|
|
|
role: "H2",
|
|
|
|
children: [
|
|
|
|
{
|
|
|
|
role: "NonStruct",
|
2023-05-19 05:23:42 +09:00
|
|
|
children: [{ type: "content", id: "p2R_mc2" }],
|
2021-04-11 19:04:29 +09:00
|
|
|
},
|
|
|
|
],
|
|
|
|
},
|
|
|
|
{
|
|
|
|
role: "P",
|
|
|
|
children: [
|
|
|
|
{
|
|
|
|
role: "NonStruct",
|
2023-05-19 05:23:42 +09:00
|
|
|
children: [{ type: "content", id: "p2R_mc3" }],
|
2021-04-11 19:04:29 +09:00
|
|
|
},
|
|
|
|
],
|
|
|
|
},
|
|
|
|
],
|
|
|
|
},
|
|
|
|
],
|
|
|
|
});
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets operator list", async function () {
|
|
|
|
const operatorList = await page.getOperatorList();
|
[api-minor] Add `intent` support to the `PDFPageProxy.getOperatorList` method (issue 13704)
With this patch, the `PDFPageProxy.getOperatorList` method will now return `PDFOperatorList`-instances that also include Annotation-operatorLists (when those exist). Hence this closes a small, but potentially confusing, gap between the `render` and `getOperatorList` methods.
Previously we've been somewhat reluctant to do this, as explained below, but given that there's actual use-cases where it's required probably means that we'll *have* to implement it now.
Since we still need the ability to separate "normal" rendering operations from direct `getOperatorList` calls in the worker-thread, this API-change unfortunately causes the *internal* renderingIntent to become a bit "messy" which is indeed unfortunate (note the `"oplist-"` strings in various spots). As-is I suppose that it's not all that bad, but we may want to consider changing the *internal* renderingIntent to e.g. a bitfield in the future.
Besides fixing issue 13704, this patch would also be necessary if someone ever tries to implement e.g. issue 10165 (since currently `PDFPageProxy.getOperatorList` doesn't include Annotation-operatorLists).
*Please note:* This patch is *also* tagged "api-minor" for a second reason, which is that we're now including the Annotation-id in the `beginAnnotation` argument. The reason for this is to allow correlating the Annotation-data returned by `PDFPageProxy.getAnnotations`, with its corresponding operatorList-data (for those Annotations that have it).
2021-07-10 23:47:39 +09:00
|
|
|
|
|
|
|
expect(operatorList.fnArray.length).toBeGreaterThan(100);
|
|
|
|
expect(operatorList.argsArray.length).toBeGreaterThan(100);
|
2021-04-17 04:48:42 +09:00
|
|
|
expect(operatorList.lastChunk).toEqual(true);
|
2022-06-27 18:41:37 +09:00
|
|
|
expect(operatorList.separateAnnots).toEqual({
|
|
|
|
form: false,
|
|
|
|
canvas: false,
|
|
|
|
});
|
2014-06-17 03:35:38 +09:00
|
|
|
});
|
2019-07-13 23:06:05 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets operatorList with JPEG image (issue 4888)", async function () {
|
2023-02-16 01:14:04 +09:00
|
|
|
const loadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("cmykjpeg.pdf", {
|
|
|
|
isOffscreenCanvasSupported: false,
|
|
|
|
})
|
|
|
|
);
|
2018-02-11 21:13:11 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
|
|
|
const operatorList = await pdfPage.getOperatorList();
|
2018-02-11 21:13:11 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const imgIndex = operatorList.fnArray.indexOf(OPS.paintImageXObject);
|
|
|
|
const imgArgs = operatorList.argsArray[imgIndex];
|
|
|
|
const { data } = pdfPage.objs.get(imgArgs[0]);
|
2019-01-29 22:25:47 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
expect(data instanceof Uint8ClampedArray).toEqual(true);
|
|
|
|
expect(data.length).toEqual(90000);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
2018-02-11 21:13:11 +09:00
|
|
|
});
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2019-07-13 23:06:05 +09:00
|
|
|
it(
|
|
|
|
"gets operatorList, from corrupt PDF file (issue 8702), " +
|
|
|
|
"with/without `stopAtErrors` set",
|
2021-04-17 04:48:42 +09:00
|
|
|
async function () {
|
2019-07-13 23:06:05 +09:00
|
|
|
const loadingTask1 = getDocument(
|
|
|
|
buildGetDocumentParams("issue8702.pdf", {
|
|
|
|
stopAtErrors: false, // The default value.
|
|
|
|
})
|
|
|
|
);
|
|
|
|
const loadingTask2 = getDocument(
|
|
|
|
buildGetDocumentParams("issue8702.pdf", {
|
|
|
|
stopAtErrors: true,
|
|
|
|
})
|
|
|
|
);
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2024-01-21 23:47:39 +09:00
|
|
|
// eslint-disable-next-line arrow-body-style
|
2019-07-13 23:06:05 +09:00
|
|
|
const result1 = loadingTask1.promise.then(pdfDoc => {
|
2024-01-21 23:47:39 +09:00
|
|
|
// eslint-disable-next-line arrow-body-style
|
2019-07-13 23:06:05 +09:00
|
|
|
return pdfDoc.getPage(1).then(pdfPage => {
|
|
|
|
return pdfPage.getOperatorList().then(opList => {
|
2020-07-15 07:17:27 +09:00
|
|
|
expect(opList.fnArray.length).toBeGreaterThan(100);
|
|
|
|
expect(opList.argsArray.length).toBeGreaterThan(100);
|
2019-07-13 23:06:05 +09:00
|
|
|
expect(opList.lastChunk).toEqual(true);
|
2022-06-27 18:41:37 +09:00
|
|
|
expect(opList.separateAnnots).toEqual(null);
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2019-07-13 23:06:05 +09:00
|
|
|
return loadingTask1.destroy();
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
});
|
2019-07-13 23:06:05 +09:00
|
|
|
});
|
|
|
|
});
|
|
|
|
|
2024-01-21 23:47:39 +09:00
|
|
|
// eslint-disable-next-line arrow-body-style
|
2019-07-13 23:06:05 +09:00
|
|
|
const result2 = loadingTask2.promise.then(pdfDoc => {
|
2024-01-21 23:47:39 +09:00
|
|
|
// eslint-disable-next-line arrow-body-style
|
2019-07-13 23:06:05 +09:00
|
|
|
return pdfDoc.getPage(1).then(pdfPage => {
|
|
|
|
return pdfPage.getOperatorList().then(opList => {
|
|
|
|
expect(opList.fnArray.length).toEqual(0);
|
|
|
|
expect(opList.argsArray.length).toEqual(0);
|
|
|
|
expect(opList.lastChunk).toEqual(true);
|
2022-06-27 18:41:37 +09:00
|
|
|
expect(opList.separateAnnots).toEqual(null);
|
2019-07-13 23:06:05 +09:00
|
|
|
|
|
|
|
return loadingTask2.destroy();
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
});
|
2019-07-13 23:06:05 +09:00
|
|
|
});
|
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await Promise.all([result1, result2]);
|
2019-07-13 23:06:05 +09:00
|
|
|
}
|
|
|
|
);
|
|
|
|
|
[api-minor] Add `intent` support to the `PDFPageProxy.getOperatorList` method (issue 13704)
With this patch, the `PDFPageProxy.getOperatorList` method will now return `PDFOperatorList`-instances that also include Annotation-operatorLists (when those exist). Hence this closes a small, but potentially confusing, gap between the `render` and `getOperatorList` methods.
Previously we've been somewhat reluctant to do this, as explained below, but given that there's actual use-cases where it's required probably means that we'll *have* to implement it now.
Since we still need the ability to separate "normal" rendering operations from direct `getOperatorList` calls in the worker-thread, this API-change unfortunately causes the *internal* renderingIntent to become a bit "messy" which is indeed unfortunate (note the `"oplist-"` strings in various spots). As-is I suppose that it's not all that bad, but we may want to consider changing the *internal* renderingIntent to e.g. a bitfield in the future.
Besides fixing issue 13704, this patch would also be necessary if someone ever tries to implement e.g. issue 10165 (since currently `PDFPageProxy.getOperatorList` doesn't include Annotation-operatorLists).
*Please note:* This patch is *also* tagged "api-minor" for a second reason, which is that we're now including the Annotation-id in the `beginAnnotation` argument. The reason for this is to allow correlating the Annotation-data returned by `PDFPageProxy.getAnnotations`, with its corresponding operatorList-data (for those Annotations that have it).
2021-07-10 23:47:39 +09:00
|
|
|
it("gets operator list, containing Annotation-operatorLists", async function () {
|
|
|
|
const loadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("annotation-line.pdf")
|
|
|
|
);
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
|
|
|
const operatorList = await pdfPage.getOperatorList();
|
|
|
|
|
|
|
|
expect(operatorList.fnArray.length).toBeGreaterThan(20);
|
|
|
|
expect(operatorList.argsArray.length).toBeGreaterThan(20);
|
|
|
|
expect(operatorList.lastChunk).toEqual(true);
|
2022-06-27 18:41:37 +09:00
|
|
|
expect(operatorList.separateAnnots).toEqual({
|
|
|
|
form: false,
|
|
|
|
canvas: false,
|
|
|
|
});
|
[api-minor] Add `intent` support to the `PDFPageProxy.getOperatorList` method (issue 13704)
With this patch, the `PDFPageProxy.getOperatorList` method will now return `PDFOperatorList`-instances that also include Annotation-operatorLists (when those exist). Hence this closes a small, but potentially confusing, gap between the `render` and `getOperatorList` methods.
Previously we've been somewhat reluctant to do this, as explained below, but given that there's actual use-cases where it's required probably means that we'll *have* to implement it now.
Since we still need the ability to separate "normal" rendering operations from direct `getOperatorList` calls in the worker-thread, this API-change unfortunately causes the *internal* renderingIntent to become a bit "messy" which is indeed unfortunate (note the `"oplist-"` strings in various spots). As-is I suppose that it's not all that bad, but we may want to consider changing the *internal* renderingIntent to e.g. a bitfield in the future.
Besides fixing issue 13704, this patch would also be necessary if someone ever tries to implement e.g. issue 10165 (since currently `PDFPageProxy.getOperatorList` doesn't include Annotation-operatorLists).
*Please note:* This patch is *also* tagged "api-minor" for a second reason, which is that we're now including the Annotation-id in the `beginAnnotation` argument. The reason for this is to allow correlating the Annotation-data returned by `PDFPageProxy.getAnnotations`, with its corresponding operatorList-data (for those Annotations that have it).
2021-07-10 23:47:39 +09:00
|
|
|
|
|
|
|
// The `getOperatorList` method, similar to the `render` method,
|
|
|
|
// is supposed to include any existing Annotation-operatorLists.
|
|
|
|
expect(operatorList.fnArray.includes(OPS.beginAnnotation)).toEqual(true);
|
|
|
|
expect(operatorList.fnArray.includes(OPS.endAnnotation)).toEqual(true);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
[api-minor] Introduce a new `annotationMode`-option, in `PDFPageProxy.{render, getOperatorList}`
*This is a follow-up to PRs 13867 and 13899.*
This patch is tagged `api-minor` for the following reasons:
- It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.
- For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.
- It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.
---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
2021-08-08 21:36:28 +09:00
|
|
|
it("gets operator list, with `annotationMode`-option", async function () {
|
|
|
|
const loadingTask = getDocument(buildGetDocumentParams("evaljs.pdf"));
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(2);
|
|
|
|
|
|
|
|
pdfDoc.annotationStorage.setValue("30R", { value: "test" });
|
|
|
|
pdfDoc.annotationStorage.setValue("31R", { value: true });
|
|
|
|
|
|
|
|
const opListAnnotDisable = await pdfPage.getOperatorList({
|
|
|
|
annotationMode: AnnotationMode.DISABLE,
|
|
|
|
});
|
|
|
|
expect(opListAnnotDisable.fnArray.length).toEqual(0);
|
|
|
|
expect(opListAnnotDisable.argsArray.length).toEqual(0);
|
|
|
|
expect(opListAnnotDisable.lastChunk).toEqual(true);
|
2022-06-27 18:41:37 +09:00
|
|
|
expect(opListAnnotDisable.separateAnnots).toEqual(null);
|
[api-minor] Introduce a new `annotationMode`-option, in `PDFPageProxy.{render, getOperatorList}`
*This is a follow-up to PRs 13867 and 13899.*
This patch is tagged `api-minor` for the following reasons:
- It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.
- For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.
- It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.
---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
2021-08-08 21:36:28 +09:00
|
|
|
|
|
|
|
const opListAnnotEnable = await pdfPage.getOperatorList({
|
|
|
|
annotationMode: AnnotationMode.ENABLE,
|
|
|
|
});
|
2022-06-11 21:05:25 +09:00
|
|
|
expect(opListAnnotEnable.fnArray.length).toBeGreaterThan(140);
|
|
|
|
expect(opListAnnotEnable.argsArray.length).toBeGreaterThan(140);
|
[api-minor] Introduce a new `annotationMode`-option, in `PDFPageProxy.{render, getOperatorList}`
*This is a follow-up to PRs 13867 and 13899.*
This patch is tagged `api-minor` for the following reasons:
- It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.
- For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.
- It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.
---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
2021-08-08 21:36:28 +09:00
|
|
|
expect(opListAnnotEnable.lastChunk).toEqual(true);
|
2022-06-27 18:41:37 +09:00
|
|
|
expect(opListAnnotEnable.separateAnnots).toEqual({
|
|
|
|
form: false,
|
|
|
|
canvas: true,
|
|
|
|
});
|
[api-minor] Introduce a new `annotationMode`-option, in `PDFPageProxy.{render, getOperatorList}`
*This is a follow-up to PRs 13867 and 13899.*
This patch is tagged `api-minor` for the following reasons:
- It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.
- For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.
- It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.
---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
2021-08-08 21:36:28 +09:00
|
|
|
|
2022-07-28 20:37:37 +09:00
|
|
|
let firstAnnotIndex = opListAnnotEnable.fnArray.indexOf(
|
|
|
|
OPS.beginAnnotation
|
|
|
|
);
|
|
|
|
let isUsingOwnCanvas = opListAnnotEnable.argsArray[firstAnnotIndex][4];
|
|
|
|
expect(isUsingOwnCanvas).toEqual(false);
|
|
|
|
|
[api-minor] Introduce a new `annotationMode`-option, in `PDFPageProxy.{render, getOperatorList}`
*This is a follow-up to PRs 13867 and 13899.*
This patch is tagged `api-minor` for the following reasons:
- It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.
- For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.
- It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.
---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
2021-08-08 21:36:28 +09:00
|
|
|
const opListAnnotEnableForms = await pdfPage.getOperatorList({
|
|
|
|
annotationMode: AnnotationMode.ENABLE_FORMS,
|
|
|
|
});
|
2022-06-11 21:05:25 +09:00
|
|
|
expect(opListAnnotEnableForms.fnArray.length).toBeGreaterThan(30);
|
|
|
|
expect(opListAnnotEnableForms.argsArray.length).toBeGreaterThan(30);
|
[api-minor] Introduce a new `annotationMode`-option, in `PDFPageProxy.{render, getOperatorList}`
*This is a follow-up to PRs 13867 and 13899.*
This patch is tagged `api-minor` for the following reasons:
- It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.
- For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.
- It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.
---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
2021-08-08 21:36:28 +09:00
|
|
|
expect(opListAnnotEnableForms.lastChunk).toEqual(true);
|
2022-06-27 18:41:37 +09:00
|
|
|
expect(opListAnnotEnableForms.separateAnnots).toEqual({
|
|
|
|
form: true,
|
|
|
|
canvas: true,
|
|
|
|
});
|
[api-minor] Introduce a new `annotationMode`-option, in `PDFPageProxy.{render, getOperatorList}`
*This is a follow-up to PRs 13867 and 13899.*
This patch is tagged `api-minor` for the following reasons:
- It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.
- For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.
- It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.
---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
2021-08-08 21:36:28 +09:00
|
|
|
|
2022-07-28 20:37:37 +09:00
|
|
|
firstAnnotIndex = opListAnnotEnableForms.fnArray.indexOf(
|
|
|
|
OPS.beginAnnotation
|
|
|
|
);
|
|
|
|
isUsingOwnCanvas = opListAnnotEnableForms.argsArray[firstAnnotIndex][4];
|
|
|
|
expect(isUsingOwnCanvas).toEqual(true);
|
|
|
|
|
[api-minor] Introduce a new `annotationMode`-option, in `PDFPageProxy.{render, getOperatorList}`
*This is a follow-up to PRs 13867 and 13899.*
This patch is tagged `api-minor` for the following reasons:
- It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.
- For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.
- It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.
---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
2021-08-08 21:36:28 +09:00
|
|
|
const opListAnnotEnableStorage = await pdfPage.getOperatorList({
|
|
|
|
annotationMode: AnnotationMode.ENABLE_STORAGE,
|
|
|
|
});
|
|
|
|
expect(opListAnnotEnableStorage.fnArray.length).toBeGreaterThan(170);
|
|
|
|
expect(opListAnnotEnableStorage.argsArray.length).toBeGreaterThan(170);
|
|
|
|
expect(opListAnnotEnableStorage.lastChunk).toEqual(true);
|
2022-06-27 18:41:37 +09:00
|
|
|
expect(opListAnnotEnableStorage.separateAnnots).toEqual({
|
|
|
|
form: false,
|
|
|
|
canvas: true,
|
|
|
|
});
|
[api-minor] Introduce a new `annotationMode`-option, in `PDFPageProxy.{render, getOperatorList}`
*This is a follow-up to PRs 13867 and 13899.*
This patch is tagged `api-minor` for the following reasons:
- It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.
- For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.
- It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.
---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
2021-08-08 21:36:28 +09:00
|
|
|
|
2022-07-28 20:37:37 +09:00
|
|
|
firstAnnotIndex = opListAnnotEnableStorage.fnArray.indexOf(
|
|
|
|
OPS.beginAnnotation
|
|
|
|
);
|
|
|
|
isUsingOwnCanvas = opListAnnotEnableStorage.argsArray[firstAnnotIndex][4];
|
|
|
|
expect(isUsingOwnCanvas).toEqual(false);
|
|
|
|
|
[api-minor] Introduce a new `annotationMode`-option, in `PDFPageProxy.{render, getOperatorList}`
*This is a follow-up to PRs 13867 and 13899.*
This patch is tagged `api-minor` for the following reasons:
- It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.
- For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.
- It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.
---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
2021-08-08 21:36:28 +09:00
|
|
|
// Sanity check to ensure that the `annotationMode` is correctly applied.
|
|
|
|
expect(opListAnnotDisable.fnArray.length).toBeLessThan(
|
|
|
|
opListAnnotEnableForms.fnArray.length
|
|
|
|
);
|
|
|
|
expect(opListAnnotEnableForms.fnArray.length).toBeLessThan(
|
|
|
|
opListAnnotEnable.fnArray.length
|
|
|
|
);
|
|
|
|
expect(opListAnnotEnable.fnArray.length).toBeLessThan(
|
|
|
|
opListAnnotEnableStorage.fnArray.length
|
|
|
|
);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2021-12-07 21:16:38 +09:00
|
|
|
it("gets operatorList, with page resources containing corrupt /CCITTFaxDecode data", async function () {
|
|
|
|
const loadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("poppler-90-0-fuzzed.pdf")
|
|
|
|
);
|
|
|
|
expect(loadingTask instanceof PDFDocumentLoadingTask).toEqual(true);
|
|
|
|
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
expect(pdfDoc.numPages).toEqual(16);
|
|
|
|
|
|
|
|
const pdfPage = await pdfDoc.getPage(6);
|
|
|
|
expect(pdfPage instanceof PDFPageProxy).toEqual(true);
|
|
|
|
|
|
|
|
const opList = await pdfPage.getOperatorList();
|
|
|
|
expect(opList.fnArray.length).toBeGreaterThan(25);
|
|
|
|
expect(opList.argsArray.length).toBeGreaterThan(25);
|
|
|
|
expect(opList.lastChunk).toEqual(true);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("gets page stats after parsing page, without `pdfBug` set", async function () {
|
|
|
|
await page.getOperatorList();
|
|
|
|
expect(page.stats).toEqual(null);
|
2018-06-25 20:19:29 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("gets page stats after parsing page, with `pdfBug` set", async function () {
|
2020-01-24 17:48:21 +09:00
|
|
|
const loadingTask = getDocument(
|
2018-06-25 20:19:29 +09:00
|
|
|
buildGetDocumentParams(basicApiFileName, { pdfBug: true })
|
|
|
|
);
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
|
|
|
await pdfPage.getOperatorList();
|
|
|
|
const stats = pdfPage.stats;
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
expect(stats instanceof StatTimer).toEqual(true);
|
|
|
|
expect(stats.times.length).toEqual(1);
|
2018-06-25 20:19:29 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const [statEntry] = stats.times;
|
|
|
|
expect(statEntry.name).toEqual("Page Request");
|
|
|
|
expect(statEntry.end - statEntry.start).toBeGreaterThanOrEqual(0);
|
2018-06-25 20:19:29 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await loadingTask.destroy();
|
2018-06-25 20:19:29 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
it("gets page stats after rendering page, with `pdfBug` set", async function () {
|
2020-01-24 17:48:21 +09:00
|
|
|
const loadingTask = getDocument(
|
2018-06-25 20:19:29 +09:00
|
|
|
buildGetDocumentParams(basicApiFileName, { pdfBug: true })
|
|
|
|
);
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
|
|
|
const viewport = pdfPage.getViewport({ scale: 1 });
|
2022-01-13 19:58:45 +09:00
|
|
|
expect(viewport instanceof PageViewport).toEqual(true);
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const canvasAndCtx = CanvasFactory.create(
|
|
|
|
viewport.width,
|
|
|
|
viewport.height
|
|
|
|
);
|
|
|
|
const renderTask = pdfPage.render({
|
|
|
|
canvasContext: canvasAndCtx.context,
|
|
|
|
viewport,
|
|
|
|
});
|
2021-09-13 20:34:37 +09:00
|
|
|
expect(renderTask instanceof RenderTask).toEqual(true);
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await renderTask.promise;
|
2022-06-27 18:41:37 +09:00
|
|
|
expect(renderTask.separateAnnots).toEqual(false);
|
2018-06-25 20:19:29 +09:00
|
|
|
|
2022-06-27 18:41:37 +09:00
|
|
|
const { stats } = pdfPage;
|
2021-04-17 04:48:42 +09:00
|
|
|
expect(stats instanceof StatTimer).toEqual(true);
|
|
|
|
expect(stats.times.length).toEqual(3);
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const [statEntryOne, statEntryTwo, statEntryThree] = stats.times;
|
|
|
|
expect(statEntryOne.name).toEqual("Page Request");
|
|
|
|
expect(statEntryOne.end - statEntryOne.start).toBeGreaterThanOrEqual(0);
|
2018-06-25 20:19:29 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
expect(statEntryTwo.name).toEqual("Rendering");
|
|
|
|
expect(statEntryTwo.end - statEntryTwo.start).toBeGreaterThan(0);
|
2018-06-25 20:19:29 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
expect(statEntryThree.name).toEqual("Overall");
|
|
|
|
expect(statEntryThree.end - statEntryThree.start).toBeGreaterThan(0);
|
2018-06-25 20:19:29 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
CanvasFactory.destroy(canvasAndCtx);
|
|
|
|
await loadingTask.destroy();
|
2018-06-25 20:19:29 +09:00
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("cancels rendering of page", async function () {
|
2020-10-25 23:40:51 +09:00
|
|
|
const viewport = page.getViewport({ scale: 1 });
|
2022-01-13 19:58:45 +09:00
|
|
|
expect(viewport instanceof PageViewport).toEqual(true);
|
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const canvasAndCtx = CanvasFactory.create(
|
|
|
|
viewport.width,
|
|
|
|
viewport.height
|
|
|
|
);
|
|
|
|
const renderTask = page.render({
|
2017-03-13 21:56:59 +09:00
|
|
|
canvasContext: canvasAndCtx.context,
|
2017-04-28 20:40:47 +09:00
|
|
|
viewport,
|
2017-03-13 21:32:23 +09:00
|
|
|
});
|
2021-09-13 20:34:37 +09:00
|
|
|
expect(renderTask instanceof RenderTask).toEqual(true);
|
|
|
|
|
2017-03-13 21:32:23 +09:00
|
|
|
renderTask.cancel();
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
try {
|
|
|
|
await renderTask.promise;
|
2020-03-19 23:36:09 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
|
|
|
} catch (reason) {
|
|
|
|
expect(reason instanceof RenderingCancelledException).toEqual(true);
|
|
|
|
expect(reason.message).toEqual("Rendering cancelled, page 1");
|
2022-12-14 20:34:16 +09:00
|
|
|
expect(reason.extraDelay).toEqual(0);
|
2021-04-17 04:48:42 +09:00
|
|
|
}
|
|
|
|
|
|
|
|
CanvasFactory.destroy(canvasAndCtx);
|
2017-03-13 21:32:23 +09:00
|
|
|
});
|
2018-06-29 05:38:09 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("re-render page, using the same canvas, after cancelling rendering", async function () {
|
2020-01-24 17:48:21 +09:00
|
|
|
const viewport = page.getViewport({ scale: 1 });
|
2022-01-13 19:58:45 +09:00
|
|
|
expect(viewport instanceof PageViewport).toEqual(true);
|
|
|
|
|
2020-01-24 17:48:21 +09:00
|
|
|
const canvasAndCtx = CanvasFactory.create(
|
|
|
|
viewport.width,
|
|
|
|
viewport.height
|
|
|
|
);
|
|
|
|
const renderTask = page.render({
|
2018-06-29 05:38:09 +09:00
|
|
|
canvasContext: canvasAndCtx.context,
|
|
|
|
viewport,
|
|
|
|
});
|
2021-09-13 20:34:37 +09:00
|
|
|
expect(renderTask instanceof RenderTask).toEqual(true);
|
|
|
|
|
2018-06-29 05:38:09 +09:00
|
|
|
renderTask.cancel();
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
try {
|
|
|
|
await renderTask.promise;
|
|
|
|
|
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
|
|
|
} catch (reason) {
|
|
|
|
expect(reason instanceof RenderingCancelledException).toEqual(true);
|
|
|
|
}
|
|
|
|
|
|
|
|
const reRenderTask = page.render({
|
|
|
|
canvasContext: canvasAndCtx.context,
|
|
|
|
viewport,
|
|
|
|
});
|
2021-09-13 20:34:37 +09:00
|
|
|
expect(reRenderTask instanceof RenderTask).toEqual(true);
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await reRenderTask.promise;
|
2022-06-27 18:41:37 +09:00
|
|
|
expect(reRenderTask.separateAnnots).toEqual(false);
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
CanvasFactory.destroy(canvasAndCtx);
|
2018-06-29 05:38:09 +09:00
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("multiple render() on the same canvas", async function () {
|
2021-05-16 17:58:34 +09:00
|
|
|
const optionalContentConfigPromise =
|
|
|
|
pdfDocument.getOptionalContentConfig();
|
2020-08-05 05:31:24 +09:00
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const viewport = page.getViewport({ scale: 1 });
|
2022-01-13 19:58:45 +09:00
|
|
|
expect(viewport instanceof PageViewport).toEqual(true);
|
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const canvasAndCtx = CanvasFactory.create(
|
|
|
|
viewport.width,
|
|
|
|
viewport.height
|
|
|
|
);
|
|
|
|
const renderTask1 = page.render({
|
2017-06-13 06:04:35 +09:00
|
|
|
canvasContext: canvasAndCtx.context,
|
|
|
|
viewport,
|
2020-08-05 05:31:24 +09:00
|
|
|
optionalContentConfigPromise,
|
2017-06-13 06:04:35 +09:00
|
|
|
});
|
2021-09-13 20:34:37 +09:00
|
|
|
expect(renderTask1 instanceof RenderTask).toEqual(true);
|
|
|
|
|
2020-10-25 23:40:51 +09:00
|
|
|
const renderTask2 = page.render({
|
2017-06-13 06:04:35 +09:00
|
|
|
canvasContext: canvasAndCtx.context,
|
|
|
|
viewport,
|
2020-08-05 05:31:24 +09:00
|
|
|
optionalContentConfigPromise,
|
2017-06-13 06:04:35 +09:00
|
|
|
});
|
2021-09-13 20:34:37 +09:00
|
|
|
expect(renderTask2 instanceof RenderTask).toEqual(true);
|
2017-06-13 06:04:35 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await Promise.all([
|
2017-06-13 06:04:35 +09:00
|
|
|
renderTask1.promise,
|
|
|
|
renderTask2.promise.then(
|
|
|
|
() => {
|
2021-04-17 04:48:42 +09:00
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
2017-06-13 06:04:35 +09:00
|
|
|
},
|
|
|
|
reason => {
|
2021-04-17 04:48:42 +09:00
|
|
|
// It fails because we are already using this canvas.
|
2017-06-13 06:04:35 +09:00
|
|
|
expect(/multiple render\(\)/.test(reason.message)).toEqual(true);
|
|
|
|
}
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
),
|
2021-04-17 04:48:42 +09:00
|
|
|
]);
|
2017-06-13 06:04:35 +09:00
|
|
|
});
|
[api-minor] Change `PDFDocumentProxy.cleanup`/`PDFPageProxy.cleanup` to return data
This patch makes the following changes, to improve these API methods:
- Let `PDFPageProxy.cleanup` return a boolean indicating if clean-up actually happened, since ongoing rendering will block clean-up.
Besides being used in other parts of this patch, it seems that an API user may also be interested in the return value given that clean-up isn't *guaranteed* to happen.
- Let `PDFDocumentProxy.cleanup` return the promise indicating when clean-up is finished.
- Improve the JSDoc comment for `PDFDocumentProxy.cleanup` to mention that clean-up is triggered on *both* threads (without going into unnecessary specifics regarding what *exactly* said data actually is).
Add a note in the JSDoc comment about not calling this method when rendering is ongoing.
- Change `WorkerTransport.startCleanup` to throw an `Error` if it's called when rendering is ongoing, to prevent rendering from breaking.
Please note that this won't stop *worker-thread* clean-up from happening (since there's no general "something is rendering"-flag), however I'm not sure if that's really a problem; but please don't quote me on that :-)
All of the caches that's being cleared in `Catalog.cleanup`, on the worker-thread, *should* be re-filled automatically even if cleared *during* parsing/rendering, and the only thing that probably happens is that e.g. font data would have to be re-parsed.
On the main-thread, on the other hand, clearing the caches is more-or-less guaranteed to cause rendering errors, since the rendering code in `src/display/canvas.js` isn't able to re-request any image/font data that's suddenly being pulled out from under it.
- Last, but not least, add a couple of basic unit-tests for the clean-up functionality.
2020-02-07 23:48:58 +09:00
|
|
|
|
2021-04-02 19:26:46 +09:00
|
|
|
it("cleans up document resources after rendering of page", async function () {
|
[api-minor] Change `PDFDocumentProxy.cleanup`/`PDFPageProxy.cleanup` to return data
This patch makes the following changes, to improve these API methods:
- Let `PDFPageProxy.cleanup` return a boolean indicating if clean-up actually happened, since ongoing rendering will block clean-up.
Besides being used in other parts of this patch, it seems that an API user may also be interested in the return value given that clean-up isn't *guaranteed* to happen.
- Let `PDFDocumentProxy.cleanup` return the promise indicating when clean-up is finished.
- Improve the JSDoc comment for `PDFDocumentProxy.cleanup` to mention that clean-up is triggered on *both* threads (without going into unnecessary specifics regarding what *exactly* said data actually is).
Add a note in the JSDoc comment about not calling this method when rendering is ongoing.
- Change `WorkerTransport.startCleanup` to throw an `Error` if it's called when rendering is ongoing, to prevent rendering from breaking.
Please note that this won't stop *worker-thread* clean-up from happening (since there's no general "something is rendering"-flag), however I'm not sure if that's really a problem; but please don't quote me on that :-)
All of the caches that's being cleared in `Catalog.cleanup`, on the worker-thread, *should* be re-filled automatically even if cleared *during* parsing/rendering, and the only thing that probably happens is that e.g. font data would have to be re-parsed.
On the main-thread, on the other hand, clearing the caches is more-or-less guaranteed to cause rendering errors, since the rendering code in `src/display/canvas.js` isn't able to re-request any image/font data that's suddenly being pulled out from under it.
- Last, but not least, add a couple of basic unit-tests for the clean-up functionality.
2020-02-07 23:48:58 +09:00
|
|
|
const loadingTask = getDocument(buildGetDocumentParams(basicApiFileName));
|
2021-04-02 19:26:46 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
[api-minor] Change `PDFDocumentProxy.cleanup`/`PDFPageProxy.cleanup` to return data
This patch makes the following changes, to improve these API methods:
- Let `PDFPageProxy.cleanup` return a boolean indicating if clean-up actually happened, since ongoing rendering will block clean-up.
Besides being used in other parts of this patch, it seems that an API user may also be interested in the return value given that clean-up isn't *guaranteed* to happen.
- Let `PDFDocumentProxy.cleanup` return the promise indicating when clean-up is finished.
- Improve the JSDoc comment for `PDFDocumentProxy.cleanup` to mention that clean-up is triggered on *both* threads (without going into unnecessary specifics regarding what *exactly* said data actually is).
Add a note in the JSDoc comment about not calling this method when rendering is ongoing.
- Change `WorkerTransport.startCleanup` to throw an `Error` if it's called when rendering is ongoing, to prevent rendering from breaking.
Please note that this won't stop *worker-thread* clean-up from happening (since there's no general "something is rendering"-flag), however I'm not sure if that's really a problem; but please don't quote me on that :-)
All of the caches that's being cleared in `Catalog.cleanup`, on the worker-thread, *should* be re-filled automatically even if cleared *during* parsing/rendering, and the only thing that probably happens is that e.g. font data would have to be re-parsed.
On the main-thread, on the other hand, clearing the caches is more-or-less guaranteed to cause rendering errors, since the rendering code in `src/display/canvas.js` isn't able to re-request any image/font data that's suddenly being pulled out from under it.
- Last, but not least, add a couple of basic unit-tests for the clean-up functionality.
2020-02-07 23:48:58 +09:00
|
|
|
|
2021-04-02 19:26:46 +09:00
|
|
|
const viewport = pdfPage.getViewport({ scale: 1 });
|
2022-01-13 19:58:45 +09:00
|
|
|
expect(viewport instanceof PageViewport).toEqual(true);
|
|
|
|
|
2021-04-02 19:26:46 +09:00
|
|
|
const canvasAndCtx = CanvasFactory.create(
|
|
|
|
viewport.width,
|
|
|
|
viewport.height
|
|
|
|
);
|
|
|
|
const renderTask = pdfPage.render({
|
|
|
|
canvasContext: canvasAndCtx.context,
|
|
|
|
viewport,
|
|
|
|
});
|
2021-09-13 20:34:37 +09:00
|
|
|
expect(renderTask instanceof RenderTask).toEqual(true);
|
[api-minor] Change `PDFDocumentProxy.cleanup`/`PDFPageProxy.cleanup` to return data
This patch makes the following changes, to improve these API methods:
- Let `PDFPageProxy.cleanup` return a boolean indicating if clean-up actually happened, since ongoing rendering will block clean-up.
Besides being used in other parts of this patch, it seems that an API user may also be interested in the return value given that clean-up isn't *guaranteed* to happen.
- Let `PDFDocumentProxy.cleanup` return the promise indicating when clean-up is finished.
- Improve the JSDoc comment for `PDFDocumentProxy.cleanup` to mention that clean-up is triggered on *both* threads (without going into unnecessary specifics regarding what *exactly* said data actually is).
Add a note in the JSDoc comment about not calling this method when rendering is ongoing.
- Change `WorkerTransport.startCleanup` to throw an `Error` if it's called when rendering is ongoing, to prevent rendering from breaking.
Please note that this won't stop *worker-thread* clean-up from happening (since there's no general "something is rendering"-flag), however I'm not sure if that's really a problem; but please don't quote me on that :-)
All of the caches that's being cleared in `Catalog.cleanup`, on the worker-thread, *should* be re-filled automatically even if cleared *during* parsing/rendering, and the only thing that probably happens is that e.g. font data would have to be re-parsed.
On the main-thread, on the other hand, clearing the caches is more-or-less guaranteed to cause rendering errors, since the rendering code in `src/display/canvas.js` isn't able to re-request any image/font data that's suddenly being pulled out from under it.
- Last, but not least, add a couple of basic unit-tests for the clean-up functionality.
2020-02-07 23:48:58 +09:00
|
|
|
|
2021-09-13 20:34:37 +09:00
|
|
|
await renderTask.promise;
|
2022-06-27 18:41:37 +09:00
|
|
|
expect(renderTask.separateAnnots).toEqual(false);
|
2021-04-02 19:26:46 +09:00
|
|
|
|
2022-06-27 18:41:37 +09:00
|
|
|
await pdfDoc.cleanup();
|
2021-04-02 19:26:46 +09:00
|
|
|
expect(true).toEqual(true);
|
|
|
|
|
|
|
|
CanvasFactory.destroy(canvasAndCtx);
|
|
|
|
await loadingTask.destroy();
|
[api-minor] Change `PDFDocumentProxy.cleanup`/`PDFPageProxy.cleanup` to return data
This patch makes the following changes, to improve these API methods:
- Let `PDFPageProxy.cleanup` return a boolean indicating if clean-up actually happened, since ongoing rendering will block clean-up.
Besides being used in other parts of this patch, it seems that an API user may also be interested in the return value given that clean-up isn't *guaranteed* to happen.
- Let `PDFDocumentProxy.cleanup` return the promise indicating when clean-up is finished.
- Improve the JSDoc comment for `PDFDocumentProxy.cleanup` to mention that clean-up is triggered on *both* threads (without going into unnecessary specifics regarding what *exactly* said data actually is).
Add a note in the JSDoc comment about not calling this method when rendering is ongoing.
- Change `WorkerTransport.startCleanup` to throw an `Error` if it's called when rendering is ongoing, to prevent rendering from breaking.
Please note that this won't stop *worker-thread* clean-up from happening (since there's no general "something is rendering"-flag), however I'm not sure if that's really a problem; but please don't quote me on that :-)
All of the caches that's being cleared in `Catalog.cleanup`, on the worker-thread, *should* be re-filled automatically even if cleared *during* parsing/rendering, and the only thing that probably happens is that e.g. font data would have to be re-parsed.
On the main-thread, on the other hand, clearing the caches is more-or-less guaranteed to cause rendering errors, since the rendering code in `src/display/canvas.js` isn't able to re-request any image/font data that's suddenly being pulled out from under it.
- Last, but not least, add a couple of basic unit-tests for the clean-up functionality.
2020-02-07 23:48:58 +09:00
|
|
|
});
|
|
|
|
|
2021-04-02 19:26:46 +09:00
|
|
|
it("cleans up document resources during rendering of page", async function () {
|
2023-08-15 19:13:36 +09:00
|
|
|
const loadingTask = getDocument(tracemonkeyGetDocumentParams);
|
2021-04-02 19:26:46 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
[api-minor] Change `PDFDocumentProxy.cleanup`/`PDFPageProxy.cleanup` to return data
This patch makes the following changes, to improve these API methods:
- Let `PDFPageProxy.cleanup` return a boolean indicating if clean-up actually happened, since ongoing rendering will block clean-up.
Besides being used in other parts of this patch, it seems that an API user may also be interested in the return value given that clean-up isn't *guaranteed* to happen.
- Let `PDFDocumentProxy.cleanup` return the promise indicating when clean-up is finished.
- Improve the JSDoc comment for `PDFDocumentProxy.cleanup` to mention that clean-up is triggered on *both* threads (without going into unnecessary specifics regarding what *exactly* said data actually is).
Add a note in the JSDoc comment about not calling this method when rendering is ongoing.
- Change `WorkerTransport.startCleanup` to throw an `Error` if it's called when rendering is ongoing, to prevent rendering from breaking.
Please note that this won't stop *worker-thread* clean-up from happening (since there's no general "something is rendering"-flag), however I'm not sure if that's really a problem; but please don't quote me on that :-)
All of the caches that's being cleared in `Catalog.cleanup`, on the worker-thread, *should* be re-filled automatically even if cleared *during* parsing/rendering, and the only thing that probably happens is that e.g. font data would have to be re-parsed.
On the main-thread, on the other hand, clearing the caches is more-or-less guaranteed to cause rendering errors, since the rendering code in `src/display/canvas.js` isn't able to re-request any image/font data that's suddenly being pulled out from under it.
- Last, but not least, add a couple of basic unit-tests for the clean-up functionality.
2020-02-07 23:48:58 +09:00
|
|
|
|
2021-04-02 19:26:46 +09:00
|
|
|
const viewport = pdfPage.getViewport({ scale: 1 });
|
2022-01-13 19:58:45 +09:00
|
|
|
expect(viewport instanceof PageViewport).toEqual(true);
|
|
|
|
|
2021-04-02 19:26:46 +09:00
|
|
|
const canvasAndCtx = CanvasFactory.create(
|
|
|
|
viewport.width,
|
|
|
|
viewport.height
|
|
|
|
);
|
|
|
|
const renderTask = pdfPage.render({
|
|
|
|
canvasContext: canvasAndCtx.context,
|
|
|
|
viewport,
|
2023-06-15 06:07:53 +09:00
|
|
|
background: "#FF0000", // See comment below.
|
2021-04-02 19:26:46 +09:00
|
|
|
});
|
2021-09-13 20:34:37 +09:00
|
|
|
expect(renderTask instanceof RenderTask).toEqual(true);
|
|
|
|
|
2021-04-02 19:26:46 +09:00
|
|
|
// Ensure that clean-up runs during rendering.
|
|
|
|
renderTask.onContinue = function (cont) {
|
|
|
|
waitSome(cont);
|
|
|
|
};
|
[api-minor] Change `PDFDocumentProxy.cleanup`/`PDFPageProxy.cleanup` to return data
This patch makes the following changes, to improve these API methods:
- Let `PDFPageProxy.cleanup` return a boolean indicating if clean-up actually happened, since ongoing rendering will block clean-up.
Besides being used in other parts of this patch, it seems that an API user may also be interested in the return value given that clean-up isn't *guaranteed* to happen.
- Let `PDFDocumentProxy.cleanup` return the promise indicating when clean-up is finished.
- Improve the JSDoc comment for `PDFDocumentProxy.cleanup` to mention that clean-up is triggered on *both* threads (without going into unnecessary specifics regarding what *exactly* said data actually is).
Add a note in the JSDoc comment about not calling this method when rendering is ongoing.
- Change `WorkerTransport.startCleanup` to throw an `Error` if it's called when rendering is ongoing, to prevent rendering from breaking.
Please note that this won't stop *worker-thread* clean-up from happening (since there's no general "something is rendering"-flag), however I'm not sure if that's really a problem; but please don't quote me on that :-)
All of the caches that's being cleared in `Catalog.cleanup`, on the worker-thread, *should* be re-filled automatically even if cleared *during* parsing/rendering, and the only thing that probably happens is that e.g. font data would have to be re-parsed.
On the main-thread, on the other hand, clearing the caches is more-or-less guaranteed to cause rendering errors, since the rendering code in `src/display/canvas.js` isn't able to re-request any image/font data that's suddenly being pulled out from under it.
- Last, but not least, add a couple of basic unit-tests for the clean-up functionality.
2020-02-07 23:48:58 +09:00
|
|
|
|
2021-04-02 19:26:46 +09:00
|
|
|
try {
|
|
|
|
await pdfDoc.cleanup();
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
// Shouldn't get here.
|
|
|
|
expect(false).toEqual(true);
|
2021-04-02 19:26:46 +09:00
|
|
|
} catch (reason) {
|
|
|
|
expect(reason instanceof Error).toEqual(true);
|
|
|
|
expect(reason.message).toEqual(
|
|
|
|
"startCleanup: Page 1 is currently rendering."
|
|
|
|
);
|
|
|
|
}
|
|
|
|
await renderTask.promise;
|
2022-06-27 18:41:37 +09:00
|
|
|
expect(renderTask.separateAnnots).toEqual(false);
|
2021-04-02 19:26:46 +09:00
|
|
|
|
2023-06-15 06:07:53 +09:00
|
|
|
// Use the red background-color to, more easily, tell that the page was
|
|
|
|
// actually rendered successfully.
|
|
|
|
const { data } = canvasAndCtx.context.getImageData(0, 0, 1, 1);
|
|
|
|
expect(data).toEqual(new Uint8ClampedArray([255, 0, 0, 255]));
|
|
|
|
|
2021-04-02 19:26:46 +09:00
|
|
|
CanvasFactory.destroy(canvasAndCtx);
|
|
|
|
await loadingTask.destroy();
|
[api-minor] Change `PDFDocumentProxy.cleanup`/`PDFPageProxy.cleanup` to return data
This patch makes the following changes, to improve these API methods:
- Let `PDFPageProxy.cleanup` return a boolean indicating if clean-up actually happened, since ongoing rendering will block clean-up.
Besides being used in other parts of this patch, it seems that an API user may also be interested in the return value given that clean-up isn't *guaranteed* to happen.
- Let `PDFDocumentProxy.cleanup` return the promise indicating when clean-up is finished.
- Improve the JSDoc comment for `PDFDocumentProxy.cleanup` to mention that clean-up is triggered on *both* threads (without going into unnecessary specifics regarding what *exactly* said data actually is).
Add a note in the JSDoc comment about not calling this method when rendering is ongoing.
- Change `WorkerTransport.startCleanup` to throw an `Error` if it's called when rendering is ongoing, to prevent rendering from breaking.
Please note that this won't stop *worker-thread* clean-up from happening (since there's no general "something is rendering"-flag), however I'm not sure if that's really a problem; but please don't quote me on that :-)
All of the caches that's being cleared in `Catalog.cleanup`, on the worker-thread, *should* be re-filled automatically even if cleared *during* parsing/rendering, and the only thing that probably happens is that e.g. font data would have to be re-parsed.
On the main-thread, on the other hand, clearing the caches is more-or-less guaranteed to cause rendering errors, since the rendering code in `src/display/canvas.js` isn't able to re-request any image/font data that's suddenly being pulled out from under it.
- Last, but not least, add a couple of basic unit-tests for the clean-up functionality.
2020-02-07 23:48:58 +09:00
|
|
|
});
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
|
2021-03-21 19:33:39 +09:00
|
|
|
it("caches image resources at the document/page level as expected (issue 11878)", async function () {
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
const { NUM_PAGES_THRESHOLD } = GlobalImageCache,
|
|
|
|
EXPECTED_WIDTH = 2550,
|
|
|
|
EXPECTED_HEIGHT = 3300;
|
|
|
|
|
2023-02-16 01:14:04 +09:00
|
|
|
const loadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("issue11878.pdf", {
|
|
|
|
isOffscreenCanvasSupported: false,
|
Attempt to further reduce re-parsing for globally cached images (PR 11912, 16108 follow-up)
In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents.
However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes.
Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to *copy* a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases.
For the PDF document in issue 11878, the rendering time of *the second page* changes as follows (on my computer):
- With the `master`-branch it takes >600 ms to render.
- With this patch that goes down to ~50 ms, which is one order of magnitude faster.
(Note that all other pages are, as expected, completely unaffected by these changes.)
This new main-thread copying is limited to "large" global images, since:
- Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue.
- With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread.
- This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.
2023-12-15 05:57:48 +09:00
|
|
|
pdfBug: true,
|
2023-02-16 01:14:04 +09:00
|
|
|
})
|
|
|
|
);
|
2021-03-21 19:33:39 +09:00
|
|
|
const pdfDoc = await loadingTask.promise;
|
Attempt to further reduce re-parsing for globally cached images (PR 11912, 16108 follow-up)
In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents.
However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes.
Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to *copy* a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases.
For the PDF document in issue 11878, the rendering time of *the second page* changes as follows (on my computer):
- With the `master`-branch it takes >600 ms to render.
- With this patch that goes down to ~50 ms, which is one order of magnitude faster.
(Note that all other pages are, as expected, completely unaffected by these changes.)
This new main-thread copying is limited to "large" global images, since:
- Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue.
- With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread.
- This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.
2023-12-15 05:57:48 +09:00
|
|
|
let checkedCopyLocalImage = false,
|
|
|
|
firstImgData = null,
|
|
|
|
firstStatsOverall = null;
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
|
2021-03-21 19:33:39 +09:00
|
|
|
for (let i = 1; i <= pdfDoc.numPages; i++) {
|
|
|
|
const pdfPage = await pdfDoc.getPage(i);
|
Attempt to further reduce re-parsing for globally cached images (PR 11912, 16108 follow-up)
In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents.
However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes.
Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to *copy* a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases.
For the PDF document in issue 11878, the rendering time of *the second page* changes as follows (on my computer):
- With the `master`-branch it takes >600 ms to render.
- With this patch that goes down to ~50 ms, which is one order of magnitude faster.
(Note that all other pages are, as expected, completely unaffected by these changes.)
This new main-thread copying is limited to "large" global images, since:
- Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue.
- With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread.
- This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.
2023-12-15 05:57:48 +09:00
|
|
|
const viewport = pdfPage.getViewport({ scale: 1 });
|
|
|
|
|
|
|
|
const canvasAndCtx = CanvasFactory.create(
|
|
|
|
viewport.width,
|
|
|
|
viewport.height
|
|
|
|
);
|
|
|
|
const renderTask = pdfPage.render({
|
|
|
|
canvasContext: canvasAndCtx.context,
|
|
|
|
viewport,
|
|
|
|
});
|
|
|
|
|
|
|
|
await renderTask.promise;
|
|
|
|
const opList = renderTask.getOperatorList();
|
|
|
|
// The canvas is no longer necessary, since we only care about
|
|
|
|
// the image-data below.
|
|
|
|
CanvasFactory.destroy(canvasAndCtx);
|
|
|
|
|
|
|
|
const [statsOverall] = pdfPage.stats.times
|
|
|
|
.filter(time => time.name === "Overall")
|
|
|
|
.map(time => time.end - time.start);
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
|
2021-03-21 19:33:39 +09:00
|
|
|
const { commonObjs, objs } = pdfPage;
|
|
|
|
const imgIndex = opList.fnArray.indexOf(OPS.paintImageXObject);
|
|
|
|
const [objId, width, height] = opList.argsArray[imgIndex];
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
|
2021-03-21 19:33:39 +09:00
|
|
|
if (i < NUM_PAGES_THRESHOLD) {
|
|
|
|
expect(objId).toEqual(`img_p${i - 1}_1`);
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
|
2021-03-21 19:33:39 +09:00
|
|
|
expect(objs.has(objId)).toEqual(true);
|
|
|
|
expect(commonObjs.has(objId)).toEqual(false);
|
|
|
|
} else {
|
|
|
|
expect(objId).toEqual(
|
|
|
|
`g_${loadingTask.docId}_img_p${NUM_PAGES_THRESHOLD - 1}_1`
|
|
|
|
);
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
|
2021-03-21 19:33:39 +09:00
|
|
|
expect(objs.has(objId)).toEqual(false);
|
|
|
|
expect(commonObjs.has(objId)).toEqual(true);
|
|
|
|
}
|
|
|
|
expect(width).toEqual(EXPECTED_WIDTH);
|
|
|
|
expect(height).toEqual(EXPECTED_HEIGHT);
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
|
2021-03-21 19:33:39 +09:00
|
|
|
// Ensure that the actual image data is identical for all pages.
|
|
|
|
if (i === 1) {
|
|
|
|
firstImgData = objs.get(objId);
|
Attempt to further reduce re-parsing for globally cached images (PR 11912, 16108 follow-up)
In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents.
However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes.
Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to *copy* a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases.
For the PDF document in issue 11878, the rendering time of *the second page* changes as follows (on my computer):
- With the `master`-branch it takes >600 ms to render.
- With this patch that goes down to ~50 ms, which is one order of magnitude faster.
(Note that all other pages are, as expected, completely unaffected by these changes.)
This new main-thread copying is limited to "large" global images, since:
- Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue.
- With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread.
- This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.
2023-12-15 05:57:48 +09:00
|
|
|
firstStatsOverall = statsOverall;
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
|
2021-03-21 19:33:39 +09:00
|
|
|
expect(firstImgData.width).toEqual(EXPECTED_WIDTH);
|
|
|
|
expect(firstImgData.height).toEqual(EXPECTED_HEIGHT);
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
|
2021-03-21 19:33:39 +09:00
|
|
|
expect(firstImgData.kind).toEqual(ImageKind.RGB_24BPP);
|
|
|
|
expect(firstImgData.data instanceof Uint8ClampedArray).toEqual(true);
|
|
|
|
expect(firstImgData.data.length).toEqual(25245000);
|
|
|
|
} else {
|
|
|
|
const objsPool = i >= NUM_PAGES_THRESHOLD ? commonObjs : objs;
|
|
|
|
const currentImgData = objsPool.get(objId);
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
|
Attempt to further reduce re-parsing for globally cached images (PR 11912, 16108 follow-up)
In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents.
However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes.
Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to *copy* a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases.
For the PDF document in issue 11878, the rendering time of *the second page* changes as follows (on my computer):
- With the `master`-branch it takes >600 ms to render.
- With this patch that goes down to ~50 ms, which is one order of magnitude faster.
(Note that all other pages are, as expected, completely unaffected by these changes.)
This new main-thread copying is limited to "large" global images, since:
- Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue.
- With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread.
- This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.
2023-12-15 05:57:48 +09:00
|
|
|
expect(currentImgData).not.toBe(firstImgData);
|
|
|
|
|
2021-03-21 19:33:39 +09:00
|
|
|
expect(currentImgData.width).toEqual(firstImgData.width);
|
|
|
|
expect(currentImgData.height).toEqual(firstImgData.height);
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
|
2021-03-21 19:33:39 +09:00
|
|
|
expect(currentImgData.kind).toEqual(firstImgData.kind);
|
|
|
|
expect(currentImgData.data instanceof Uint8ClampedArray).toEqual(
|
|
|
|
true
|
|
|
|
);
|
|
|
|
expect(
|
2024-01-21 18:13:12 +09:00
|
|
|
currentImgData.data.every(
|
|
|
|
(value, index) => value === firstImgData.data[index]
|
|
|
|
)
|
2021-03-21 19:33:39 +09:00
|
|
|
).toEqual(true);
|
Attempt to further reduce re-parsing for globally cached images (PR 11912, 16108 follow-up)
In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents.
However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes.
Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to *copy* a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases.
For the PDF document in issue 11878, the rendering time of *the second page* changes as follows (on my computer):
- With the `master`-branch it takes >600 ms to render.
- With this patch that goes down to ~50 ms, which is one order of magnitude faster.
(Note that all other pages are, as expected, completely unaffected by these changes.)
This new main-thread copying is limited to "large" global images, since:
- Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue.
- With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread.
- This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.
2023-12-15 05:57:48 +09:00
|
|
|
|
|
|
|
if (i === NUM_PAGES_THRESHOLD) {
|
|
|
|
checkedCopyLocalImage = true;
|
|
|
|
// Ensure that the image was copied in the main-thread, rather
|
|
|
|
// than being re-parsed in the worker-thread (which is slower).
|
2024-02-12 20:28:21 +09:00
|
|
|
expect(statsOverall).toBeLessThan(firstStatsOverall / 4);
|
Attempt to further reduce re-parsing for globally cached images (PR 11912, 16108 follow-up)
In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents.
However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes.
Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to *copy* a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases.
For the PDF document in issue 11878, the rendering time of *the second page* changes as follows (on my computer):
- With the `master`-branch it takes >600 ms to render.
- With this patch that goes down to ~50 ms, which is one order of magnitude faster.
(Note that all other pages are, as expected, completely unaffected by these changes.)
This new main-thread copying is limited to "large" global images, since:
- Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue.
- With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread.
- This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.
2023-12-15 05:57:48 +09:00
|
|
|
}
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
}
|
|
|
|
}
|
Attempt to further reduce re-parsing for globally cached images (PR 11912, 16108 follow-up)
In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents.
However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes.
Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to *copy* a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases.
For the PDF document in issue 11878, the rendering time of *the second page* changes as follows (on my computer):
- With the `master`-branch it takes >600 ms to render.
- With this patch that goes down to ~50 ms, which is one order of magnitude faster.
(Note that all other pages are, as expected, completely unaffected by these changes.)
This new main-thread copying is limited to "large" global images, since:
- Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue.
- With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread.
- This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.
2023-12-15 05:57:48 +09:00
|
|
|
expect(checkedCopyLocalImage).toBeTruthy();
|
2021-03-21 19:33:39 +09:00
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
firstImgData = null;
|
Attempt to further reduce re-parsing for globally cached images (PR 11912, 16108 follow-up)
In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents.
However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes.
Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to *copy* a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases.
For the PDF document in issue 11878, the rendering time of *the second page* changes as follows (on my computer):
- With the `master`-branch it takes >600 ms to render.
- With this patch that goes down to ~50 ms, which is one order of magnitude faster.
(Note that all other pages are, as expected, completely unaffected by these changes.)
This new main-thread copying is limited to "large" global images, since:
- Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue.
- With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread.
- This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.
2023-12-15 05:57:48 +09:00
|
|
|
firstStatsOverall = null;
|
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]
Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.
In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.
Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]
*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.
---
[1] There's e.g. PDF documents that use the same image as background on all pages.
[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.
[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
|
|
|
});
|
2022-06-13 20:35:58 +09:00
|
|
|
|
|
|
|
it("render for printing, with `printAnnotationStorage` set", async function () {
|
|
|
|
async function getPrintData(printAnnotationStorage = null) {
|
|
|
|
const canvasAndCtx = CanvasFactory.create(
|
|
|
|
viewport.width,
|
|
|
|
viewport.height
|
|
|
|
);
|
|
|
|
const renderTask = pdfPage.render({
|
|
|
|
canvasContext: canvasAndCtx.context,
|
|
|
|
viewport,
|
|
|
|
intent: "print",
|
|
|
|
annotationMode: AnnotationMode.ENABLE_STORAGE,
|
|
|
|
printAnnotationStorage,
|
|
|
|
});
|
|
|
|
|
|
|
|
await renderTask.promise;
|
2022-06-27 18:41:37 +09:00
|
|
|
expect(renderTask.separateAnnots).toEqual(false);
|
|
|
|
|
2022-06-13 20:35:58 +09:00
|
|
|
const printData = canvasAndCtx.canvas.toDataURL();
|
|
|
|
CanvasFactory.destroy(canvasAndCtx);
|
|
|
|
|
|
|
|
return printData;
|
|
|
|
}
|
|
|
|
|
|
|
|
const loadingTask = getDocument(
|
|
|
|
buildGetDocumentParams("annotation-tx.pdf")
|
|
|
|
);
|
|
|
|
const pdfDoc = await loadingTask.promise;
|
|
|
|
const pdfPage = await pdfDoc.getPage(1);
|
|
|
|
const viewport = pdfPage.getViewport({ scale: 1 });
|
|
|
|
|
|
|
|
// Update the contents of the form-field.
|
|
|
|
const { annotationStorage } = pdfDoc;
|
|
|
|
annotationStorage.setValue("22R", { value: "Hello World" });
|
|
|
|
|
|
|
|
// Render for printing, with default parameters.
|
|
|
|
const printOriginalData = await getPrintData();
|
|
|
|
|
|
|
|
// Get the *frozen* print-storage for use during printing.
|
|
|
|
const printAnnotationStorage = annotationStorage.print;
|
|
|
|
// Update the contents of the form-field again.
|
|
|
|
annotationStorage.setValue("22R", { value: "Printing again..." });
|
|
|
|
|
2023-06-29 15:43:02 +09:00
|
|
|
const { hash: annotationHash } = annotationStorage.serializable;
|
|
|
|
const { hash: printAnnotationHash } = printAnnotationStorage.serializable;
|
2022-06-13 20:35:58 +09:00
|
|
|
// Sanity check to ensure that the print-storage didn't change,
|
|
|
|
// after the form-field was updated.
|
|
|
|
expect(printAnnotationHash).not.toEqual(annotationHash);
|
|
|
|
|
|
|
|
// Render for printing again, after updating the form-field,
|
|
|
|
// with default parameters.
|
|
|
|
const printAgainData = await getPrintData();
|
|
|
|
|
|
|
|
// Render for printing again, after updating the form-field,
|
|
|
|
// with `printAnnotationStorage` set.
|
|
|
|
const printStorageData = await getPrintData(printAnnotationStorage);
|
|
|
|
|
|
|
|
// Ensure that printing again, with default parameters,
|
|
|
|
// actually uses the "new" form-field data.
|
|
|
|
expect(printAgainData).not.toEqual(printOriginalData);
|
|
|
|
// Finally ensure that printing, with `printAnnotationStorage` set,
|
|
|
|
// still uses the "previous" form-field data.
|
|
|
|
expect(printStorageData).toEqual(printOriginalData);
|
|
|
|
|
|
|
|
await loadingTask.destroy();
|
|
|
|
});
|
2012-04-13 09:59:30 +09:00
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
describe("Multiple `getDocument` instances", function () {
|
2015-07-14 18:43:20 +09:00
|
|
|
// Regression test for https://github.com/mozilla/pdf.js/issues/6205
|
|
|
|
// A PDF using the Helvetica font.
|
2023-08-15 19:13:36 +09:00
|
|
|
const pdf1 = tracemonkeyGetDocumentParams;
|
2015-07-14 18:43:20 +09:00
|
|
|
// A PDF using the Times font.
|
2020-10-25 23:40:51 +09:00
|
|
|
const pdf2 = buildGetDocumentParams("TAMReview.pdf");
|
2015-07-14 18:43:20 +09:00
|
|
|
// A PDF using the Arial font.
|
2020-10-25 23:40:51 +09:00
|
|
|
const pdf3 = buildGetDocumentParams("issue6068.pdf");
|
|
|
|
const loadingTasks = [];
|
2015-07-14 18:43:20 +09:00
|
|
|
|
|
|
|
// Render the first page of the given PDF file.
|
|
|
|
// Fulfills the promise with the base64-encoded version of the PDF.
|
2018-11-08 21:46:02 +09:00
|
|
|
async function renderPDF(filename) {
|
|
|
|
const loadingTask = getDocument(filename);
|
2016-01-21 07:57:17 +09:00
|
|
|
loadingTasks.push(loadingTask);
|
2018-11-08 21:46:02 +09:00
|
|
|
const pdf = await loadingTask.promise;
|
|
|
|
const page = await pdf.getPage(1);
|
2018-12-21 19:47:37 +09:00
|
|
|
const viewport = page.getViewport({ scale: 1.2 });
|
2022-01-13 19:58:45 +09:00
|
|
|
expect(viewport instanceof PageViewport).toEqual(true);
|
|
|
|
|
2018-11-08 21:46:02 +09:00
|
|
|
const canvasAndCtx = CanvasFactory.create(
|
|
|
|
viewport.width,
|
|
|
|
viewport.height
|
|
|
|
);
|
|
|
|
const renderTask = page.render({
|
|
|
|
canvasContext: canvasAndCtx.context,
|
|
|
|
viewport,
|
|
|
|
});
|
|
|
|
await renderTask.promise;
|
2022-06-27 18:41:37 +09:00
|
|
|
expect(renderTask.separateAnnots).toEqual(false);
|
|
|
|
|
2018-11-08 21:46:02 +09:00
|
|
|
const data = canvasAndCtx.canvas.toDataURL();
|
|
|
|
CanvasFactory.destroy(canvasAndCtx);
|
|
|
|
return data;
|
2015-07-14 18:43:20 +09:00
|
|
|
}
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
afterEach(async function () {
|
2015-07-14 18:43:20 +09:00
|
|
|
// Issue 6205 reported an issue with font rendering, so clear the loaded
|
|
|
|
// fonts so that we can see whether loading PDFs in parallel does not
|
|
|
|
// cause any issues with the rendered fonts.
|
2020-04-14 19:28:14 +09:00
|
|
|
const destroyPromises = loadingTasks.map(function (loadingTask) {
|
2016-01-21 07:57:17 +09:00
|
|
|
return loadingTask.destroy();
|
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
await Promise.all(destroyPromises);
|
2015-07-14 18:43:20 +09:00
|
|
|
});
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("should correctly render PDFs in parallel", async function () {
|
2020-10-25 23:40:51 +09:00
|
|
|
let baseline1, baseline2, baseline3;
|
|
|
|
const promiseDone = renderPDF(pdf1)
|
2020-04-14 19:28:14 +09:00
|
|
|
.then(function (data1) {
|
2015-07-14 18:43:20 +09:00
|
|
|
baseline1 = data1;
|
|
|
|
return renderPDF(pdf2);
|
|
|
|
})
|
2020-04-14 19:28:14 +09:00
|
|
|
.then(function (data2) {
|
2015-07-14 18:43:20 +09:00
|
|
|
baseline2 = data2;
|
|
|
|
return renderPDF(pdf3);
|
|
|
|
})
|
2020-04-14 19:28:14 +09:00
|
|
|
.then(function (data3) {
|
2015-07-14 18:43:20 +09:00
|
|
|
baseline3 = data3;
|
|
|
|
return Promise.all([
|
|
|
|
renderPDF(pdf1),
|
|
|
|
renderPDF(pdf2),
|
|
|
|
renderPDF(pdf3),
|
|
|
|
]);
|
|
|
|
})
|
2020-04-14 19:28:14 +09:00
|
|
|
.then(function (dataUrls) {
|
2015-07-14 18:43:20 +09:00
|
|
|
expect(dataUrls[0]).toEqual(baseline1);
|
|
|
|
expect(dataUrls[1]).toEqual(baseline2);
|
|
|
|
expect(dataUrls[2]).toEqual(baseline3);
|
|
|
|
return true;
|
|
|
|
});
|
2021-04-17 04:48:42 +09:00
|
|
|
|
|
|
|
await promiseDone;
|
2015-07-14 18:43:20 +09:00
|
|
|
});
|
|
|
|
});
|
2019-02-17 22:38:41 +09:00
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
describe("PDFDataRangeTransport", function () {
|
2019-02-17 22:38:41 +09:00
|
|
|
let dataPromise;
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
beforeAll(function () {
|
2021-01-09 01:12:58 +09:00
|
|
|
dataPromise = DefaultFileReaderFactory.fetch({
|
2023-08-15 19:13:36 +09:00
|
|
|
path: TEST_PDFS_PATH + tracemonkeyFileName,
|
2021-01-09 01:12:58 +09:00
|
|
|
});
|
2019-02-17 22:38:41 +09:00
|
|
|
});
|
|
|
|
|
2020-04-14 19:28:14 +09:00
|
|
|
afterAll(function () {
|
2019-02-17 22:38:41 +09:00
|
|
|
dataPromise = null;
|
|
|
|
});
|
2018-03-18 03:56:39 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("should fetch document info and page using ranges", async function () {
|
2019-02-17 22:38:41 +09:00
|
|
|
const initialDataLength = 4000;
|
[api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up)
This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data *by default* instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases:
- TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data.
- TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer).
*PLEASE NOTE:* To avoid being affected by this, please simply *copy* any TypedArray data before passing it to either of the functions/methods mentioned above.
Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues.
Hence we'll check for this and only allow transferring of *safe* TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here.
---
[1] See https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2492-L2506 respectively https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2578-L2590
2023-01-13 19:16:28 +09:00
|
|
|
const subArrays = [];
|
2021-04-17 04:48:42 +09:00
|
|
|
let fetches = 0;
|
|
|
|
|
|
|
|
const data = await dataPromise;
|
[api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up)
This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data *by default* instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases:
- TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data.
- TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer).
*PLEASE NOTE:* To avoid being affected by this, please simply *copy* any TypedArray data before passing it to either of the functions/methods mentioned above.
Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues.
Hence we'll check for this and only allow transferring of *safe* TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here.
---
[1] See https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2492-L2506 respectively https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2578-L2590
2023-01-13 19:16:28 +09:00
|
|
|
const initialData = new Uint8Array(data.subarray(0, initialDataLength));
|
|
|
|
subArrays.push(initialData);
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const transport = new PDFDataRangeTransport(data.length, initialData);
|
|
|
|
transport.requestDataRange = function (begin, end) {
|
|
|
|
fetches++;
|
|
|
|
waitSome(function () {
|
[api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up)
This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data *by default* instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases:
- TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data.
- TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer).
*PLEASE NOTE:* To avoid being affected by this, please simply *copy* any TypedArray data before passing it to either of the functions/methods mentioned above.
Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues.
Hence we'll check for this and only allow transferring of *safe* TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here.
---
[1] See https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2492-L2506 respectively https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2578-L2590
2023-01-13 19:16:28 +09:00
|
|
|
const chunk = new Uint8Array(data.subarray(begin, end));
|
|
|
|
subArrays.push(chunk);
|
|
|
|
|
|
|
|
transport.onDataProgress(initialDataLength);
|
|
|
|
transport.onDataRange(begin, chunk);
|
2021-04-17 04:48:42 +09:00
|
|
|
});
|
|
|
|
};
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2023-01-19 20:40:09 +09:00
|
|
|
const loadingTask = getDocument({ range: transport });
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfDocument = await loadingTask.promise;
|
|
|
|
expect(pdfDocument.numPages).toEqual(14);
|
2019-02-17 22:38:41 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfPage = await pdfDocument.getPage(10);
|
|
|
|
expect(pdfPage.rotate).toEqual(0);
|
|
|
|
expect(fetches).toBeGreaterThan(2);
|
2019-02-17 22:38:41 +09:00
|
|
|
|
2023-03-20 05:49:27 +09:00
|
|
|
// Check that the TypedArrays were transferred.
|
|
|
|
for (const array of subArrays) {
|
|
|
|
expect(array.length).toEqual(0);
|
2023-01-10 01:24:52 +09:00
|
|
|
}
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await loadingTask.destroy();
|
2016-02-10 05:55:11 +09:00
|
|
|
});
|
[Firefox regression] Fix `disableRange=true` bug in `PDFDataTransportStream`
Currently if trying to set `disableRange=true` in the built-in PDF Viewer in Firefox, either through `about:config` or via the URL hash, the PDF document will never load. It appears that this has been broken for a couple of years, without anyone noticing.
Obviously it's not a good idea to set `disableRange=true`, however it seems that this bug affects the PDF Viewer in Firefox even with default settings:
- In the case where `initialData` already contains the *entire* file, we're forced to dispatch a range request to re-fetch already available data just so that file loading may complete.
- (In the case where the data arrives, via streaming, before being specifically requested through `requestDataRange`, we're also forced to re-fetch data unnecessarily.) *This part was removed, to reduce the scope/risk of the patch somewhat.*
In the cases outlined above, we're having to re-fetch already available data thus potentially delaying loading/rendering of PDF files in Firefox (and wasting resources in the process).
2019-03-27 00:05:30 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
it("should fetch document info and page using range and streaming", async function () {
|
2019-02-17 22:38:41 +09:00
|
|
|
const initialDataLength = 4000;
|
[api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up)
This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data *by default* instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases:
- TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data.
- TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer).
*PLEASE NOTE:* To avoid being affected by this, please simply *copy* any TypedArray data before passing it to either of the functions/methods mentioned above.
Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues.
Hence we'll check for this and only allow transferring of *safe* TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here.
---
[1] See https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2492-L2506 respectively https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2578-L2590
2023-01-13 19:16:28 +09:00
|
|
|
const subArrays = [];
|
2021-04-17 04:48:42 +09:00
|
|
|
let fetches = 0;
|
|
|
|
|
|
|
|
const data = await dataPromise;
|
[api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up)
This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data *by default* instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases:
- TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data.
- TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer).
*PLEASE NOTE:* To avoid being affected by this, please simply *copy* any TypedArray data before passing it to either of the functions/methods mentioned above.
Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues.
Hence we'll check for this and only allow transferring of *safe* TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here.
---
[1] See https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2492-L2506 respectively https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2578-L2590
2023-01-13 19:16:28 +09:00
|
|
|
const initialData = new Uint8Array(data.subarray(0, initialDataLength));
|
|
|
|
subArrays.push(initialData);
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const transport = new PDFDataRangeTransport(data.length, initialData);
|
|
|
|
transport.requestDataRange = function (begin, end) {
|
|
|
|
fetches++;
|
|
|
|
if (fetches === 1) {
|
[api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up)
This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data *by default* instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases:
- TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data.
- TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer).
*PLEASE NOTE:* To avoid being affected by this, please simply *copy* any TypedArray data before passing it to either of the functions/methods mentioned above.
Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues.
Hence we'll check for this and only allow transferring of *safe* TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here.
---
[1] See https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2492-L2506 respectively https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2578-L2590
2023-01-13 19:16:28 +09:00
|
|
|
const chunk = new Uint8Array(data.subarray(initialDataLength));
|
|
|
|
subArrays.push(chunk);
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
// Send rest of the data on first range request.
|
[api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up)
This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data *by default* instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases:
- TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data.
- TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer).
*PLEASE NOTE:* To avoid being affected by this, please simply *copy* any TypedArray data before passing it to either of the functions/methods mentioned above.
Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues.
Hence we'll check for this and only allow transferring of *safe* TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here.
---
[1] See https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2492-L2506 respectively https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2578-L2590
2023-01-13 19:16:28 +09:00
|
|
|
transport.onDataProgressiveRead(chunk);
|
2021-04-17 04:48:42 +09:00
|
|
|
}
|
|
|
|
waitSome(function () {
|
[api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up)
This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data *by default* instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases:
- TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data.
- TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer).
*PLEASE NOTE:* To avoid being affected by this, please simply *copy* any TypedArray data before passing it to either of the functions/methods mentioned above.
Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues.
Hence we'll check for this and only allow transferring of *safe* TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here.
---
[1] See https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2492-L2506 respectively https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2578-L2590
2023-01-13 19:16:28 +09:00
|
|
|
const chunk = new Uint8Array(data.subarray(begin, end));
|
|
|
|
subArrays.push(chunk);
|
|
|
|
|
|
|
|
transport.onDataRange(begin, chunk);
|
2021-04-17 04:48:42 +09:00
|
|
|
});
|
|
|
|
};
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2023-01-19 20:40:09 +09:00
|
|
|
const loadingTask = getDocument({ range: transport });
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfDocument = await loadingTask.promise;
|
|
|
|
expect(pdfDocument.numPages).toEqual(14);
|
[Firefox regression] Fix `disableRange=true` bug in `PDFDataTransportStream`
Currently if trying to set `disableRange=true` in the built-in PDF Viewer in Firefox, either through `about:config` or via the URL hash, the PDF document will never load. It appears that this has been broken for a couple of years, without anyone noticing.
Obviously it's not a good idea to set `disableRange=true`, however it seems that this bug affects the PDF Viewer in Firefox even with default settings:
- In the case where `initialData` already contains the *entire* file, we're forced to dispatch a range request to re-fetch already available data just so that file loading may complete.
- (In the case where the data arrives, via streaming, before being specifically requested through `requestDataRange`, we're also forced to re-fetch data unnecessarily.) *This part was removed, to reduce the scope/risk of the patch somewhat.*
In the cases outlined above, we're having to re-fetch already available data thus potentially delaying loading/rendering of PDF files in Firefox (and wasting resources in the process).
2019-03-27 00:05:30 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfPage = await pdfDocument.getPage(10);
|
|
|
|
expect(pdfPage.rotate).toEqual(0);
|
|
|
|
expect(fetches).toEqual(1);
|
[Firefox regression] Fix `disableRange=true` bug in `PDFDataTransportStream`
Currently if trying to set `disableRange=true` in the built-in PDF Viewer in Firefox, either through `about:config` or via the URL hash, the PDF document will never load. It appears that this has been broken for a couple of years, without anyone noticing.
Obviously it's not a good idea to set `disableRange=true`, however it seems that this bug affects the PDF Viewer in Firefox even with default settings:
- In the case where `initialData` already contains the *entire* file, we're forced to dispatch a range request to re-fetch already available data just so that file loading may complete.
- (In the case where the data arrives, via streaming, before being specifically requested through `requestDataRange`, we're also forced to re-fetch data unnecessarily.) *This part was removed, to reduce the scope/risk of the patch somewhat.*
In the cases outlined above, we're having to re-fetch already available data thus potentially delaying loading/rendering of PDF files in Firefox (and wasting resources in the process).
2019-03-27 00:05:30 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await new Promise(resolve => {
|
|
|
|
waitSome(resolve);
|
|
|
|
});
|
[api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up)
This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data *by default* instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases:
- TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data.
- TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer).
*PLEASE NOTE:* To avoid being affected by this, please simply *copy* any TypedArray data before passing it to either of the functions/methods mentioned above.
Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues.
Hence we'll check for this and only allow transferring of *safe* TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here.
---
[1] See https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2492-L2506 respectively https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2578-L2590
2023-01-13 19:16:28 +09:00
|
|
|
|
2023-03-20 05:49:27 +09:00
|
|
|
// Check that the TypedArrays were transferred.
|
|
|
|
for (const array of subArrays) {
|
|
|
|
expect(array.length).toEqual(0);
|
[api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up)
This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data *by default* instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases:
- TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data.
- TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer).
*PLEASE NOTE:* To avoid being affected by this, please simply *copy* any TypedArray data before passing it to either of the functions/methods mentioned above.
Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues.
Hence we'll check for this and only allow transferring of *safe* TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here.
---
[1] See https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2492-L2506 respectively https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2578-L2590
2023-01-13 19:16:28 +09:00
|
|
|
}
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await loadingTask.destroy();
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
});
|
|
|
|
|
|
|
|
it(
|
2016-02-10 05:55:11 +09:00
|
|
|
"should fetch document info and page, without range, " +
|
2019-02-17 22:38:41 +09:00
|
|
|
"using complete initialData",
|
2021-04-17 04:48:42 +09:00
|
|
|
async function () {
|
[api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up)
This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data *by default* instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases:
- TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data.
- TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer).
*PLEASE NOTE:* To avoid being affected by this, please simply *copy* any TypedArray data before passing it to either of the functions/methods mentioned above.
Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues.
Hence we'll check for this and only allow transferring of *safe* TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here.
---
[1] See https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2492-L2506 respectively https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2578-L2590
2023-01-13 19:16:28 +09:00
|
|
|
const subArrays = [];
|
2021-04-17 04:48:42 +09:00
|
|
|
let fetches = 0;
|
|
|
|
|
|
|
|
const data = await dataPromise;
|
[api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up)
This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data *by default* instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases:
- TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data.
- TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer).
*PLEASE NOTE:* To avoid being affected by this, please simply *copy* any TypedArray data before passing it to either of the functions/methods mentioned above.
Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues.
Hence we'll check for this and only allow transferring of *safe* TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here.
---
[1] See https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2492-L2506 respectively https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2578-L2590
2023-01-13 19:16:28 +09:00
|
|
|
const initialData = new Uint8Array(data);
|
|
|
|
subArrays.push(initialData);
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const transport = new PDFDataRangeTransport(
|
|
|
|
data.length,
|
[api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up)
This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data *by default* instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases:
- TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data.
- TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer).
*PLEASE NOTE:* To avoid being affected by this, please simply *copy* any TypedArray data before passing it to either of the functions/methods mentioned above.
Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues.
Hence we'll check for this and only allow transferring of *safe* TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here.
---
[1] See https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2492-L2506 respectively https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2578-L2590
2023-01-13 19:16:28 +09:00
|
|
|
initialData,
|
2021-04-17 04:48:42 +09:00
|
|
|
/* progressiveDone = */ true
|
|
|
|
);
|
|
|
|
transport.requestDataRange = function (begin, end) {
|
|
|
|
fetches++;
|
|
|
|
};
|
Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-25 23:59:37 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const loadingTask = getDocument({
|
|
|
|
disableRange: true,
|
|
|
|
range: transport,
|
|
|
|
});
|
|
|
|
const pdfDocument = await loadingTask.promise;
|
|
|
|
expect(pdfDocument.numPages).toEqual(14);
|
[Firefox regression] Fix `disableRange=true` bug in `PDFDataTransportStream`
Currently if trying to set `disableRange=true` in the built-in PDF Viewer in Firefox, either through `about:config` or via the URL hash, the PDF document will never load. It appears that this has been broken for a couple of years, without anyone noticing.
Obviously it's not a good idea to set `disableRange=true`, however it seems that this bug affects the PDF Viewer in Firefox even with default settings:
- In the case where `initialData` already contains the *entire* file, we're forced to dispatch a range request to re-fetch already available data just so that file loading may complete.
- (In the case where the data arrives, via streaming, before being specifically requested through `requestDataRange`, we're also forced to re-fetch data unnecessarily.) *This part was removed, to reduce the scope/risk of the patch somewhat.*
In the cases outlined above, we're having to re-fetch already available data thus potentially delaying loading/rendering of PDF files in Firefox (and wasting resources in the process).
2019-03-27 00:05:30 +09:00
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
const pdfPage = await pdfDocument.getPage(10);
|
|
|
|
expect(pdfPage.rotate).toEqual(0);
|
|
|
|
expect(fetches).toEqual(0);
|
|
|
|
|
2023-03-20 05:49:27 +09:00
|
|
|
// Check that the TypedArrays were transferred.
|
|
|
|
for (const array of subArrays) {
|
|
|
|
expect(array.length).toEqual(0);
|
[api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up)
This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data *by default* instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases:
- TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data.
- TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer).
*PLEASE NOTE:* To avoid being affected by this, please simply *copy* any TypedArray data before passing it to either of the functions/methods mentioned above.
Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues.
Hence we'll check for this and only allow transferring of *safe* TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here.
---
[1] See https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2492-L2506 respectively https://github.com/mozilla/pdf.js/blob/e09ad99973b1dcb82a06c001da96d52fc5bcab9d/src/display/api.js#L2578-L2590
2023-01-13 19:16:28 +09:00
|
|
|
}
|
|
|
|
|
2021-04-17 04:48:42 +09:00
|
|
|
await loadingTask.destroy();
|
[Firefox regression] Fix `disableRange=true` bug in `PDFDataTransportStream`
Currently if trying to set `disableRange=true` in the built-in PDF Viewer in Firefox, either through `about:config` or via the URL hash, the PDF document will never load. It appears that this has been broken for a couple of years, without anyone noticing.
Obviously it's not a good idea to set `disableRange=true`, however it seems that this bug affects the PDF Viewer in Firefox even with default settings:
- In the case where `initialData` already contains the *entire* file, we're forced to dispatch a range request to re-fetch already available data just so that file loading may complete.
- (In the case where the data arrives, via streaming, before being specifically requested through `requestDataRange`, we're also forced to re-fetch data unnecessarily.) *This part was removed, to reduce the scope/risk of the patch somewhat.*
In the cases outlined above, we're having to re-fetch already available data thus potentially delaying loading/rendering of PDF files in Firefox (and wasting resources in the process).
2019-03-27 00:05:30 +09:00
|
|
|
}
|
|
|
|
);
|
2016-02-10 05:55:11 +09:00
|
|
|
});
|
2022-03-10 21:37:21 +09:00
|
|
|
|
|
|
|
describe("PDFWorkerUtil", function () {
|
|
|
|
describe("isSameOrigin", function () {
|
|
|
|
const { isSameOrigin } = PDFWorkerUtil;
|
|
|
|
|
|
|
|
it("handles invalid base URLs", function () {
|
|
|
|
// The base URL is not valid.
|
|
|
|
expect(isSameOrigin("/foo", "/bar")).toEqual(false);
|
|
|
|
|
|
|
|
// The base URL has no origin.
|
|
|
|
expect(isSameOrigin("blob:foo", "/bar")).toEqual(false);
|
|
|
|
});
|
|
|
|
|
|
|
|
it("correctly checks if the origin of both URLs matches", function () {
|
|
|
|
expect(
|
|
|
|
isSameOrigin(
|
|
|
|
"https://www.mozilla.org/foo",
|
|
|
|
"https://www.mozilla.org/bar"
|
|
|
|
)
|
|
|
|
).toEqual(true);
|
|
|
|
expect(
|
|
|
|
isSameOrigin(
|
|
|
|
"https://www.mozilla.org/foo",
|
|
|
|
"https://www.example.com/bar"
|
|
|
|
)
|
|
|
|
).toEqual(false);
|
|
|
|
});
|
|
|
|
});
|
|
|
|
});
|
2012-04-13 09:59:30 +09:00
|
|
|
});
|