pdf.js/web/generic_scripting.js

74 lines
2.1 KiB
JavaScript
Raw Normal View History

/* Copyright 2020 Mozilla Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
[api-major] Output JavaScript modules in the builds (issue 10317) At this point in time all browsers, and also Node.js, support standard `import`/`export` statements and we can now finally consider outputting modern JavaScript modules in the builds.[1] In order for this to work we can *only* use proper `import`/`export` statements throughout the main code-base, and (as expected) our Node.js support made this much more complicated since both the official builds and the GitHub Actions-based tests must keep working.[2] One remaining issue is that the `pdf.scripting.js` file cannot be built as a JavaScript module, since doing so breaks PDF scripting. Note that my initial goal was to try and split these changes into a couple of commits, however that unfortunately didn't really work since it turned out to be difficult for smaller patches to work correctly and pass (all) tests that way.[3] This is a classic case of every change requiring a couple of other changes, with each of those changes requiring further changes in turn and the size/scope quickly increasing as a result. One possible "issue" with these changes is that we'll now only output JavaScript modules in the builds, which could perhaps be a problem with older tools. However it unfortunately seems far too complicated/time-consuming for us to attempt to support both the old and modern module formats, hence the alternative would be to do "nothing" here and just keep our "old" builds.[4] --- [1] The final blocker was module support in workers in Firefox, which was implemented in Firefox 114; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import#browser_compatibility [2] It's probably possible to further improve/simplify especially the Node.js-specific code, but it does appear to work as-is. [3] Having partially "broken" patches, that fail tests, as part of the commit history is *really not* a good idea in general. [4] Outputting JavaScript modules was first requested almost five years ago, see issue 10317, and nowadays there *should* be much better support for JavaScript modules in various tools.
2023-09-28 20:00:10 +09:00
import { getPdfFilenameFromUrl } from "pdfjs-lib";
async function docProperties(pdfDocument) {
[api-minor] Move the viewer scripting initialization/handling into a new `PDFScriptingManager` class The *main* purpose of this patch is to allow scripting to be used together with the viewer components, note the updated "simpleviewer"/"singlepageviewer" examples, in addition to the full default viewer. Given how the scripting functionality is currently implemented in the default viewer, trying to re-use this with the standalone viewer components would be *very* hard and ideally you'd want it to work out-of-the-box. For an initial implementation, in the default viewer, of the scripting functionality it probably made sense to simply dump all of the code in the `app.js` file, however that cannot be used with the viewer components. To address this, the functionality is moved into a new `PDFScriptingManager` class which can thus be handled in the same way as all other viewer components (and e.g. be passed to the `BaseViewer`-implementations). Obviously the scripting functionality needs quite a lot of data, during its initialization, and for the default viewer we want to maintain the current way of doing the lookups since that helps avoid a number of redundant API-calls. To that end, the `PDFScriptingManager` implementation accepts (optional) factories/functions such that we can maintain the current behaviour for the default viewer. For the viewer components specifically, fallback code-paths are provided to ensure that scripting will "just work"[1]. Besides moving the viewer handling of the scripting code to its own file/class, this patch also takes the opportunity to re-factor the functionality into a number of helper methods to improve overall readability[2]. Note that it's definitely possible that the `PDFScriptingManager` class could be improved even further (e.g. for general re-use), since it's still heavily tailored to the default viewer use-case, however I believe that this patch is still a good step forward overall. --- [1] Obviously *all* the relevant document properties might not be available in the viewer components use-case (e.g. the various URLs), but most things should work just fine. [2] The old `PDFViewerApplication._initializeJavaScript` method, where everything was simply inlined, have over time (in my opinion) become quite large and somewhat difficult to *easily* reason about.
2021-03-05 08:15:18 +09:00
const url = "",
baseUrl = url.split("#", 1)[0];
// eslint-disable-next-line prefer-const
let { info, metadata, contentDispositionFilename, contentLength } =
await pdfDocument.getMetadata();
[api-minor] Move the viewer scripting initialization/handling into a new `PDFScriptingManager` class The *main* purpose of this patch is to allow scripting to be used together with the viewer components, note the updated "simpleviewer"/"singlepageviewer" examples, in addition to the full default viewer. Given how the scripting functionality is currently implemented in the default viewer, trying to re-use this with the standalone viewer components would be *very* hard and ideally you'd want it to work out-of-the-box. For an initial implementation, in the default viewer, of the scripting functionality it probably made sense to simply dump all of the code in the `app.js` file, however that cannot be used with the viewer components. To address this, the functionality is moved into a new `PDFScriptingManager` class which can thus be handled in the same way as all other viewer components (and e.g. be passed to the `BaseViewer`-implementations). Obviously the scripting functionality needs quite a lot of data, during its initialization, and for the default viewer we want to maintain the current way of doing the lookups since that helps avoid a number of redundant API-calls. To that end, the `PDFScriptingManager` implementation accepts (optional) factories/functions such that we can maintain the current behaviour for the default viewer. For the viewer components specifically, fallback code-paths are provided to ensure that scripting will "just work"[1]. Besides moving the viewer handling of the scripting code to its own file/class, this patch also takes the opportunity to re-factor the functionality into a number of helper methods to improve overall readability[2]. Note that it's definitely possible that the `PDFScriptingManager` class could be improved even further (e.g. for general re-use), since it's still heavily tailored to the default viewer use-case, however I believe that this patch is still a good step forward overall. --- [1] Obviously *all* the relevant document properties might not be available in the viewer components use-case (e.g. the various URLs), but most things should work just fine. [2] The old `PDFViewerApplication._initializeJavaScript` method, where everything was simply inlined, have over time (in my opinion) become quite large and somewhat difficult to *easily* reason about.
2021-03-05 08:15:18 +09:00
if (!contentLength) {
const { length } = await pdfDocument.getDownloadInfo();
contentLength = length;
}
return {
...info,
baseURL: baseUrl,
filesize: contentLength,
filename: contentDispositionFilename || getPdfFilenameFromUrl(url),
[api-minor] Move the viewer scripting initialization/handling into a new `PDFScriptingManager` class The *main* purpose of this patch is to allow scripting to be used together with the viewer components, note the updated "simpleviewer"/"singlepageviewer" examples, in addition to the full default viewer. Given how the scripting functionality is currently implemented in the default viewer, trying to re-use this with the standalone viewer components would be *very* hard and ideally you'd want it to work out-of-the-box. For an initial implementation, in the default viewer, of the scripting functionality it probably made sense to simply dump all of the code in the `app.js` file, however that cannot be used with the viewer components. To address this, the functionality is moved into a new `PDFScriptingManager` class which can thus be handled in the same way as all other viewer components (and e.g. be passed to the `BaseViewer`-implementations). Obviously the scripting functionality needs quite a lot of data, during its initialization, and for the default viewer we want to maintain the current way of doing the lookups since that helps avoid a number of redundant API-calls. To that end, the `PDFScriptingManager` implementation accepts (optional) factories/functions such that we can maintain the current behaviour for the default viewer. For the viewer components specifically, fallback code-paths are provided to ensure that scripting will "just work"[1]. Besides moving the viewer handling of the scripting code to its own file/class, this patch also takes the opportunity to re-factor the functionality into a number of helper methods to improve overall readability[2]. Note that it's definitely possible that the `PDFScriptingManager` class could be improved even further (e.g. for general re-use), since it's still heavily tailored to the default viewer use-case, however I believe that this patch is still a good step forward overall. --- [1] Obviously *all* the relevant document properties might not be available in the viewer components use-case (e.g. the various URLs), but most things should work just fine. [2] The old `PDFViewerApplication._initializeJavaScript` method, where everything was simply inlined, have over time (in my opinion) become quite large and somewhat difficult to *easily* reason about.
2021-03-05 08:15:18 +09:00
metadata: metadata?.getRaw(),
authors: metadata?.get("dc:creator"),
numPages: pdfDocument.numPages,
URL: url,
};
}
class GenericScripting {
constructor(sandboxBundleSrc) {
[api-major] Output JavaScript modules in the builds (issue 10317) At this point in time all browsers, and also Node.js, support standard `import`/`export` statements and we can now finally consider outputting modern JavaScript modules in the builds.[1] In order for this to work we can *only* use proper `import`/`export` statements throughout the main code-base, and (as expected) our Node.js support made this much more complicated since both the official builds and the GitHub Actions-based tests must keep working.[2] One remaining issue is that the `pdf.scripting.js` file cannot be built as a JavaScript module, since doing so breaks PDF scripting. Note that my initial goal was to try and split these changes into a couple of commits, however that unfortunately didn't really work since it turned out to be difficult for smaller patches to work correctly and pass (all) tests that way.[3] This is a classic case of every change requiring a couple of other changes, with each of those changes requiring further changes in turn and the size/scope quickly increasing as a result. One possible "issue" with these changes is that we'll now only output JavaScript modules in the builds, which could perhaps be a problem with older tools. However it unfortunately seems far too complicated/time-consuming for us to attempt to support both the old and modern module formats, hence the alternative would be to do "nothing" here and just keep our "old" builds.[4] --- [1] The final blocker was module support in workers in Firefox, which was implemented in Firefox 114; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import#browser_compatibility [2] It's probably possible to further improve/simplify especially the Node.js-specific code, but it does appear to work as-is. [3] Having partially "broken" patches, that fail tests, as part of the commit history is *really not* a good idea in general. [4] Outputting JavaScript modules was first requested almost five years ago, see issue 10317, and nowadays there *should* be much better support for JavaScript modules in various tools.
2023-09-28 20:00:10 +09:00
this._ready = new Promise((resolve, reject) => {
const sandbox =
typeof PDFJSDev === "undefined"
? import(sandboxBundleSrc) // eslint-disable-line no-unsanitized/method
: __non_webpack_import__(sandboxBundleSrc);
[api-major] Output JavaScript modules in the builds (issue 10317) At this point in time all browsers, and also Node.js, support standard `import`/`export` statements and we can now finally consider outputting modern JavaScript modules in the builds.[1] In order for this to work we can *only* use proper `import`/`export` statements throughout the main code-base, and (as expected) our Node.js support made this much more complicated since both the official builds and the GitHub Actions-based tests must keep working.[2] One remaining issue is that the `pdf.scripting.js` file cannot be built as a JavaScript module, since doing so breaks PDF scripting. Note that my initial goal was to try and split these changes into a couple of commits, however that unfortunately didn't really work since it turned out to be difficult for smaller patches to work correctly and pass (all) tests that way.[3] This is a classic case of every change requiring a couple of other changes, with each of those changes requiring further changes in turn and the size/scope quickly increasing as a result. One possible "issue" with these changes is that we'll now only output JavaScript modules in the builds, which could perhaps be a problem with older tools. However it unfortunately seems far too complicated/time-consuming for us to attempt to support both the old and modern module formats, hence the alternative would be to do "nothing" here and just keep our "old" builds.[4] --- [1] The final blocker was module support in workers in Firefox, which was implemented in Firefox 114; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import#browser_compatibility [2] It's probably possible to further improve/simplify especially the Node.js-specific code, but it does appear to work as-is. [3] Having partially "broken" patches, that fail tests, as part of the commit history is *really not* a good idea in general. [4] Outputting JavaScript modules was first requested almost five years ago, see issue 10317, and nowadays there *should* be much better support for JavaScript modules in various tools.
2023-09-28 20:00:10 +09:00
sandbox
.then(pdfjsSandbox => {
resolve(pdfjsSandbox.QuickJSSandbox());
})
.catch(reject);
});
}
async createSandbox(data) {
const sandbox = await this._ready;
sandbox.create(data);
}
async dispatchEventInSandbox(event) {
const sandbox = await this._ready;
setTimeout(() => sandbox.dispatchEvent(event), 0);
}
async destroySandbox() {
const sandbox = await this._ready;
sandbox.nukeSandbox();
}
}
export { docProperties, GenericScripting };