707 Commits

Author SHA1 Message Date
Jonas Jenwald
3c78ff5fb0 [api-minor] Implement basic support for OptionalContent Usage dicts (issue 5764, bug 1826783)
The following are some highlights of this patch:
 - In the Worker we only extract a *subset* of the potential contents of the `Usage` dictionary, to avoid having to implement/test a bunch of code that'd be completely unused in the viewer.

 - In order to still allow the user to *manually* override the default visible layers in the viewer, the viewable/printable state is purposely *not* enforced during initialization in the `OptionalContentConfig` constructor.

 - Printing will now always use the *default* visible layers, rather than using the same state as the viewer (as was the case previously).
   This ensures that the printing-output will correctly take the `Usage` dictionary into account, and in practice toggling of visible layers rarely seem to be necessary except in the viewer itself (if at all).[1]

---
[1] In the unlikely case that it'd ever be deemed necessary to support fine-grained control of optional content visibility during printing, some new (additional) UI would likely be needed to support that case.
2024-03-12 13:18:15 +01:00
Jonas Jenwald
37e98e39f6 Skip any whitespace after the first object in linearized PDFs (issue 17665)
This way the code is now consistent with the non-linearized branch in the `PDFDocument.startXRef` getter.
2024-02-12 22:05:36 +01:00
Jonas Jenwald
06cd278808 Simplify the signature of the PDFDataTransportStream constructor
Given that we need to pass in a `PDFDataRangeTransport`-instance a number of the needed parameters can be obtained from it, rather than having to specify them manually.
2024-02-03 13:10:42 +01:00
Jonas Jenwald
f9a384d711 Enable the arrow-body-style ESLint rule
This manually ignores some cases where the resulting auto-formatting would not, as far as I'm concerned, constitute a readability improvement or where we'd just end up with more overall indentation.

Please see https://eslint.org/docs/latest/rules/arrow-body-style
2024-01-21 16:20:55 +01:00
Jonas Jenwald
9dfe9c552c Use shorter arrow functions where possible
For arrow functions that are both simple and short, we can avoid using explicit `return` to shorten them even further without hurting readability.

For the `gulp mozcentral` build-target this reduces the overall size of the output by just under 1 kilo-byte (which isn't a lot but still can't hurt).
2024-01-21 10:13:12 +01:00
Jonas Jenwald
b37536c38c Remove the isArrayBuffer helper function
This old helper function can now be replaced with `ArrayBuffer.isView()` and/or `instanceof ArrayBuffer` checks, as needed depending on the situation.
2024-01-19 14:10:52 +01:00
Calixte Denizet
f84f48b5d0 Avoid to have the text layer mismatching the rendered text with mismatching locales (bug 1869001)
The system locale (used in OffscreenCanvas) can be different from the one guessed by Fluent,
consequently, in order to avoid any mismatch, we just use an attached canvas element.
The original issue can easily be reproduced locally in adding a lang="ja" in viewer.html
(or with an other language for Japanese users).
2024-01-04 19:20:20 +01:00
Jonas Jenwald
9f02cc36d4 Attempt to further reduce re-parsing for globally cached images (PR 11912, 16108 follow-up)
In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents.
However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes.

Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to *copy* a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases.

For the PDF document in issue 11878, the rendering time of *the second page* changes as follows (on my computer):
 - With the `master`-branch it takes >600 ms to render.
 - With this patch that goes down to ~50 ms, which is one order of magnitude faster.

(Note that all other pages are, as expected, completely unaffected by these changes.)

This new main-thread copying is limited to "large" global images, since:
 - Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue.
 - With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread.
 - This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.
2023-12-21 21:26:21 +01:00
Jonas Jenwald
e547b198a3 Compute the length of the final image-bitmap/data on the worker-thread
Currently this is done in the API, but moving it into the worker-thread will simplify upcoming changes.
2023-12-21 21:26:21 +01:00
Jonas Jenwald
b09f238436 Add iteration support in the PDFObjects class
This (obviously) only includes "resolved" data, and will be used in an upcoming patch.
2023-12-21 21:26:21 +01:00
Jonas Jenwald
ade692ff2e Set a type for the Blob used in createCDNWrapper (issue 17259)
Hopefully this is enough to address the problem of initializing the Worker in Chromium-based browsers.
Locally I've tried to *force* use of `createCDNWrapper` in development mode, by commenting out the `isSameOrigin` checks, and worker-loading fails against `master` and works with this patch.
2023-11-12 09:30:26 +01:00
Jonas Jenwald
d5acbbccd3 Update the ESLint globals list (PR 17055 follow-up)
Given that we only use standard `import`/`export` statements now, after recent PRs, the "exports" global is unused.
Instead we add "__non_webpack_import__" to the `globals` to avoid having to sprinkle disable statements throughout the code.

Finally, the way that `globals` are defined has changed in ESLint and we should thus explicitly specify them as "readonly"; please find additional details at https://eslint.org/docs/latest/use/configure/language-options#specifying-globals
2023-10-15 11:38:10 +02:00
Jonas Jenwald
af9a7b0003 Tweak PDFWorkerUtil.createCDNWrapper to account for JavaScript modules (PR 17055 follow-up) 2023-10-14 11:34:40 +02:00
Jonas Jenwald
0238cf134d Don't store page-level data, in the API, after cleanup has run (bug 1854145)
For large/complex images it's possible that the image-data arrives in the API *after* the page has been scrolled out-of-view and thus been cleaned-up. In this case we obviously shouldn't cache such page-level data, since it'll first of all be unused and secondly can increase memory usage *a lot*.
Also, ensure that we *immediately* release any `ImageBitmap` data in this case to help reclaim memory faster.
2023-10-11 11:51:42 +02:00
Jonas Jenwald
8bd3cc0313 [api-minor] Stop polyfilling structuredClone in legacy builds
Comparing the currently supported browsers/environments, see [the FAQ](https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support) and the [MDN compatibility data](https://developer.mozilla.org/en-US/docs/Web/API/structuredClone#browser_compatibility), the `structuredClone` polyfill is *only* needed in Google Chrome versions < 98. Because of some limitations in the core-js polyfill we're currently forced to special-case the `transfer` handling to prevent bugs, and it'd be nice to avoid that.

Note that `structuredClone`, with transfers, is only used in two spots:
 - The `LoopbackPort` class, which is only used with fake workers. Given that fake workers should *never* be used in browsers, breaking that edge-case in older Google Chrome versions seem fine.
 - The `AnnotationStorage` class, when Stamp-annotations have been added to the document. Given that Google Chrome isn't the main focus of development, breaking *part* of the editing-functionality in older Google Chrome versions should hopefully be acceptable.
2023-10-07 16:52:47 +02:00
Jonas Jenwald
927e50f5d4 [api-major] Output JavaScript modules in the builds (issue 10317)
At this point in time all browsers, and also Node.js, support standard `import`/`export` statements and we can now finally consider outputting modern JavaScript modules in the builds.[1]

In order for this to work we can *only* use proper `import`/`export` statements throughout the main code-base, and (as expected) our Node.js support made this much more complicated since both the official builds and the GitHub Actions-based tests must keep working.[2]
One remaining issue is that the `pdf.scripting.js` file cannot be built as a JavaScript module, since doing so breaks PDF scripting.

Note that my initial goal was to try and split these changes into a couple of commits, however that unfortunately didn't really work since it turned out to be difficult for smaller patches to work correctly and pass (all) tests that way.[3]
This is a classic case of every change requiring a couple of other changes, with each of those changes requiring further changes in turn and the size/scope quickly increasing as a result.

One possible "issue" with these changes is that we'll now only output JavaScript modules in the builds, which could perhaps be a problem with older tools. However it unfortunately seems far too complicated/time-consuming for us to attempt to support both the old and modern module formats, hence the alternative would be to do "nothing" here and just keep our "old" builds.[4]

---
[1] The final blocker was module support in workers in Firefox, which was implemented in Firefox 114; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import#browser_compatibility

[2] It's probably possible to further improve/simplify especially the Node.js-specific code, but it does appear to work as-is.

[3] Having partially "broken" patches, that fail tests, as part of the commit history is *really not* a good idea in general.

[4] Outputting JavaScript modules was first requested almost five years ago, see issue 10317, and nowadays there *should* be much better support for JavaScript modules in various tools.
2023-10-07 09:31:08 +02:00
Jonas Jenwald
0a970ee443 [api-major] Remove the fallbackWorkerSrc functionality in browsers
The user should *always* provide a correct `GlobalWorkerOptions.workerSrc` value when using the PDF.js library in browser environments. Note that the fallback:
 - Has been deprecated ever since PR 11418, first released in version `2.4.456` over three years ago.
 - Was always a best-effort solution, with no guarantees that it'd actually work correctly.
 - With upcoming changes, w.r.t. outputting JavaScript modules, it'd now be more diffiult to determine the correct value.
2023-10-06 12:12:30 +02:00
Jonas Jenwald
426209c6e6
Merge pull request #16699 from Snuffleupagus/rm-svg
[api-major] Remove the SVG back-end (PR 15173 follow-up)
2023-10-03 15:13:14 +02:00
Jonas Jenwald
3ced0dec1b [api-major] Remove the SVG back-end (PR 15173 follow-up)
This has been deprecated since version `2.15.349`, which is a year ago.
Removing this will also simplify some upcoming changes, specifically outputting of JavaScript modules in the builds.
2023-10-01 23:14:29 +02:00
Jonas Jenwald
f87ec67ab1 [api-major] Remove various deprecated functionality and options 2023-09-23 17:44:09 +02:00
Jonas Jenwald
1b8441dacc Don't pass in unused pageColors to CanvasGraphics.endDrawing (PR 16380 follow-up)
This became unnecessary in PR 16380, however we forgot to update one of the API call-sites.
2023-08-28 16:14:22 +02:00
Jonas Jenwald
9b4efe2c2f Use WeakSet.prototype.delete() unconditionally in the InternalRenderTask class
It's not necessary to check if an object exists before trying remove it from a `WeakSet`; see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/WeakSet/delete#return_value
2023-08-28 16:10:33 +02:00
Jonas Jenwald
ec3d2be761 Introduce more optional chaining in the code-base
Also, use logical OR assignment a bit more.
2023-08-26 10:52:23 +02:00
Jonas Jenwald
988ce2820b Initialize the PDFWorker.#workerPorts WeakMap lazily
By default this WeakMap isn't needed, and it's simple enough to initialize it lazily instead.
2023-08-19 16:18:38 +02:00
Jonas Jenwald
2993c7725b [Firefox] Exclude more workerPort related code in MOZCENTRAL builds
Given that this code is (and has always been) unused in the Firefox PDF Viewer, we don't need to include it in that build-target.
2023-08-19 15:52:00 +02:00
Jonas Jenwald
66437917db Avoid using the global workerPort when destruction has started, but not yet finished (issue 16777)
Given that the `PDFDocumentLoadingTask.destroy()`-method is documented as being asynchronous, you thus need to await its completion before attempting to load a new PDF document when using the global `workerPort`.
If you don't await destruction as intended then a new `getDocument`-call can remain pending indefinitely, without any kind of indication of the problem, as shown in the issue.

In order to improve the current situation, without unnecessarily complicating the API-implementation, we'll now throw during the `getDocument`-call if the global `workerPort` is in the process of being destroyed.
This part of the code-base has apparently never been covered by any tests, hence the patch adds unit-tests for both the *correct* usage (awaiting destruction) as well as the specific case outlined in the issue.
2023-08-12 21:21:50 +02:00
Jonas Jenwald
ec7746350d Introduce even more optional chaining in the code-base
This replaces a few more small/simple if-statements with optional chaining.
2023-08-09 17:04:54 +02:00
Jonas Jenwald
e6728f94f4
Merge pull request #16779 from Snuffleupagus/deprecate-getJavaScript
[api-minor] Deprecate the `PDFDocumentProxy.getJavaScript` method
2023-08-01 20:58:36 +02:00
Calixte Denizet
8bd4a18190 [GeckoView] Allow to query pdf.js to know if we can avoid to print a pdf (bug 1846296) 2023-08-01 15:15:04 +02:00
Jonas Jenwald
64e8557fb5 [api-minor] Deprecate the PDFDocumentProxy.getJavaScript method
This method is very old, however with the exception of the auto-print hack (when scripting is disabled) in the viewer it's never actually been used.

Most likely the idea with `PDFDocumentProxy.getJavaScript` was that it'd be useful if scripting support was added, however it turned out that it was a bit too simplistic and instead a number of new methods were added for the scripting use-cases.
2023-08-01 09:02:05 +02:00
Jonas Jenwald
930cbc4d27 Make the passwordCapability field, in WorkerTransport, actually private as intended 2023-07-30 11:45:35 +02:00
Jonas Jenwald
c09bd5568c Tweak the useWorkerFetch default value checks (PR 15879 follow-up)
Currently we accidentally accept `cMapUrl` and `standardFontDataUrl` parameters that are empty strings or `null`, since e.g. `new URL(null, document.baseURI)` doesn't throw, when validating the `useWorkerFetch` parameter via the `isValidFetchUrl` helper function.
Please note that we are currently failing gracefully in this case, as intended, however the warning-messages printed in the console are perhaps less helpful without this patch.
2023-07-27 16:26:39 +02:00
Jonas Jenwald
d022912719 Remove most build-time require-calls from the src/display/-folder
By leveraging import maps we can get rid of *most* of the remaining `require`-calls in the `src/display/`-folder, since we should strive to use modern `import`-statements wherever possible.
The only remaining cases are Node.js-specific dependencies, since those seem very difficult to convert unless we start producing a bundle *specifically* for Node.js environments.
2023-07-17 19:47:13 +02:00
Jonas Jenwald
3a886e7264 Move the isNodeJS-helper into the src/shared/util.js file
With the changes in the previous patch the `isNodeJS`-helper no longer needs to live in its own file, which helps get rid of a closure in the *built* files.
2023-07-17 16:42:25 +02:00
Jonas Jenwald
25bac064d8 [api-minor] Stop "supporting" binary data provided as Buffer in Node.js environments (PR 16055 follow-up)
Given that the PDF.js library has never officially supported/documented that binary data can be provided as a `Buffer`, and that it's been explicitly deprecated in *four* releases, it seems reasonable that we outright reject such data instead (to reduce the amount of Node.js specific code-paths).
2023-07-01 10:34:18 +02:00
Jonas Jenwald
ffa9795ca9
Merge pull request #16620 from Snuffleupagus/AnnotationStorage-transfers
Move the `transfers` computation into the `AnnotationStorage` class
2023-06-30 14:28:55 +02:00
Jonas Jenwald
64aa28953d Fully remove the canvasFactory option from PDFPageProxy.render (PR 16242 follow-up)
We've now been throwing an Error in *three* releases if the `canvasFactory` option is provided, hence it ought to be fine to stop doing that and simply ignore the option instead.
2023-06-30 09:21:45 +02:00
Jonas Jenwald
39113baa33 Move the transfers computation into the AnnotationStorage class
Rather than having to *manually* determine the potential `transfers` at various spots in the API, we can let the `AnnotationStorage.serializable` getter include this.
To further simplify things, we can also let the `serializable` getter compute and include the `hash`-string as well.
2023-06-29 19:51:57 +02:00
calixteman
88c7c8b5bf
Merge pull request #16588 from calixteman/editor_stamp_2
[Editor] Add support for printing/saving newly added Stamp annotations
2023-06-28 22:42:54 +02:00
Jonas Jenwald
a024cd0127 Re-factor how HCM highlight-filters are handled in the viewer components (PR 16593 follow-up)
This is something that I completely overlooked during review of PR 16593, since the idea is (obviously) that the viewer-components should be usable as-is without the user needing to manually pass in any *additional* parameters.

To support this we can very easily expose the current `FilterFactory`-instance on the `PDFPageProxy`-class[1], and if needed initialize the highlight-filters when initializing the page (again limited to the viewer-components).
2023-06-26 23:37:39 +02:00
Calixte Denizet
599b9498f2 [Editor] Add support for printing/saving newly added Stamp annotations
In order to minimize the size the of a saved pdf, we generate only one
image and use a reference in each annotation using it.
When printing, it's slightly different since we have to render each page
independantly but we use the same image within a page.
2023-06-26 15:47:05 +02:00
Calixte Denizet
71479fdd21 [Editor] Avoid to have duplicated entries in the Annot array when saving an existing and modified annotation 2023-06-15 22:02:10 +02:00
Jonas Jenwald
a37f7d2477
Merge pull request #16543 from Snuffleupagus/limit-more-to-GENERIC
Limit more code to GENERIC builds
2023-06-15 13:58:52 +02:00
Jonas Jenwald
877884029d
Merge pull request #16551 from Snuffleupagus/page-destroyed-complete
Ensure that `cleanup` during rendering is actually ignored, to prevent a blank canvas
2023-06-15 12:26:57 +02:00
Jonas Jenwald
0650be4641
Merge pull request #16550 from Snuffleupagus/rm-RenderingCancelledException-type
[api-minor] Remove the `type` from `RenderingCancelledException` (PR 16226 follow-up)
2023-06-15 12:26:27 +02:00
Jonas Jenwald
a591c3de84 Ensure that cleanup during rendering is actually ignored, to prevent a blank canvas
The existing unit-test doesn't work as intended, since the page never actually renders. Note how `cleanup` is *not* allowed to run when parsing and/or rendering is ongoing, however an (old) incorrect condition could prevent rendering from ever starting.

This is very old code, which has been slightly re-factored a couple of times (many years ago), however this doesn't appear to affect e.g. the default viewer since the incorrect behaviour seem highly dependent on "unlucky" timing.
Note also how at the start of the `PDFPageProxy.prototype.render`-method we purposely cancel any pending `cleanup`-call, to prevent unnecessary re-parsing for multiple sequential `render`-calls.

Finally, avoid running `cleanup` when document/page destruction has already started since it's pointless in that case.
2023-06-15 11:39:26 +02:00
Jonas Jenwald
a8d4aad8b9 Limit PDFPageProxy.getOperatorList to development mode and GENERIC builds
Given that this API method isn't used anywhere within the PDF.js library itself, except for the unit-tests, we can avoid including what's effectively dead code in e.g. the *built-in* Firefox PDF Viewer.
2023-06-14 21:33:22 +02:00
Jonas Jenwald
225734dd00 [api-minor] Remove the type from RenderingCancelledException (PR 16226 follow-up)
After PR 16226 we're only using `RenderingCancelledException` together with canvas-rendering, hence the `type`-property is no longer necessary.
2023-06-14 15:40:25 +02:00
Jonas Jenwald
fee850737b Enable the unicorn/prefer-optional-catch-binding ESLint plugin rule
According to MDN this format is available in all browsers/environments that we currently support, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/try...catch#browser_compatibility

Please also see https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-optional-catch-binding.md
2023-06-12 11:46:11 +02:00
Jonas Jenwald
5fad931a3f Enable more import ESLint plugin rules
This patch enables more `import` rules to help prevent bugs/inconsistencies, and most of these rules didn't require code changes; please find additional details here:
 - https://github.com/import-js/eslint-plugin-import/blob/main/docs/rules/export.md
 - https://github.com/import-js/eslint-plugin-import/blob/main/docs/rules/exports-last.md
 - https://github.com/import-js/eslint-plugin-import/blob/main/docs/rules/first.md
 - https://github.com/import-js/eslint-plugin-import/blob/main/docs/rules/no-empty-named-blocks.md
 - https://github.com/import-js/eslint-plugin-import/blob/main/docs/rules/no-mutable-exports.md
 - https://github.com/import-js/eslint-plugin-import/blob/main/docs/rules/no-self-import.md
2023-06-04 09:58:25 +02:00