Commit Graph

1163 Commits

Author SHA1 Message Date
Jonas Jenwald
9f02cc36d4 Attempt to further reduce re-parsing for globally cached images (PR 11912, 16108 follow-up)
In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents.
However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes.

Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to *copy* a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases.

For the PDF document in issue 11878, the rendering time of *the second page* changes as follows (on my computer):
 - With the `master`-branch it takes >600 ms to render.
 - With this patch that goes down to ~50 ms, which is one order of magnitude faster.

(Note that all other pages are, as expected, completely unaffected by these changes.)

This new main-thread copying is limited to "large" global images, since:
 - Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue.
 - With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread.
 - This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.
2023-12-21 21:26:21 +01:00
Jonas Jenwald
988d3a188f
Merge pull request #17395 from Snuffleupagus/pypdf-2332
Support Annotations with corrupt /BS-entries
2023-12-09 14:18:29 +01:00
Jonas Jenwald
a1d859c082 Disable the "should compress and save text" unit-test in Node.js (PR 17202 follow-up)
It seems this unit-test now fails consistently in "all" up-to-date Node.js versions. We should probably try and understand why, but for now just disable it to get passing CI tests.
2023-12-09 14:13:11 +01:00
Tim van der Meij
c908f2d55c
Merge pull request #17372 from Snuffleupagus/fuzzing-VerbosityLevel-ERRORS
Limit the amount of console "spam" during fuzz tests (PR 17337 follow-up)
2023-12-09 13:57:23 +01:00
calixteman
8702e1bbb2
Merge pull request #17359 from calixteman/editor_highlight_color_picker
[Editor] Add a color picker with predefined colors for highlighting text (bug 1866434)
2023-12-06 11:06:55 +01:00
Calixte Denizet
098cc16c46 Set text field value as a string when it's for a date or a time (bug 1868503) 2023-12-06 09:44:30 +01:00
Calixte Denizet
ff23d37fa2 [Editor] Add a color picker with predefined colors for highlighting text (bug 1866434)
The doorhanger for highlighting has a basic color picker composed of 5 predefined colors
to set the default color to use.
These colors can be changed thanks to a preference for now but it's something which could
be changed in the Firefox settings in the future.
Each highlight has in its own toolbar a color picker to just change its color.
The different color pickers are so similar (modulo few differences in their styles) that
this patch introduces a new class ColorPicker which provides a color picker component
which could be reused in future editors.
All in all, a large part of this patch is dedicated to color picker itself and its style
and the rest is almost a matter of wiring the component.
2023-12-05 23:27:22 +01:00
Jonas Jenwald
d7bec1b527 Limit the amount of console "spam" during fuzz tests (PR 17337 follow-up)
Having just tested PR 17337 locally I noticed that especially the `JpxImage`-test causes a "ridiculous" amount of warning messages to be printed, which doesn't seem helpful.
Given that only actual `Error`s should be relevant here, we can easily disable this logging during the tests.
2023-12-04 16:39:45 +01:00
Jonas Jenwald
fe3bc575de Disable the "should compress and save text" unit-test in additional Node.js versions (PR 17202 follow-up)
It seems this unit-test started failing in Node.js version 20.10 as well. We should probably try and understand why, but for now just disable it to get passing CI tests.
2023-11-30 20:47:15 +01:00
Calixte Denizet
eb5f610d18 Remove language codes from text strings.
And take care to have an even number of bytes with utf16 strings.
2023-11-25 15:09:31 +01:00
Calixte Denizet
f8f4432961 [Editor] Add support for saving/printing a newly added Highlight annotation (bug 1865708) 2023-11-22 10:41:55 +01:00
Calixte Denizet
31d9b9f574 [Editor] Add a way to extract the outlines of a union of rectangles
The goal is to be able to get these outlines to fill the shape corresponding
to a text selection in order to highlight some text contents.
The outlines will be used either to show selected/hovered highlights.
2023-11-20 18:45:19 +01:00
Jonas Jenwald
709d89420e Re-factor how the GenericL10n class fetches localization-data
- Re-factor the existing `fetchData` helper function such that it can fetch more types of data, and it now supports "arraybuffer", "json", and "text".
   This only needed minor adjustments in the `DOMCMapReaderFactory` and `DOMStandardFontDataFactory` classes.[1]

 - Expose the `fetchData` helper function in the API, such that the viewer is able to access it.

 - Use the `fetchData` helper function in the `GenericL10n` class, since this should allow fetching of localization-data even if the default viewer is run in an environment without support for the Fetch API.

---
[1] While testing this I also noticed a minor inconsistency when handling standard font-data on the worker-thread.
2023-11-14 13:45:14 +01:00
Tim van der Meij
71a6c749d0
Merge pull request #17202 from Snuffleupagus/node-ci-latest
Also test the latest Node.js version in GitHub Actions
2023-11-04 12:45:03 +01:00
Jonas Jenwald
99522c3201 Also test the latest Node.js version in GitHub Actions
Hopefully this will allow us to catch bugs in new Node.js versions earlier, rather than having to wait for bug reports.

Given that `CompressionStream` is (currently) only potentially used when saving a *modified* PDF document, which is unlikely to be a common use-case in Node.js environments, let's just disable the affected unit-test for now.
2023-11-02 16:58:03 +01:00
Jonas Jenwald
155a302e74 Use even more optional chaining in the code-base 2023-11-02 16:47:33 +01:00
Calixte Denizet
133ed96f8f Don't take into account the INVISIBLE flag for well-known annotations 2023-10-25 10:16:14 +02:00
Jonas Jenwald
e2af77fd6c Add a unit-test to ensure that NullL10n won't diverge from the L10n-class
To prevent the *standalone* viewer-components from breaking, we need to ensure that the `NullL10n`-interface won't accidentally diverge from the actual `L10n`-implementations.
2023-10-24 13:13:14 +02:00
Jonas Jenwald
f07675a6a8 [api-minor] Re-factor NullL10n and remove the hard-coded l10n strings (PR 17115 follow-up)
*Please note:* These changes only affect the GENERIC build, since `NullL10n` is only a stub elsewhere (see PR 17135).

After the changes in PR 17115, which modernized and improved l10n-handling, the `NullL10n`-implementation is no longer a good fallback for the "proper" `L10n`-classes.
To improve this situation, especially for the *standalone* viewer-components, this patch makes the following changes:
 - Let the `NullL10n`-implementation extend an actual `L10n`-class, which is constant and lazily initialized, to ensure that it works *exactly* like the "proper" ones.

 - Automatically bundle the "en-US" l10n-strings in the build, via the pre-processor, such that we don't need to remember to manually update them.

 - Ensure that the *standalone* viewer-components register their DOM-elements for translation, similar to the default viewer, since this will allow future code improvements by using "data-l10n-id"/"data-l10n-args" in most (if not all) parts of the viewer.

 - Remove the `NullL10n` from the `AnnotationLayer`, to avoid affecting bundle size too much.
   For third-party users that access the `AnnotationLayer`, as exposed in the main PDF.js library, they'll now need to *manually* register it for translation. (However, the *standalone* viewer-components still works given the point above.)
2023-10-20 21:49:33 +02:00
Jonas Jenwald
69ad0d9861 Only bundle NullL10n in GENERIC builds (bug 1859818) 2023-10-19 13:51:00 +02:00
calixteman
5d8be99782
Merge pull request #17115 from calixteman/mv_to_fluent
[api-minor] Move to Fluent for the localization (bug 1858715)
2023-10-19 13:40:50 +02:00
Calixte Denizet
66982a2a11 [api-minor] Move to Fluent for the localization (bug 1858715)
- For the generic viewer we use @fluent/dom and @fluent/bundle
- For the builtin pdf viewer in Firefox, we set a localization url
  and then we rely on document.l10n which is a DOMLocalization object.
2023-10-19 11:20:41 +02:00
Jonas Jenwald
674052d3fc Re-factor the blob-URL caching in DownloadManager.openOrDownloadData
Cache blob-URLs on the actual data, rather than DOM elements, to reduce potential duplicates (note the updated unit-test).
2023-10-17 10:18:34 +02:00
Jonas Jenwald
d5acbbccd3 Update the ESLint globals list (PR 17055 follow-up)
Given that we only use standard `import`/`export` statements now, after recent PRs, the "exports" global is unused.
Instead we add "__non_webpack_import__" to the `globals` to avoid having to sprinkle disable statements throughout the code.

Finally, the way that `globals` are defined has changed in ESLint and we should thus explicitly specify them as "readonly"; please find additional details at https://eslint.org/docs/latest/use/configure/language-options#specifying-globals
2023-10-15 11:38:10 +02:00
Jonas Jenwald
5e986cb514 Use native import maps in development mode
This patch seem to work fine locally now, and `mozregression` points to it being fixed in bug https://bugzilla.mozilla.org/show_bug.cgi?id=1803984 which landed in Firefox 116.

By using the native `import maps` functionality, we can remove a development dependency. Please find the specification at https://wicg.github.io/import-maps/
2023-10-13 20:35:34 +02:00
Jonas Jenwald
38245500fd Output JavaScript modules for the LIB build-target (PR 17055 follow-up)
This *finally* allows us to mark the entire PDF.js library as a "module", which should thus conclude the (multi-year) effort to re-factor and improve how we import files/resources in the code-base.

This also means that the `gulp ci-test` target, which is what's run in GitHub Actions, now uses JavaScript modules since that's supported in modern Node.js versions.
2023-10-13 18:54:33 +02:00
Jonas Jenwald
927e50f5d4 [api-major] Output JavaScript modules in the builds (issue 10317)
At this point in time all browsers, and also Node.js, support standard `import`/`export` statements and we can now finally consider outputting modern JavaScript modules in the builds.[1]

In order for this to work we can *only* use proper `import`/`export` statements throughout the main code-base, and (as expected) our Node.js support made this much more complicated since both the official builds and the GitHub Actions-based tests must keep working.[2]
One remaining issue is that the `pdf.scripting.js` file cannot be built as a JavaScript module, since doing so breaks PDF scripting.

Note that my initial goal was to try and split these changes into a couple of commits, however that unfortunately didn't really work since it turned out to be difficult for smaller patches to work correctly and pass (all) tests that way.[3]
This is a classic case of every change requiring a couple of other changes, with each of those changes requiring further changes in turn and the size/scope quickly increasing as a result.

One possible "issue" with these changes is that we'll now only output JavaScript modules in the builds, which could perhaps be a problem with older tools. However it unfortunately seems far too complicated/time-consuming for us to attempt to support both the old and modern module formats, hence the alternative would be to do "nothing" here and just keep our "old" builds.[4]

---
[1] The final blocker was module support in workers in Firefox, which was implemented in Firefox 114; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import#browser_compatibility

[2] It's probably possible to further improve/simplify especially the Node.js-specific code, but it does appear to work as-is.

[3] Having partially "broken" patches, that fail tests, as part of the commit history is *really not* a good idea in general.

[4] Outputting JavaScript modules was first requested almost five years ago, see issue 10317, and nowadays there *should* be much better support for JavaScript modules in various tools.
2023-10-07 09:31:08 +02:00
Jonas Jenwald
bf9c33e60f Add support for "GoToE" actions with destinations (issue 17056)
This shouldn't be very common in practice, since "GoToE" actions themselves seem quite uncommon; see PR 15537.
2023-10-04 11:14:23 +02:00
Jonas Jenwald
3ced0dec1b [api-major] Remove the SVG back-end (PR 15173 follow-up)
This has been deprecated since version `2.15.349`, which is a year ago.
Removing this will also simplify some upcoming changes, specifically outputting of JavaScript modules in the builds.
2023-10-01 23:14:29 +02:00
Jonas Jenwald
9624505f0f Use a standard export statement in the web/pdfjs.js file
This removes the only remaining old and non-standard handling of exports in the `web/`-folder, since some initial attempts at outputting JavaScript modules in the builds have identified this file as a potential problem.
While this uses a hard-coded list, for overall simplicity, I don't believe that that's a big problem since:
 - Generating this file automatically would require a bunch more parsing *every single time* that the library is built.
 - The official API-surface doesn't change often enough for this to really impede development in any significant way.
 - The added unit-test helps ensure that this list cannot accidentally become outdated.
2023-09-30 12:10:02 +02:00
Calixte Denizet
f2196f7803 StructParents entry isn't required on pages with no tagged contents (bug 1855641) 2023-09-28 14:23:10 +02:00
Calixte Denizet
3ee5268a23 [Editor] Don't try to add data to the struct tree when there is no accessibilityData (bug 1855157) 2023-09-26 11:02:14 +02:00
Jonas Jenwald
1df31c0284 Use one noContextMenu function in both the src/- and web/-folders
Currently we duplicate this event handler function in multiple places, which seems unnecessary.
2023-09-23 15:37:13 +02:00
Calixte Denizet
6545551e76 [Editor] Avoid to darken the current editor when opening the alt-text dialog 2023-09-21 20:44:53 +02:00
Jonas Jenwald
e2b7896826 [GeckoView] Avoid bundling the AltTextManager class, since it's unused 2023-09-21 12:51:34 +02:00
Calixte Denizet
a8573d4e1b [Editor] Add the ability to create/update the structure tree when saving a pdf containing newly added annotations (bug 1845087)
When there is no tree, the tags for the new annotions are just put under the root element.
When there is a tree, we insert the new tags at the right place in using the value
of structTreeParentId (added in PR #16916).
2023-09-16 18:34:58 +02:00
Tim van der Meij
66507ccae8
Enable unit test "creates pdf doc from non-existent URL"
The unit test is re-enabled because it no longer seems to fail after 10
runs on Linux where this used to fail often. Code inspection also shows
that the code is correct and should raise the previous exception
(anymore). Finally, a lot has changed since this test was disabled such
as new Jasmine versions, new Linux bot OS version and new browser
versions.
2023-09-10 15:47:04 +02:00
Jonas Jenwald
df9cce39c0 Slightly reduce asynchronicity when parsing Annotations
Over time the amount of "document level" data potentially needed during parsing of Annotations have increased a fair bit, which means that we currently need to ensure that a bunch of data is available for each individual Annotation.
Given that this data is "constant" for a PDF document we can instead create (and cache) it lazily, only when needed, *before* starting to parse the Annotations on a page. This way the parsing of individual Annotations should become slightly less asynchronous, which really cannot hurt.

An additional benefit of these changes is that we can reduce the number of parameters that need to be explicitly passed around in the annotation-code, which helps overall readability in my opinion.

One potential drawback of these changes is that the `AnnotationFactory.create` method no longer handles "everything" on its own, however given how few call-sites there are I don't think that's too much of a problem.
2023-09-08 13:27:27 +02:00
Calixte Denizet
a8a50c567a Construct the correct field name and strip out classes when searching
The classes were stripped out during when creating the field name but
it led to a wrong name.
Since class components in a path are irrelevant, they're just ignored
when searching for a node in the datasets.
2023-09-07 15:56:47 +02:00
Calixte Denizet
ee3ac35e05 Revert fix for bug 1838855 (bug 1849876)
The issue described in the mentioned bug is reall because
Acrobat is rendering the XFA instead of the Acroform.
The original patch just tried to workaround the issue but it
induces some regressions.
2023-08-23 12:34:41 -04:00
Tim van der Meij
5828ac0ee3
Merge pull request #16834 from Snuffleupagus/globalWorkerPort-parallel-test
Add a unit-test for the "correct" way of using the global `workerPort` in parallel (PR 16830 follow-up)
2023-08-19 13:38:16 +02:00
Jonas Jenwald
4d19db0b19 Re-format the code to account for prettier and globals updates
The `prettier` update slightly changed the formatting of some await-expressions; please see https://github.com/prettier/prettier/blob/main/CHANGELOG.md#302

The `globals` update removed the need for some eslint-disable statements; please see https://github.com/sindresorhus/globals/releases/tag/v13.21.0
2023-08-19 09:30:34 +02:00
Jonas Jenwald
29b2050ac2 Improve the "write a new annotation, save the pdf and check that the text content is correct" unit-test (PR 16559 follow-up)
Currently this unit-test will pass just fine if compression is disabled, e.g. by commenting out the relevant code in the `src/core/writer.js` file.
While we don't have a simple way of *directly* checking that the Annotation text-content is compressed, we can however use the resulting file-size as a fairly good proxy. (Note that if compression is disabled the file-size is more than doubled.)
2023-08-15 15:12:17 +02:00
Jonas Jenwald
2422492ee3 Add a unit-test for the "correct" way of using the global workerPort in parallel (PR 16830 follow-up)
Please note that for performance reasons it's not really advised to use the same worker-thread *in parallel* for parsing multiple PDF documents, since they will then unnecessarily compete for resources.
However, given that it's still possible to do that e.g. when using the global `workerPort` it probably won't hurt to add a unit-test for this particular situation.
2023-08-15 12:45:54 +02:00
Jonas Jenwald
66437917db Avoid using the global workerPort when destruction has started, but not yet finished (issue 16777)
Given that the `PDFDocumentLoadingTask.destroy()`-method is documented as being asynchronous, you thus need to await its completion before attempting to load a new PDF document when using the global `workerPort`.
If you don't await destruction as intended then a new `getDocument`-call can remain pending indefinitely, without any kind of indication of the problem, as shown in the issue.

In order to improve the current situation, without unnecessarily complicating the API-implementation, we'll now throw during the `getDocument`-call if the global `workerPort` is in the process of being destroyed.
This part of the code-base has apparently never been covered by any tests, hence the patch adds unit-tests for both the *correct* usage (awaiting destruction) as well as the specific case outlined in the issue.
2023-08-12 21:21:50 +02:00
Jonas Jenwald
389a26c115 Fallback to check all pages when getting the pageIndex of FieldObjects
Given that the FieldObjects are parsed in parallel, in combination with the existing caching in the `getPage`-method and `annotations`-getter, adding additional caches for this fallback code-path doesn't seem entirely necessary.
2023-08-10 17:10:04 +02:00
Jonas Jenwald
64e8557fb5 [api-minor] Deprecate the PDFDocumentProxy.getJavaScript method
This method is very old, however with the exception of the auto-print hack (when scripting is disabled) in the viewer it's never actually been used.

Most likely the idea with `PDFDocumentProxy.getJavaScript` was that it'd be useful if scripting support was added, however it turned out that it was a bit too simplistic and instead a number of new methods were added for the scripting use-cases.
2023-08-01 09:02:05 +02:00
Jonas Jenwald
d022912719 Remove most build-time require-calls from the src/display/-folder
By leveraging import maps we can get rid of *most* of the remaining `require`-calls in the `src/display/`-folder, since we should strive to use modern `import`-statements wherever possible.
The only remaining cases are Node.js-specific dependencies, since those seem very difficult to convert unless we start producing a bundle *specifically* for Node.js environments.
2023-07-17 19:47:13 +02:00
Jonas Jenwald
3a886e7264 Move the isNodeJS-helper into the src/shared/util.js file
With the changes in the previous patch the `isNodeJS`-helper no longer needs to live in its own file, which helps get rid of a closure in the *built* files.
2023-07-17 16:42:25 +02:00
Jonas Jenwald
86a868189c Re-factor the PDFScriptingManager-class for the viewer-components
Currently this class contains a few "special" code-paths for the COMPONENTS build-target, which normally wouldn't be a problem. However, in this particular case that means accessing code that we don't want to include unconditionally in all builds.
This is currently implemented using build-time `require`-calls which we nowadays want to avoid, and we should strive to remove all such cases from the code-base. (Generally speaking `import` is the future, and build-tools may not always play well with a mix of both formats.)

We can easily improve things here by using sub-classing for the COMPONENTS build-target, and then use the ability to re-name when exporting (to avoid breaking existing code).
2023-07-16 08:51:46 +02:00