Commit Graph

566 Commits

Author SHA1 Message Date
Jonas Jenwald
0c31320c12 [api-minor] Improve thumbnail handling in documents that contain interactive forms
To improve performance of the sidebar we use the page-canvases to generate the thumbnails whenever possible, since that avoids unnecessary re-rendering when the sidebar is open. This works generally well, however there's an old problem in PDF documents that contain interactive forms (when those are enabled): Note how the thumbnails become partially (or fully) blank, since those Annotations are not included in the OperatorList.[1]

We obviously want to keep using the `PDFThumbnailView.setImage`-method for most documents, however we need a way to skip it only for those pages that contain interactive forms.
As it turns out it's unfortunately not all that simple to tell, after the fact, from looking only at the OperatorList that some Annotations were skipped. While it might have been possible to try and infer that in the viewer, it'd not have been pretty considering that at the time when rendering finishes the annotationLayer has not yet been built.
The overall simplest solution that I could come up with, was instead to include a *summary* of the interactive form-state when doing the final "flushing" of the OperatorList and expose that information in the API.

---
[1] Some examples from our test-suite: `annotation-tx2.pdf` where the thumbnail is completely blank, and `bug1737260.pdf` where the thumbnail is missing the "buttons" found on the page.
2022-07-30 16:53:32 +02:00
Jonas Jenwald
1cc7cecc7b [api-minor] Introduce a PrintAnnotationStorage with *frozen* serializable data
Given that printing is triggered *synchronously* in browsers, it's thus possible for scripting (in PDF documents) to modify the Annotation-data while printing is currently ongoing.
To work-around that we add a new printing-specific `AnnotationStorage`, where the serializable data is *frozen* upon initialization, which the viewer can thus create/utilize during printing.
2022-06-23 17:06:46 +02:00
Jonas Jenwald
7e852851fd A small memory-usage improvement for PDF documents opened from TypedArray-data
This patch contains a small optimization specifically for the case when `getDocument` is called with TypedArray-data. In that case we'll still hold onto that data, which could obviously be large, even after the "GetDocRequest"-message has been sent to the worker-thread.

In practice this will most likely not affect memory usage in any noticeable way, since the application calling `getDocument` will probably also be keeping a reference to the TypedArray-data. However, it seems like a good idea to ensure that the PDF.js API *itself* won't unnecessarily keep this data alive.
2022-05-29 16:37:18 +02:00
calixteman
cfac6fa511
Merge pull request #14874 from calixteman/colors
[api-minor] Improve pdf reading in high contrast mode
2022-05-05 21:48:19 +02:00
Calixte Denizet
c8afd6ce8c [api-minor] Improve pdf reading in high contrast mode
- Use Canvas & CanvasText color when they don't have their default value
  as background and foreground colors.
- The colors used to draw (stroke/fill) in a pdf are replaced by the bg/fg
  ones according to their luminance.
2022-05-05 16:34:51 +02:00
Tim van der Meij
899e4d58d6
Merge pull request #14870 from Snuffleupagus/isNodeJS-cleanup
Only bundle the `src/display/node_utils.js` file in GENERIC-builds
2022-05-04 22:38:21 +02:00
Jonas Jenwald
8267fd8a52 Replace the AnnotationStorage.lastModified-getter with a proper hash-method
The current `lastModified`-getter, which only contains a time-stamp, is a fairly crude way of detecting if the stored data has actually been changed. In particular, when the `getRawValue`-method is used, the `lastModified`-getter doesn't cope with data being modified from the "outside".

To fix these issues[1], and to prevent any future bugs in this code, this patch introduces a new `AnnotationStorage.hash`-getter which computes a hash of the currently stored data. To simplify things this re-uses the existing `MurmurHash3_64`-implementation, which required moving that file into the `src/shared/`-folder, since its performance should be good enough here.

---
[1] Given how the `AnnotationStorage.lastModified`-getter was used, this would have been limited to *printing* of forms.
2022-05-04 15:21:30 +02:00
Jonas Jenwald
d4fe4fd97b Simplify a couple of isNodeJS-dependent getDocument default values
Given that the `isNodeJS`-constant will, after PR 14858, *always* be `false` in non-GENERIC builds we can simplify a couple of `getDocument`-parameter default values slightly.
The old format, with inline `PDFJSDev`-checks, wasn't exactly a wonder of readability; which was my fault.
2022-05-03 11:36:10 +02:00
Jonas Jenwald
7df47c289f Only bundle the src/display/node_utils.js file in GENERIC-builds
This first of all simplifies the file, since we no longer need dummy-classes and can instead *directly* define the actual classes.
Furthermore, and more importantly, this means that we no longer need to bundle this code in e.g. MOZCENTRAL-builds which reduces the size of *built* `pdf.js` file slightly.
2022-05-03 11:34:35 +02:00
Jonas Jenwald
b996e107c3 Update core-js to allow removing a structuredClone work-around
Because of a bug in previous `core-js` versions, which caused an Error to be thrown if its `structuredClone` polyfill was called with an *explicit* `null`/`undefined` transfer-parameter, the `LoopbackPort`-class contained a work-around.
In the latest `core-js` version this has been fixed, and we can thus simplify our code ever so slightly; please see https://github.com/zloirock/core-js/releases/tag/v3.22.0
2022-04-15 22:12:02 +02:00
Calixte Denizet
040fcae5ab Improve performance with image masks (bug 857031)
- it aims to partially fix performance issue reported: https://bugzilla.mozilla.org/show_bug.cgi?id=857031;
- the idea is too avoid to use byte arrays but use ImageBitmap which are a way faster to draw:
  * an ImageBitmap is Transferable which means that it can be built in the worker instead of in the main thread:
    - this is achieved in using an OffscreenCanvas when it's available, there is a bug to enable them
      for pdf.js: https://bugzilla.mozilla.org/show_bug.cgi?id=1763330;
    - or in using createImageBitmap: in Firefox a task is sent to the main thread to build the bitmap so
      it's slightly slower than using an OffscreenCanvas.
  * it's transfered from the worker to the main thread by "reference";
  * the byte buffers used to create the image data have a very short lifetime and ergo the memory used is globally
    less than before.
- Use the localImageCache for the mask;
- Fix the pdf issue4436r.pdf: it was expected to have a binary stream for the image;
- Move the singlePixel trick from operator_list to image: this way we can use this trick even if it isn't in a set
  as defined in operator_list.
2022-04-09 18:26:26 +02:00
Jonas Jenwald
849de5a508 Slightly improve validation of (some) parameters in getDocument
There's a couple of `getDocument` parameters that should be numbers, but which are currently not *fully* validated to prevent issues elsewhere in the code-base.
Also, improves validation of the `ownerDocument` parameter since we currently accept more-or-less anything here.
2022-03-21 13:32:17 +01:00
Jonas Jenwald
be2b1d5d2a [src/display/api.js] Simplify the sendTest function, used with Worker initialization (PR 14291 follow-up)
Given that we now only use Workers when `postMessage` transfers are supported, there's really no point in trying to send a "test" message *without* transfers present.
Hence, if `postMessage` transfers are not supported by the browser, we'll now fallback to "fake" Workers immediately instead. The comment about Opera is also removed, since it was originally added back in PR 983 and mentions Opera `11.60` [which was released in 2011](https://en.wikipedia.org/wiki/History_of_the_Opera_web_browser#Version_11).
2022-03-16 13:25:41 +01:00
Jonas Jenwald
d5c9be341d [src/display/api.js] Use private static class fields, rather than shadowed getter work-arounds (PR 13813, 13882 follow-up)
At the time private static class fields were to new, however that's no longer an issue and we can thus (ever so slightly) simplify the code.
2022-03-16 13:02:34 +01:00
Tim van der Meij
790735eaf1
Merge pull request #14658 from Snuffleupagus/api-validate-cMapUrl-standardFontDataUrl
Validate the `cMapUrl`/`standardFontDataUrl` parameters in `getDocument`
2022-03-11 21:09:58 +01:00
Jonas Jenwald
a60b98412f Validate the cMapUrl/standardFontDataUrl parameters in getDocument
These changes make sense for two reasons:
 - Given that the parameters are potentially passed to the worker-thread, depending on the `useWorkerFetch` parameter, we need to prevent errors if the user provides values that aren't clonable.
 - By ensuring that the default values are indeed `null`, we'll trigger main-thread fetching (of CMaps and Standard fonts) as intended in the `PartialEvaluator` and thus potentially provide better Error messages.
2022-03-10 16:33:10 +01:00
Jonas Jenwald
537ed37835 Move the isSameOrigin helper function
This function is currently placed in the `src/shared/util.js` file, which means that the code is duplicated in both of the *built* `pdf.js` and `pdf.worker.js` files. Furthermore, it only has a single call-site which is also specific to the `GENERIC`-build of the PDF.js library.

Hence this helper function is instead moved into the `src/display/api.js` file, in such a way that it's conditionally defined but still can be unit-tested.
2022-03-10 13:51:09 +01:00
Jonas Jenwald
172d007598 [api-minor] Add validation for the PDFDocumentProxy.getPageIndex method
Currently we'll happily attempt to send any argument passed to this method over to the worker-thread, without doing any sort of validation.
That could obviously be quite bad, since there's first of all no protection against sending unclonable data. Secondly, it's also possible to pass data that will cause the `Ref.get` call in the worker-thread to fail immediately.

In order to address all of these issues, we'll now properly validate the argument passed to `PDFDocumentProxy.getPageIndex` and when necessary reject already on the main-thread instead.
2022-02-24 12:01:51 +01:00
Jonas Jenwald
2be8036eb7 [api-minor] Reduce duplication in the "gets non-existent page" unit-test 2022-02-24 11:25:21 +01:00
Jonas Jenwald
bad15894fc Improve the JSDocs for the PDFObjects class
Given that we expose `PDFObjects`-instances, via the `commonObjs` and `objs` properties, on the `PDFPageProxy`-instances this ought to help provide slightly better TypeScript definitions.
2022-02-20 13:02:14 +01:00
Jonas Jenwald
f4712bc0ad Simplify the data stored on PDFObjects-instances
The manually tracked `resolved`-property is no longer necessary, since the same information is now directly available on all `PromiseCapability`-instances.
Furthermore, since the `PDFObjects.resolve` method is not documented as accepting e.g. only Object-data, we probably shouldn't resolve the `PromiseCapability` with the `data` and instead only store it on the `PDFObjects`-instance.[1]

---
[1] While Objects are passed by reference in JavaScript, other primitives such as e.g. strings are passed by value and the current implementation *could* thus lead to increased memory usage. Given how we're using `PDFObjects` in the PDF.js code-base none of this should be an issue, but it still cannot hurt to change this.
2022-02-20 12:33:33 +01:00
Jonas Jenwald
beecde3229 Introduce (some) private properties/methods in the PDFObjects class
This ensures that the underlying data cannot be accessed directly, from the outside, since that's definately not intended here.
Note that we expose `PDFObjects`-instances, via the `commonObjs` and `objs` properties, on the `PDFPageProxy`-instances hence these changes really cannot hurt.
2022-02-20 12:23:30 +01:00
Jonas Jenwald
1f0fb270b1 [api-minor] Ensure that the PDFDocumentLoadingTask-promise is rejected when cancelling the PasswordPrompt (bug 1754421)
This is essentially a *continuation* of PR 7926, where we added support for rejecting the current `PDFDocumentLoadingTask`-promise by throwing inside of the `onPassword`-callback.
Hence the naive way to address [bug 1754421](https://bugzilla.mozilla.org/show_bug.cgi?id=1754421) would be to simply throw in the `onPassword`-callback used in the default viewer. However it unfortunately turns out to not work, since the password input/validation is asynchronous, and we thus need another approach.

The simplest solution that I can come up with here, is thus to *extend* the `onPassword`-callback to also reject the current `PDFDocumentLoadingTask`-instance if an `Error` is explicitly passed as the input to the callback function. (This doesn't feel great, but I cannot see a better solution that isn't really complicated.)
2022-02-09 15:09:20 +01:00
Jonas Jenwald
403baa7bba [api-minor] Remove the normalizeWhitespace option in the PDFPageProxy.{getTextContent, streamTextContent} methods (issue 14519, PR 14428 follow-up)
With these changes, we'll now *always* replace all whitespaces with standard spaces (0x20). This behaviour is already, since many years, the default in both the viewer and the browser-tests.
2022-02-03 09:17:22 +01:00
Jonas Jenwald
7cc761a8c0 Polyfill structuredClone with core-js (PR 13948 follow-up)
This allows us to remove the manually implemented `structuredClone` polyfill, thus reducing the maintenance burden for the `LoopbackPort` class; refer to https://github.com/zloirock/core-js#structuredclone

*Please note:* While `structuredClone` support landed already in Firefox 94, Google Chrome only added it in version 98 (currently in Beta). However, given that the `LoopbackPort` will only be used together with *fake workers* in browsers this shouldn't be too much of a problem.[1]
For Node.js environments, where *fake workers* are unfortunately necessary, using a `legacy/`-build is already required which thus guarantees that the `structuredClone` polyfill is available.

Also, the patch updates core-js to the latest version since that one includes `structuredClone` improvements; please see https://github.com/zloirock/core-js/releases/tag/v3.20.3

---
[1] Given that we only support browsers with proper worker support, if *fake workers* are being used that essentially indicates a configuration problem/error.
2022-01-27 21:11:42 +01:00
Jonas Jenwald
e0dba504d2 Fix broken/missing JSDocs and typedefs, to allow updating TypeScript to the latest version (issue 14342)
This patch circumvents the issues seen when trying to update TypeScript to version `4.5`, by "simply" fixing the broken/missing JSDocs and `typedef`s such that `gulp typestest` now passes.
As always, given that I don't really know anything about TypeScript, I cannot tell if this is a "correct" and/or proper way of doing things; we'll need TypeScript users to help out with testing!

*Please note:* I'm sorry about the size of this patch, but given how intertwined all of this unfortunately is it just didn't seem easy to split this into smaller parts.
However, one good thing about this TypeScript update is that it helped uncover a number of pre-existing bugs in our JSDocs comments.
2021-12-15 23:14:25 +01:00
Jonas Jenwald
760f765e56 Move the /Lang handling into the BaseViewer (PR 14114 follow-up)
In PR 14114 this was only added to the default viewer, which means that in the viewer components the user would need to *manually* implement /Lang handling. This was (obviously) a bad choice, since the viewer components already support e.g. structTrees by default; sorry about overlooking this!

To avoid having to make *two* `getMetadata` API-calls[1] very early during initialization, in the default viewer, the API will now cache its result. This will also come in handy elsewhere in the default viewer, e.g. by reducing parsing when opening the "document properties" dialog.

---
[1] This not only includes a round-trip to the worker-thread, but also having to re-parse the /Metadata-entry when it exists.
2021-12-14 13:19:05 +01:00
Jonas Jenwald
f39536a30b Change WorkerTransport.pagePromises from an Array to a Map
Given that not all pages necessarily are being accessed, or that the pages may be accessed out of order, using a `Map` seems like a more appropriate data-structure here.

Finally, also changes the `pagePromises` to a *private* property since it's not supposed to be accessed from the "outside".
2021-12-09 15:30:10 +01:00
Jonas Jenwald
c5525dcb69 Change WorkerTransport.pageCache from an Array to a Map
Given that not all pages necessarily are being accessed, or that the pages may be accessed out of order, using a `Map` seems like a more appropriate data-structure here.
For one thing, this simplifies iteration since we no longer have to worry about/check if `pageCache`-entries are undefined (which will happen for *sparse* `Array`s).

Of particular note is that we're no longer attempting to "null" the `pageCache`-entry from within the `PDFPageProxy._destroy`-method. Given that *synchronous* JavaScript will always run to completion[1] and that we're looping through all pages in `WorkerTransport.destroy` and immediately clear the cache afterwards, that code did/does not really make a lot of sense (as far as I can tell).

Finally, also changes the `pageCache` to a *private* property since it's not supposed to be accessed from the "outside".

---
[1] Unless there are errors, of course.
2021-12-09 15:29:47 +01:00
Jonas Jenwald
6da0944fc7 [api-minor] Replace PDFDocumentProxy.getStats with a synchronous PDFDocumentProxy.stats getter
*Please note:* These changes will primarily benefit longer documents, somewhat at the expense of e.g. one-page documents.

The existing `PDFDocumentProxy.getStats` function, which in the default viewer is called for each rendered page, requires a round-trip to the worker-thread in order to obtain the current document stats. In the default viewer, we currently make one such API-call for *every rendered* page.
This patch proposes replacing that method with a *synchronous* `PDFDocumentProxy.stats` getter instead, combined with re-factoring the worker-thread code by adding a `DocStats`-class to track Stream/Font-types and *only send* them to the main-thread *the first time* that a type is encountered.

Note that in practice most PDF documents only use a fairly limited number of Stream/Font-types, which means that in longer documents most of the `PDFDocumentProxy.getStats`-calls will return the same data.[1]
This re-factoring will obviously benefit longer document the most[2], and could actually be seen as a regression for one-page documents, since in practice there'll usually be a couple of "DocStats" messages sent during the parsing of the first page. However, if the user zooms/rotates the document (which causes re-rendering), note that even a one-page document would start to benefit from these changes.

Another benefit of having the data available/cached in the API is that unless the document stats change during parsing, repeated `PDFDocumentProxy.stats`-calls will return *the same identical* object.
This is something that we can easily take advantage of in the default viewer, by now *only* reporting "documentStats" telemetry[3] when the data actually have changed rather than once per rendered page (again beneficial in longer documents).

---
[1] Furthermore, the maximium number of `StreamType`/`FontType` are `10` respectively `12`, which means that regardless of the complexity and page count in a PDF document there'll never be more than twenty-two "DocStats" messages sent; see 41ac3f0c07/src/shared/util.js (L206-L232)

[2] One example is the `pdf.pdf` document in the test-suite, where rendering all of its 1310 pages only result in a total of seven "DocStats" messages being sent from the worker-thread.

[3] Reporting telemetry, in Firefox, includes using `JSON.stringify` on the data and then sending an event to the `PdfStreamConverter.jsm`-code.
In that code the event is handled and `JSON.parse` is used to retrieve the data, and in the "documentStats"-case we'll then iterate through the data to avoid double-reporting telemetry; see https://searchfox.org/mozilla-central/rev/8f4c180b87e52f3345ef8a3432d6e54bd1eb18dc/toolkit/components/pdfjs/content/PdfStreamConverter.jsm#515-549
2021-11-20 12:20:55 +01:00
Jonas Jenwald
6f22327e61 [api-minor] Only use Workers when postMessage transfers are supported (PR 11123 follow-up)
Given that all modern browsers now support `postMessage` transfers, and have for years, it no longer seems necessary for the PDF.js library to support using Workers unless the `postMessage` transfers functionality is available.
This patch is a follow-up to PR 11123, which made it impossible to *manually* disable `postMessage` transfers for performance reasons (since it increases memory usage), which hasn't caused any bug reports as far as I know.[1]

Hence we'll now only support *proper* Worker implementations, with fully working `postMessage` transfers, and fallback to using "fake" Workers otherwise.

---
[1] At the time of that PR we still "supported" IE, which is why this code was left intact.
2021-11-19 16:47:58 +01:00
Calixte Denizet
33ea817b20 [api-minor] Render pushbuttons on their own canvas (bug 1737260)
- First step to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1737260;
 - several interactive pdfs use the possibility to hide/show buttons to show different icons;
 - render pushbuttons on their own canvas and then insert it the annotation_layer;
 - update test/driver.js in order to convert canvases for pushbuttons into images.
2021-11-12 15:37:33 +01:00
Jonas Jenwald
8fc9c7e41c Use even more optional chaining in the src/display/api.js file
This patch (slightly) simplifies a couple of `onProgress` and `onUnsupportedFeature` call-sites.
Finally, while unrelated, also removes some unnecessary `return undefined;` statements (PR 11601 follow-up).
2021-10-12 12:05:59 +02:00
Jonas Jenwald
d49b1bf2ee Use the native structuredClone implementation when it's available
With a recent addition to the HTML specification, the internal structured clone algorithm used in browsers is (or will be, once it's implemented) *directly* accessible to JavaScript; please see https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/structuredClone

Hence we'll *eventually* not need to maintain our own structured clone functionality in the `LoopbackPort`-class in the API, however for the time being we'll feature detect `structuredClone` and fallback to the existing PDF.js implementation.

Given that https://bugzilla.mozilla.org/show_bug.cgi?id=1722576 has landed in Firefox 94, we should no longer need the manually implemented `cloneValue`-functionality in MOZCENTRAL builds. Note also that in the Firefox built-in PDF Viewer it's not possible for users to *easily* disable workers, which should further reduce the risk of these changes.
2021-10-03 10:55:33 +02:00
Jonas Jenwald
1dcd2f0cd3 [api-minor] Add basic support for RTL text-content in PopupAnnotations (issue 14046)
In order to implement this, we utilize the existing `bidi` function to infer the text-direction of /T and /Contents entries. While this may not be perfect in cases where one PopupAnnotation mixes LTR and RTL languages, it should work well enough in most cases.
To avoid having to add *two new* properties in lots of annotations, supplementing the existing `title`/`contents`-properties, this patch instead re-factors the existing code such that the properties are replaced by Objects (containing `str` and `dir`).

*Please note:* In order avoid breaking existing third-party implementations, `GENERIC`-builds of the PDF.js library will still provide the old `title`/`contents`-properties on annotations returned by `PDFPageProxy.getAnnotations`.
2021-09-25 09:18:58 +02:00
Jonas Jenwald
fd1f0f647f Print a special warning message, in the viewer, for XFA Foreground documents
Currently XFAF documents use the same warning message as in the XFA *disabled* case, which is neither helpful nor correct.
2021-09-23 15:02:24 +02:00
Jonas Jenwald
6cba5509f2 Re-factor document.getElementsByName lookups in the AnnotationLayer (issue 14003)
This replaces direct `document.getElementsByName` lookups with a helper method which:
 - Lets the AnnotationLayer use the data returned by the `PDFDocumentProxy.getFieldObjects` API-method, such that we can directly lookup only the necessary DOM elements.
 - Fallback to using `document.getElementsByName` as before, such that e.g. the standalone viewer components still work.

Finally, to fix the problems reported in issue 14003, regardless of the code-path we now also enforce that the DOM elements found were actually created by the AnnotationLayer code.
With these changes we'll thus be able to update form elements on all visible pages just as before, but we'll additionally update the AnnotationStorage for not-yet-rendered elements thus fixing a pre-existing bug.
2021-09-23 13:05:18 +02:00
Jonas Jenwald
d854352cd5 Improve the API unit-tests by checking that PDFPageProxy.render returns a RenderTask-instance
This is similar to existing unit-tests, which checks for `PDFDocumentProxy`- and `PDFPageProxy`-instances.
2021-09-13 13:34:37 +02:00
Jonas Jenwald
fa7a607d33 Improve the API unit-tests by checking that getDocument returns a PDFDocumentLoadingTask-instance
This is similar to existing unit-tests, which checks for `PDFDocumentProxy`- and `PDFPageProxy`-instances.
2021-09-13 13:34:28 +02:00
Jonas Jenwald
ce3f5ea2bf Use async a bit more in the API
This patch changes the `PDFDocumentLoadingTask.destroy`-method and the `_fetchDocument`-function to be `async`, which slightly simplifies the relevant code.

Furthermore, remove the catch-handler from the `WorkerTransport.getPageIndex`-method since it's no longer needed. Given that the `MessageHandler` is nowadays wrapping every possible Exception, it's no longer necessary to try and re-wrap the reason here.
2021-08-29 12:31:28 +02:00
Jonas Jenwald
2a0ad8e696 Add deprecation warnings for the renderInteractiveForms and includeAnnotationStorage options, in PDFPageProxy.render
*This is done separately from the previous patch, to make it easier to revert these changes once they've been included in a couple of releases.*

Please note that because these two options are mutually exclusive, which is a large part of the reason for the previous patch, it's not guaranteed that the fallback-values will always be correct in every situation (but it's the best that we can do).
2021-08-24 01:40:12 +02:00
Jonas Jenwald
41efa3c071 [api-minor] Introduce a new annotationMode-option, in PDFPageProxy.{render, getOperatorList}
*This is a follow-up to PRs 13867 and 13899.*

This patch is tagged `api-minor` for the following reasons:
 - It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.

 - For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.

 - It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.

---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
2021-08-24 01:13:02 +02:00
Brendan Dahl
bf5a45ce6d
Merge pull request #13908 from brendandahl/xfa-find
[api-minor] XFA - Support text search in XFA documents.
2021-08-23 08:53:02 -07:00
Brendan Dahl
bb47128864 XFA - Support text search in XFA documents.
Moves the logic out of TextLayerBuilder to handle
highlighting matches into a new separate class `TextHighlighter`
that can be used with regular PDFs and XFA PDFs.

To mimic the current find functionality in XFA, two arrays
from the XFA rendering are created to get the text content
and map those to DOM nodes.

Fixes #13878
2021-08-23 08:44:20 -07:00
Jonas Jenwald
a7f0301f21 [Regression] Re-factor the *internal* includeAnnotationStorage handling, since it's currently subtly wrong
*This patch is very similar to the recently fixed `renderInteractiveForms`-options, see PR 13867.*
As far as I can tell, this *subtle* bug has existed ever since `AnnotationStorage`-support was first added in PR 12106 (a little over a year ago).

The value of the `includeAnnotationStorage`-option, as passed to the `PDFPageProxy.render` method, will (potentially) affect the size/content of the operatorList that's returned from the worker (for documents with forms).
Given that operatorLists will generally, unless they contain huge images, be cached in the API, repeated `PDFPageProxy.render` calls where the form-data has been changed by the user in between, can thus *wrongly* return a cached operatorList.

In the viewer we're only using the `includeAnnotationStorage`-option when printing, which is probably why this has gone unnoticed for so long. Note that we, for performance reasons, don't cache printing-operatorLists in the API.
However, there's nothing stopping an API-user from using the `includeAnnotationStorage`-option during "normal" rendering, which could thus result in *subtle* (and difficult to understand) rendering bugs.

In order to handle this, we need to know if the `AnnotationStorage`-instance has been updated since the last `PDFPageProxy.render` call. The most "correct" solution would obviously be to create a hash of the `AnnotationStorage` contents, however that would require adding a bunch of code, complexity, and runtime overhead.
Given that operatorList caching in the API doesn't have to be perfect[1], but only have to avoid *false* cache-hits, we can simplify things significantly be only keeping track of the last time that the `AnnotationStorage`-data was modified.

*Please note:* While working on this patch, I also noticed that the `renderInteractiveForms`- and `includeAnnotationStorage`-options in the `PDFPageProxy.render` method are mutually exclusive.[2]
Given that the various Annotation-related options in `PDFPageProxy.render` have been added at different times, this has unfortunately led to the current "messy" situation.[3]

---
[1] Note how we're already not caching operatorLists for pages with *huge* images, in order to save memory, hence there's no guarantee that operatorLists will always be cached.

[2] Setting both to `true` will result in undefined behaviour, since trying to insert `AnnotationStorage`-values into fields that are being excluded from the operatorList-building will obviously not work, which isn't at all clear from the documentation.

[3] My intention is to try and fix this in a follow-up PR, and I've got a WIP patch locally, however it will result in a number of API-observable changes.
2021-08-18 10:09:03 +02:00
Jonas Jenwald
1465b1670f [src/display/api.js] Move the getRenderingIntent helper function into WorkerTransport
By doing this re-factoring *separately*, since it's mostly a mechanical change, the size/scope of the next patch will be reduced somewhat.
2021-08-18 09:58:26 +02:00
Jonas Jenwald
6167566f1b Re-factor the BaseException.name handling, and clean-up some code
Once we're finally able to get rid of SystemJS, which is unfortunately still blocked on [bug 1247687](https://bugzilla.mozilla.org/show_bug.cgi?id=1247687), we might also want to clean-up (or even completely remove) the `BaseException` abstraction and simply extend `Error` directly instead.

At that point we'd need to (explicitly) set the `name` on each class anyway, so this patch is essentially preparing for future clean-up. Furthermore, after the `BaseException` abstraction was added there's been *multiple* issues filed about third-party minification breaking our code since `this.constructor.name` is not guaranteed to always do what you intended.

While hard-coding the strings indeed feels quite unfortunate, it's likely the "best" solution to avoid the problem described above.
2021-08-10 11:27:47 +02:00
Jonas Jenwald
7f2d524df5 Improve caching of Annotations-data, by using a Map, in the API
Rather than caching only the *last* `PDFPageProxy.getAnnotations` call, and having to handle the intent separately, we can instead implement the caching in exactly the same way as done in the `PDFPageProxy.{render, getOperatorList}` methods.
2021-08-08 08:14:51 +02:00
Tim van der Meij
036b81496e
Merge pull request #13882 from Snuffleupagus/PDFWorker-rm-closure
[api-minor] Remove the closure from the `PDFWorker` class, in the `src/display/api.js` file
2021-08-07 19:52:39 +02:00
Jonas Jenwald
1cf9405281 [api-minor] Remove the closure from the PDFWorker class, in the src/display/api.js file
This patch removes the only remaining closure in the `src/display/api.js` file, utilizing a similar approach as used in lots of other parts of the code-base, which results in a small decrease in the size of the *build* `pdf.js` file.

Given that `PDFWorker` is exposed through the *public* API, this complicates things somewhat since there's a couple of worker-related properties that really should stay *private*. Initially, while working on PR 13813, I believed that we'd need support for private (static) class fields in order to get rid of this closure, however I've managed to come up with what's hopefully deemed an acceptable work-around here.
Furthermore, some helper functions were simply moved into the `PDFWorker` class as static methods, thus simplifying the overall implementation (e.g. we don't need to manually cache the Promise in the `PDFWorker._setupFakeWorkerGlobal`-method).

Finally, as part of this re-factoring a number of missing JSDoc-comments were added which *together* with the removal of the closure significantly improves the `gulp jsdoc` output for the `PDFWorker` class.

*Please note:* This patch is tagged with `api-minor` since it deprecates `PDFWorker.getWorkerSrc()` in favor of the shorter `PDFWorker.workerSrc`, with the fallback limited to `GENERIC` builds.
2021-08-07 10:43:39 +02:00