Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	0c31320c12	[api-minor] Improve `thumbnail` handling in documents that contain interactive forms To improve performance of the sidebar we use the page-canvases to generate the thumbnails whenever possible, since that avoids unnecessary re-rendering when the sidebar is open. This works generally well, however there's an old problem in PDF documents that contain interactive forms (when those are enabled): Note how the thumbnails become partially (or fully) blank, since those Annotations are not included in the OperatorList.[1] We obviously want to keep using the `PDFThumbnailView.setImage`-method for most documents, however we need a way to skip it only for those pages that contain interactive forms. As it turns out it's unfortunately not all that simple to tell, after the fact, from looking only at the OperatorList that some Annotations were skipped. While it might have been possible to try and infer that in the viewer, it'd not have been pretty considering that at the time when rendering finishes the annotationLayer has not yet been built. The overall simplest solution that I could come up with, was instead to include a summary of the interactive form-state when doing the final "flushing" of the OperatorList and expose that information in the API. --- [1] Some examples from our test-suite: `annotation-tx2.pdf` where the thumbnail is completely blank, and `bug1737260.pdf` where the thumbnail is missing the "buttons" found on the page.	2022-07-30 16:53:32 +02:00
Jonas Jenwald	1cc7cecc7b	[api-minor] Introduce a `PrintAnnotationStorage` with frozen serializable data Given that printing is triggered synchronously in browsers, it's thus possible for scripting (in PDF documents) to modify the Annotation-data while printing is currently ongoing. To work-around that we add a new printing-specific `AnnotationStorage`, where the serializable data is frozen upon initialization, which the viewer can thus create/utilize during printing.	2022-06-23 17:06:46 +02:00
Jonas Jenwald	7e852851fd	A small memory-usage improvement for PDF documents opened from TypedArray-data This patch contains a small optimization specifically for the case when `getDocument` is called with TypedArray-data. In that case we'll still hold onto that data, which could obviously be large, even after the "GetDocRequest"-message has been sent to the worker-thread. In practice this will most likely not affect memory usage in any noticeable way, since the application calling `getDocument` will probably also be keeping a reference to the TypedArray-data. However, it seems like a good idea to ensure that the PDF.js API itself won't unnecessarily keep this data alive.	2022-05-29 16:37:18 +02:00
calixteman	cfac6fa511	Merge pull request #14874 from calixteman/colors [api-minor] Improve pdf reading in high contrast mode	2022-05-05 21:48:19 +02:00
Calixte Denizet	c8afd6ce8c	[api-minor] Improve pdf reading in high contrast mode - Use Canvas & CanvasText color when they don't have their default value as background and foreground colors. - The colors used to draw (stroke/fill) in a pdf are replaced by the bg/fg ones according to their luminance.	2022-05-05 16:34:51 +02:00
Tim van der Meij	899e4d58d6	Merge pull request #14870 from Snuffleupagus/isNodeJS-cleanup Only bundle the `src/display/node_utils.js` file in GENERIC-builds	2022-05-04 22:38:21 +02:00
Jonas Jenwald	8267fd8a52	Replace the `AnnotationStorage.lastModified`-getter with a proper hash-method The current `lastModified`-getter, which only contains a time-stamp, is a fairly crude way of detecting if the stored data has actually been changed. In particular, when the `getRawValue`-method is used, the `lastModified`-getter doesn't cope with data being modified from the "outside". To fix these issues[1], and to prevent any future bugs in this code, this patch introduces a new `AnnotationStorage.hash`-getter which computes a hash of the currently stored data. To simplify things this re-uses the existing `MurmurHash3_64`-implementation, which required moving that file into the `src/shared/`-folder, since its performance should be good enough here. --- [1] Given how the `AnnotationStorage.lastModified`-getter was used, this would have been limited to printing of forms.	2022-05-04 15:21:30 +02:00
Jonas Jenwald	d4fe4fd97b	Simplify a couple of `isNodeJS`-dependent `getDocument` default values Given that the `isNodeJS`-constant will, after PR 14858, always be `false` in non-GENERIC builds we can simplify a couple of `getDocument`-parameter default values slightly. The old format, with inline `PDFJSDev`-checks, wasn't exactly a wonder of readability; which was my fault.	2022-05-03 11:36:10 +02:00
Jonas Jenwald	7df47c289f	Only bundle the `src/display/node_utils.js` file in GENERIC-builds This first of all simplifies the file, since we no longer need dummy-classes and can instead directly define the actual classes. Furthermore, and more importantly, this means that we no longer need to bundle this code in e.g. MOZCENTRAL-builds which reduces the size of built `pdf.js` file slightly.	2022-05-03 11:34:35 +02:00
Jonas Jenwald	b996e107c3	Update `core-js` to allow removing a `structuredClone` work-around Because of a bug in previous `core-js` versions, which caused an Error to be thrown if its `structuredClone` polyfill was called with an explicit `null`/`undefined` transfer-parameter, the `LoopbackPort`-class contained a work-around. In the latest `core-js` version this has been fixed, and we can thus simplify our code ever so slightly; please see https://github.com/zloirock/core-js/releases/tag/v3.22.0	2022-04-15 22:12:02 +02:00
Calixte Denizet	040fcae5ab	Improve performance with image masks (bug 857031) - it aims to partially fix performance issue reported: https://bugzilla.mozilla.org/show_bug.cgi?id=857031; - the idea is too avoid to use byte arrays but use ImageBitmap which are a way faster to draw: * an ImageBitmap is Transferable which means that it can be built in the worker instead of in the main thread: - this is achieved in using an OffscreenCanvas when it's available, there is a bug to enable them for pdf.js: https://bugzilla.mozilla.org/show_bug.cgi?id=1763330; - or in using createImageBitmap: in Firefox a task is sent to the main thread to build the bitmap so it's slightly slower than using an OffscreenCanvas. * it's transfered from the worker to the main thread by "reference"; * the byte buffers used to create the image data have a very short lifetime and ergo the memory used is globally less than before. - Use the localImageCache for the mask; - Fix the pdf issue4436r.pdf: it was expected to have a binary stream for the image; - Move the singlePixel trick from operator_list to image: this way we can use this trick even if it isn't in a set as defined in operator_list.	2022-04-09 18:26:26 +02:00
Jonas Jenwald	849de5a508	Slightly improve validation of (some) parameters in `getDocument` There's a couple of `getDocument` parameters that should be numbers, but which are currently not fully validated to prevent issues elsewhere in the code-base. Also, improves validation of the `ownerDocument` parameter since we currently accept more-or-less anything here.	2022-03-21 13:32:17 +01:00
Jonas Jenwald	be2b1d5d2a	[src/display/api.js] Simplify the `sendTest` function, used with Worker initialization (PR 14291 follow-up) Given that we now only use Workers when `postMessage` transfers are supported, there's really no point in trying to send a "test" message without transfers present. Hence, if `postMessage` transfers are not supported by the browser, we'll now fallback to "fake" Workers immediately instead. The comment about Opera is also removed, since it was originally added back in PR 983 and mentions Opera `11.60` [which was released in 2011](https://en.wikipedia.org/wiki/History_of_the_Opera_web_browser#Version_11).	2022-03-16 13:25:41 +01:00
Jonas Jenwald	d5c9be341d	[src/display/api.js] Use private static class fields, rather than `shadow`ed getter work-arounds (PR 13813, 13882 follow-up) At the time private static class fields were to new, however that's no longer an issue and we can thus (ever so slightly) simplify the code.	2022-03-16 13:02:34 +01:00
Tim van der Meij	790735eaf1	Merge pull request #14658 from Snuffleupagus/api-validate-cMapUrl-standardFontDataUrl Validate the `cMapUrl`/`standardFontDataUrl` parameters in `getDocument`	2022-03-11 21:09:58 +01:00
Jonas Jenwald	a60b98412f	Validate the `cMapUrl`/`standardFontDataUrl` parameters in `getDocument` These changes make sense for two reasons: - Given that the parameters are potentially passed to the worker-thread, depending on the `useWorkerFetch` parameter, we need to prevent errors if the user provides values that aren't clonable. - By ensuring that the default values are indeed `null`, we'll trigger main-thread fetching (of CMaps and Standard fonts) as intended in the `PartialEvaluator` and thus potentially provide better Error messages.	2022-03-10 16:33:10 +01:00
Jonas Jenwald	537ed37835	Move the `isSameOrigin` helper function This function is currently placed in the `src/shared/util.js` file, which means that the code is duplicated in both of the built `pdf.js` and `pdf.worker.js` files. Furthermore, it only has a single call-site which is also specific to the `GENERIC`-build of the PDF.js library. Hence this helper function is instead moved into the `src/display/api.js` file, in such a way that it's conditionally defined but still can be unit-tested.	2022-03-10 13:51:09 +01:00
Jonas Jenwald	172d007598	[api-minor] Add validation for the `PDFDocumentProxy.getPageIndex` method Currently we'll happily attempt to send any argument passed to this method over to the worker-thread, without doing any sort of validation. That could obviously be quite bad, since there's first of all no protection against sending unclonable data. Secondly, it's also possible to pass data that will cause the `Ref.get` call in the worker-thread to fail immediately. In order to address all of these issues, we'll now properly validate the argument passed to `PDFDocumentProxy.getPageIndex` and when necessary reject already on the main-thread instead.	2022-02-24 12:01:51 +01:00
Jonas Jenwald	2be8036eb7	[api-minor] Reduce duplication in the "gets non-existent page" unit-test	2022-02-24 11:25:21 +01:00
Jonas Jenwald	bad15894fc	Improve the JSDocs for the `PDFObjects` class Given that we expose `PDFObjects`-instances, via the `commonObjs` and `objs` properties, on the `PDFPageProxy`-instances this ought to help provide slightly better TypeScript definitions.	2022-02-20 13:02:14 +01:00
Jonas Jenwald	f4712bc0ad	Simplify the data stored on `PDFObjects`-instances The manually tracked `resolved`-property is no longer necessary, since the same information is now directly available on all `PromiseCapability`-instances. Furthermore, since the `PDFObjects.resolve` method is not documented as accepting e.g. only Object-data, we probably shouldn't resolve the `PromiseCapability` with the `data` and instead only store it on the `PDFObjects`-instance.[1] --- [1] While Objects are passed by reference in JavaScript, other primitives such as e.g. strings are passed by value and the current implementation could thus lead to increased memory usage. Given how we're using `PDFObjects` in the PDF.js code-base none of this should be an issue, but it still cannot hurt to change this.	2022-02-20 12:33:33 +01:00
Jonas Jenwald	beecde3229	Introduce (some) private properties/methods in the `PDFObjects` class This ensures that the underlying data cannot be accessed directly, from the outside, since that's definately not intended here. Note that we expose `PDFObjects`-instances, via the `commonObjs` and `objs` properties, on the `PDFPageProxy`-instances hence these changes really cannot hurt.	2022-02-20 12:23:30 +01:00
Jonas Jenwald	1f0fb270b1	[api-minor] Ensure that the `PDFDocumentLoadingTask`-promise is rejected when cancelling the PasswordPrompt (bug 1754421) This is essentially a continuation of PR 7926, where we added support for rejecting the current `PDFDocumentLoadingTask`-promise by throwing inside of the `onPassword`-callback. Hence the naive way to address [bug 1754421](https://bugzilla.mozilla.org/show_bug.cgi?id=1754421) would be to simply throw in the `onPassword`-callback used in the default viewer. However it unfortunately turns out to not work, since the password input/validation is asynchronous, and we thus need another approach. The simplest solution that I can come up with here, is thus to extend the `onPassword`-callback to also reject the current `PDFDocumentLoadingTask`-instance if an `Error` is explicitly passed as the input to the callback function. (This doesn't feel great, but I cannot see a better solution that isn't really complicated.)	2022-02-09 15:09:20 +01:00
Jonas Jenwald	403baa7bba	[api-minor] Remove the `normalizeWhitespace` option in the `PDFPageProxy.{getTextContent, streamTextContent}` methods (issue 14519, PR 14428 follow-up) With these changes, we'll now always replace all whitespaces with standard spaces (0x20). This behaviour is already, since many years, the default in both the viewer and the browser-tests.	2022-02-03 09:17:22 +01:00
Jonas Jenwald	7cc761a8c0	Polyfill `structuredClone` with core-js (PR 13948 follow-up) This allows us to remove the manually implemented `structuredClone` polyfill, thus reducing the maintenance burden for the `LoopbackPort` class; refer to https://github.com/zloirock/core-js#structuredclone Please note: While `structuredClone` support landed already in Firefox 94, Google Chrome only added it in version 98 (currently in Beta). However, given that the `LoopbackPort` will only be used together with fake workers in browsers this shouldn't be too much of a problem.[1] For Node.js environments, where fake workers are unfortunately necessary, using a `legacy/`-build is already required which thus guarantees that the `structuredClone` polyfill is available. Also, the patch updates core-js to the latest version since that one includes `structuredClone` improvements; please see https://github.com/zloirock/core-js/releases/tag/v3.20.3 --- [1] Given that we only support browsers with proper worker support, if fake workers are being used that essentially indicates a configuration problem/error.	2022-01-27 21:11:42 +01:00
Jonas Jenwald	e0dba504d2	Fix broken/missing JSDocs and `typedef`s, to allow updating TypeScript to the latest version (issue 14342) This patch circumvents the issues seen when trying to update TypeScript to version `4.5`, by "simply" fixing the broken/missing JSDocs and `typedef`s such that `gulp typestest` now passes. As always, given that I don't really know anything about TypeScript, I cannot tell if this is a "correct" and/or proper way of doing things; we'll need TypeScript users to help out with testing! Please note: I'm sorry about the size of this patch, but given how intertwined all of this unfortunately is it just didn't seem easy to split this into smaller parts. However, one good thing about this TypeScript update is that it helped uncover a number of pre-existing bugs in our JSDocs comments.	2021-12-15 23:14:25 +01:00
Jonas Jenwald	760f765e56	Move the /Lang handling into the `BaseViewer` (PR 14114 follow-up) In PR 14114 this was only added to the default viewer, which means that in the viewer components the user would need to manually implement /Lang handling. This was (obviously) a bad choice, since the viewer components already support e.g. structTrees by default; sorry about overlooking this! To avoid having to make two `getMetadata` API-calls[1] very early during initialization, in the default viewer, the API will now cache its result. This will also come in handy elsewhere in the default viewer, e.g. by reducing parsing when opening the "document properties" dialog. --- [1] This not only includes a round-trip to the worker-thread, but also having to re-parse the /Metadata-entry when it exists.	2021-12-14 13:19:05 +01:00
Jonas Jenwald	f39536a30b	Change `WorkerTransport.pagePromises` from an Array to a Map Given that not all pages necessarily are being accessed, or that the pages may be accessed out of order, using a `Map` seems like a more appropriate data-structure here. Finally, also changes the `pagePromises` to a private property since it's not supposed to be accessed from the "outside".	2021-12-09 15:30:10 +01:00
Jonas Jenwald	c5525dcb69	Change `WorkerTransport.pageCache` from an Array to a Map Given that not all pages necessarily are being accessed, or that the pages may be accessed out of order, using a `Map` seems like a more appropriate data-structure here. For one thing, this simplifies iteration since we no longer have to worry about/check if `pageCache`-entries are undefined (which will happen for sparse `Array`s). Of particular note is that we're no longer attempting to "null" the `pageCache`-entry from within the `PDFPageProxy._destroy`-method. Given that synchronous JavaScript will always run to completion[1] and that we're looping through all pages in `WorkerTransport.destroy` and immediately clear the cache afterwards, that code did/does not really make a lot of sense (as far as I can tell). Finally, also changes the `pageCache` to a private property since it's not supposed to be accessed from the "outside". --- [1] Unless there are errors, of course.	2021-12-09 15:29:47 +01:00
Jonas Jenwald	6da0944fc7	[api-minor] Replace `PDFDocumentProxy.getStats` with a synchronous `PDFDocumentProxy.stats` getter Please note: These changes will primarily benefit longer documents, somewhat at the expense of e.g. one-page documents. The existing `PDFDocumentProxy.getStats` function, which in the default viewer is called for each rendered page, requires a round-trip to the worker-thread in order to obtain the current document stats. In the default viewer, we currently make one such API-call for every rendered page. This patch proposes replacing that method with a synchronous `PDFDocumentProxy.stats` getter instead, combined with re-factoring the worker-thread code by adding a `DocStats`-class to track Stream/Font-types and only send them to the main-thread the first time that a type is encountered. Note that in practice most PDF documents only use a fairly limited number of Stream/Font-types, which means that in longer documents most of the `PDFDocumentProxy.getStats`-calls will return the same data.[1] This re-factoring will obviously benefit longer document the most[2], and could actually be seen as a regression for one-page documents, since in practice there'll usually be a couple of "DocStats" messages sent during the parsing of the first page. However, if the user zooms/rotates the document (which causes re-rendering), note that even a one-page document would start to benefit from these changes. Another benefit of having the data available/cached in the API is that unless the document stats change during parsing, repeated `PDFDocumentProxy.stats`-calls will return the same identical object. This is something that we can easily take advantage of in the default viewer, by now only reporting "documentStats" telemetry[3] when the data actually have changed rather than once per rendered page (again beneficial in longer documents). --- [1] Furthermore, the maximium number of `StreamType`/`FontType` are `10` respectively `12`, which means that regardless of the complexity and page count in a PDF document there'll never be more than twenty-two "DocStats" messages sent; see `41ac3f0c07/src/shared/util.js (L206-L232)` [2] One example is the `pdf.pdf` document in the test-suite, where rendering all of its 1310 pages only result in a total of seven "DocStats" messages being sent from the worker-thread. [3] Reporting telemetry, in Firefox, includes using `JSON.stringify` on the data and then sending an event to the `PdfStreamConverter.jsm`-code. In that code the event is handled and `JSON.parse` is used to retrieve the data, and in the "documentStats"-case we'll then iterate through the data to avoid double-reporting telemetry; see https://searchfox.org/mozilla-central/rev/8f4c180b87e52f3345ef8a3432d6e54bd1eb18dc/toolkit/components/pdfjs/content/PdfStreamConverter.jsm#515-549	2021-11-20 12:20:55 +01:00
Jonas Jenwald	6f22327e61	[api-minor] Only use Workers when `postMessage` transfers are supported (PR 11123 follow-up) Given that all modern browsers now support `postMessage` transfers, and have for years, it no longer seems necessary for the PDF.js library to support using Workers unless the `postMessage` transfers functionality is available. This patch is a follow-up to PR 11123, which made it impossible to manually disable `postMessage` transfers for performance reasons (since it increases memory usage), which hasn't caused any bug reports as far as I know.[1] Hence we'll now only support proper Worker implementations, with fully working `postMessage` transfers, and fallback to using "fake" Workers otherwise. --- [1] At the time of that PR we still "supported" IE, which is why this code was left intact.	2021-11-19 16:47:58 +01:00
Calixte Denizet	33ea817b20	[api-minor] Render pushbuttons on their own canvas (bug 1737260) - First step to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1737260; - several interactive pdfs use the possibility to hide/show buttons to show different icons; - render pushbuttons on their own canvas and then insert it the annotation_layer; - update test/driver.js in order to convert canvases for pushbuttons into images.	2021-11-12 15:37:33 +01:00
Jonas Jenwald	8fc9c7e41c	Use even more optional chaining in the `src/display/api.js` file This patch (slightly) simplifies a couple of `onProgress` and `onUnsupportedFeature` call-sites. Finally, while unrelated, also removes some unnecessary `return undefined;` statements (PR 11601 follow-up).	2021-10-12 12:05:59 +02:00
Jonas Jenwald	d49b1bf2ee	Use the native `structuredClone` implementation when it's available With a recent addition to the HTML specification, the internal structured clone algorithm used in browsers is (or will be, once it's implemented) directly accessible to JavaScript; please see https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/structuredClone Hence we'll eventually not need to maintain our own structured clone functionality in the `LoopbackPort`-class in the API, however for the time being we'll feature detect `structuredClone` and fallback to the existing PDF.js implementation. Given that https://bugzilla.mozilla.org/show_bug.cgi?id=1722576 has landed in Firefox 94, we should no longer need the manually implemented `cloneValue`-functionality in MOZCENTRAL builds. Note also that in the Firefox built-in PDF Viewer it's not possible for users to easily disable workers, which should further reduce the risk of these changes.	2021-10-03 10:55:33 +02:00
Jonas Jenwald	1dcd2f0cd3	[api-minor] Add basic support for RTL text-content in PopupAnnotations (issue 14046) In order to implement this, we utilize the existing `bidi` function to infer the text-direction of /T and /Contents entries. While this may not be perfect in cases where one PopupAnnotation mixes LTR and RTL languages, it should work well enough in most cases. To avoid having to add two new properties in lots of annotations, supplementing the existing `title`/`contents`-properties, this patch instead re-factors the existing code such that the properties are replaced by Objects (containing `str` and `dir`). Please note: In order avoid breaking existing third-party implementations, `GENERIC`-builds of the PDF.js library will still provide the old `title`/`contents`-properties on annotations returned by `PDFPageProxy.getAnnotations`.	2021-09-25 09:18:58 +02:00
Jonas Jenwald	fd1f0f647f	Print a special warning message, in the viewer, for XFA Foreground documents Currently XFAF documents use the same warning message as in the XFA disabled case, which is neither helpful nor correct.	2021-09-23 15:02:24 +02:00
Jonas Jenwald	6cba5509f2	Re-factor `document.getElementsByName` lookups in the AnnotationLayer (issue 14003) This replaces direct `document.getElementsByName` lookups with a helper method which: - Lets the AnnotationLayer use the data returned by the `PDFDocumentProxy.getFieldObjects` API-method, such that we can directly lookup only the necessary DOM elements. - Fallback to using `document.getElementsByName` as before, such that e.g. the standalone viewer components still work. Finally, to fix the problems reported in issue 14003, regardless of the code-path we now also enforce that the DOM elements found were actually created by the AnnotationLayer code. With these changes we'll thus be able to update form elements on all visible pages just as before, but we'll additionally update the AnnotationStorage for not-yet-rendered elements thus fixing a pre-existing bug.	2021-09-23 13:05:18 +02:00
Jonas Jenwald	d854352cd5	Improve the API unit-tests by checking that `PDFPageProxy.render` returns a `RenderTask`-instance This is similar to existing unit-tests, which checks for `PDFDocumentProxy`- and `PDFPageProxy`-instances.	2021-09-13 13:34:37 +02:00
Jonas Jenwald	fa7a607d33	Improve the API unit-tests by checking that `getDocument` returns a `PDFDocumentLoadingTask`-instance This is similar to existing unit-tests, which checks for `PDFDocumentProxy`- and `PDFPageProxy`-instances.	2021-09-13 13:34:28 +02:00
Jonas Jenwald	ce3f5ea2bf	Use `async` a bit more in the API This patch changes the `PDFDocumentLoadingTask.destroy`-method and the `_fetchDocument`-function to be `async`, which slightly simplifies the relevant code. Furthermore, remove the catch-handler from the `WorkerTransport.getPageIndex`-method since it's no longer needed. Given that the `MessageHandler` is nowadays wrapping every possible Exception, it's no longer necessary to try and re-wrap the reason here.	2021-08-29 12:31:28 +02:00
Jonas Jenwald	2a0ad8e696	Add deprecation warnings for the `renderInteractiveForms` and `includeAnnotationStorage` options, in `PDFPageProxy.render` This is done separately from the previous patch, to make it easier to revert these changes once they've been included in a couple of releases. Please note that because these two options are mutually exclusive, which is a large part of the reason for the previous patch, it's not guaranteed that the fallback-values will always be correct in every situation (but it's the best that we can do).	2021-08-24 01:40:12 +02:00
Jonas Jenwald	41efa3c071	[api-minor] Introduce a new `annotationMode`-option, in `PDFPageProxy.{render, getOperatorList}` This is a follow-up to PRs 13867 and 13899. This patch is tagged `api-minor` for the following reasons: - It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour. - For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method. - It's now also possible to disable all annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282. --- [1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of all annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.	2021-08-24 01:13:02 +02:00
Brendan Dahl	bf5a45ce6d	Merge pull request #13908 from brendandahl/xfa-find [api-minor] XFA - Support text search in XFA documents.	2021-08-23 08:53:02 -07:00
Brendan Dahl	bb47128864	XFA - Support text search in XFA documents. Moves the logic out of TextLayerBuilder to handle highlighting matches into a new separate class `TextHighlighter` that can be used with regular PDFs and XFA PDFs. To mimic the current find functionality in XFA, two arrays from the XFA rendering are created to get the text content and map those to DOM nodes. Fixes #13878	2021-08-23 08:44:20 -07:00
Jonas Jenwald	a7f0301f21	[Regression] Re-factor the internal `includeAnnotationStorage` handling, since it's currently subtly wrong This patch is very similar to the recently fixed `renderInteractiveForms`-options, see PR 13867. As far as I can tell, this subtle bug has existed ever since `AnnotationStorage`-support was first added in PR 12106 (a little over a year ago). The value of the `includeAnnotationStorage`-option, as passed to the `PDFPageProxy.render` method, will (potentially) affect the size/content of the operatorList that's returned from the worker (for documents with forms). Given that operatorLists will generally, unless they contain huge images, be cached in the API, repeated `PDFPageProxy.render` calls where the form-data has been changed by the user in between, can thus wrongly return a cached operatorList. In the viewer we're only using the `includeAnnotationStorage`-option when printing, which is probably why this has gone unnoticed for so long. Note that we, for performance reasons, don't cache printing-operatorLists in the API. However, there's nothing stopping an API-user from using the `includeAnnotationStorage`-option during "normal" rendering, which could thus result in subtle (and difficult to understand) rendering bugs. In order to handle this, we need to know if the `AnnotationStorage`-instance has been updated since the last `PDFPageProxy.render` call. The most "correct" solution would obviously be to create a hash of the `AnnotationStorage` contents, however that would require adding a bunch of code, complexity, and runtime overhead. Given that operatorList caching in the API doesn't have to be perfect[1], but only have to avoid false cache-hits, we can simplify things significantly be only keeping track of the last time that the `AnnotationStorage`-data was modified. Please note: While working on this patch, I also noticed that the `renderInteractiveForms`- and `includeAnnotationStorage`-options in the `PDFPageProxy.render` method are mutually exclusive.[2] Given that the various Annotation-related options in `PDFPageProxy.render` have been added at different times, this has unfortunately led to the current "messy" situation.[3] --- [1] Note how we're already not caching operatorLists for pages with huge images, in order to save memory, hence there's no guarantee that operatorLists will always be cached. [2] Setting both to `true` will result in undefined behaviour, since trying to insert `AnnotationStorage`-values into fields that are being excluded from the operatorList-building will obviously not work, which isn't at all clear from the documentation. [3] My intention is to try and fix this in a follow-up PR, and I've got a WIP patch locally, however it will result in a number of API-observable changes.	2021-08-18 10:09:03 +02:00
Jonas Jenwald	1465b1670f	[src/display/api.js] Move the `getRenderingIntent` helper function into `WorkerTransport` By doing this re-factoring separately, since it's mostly a mechanical change, the size/scope of the next patch will be reduced somewhat.	2021-08-18 09:58:26 +02:00
Jonas Jenwald	6167566f1b	Re-factor the `BaseException.name` handling, and clean-up some code Once we're finally able to get rid of SystemJS, which is unfortunately still blocked on [bug 1247687](https://bugzilla.mozilla.org/show_bug.cgi?id=1247687), we might also want to clean-up (or even completely remove) the `BaseException` abstraction and simply extend `Error` directly instead. At that point we'd need to (explicitly) set the `name` on each class anyway, so this patch is essentially preparing for future clean-up. Furthermore, after the `BaseException` abstraction was added there's been multiple issues filed about third-party minification breaking our code since `this.constructor.name` is not guaranteed to always do what you intended. While hard-coding the strings indeed feels quite unfortunate, it's likely the "best" solution to avoid the problem described above.	2021-08-10 11:27:47 +02:00
Jonas Jenwald	7f2d524df5	Improve caching of Annotations-data, by using a `Map`, in the API Rather than caching only the last `PDFPageProxy.getAnnotations` call, and having to handle the intent separately, we can instead implement the caching in exactly the same way as done in the `PDFPageProxy.{render, getOperatorList}` methods.	2021-08-08 08:14:51 +02:00
Tim van der Meij	036b81496e	Merge pull request #13882 from Snuffleupagus/PDFWorker-rm-closure [api-minor] Remove the closure from the `PDFWorker` class, in the `src/display/api.js` file	2021-08-07 19:52:39 +02:00
Jonas Jenwald	1cf9405281	[api-minor] Remove the closure from the `PDFWorker` class, in the `src/display/api.js` file This patch removes the only remaining closure in the `src/display/api.js` file, utilizing a similar approach as used in lots of other parts of the code-base, which results in a small decrease in the size of the build `pdf.js` file. Given that `PDFWorker` is exposed through the public API, this complicates things somewhat since there's a couple of worker-related properties that really should stay private. Initially, while working on PR 13813, I believed that we'd need support for private (static) class fields in order to get rid of this closure, however I've managed to come up with what's hopefully deemed an acceptable work-around here. Furthermore, some helper functions were simply moved into the `PDFWorker` class as static methods, thus simplifying the overall implementation (e.g. we don't need to manually cache the Promise in the `PDFWorker._setupFakeWorkerGlobal`-method). Finally, as part of this re-factoring a number of missing JSDoc-comments were added which together with the removal of the closure significantly improves the `gulp jsdoc` output for the `PDFWorker` class. Please note: This patch is tagged with `api-minor` since it deprecates `PDFWorker.getWorkerSrc()` in favor of the shorter `PDFWorker.workerSrc`, with the fallback limited to `GENERIC` builds.	2021-08-07 10:43:39 +02:00

1 2 3 4 5 ...