Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	1753e321cd	Remove the compatibility checks in `WorkerMessageHandler.createDocumentHandler` For some time these checks have only targeted Node.js environments, since the features in question exist in all supported browsers (even when a `legacy`-build is used). Now that we've updated the minimum supported Node.js version to 18, a number of polyfills are thus (finally) no longer necessary in that environment. Hence for certain basic functionality, such as e.g. text-extraction, it's now possible to use either a modern- or a `legacy`-build of the PDF.js library in Node.js environments. Please note: For e.g. canvas-rendering in Node.js environments it's still necessary to use a `legacy`-build, since that functionality requires various polyfills.	2023-05-07 13:43:19 +02:00
Jonas Jenwald	ed8be6f882	[api-minor] Update the minimum supported Node.js version to 18 This patch updates the minimum supported environments as follows: - Node.js 18, which was released on 2022-04-19; see https://en.wikipedia.org/wiki/Node.js#Releases Note also that Node.js 16 will soon reach EOL, and thus no longer receive any security updates.	2023-05-07 13:43:19 +02:00
Jonas Jenwald	d950b91c4e	Introduce some logical assignment in the `src/core/` folder	2023-04-29 13:49:37 +02:00
Jonas Jenwald	317abd6d07	Change the `createPromiseCapability` helper function into a `PromiseCapability` class This is not only slightly more compact, but it also simplifies the handling of the `settled` getter.	2023-04-29 13:43:24 +02:00
Tim van der Meij	c9359957e6	Merge pull request #16305 from Snuffleupagus/PDFJSDev-skip-PRODUCTION Remove the `PRODUCTION` build-target	2023-04-22 14:53:30 +02:00
Calixte Denizet	117bbf7cd9	[api-minor] Don't normalize the text used in the text layer. Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized when creating the search query. So to avoid to duplicate the normalization code, everything is moved in the find controller. The previous code to normalize text was using NFKC but with a hardcoded map, hence it has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size by 30kb). In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into account some RTL unicode ranges, the generated font wasn't embedding the mapping this char and the unicode ranges in the OS/2 table weren't up-to-date. When normalized some chars can be replaced by several ones and it induced to have some extra chars in the text layer. To avoid any regression, when copying some text from the text layer, a copied string is normalized (NFKC) before being put in the clipboard (it works like this in either Acrobat or Chrome).	2023-04-17 14:31:23 +02:00
Jonas Jenwald	804aa896a7	Stop using the `PRODUCTION` build-target in the JavaScript code This special build-target is very old, and was introduced with the first pre-processor that only uses comments to enable/disable code. When the new pre-processor was added `PRODUCTION` effectively became redundant, at least in JavaScript code, since `typeof PDFJSDev === "undefined"` checks now do the same thing. This patch proposes that we remove `PRODUCTION` from the JavaScript code, since that simplifies the conditions and thus improves readability in many cases. Please note: There's not, nor has there ever been, any gulp-task that set `PRODUCTION = false` during building.	2023-04-17 12:04:34 +02:00
Jonas Jenwald	edd13895dd	Limit the `Path2D`-checks in the worker-thread to Node.js (PR 16238 follow-up, issue 16289) The changes in PR 16238 were intended specifically for Node.js environments, however they accidentally applied to older browsers as well. Please note: In up-to-date browsers `Path2D` is available in Workers, which should be connected to the introduction of `OffscreenCanvas`.	2023-04-14 11:51:11 +02:00
Tim van der Meij	13f2426aab	Merge pull request #16238 from Snuffleupagus/update-Node-compat-check Update the Node.js compatibility-check in the worker-thread	2023-04-01 14:20:33 +02:00
Jonas Jenwald	57a307d0cd	Update the Node.js compatibility-check in the worker-thread Please note: In Node.js environments a `legacy`-build must be used since only those versions include any polyfills. Previously we'd only check if `ReadableStream` is natively supported, however since Node.js version 18 that's now been implemented; please see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#browser_compatibility Hence we'll also check for the availability of `Path2D`, since that's browser-specific functionality not expected to be available in Node.js environments; please see https://developer.mozilla.org/en-US/docs/Web/API/Path2D#browser_compatibility	2023-03-30 18:36:15 +02:00
Jonas Jenwald	5063a6f2a9	[api-minor] Remove the `disableCombineTextItems` option Please note: This parameter has never been used within the PDF.js library/viewer itself, and it was only ever added for backwards compatibility reasons. This parameter was added in PR 7475, over six years ago, to try and optionally maintain the previous default text-extraction behaviour. However as part of the general text-extraction improvements in PR 13257, almost two years ago, the `disableCombineTextItems` functionality was accidentally "broken" in various ways. Note how the only (very basic) unit-test was updated in a way that doesn't really make sense, since generally speaking you'd expect that using the option should result in more (or at least the same number of) text-items. Furthermore there's also the recent issue 16209, where the option causes almost all textContent to be concatenated together. Hence this patch proposes that we simply remove the `disableCombineTextItems` option since it's essentially unused/untested functionality, as evident from the fact that it took almost two years for someone to notice that it's broken.	2023-03-30 14:23:38 +02:00
Calixte Denizet	2d0f30a67c	Use the position of the previous xref stream if any when saving a pdf (bug 1823296)	2023-03-21 19:27:24 +01:00
Calixte Denizet	3a21423386	[Acroform] Use the full path to find the node in the XFA datasets where to store the value I noticed several 'Path not found' errors because of a field called #subform[2]. From the XFA specs, the hash is used for a class of elements in the template tree. When we're looking for a node in the datasets tree, it doesn't make sense to search for a class. Hence the path element starting with a hash are just skipped.	2023-02-23 12:09:39 +01:00
Tim van der Meij	22618213c7	Merge pull request #16040 from Snuffleupagus/arrayBuffersToBytes Re-factor the `arraysToBytes` helper function (PR 16032 follow-up)	2023-02-12 11:47:57 +01:00
Jonas Jenwald	6d4d402a78	Move the `arrayBuffersToBytes` helper function into the worker-thread Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the built `pdf.js` and `pdf.worker.js` files.	2023-02-11 21:34:37 +01:00
Jonas Jenwald	18042163ce	Improve the consistency between the `LocalPdfManager`/`NetworkPdfManager` constructor Currently these classes take a bunch of parameters (somewhat randomly ordered), probably because this is very old code that's been extended over the years. Hence this patch changes the constructors to use parameter-objects instead, which improves consistency and (slightly) reduces the amount of code as well. Please note: Also removes the `msgHandler`-property on these classes, since I cannot find a single call-site that accesses it.	2023-02-11 13:39:52 +01:00
Jonas Jenwald	14b0e8c0b6	Ensure that "GetAnnotations" errors are propagated to the main-thread (PR 15267 follow-up) With the changes in PR 15267 we're now accidentally swallowing "GetAnnotations" errors, rather than propagating them to the main-thread as intended.	2023-02-10 12:18:35 +01:00
Jonas Jenwald	c56f25409d	Re-factor the `arraysToBytes` helper function (PR 16032 follow-up) Currently this helper function only has two call-sites, and both of them only pass in `ArrayBuffer` data. Given how it's implemented there's a couple of code-paths that are completely unused (e.g. the "string" one), and in particular the intended fast-paths don't actually work. This patch re-factors and simplifies the helper function, and it'll no longer accept anything except `ArrayBuffer` data (hence why it's also re-named). Note that at the time when `arraysToBytes` was added we still supported browsers without TypedArray functionality, and we'd then simulate them using regular Arrays.	2023-02-10 10:26:35 +01:00
Jonas Jenwald	5ba596786c	Change `WorkerTasks`, in `WorkerMessageHandler.createDocumentHandler`, to a use a Set This is a tiny bit more compact, thanks to the `Set.prototype.delete` method.	2023-02-09 22:01:16 +01:00
Jonas Jenwald	96d338e437	Reduce usage of the `arrayByteLength` helper function We're using this helper function when reading data from the [`PDFWorkerStreamReader.read`](`a49d1d1615/src/core/worker_stream.js (L90-L98)`) and [`PDFWorkerStreamRangeReader.read`](`a49d1d1615/src/core/worker_stream.js (L122-L128)`) methods, and as can be seen they always return `ArrayBuffer` data. Hence we can simply get the `byteLength` directly, and don't need to use the helper function. Note that at the time when `arrayByteLength` was added we still supported browsers without TypedArray functionality, and we'd then simulate them using regular Arrays.	2023-02-09 15:50:38 +01:00
Jonas Jenwald	70d362f22c	Remove an unnecessary variable in `getPdfManager`, in the `src/core/worker.js` file Another tiny piece of clean-up, since adding a `catch`-handler to a Promise shouldn't require an intermediate variable.	2022-11-17 15:31:41 +01:00
Jonas Jenwald	a2a200175f	Remove unnecessary function names in the `src/core/worker.js` file Currently some functions in this file have names while others don't, and in a few cases the names are no longer entirely accurate. For the relevant functions there should really be no need to name them, and if memory serves this was originally done since browsers (many years ago) didn't always handle anonymous functions correctly in stack traces.	2022-11-17 15:12:48 +01:00
Calixte Denizet	3ca03603c2	[Annotation] Fix printing/saving for annotations containing some non-ascii chars and with no fonts to handle them (bug 1666824) - For text fields * when printing, we generate a fake font which contains some widths computed thanks to an OffscreenCanvas and its method measureText. In order to avoid to have to layout the glyphs ourselves, we just render all of them in one call in the showText method in using the system sans-serif/monospace fonts. * when saving, we continue to create the appearance streams if the fonts contain the char but when a char is missing, we just set, in the AcroForm dict, the flag /NeedAppearances to true and remove the appearance stream. This way, we let the different readers handle the rendering of the strings. - For FreeText annotations * when printing, we use the same trick as for text fields. * there is no need to save an appearance since Acrobat is able to infer one from the Content entry.	2022-11-10 19:05:39 +01:00
Jonas Jenwald	caef47a0cf	Remove the `PdfManager.onLoadedStream` method (PR 15616 follow-up) After the clean-up in PR 15616, the `PdfManager.onLoadedStream` method now only has a single call-site. Hence why this patch suggests that we remove this method and replace it with an optional parameter in `PdfManager.requestLoadedStream` instead. By making the new behaviour opt-in, we'll thus not change any existing call-site.	2022-10-29 14:42:17 +02:00
Jonas Jenwald	bcffbf74f3	Let the `PdfManager.requestLoadedStream` method return the stream This is very old code, and it could thus do with some simplification. Note how in the `src/core/worker.js` file we're combining both the `PdfManager.requestLoadedStream` and `PdfManager.onLoadedStream` methods in order to access the stream-data. This seems unnecessary, and it's simple enough to always let the `PdfManager.requestLoadedStream` method return the stream-data as well.	2022-10-24 17:00:48 +02:00
Jonas Jenwald	f2f0a1e871	[api-minor] Stop sending "UnsupportedFeature" from the worker-thread GetOperatorList-handling This code was added all the way back in PR 6698, almost seven years ago, for backwards compatibility reasons. At this point in time, it seems that we can remove that since: - We have more fine-grained "UnsupportedFeature" reporting elsewhere in the worker-thread code nowadays. - The GetOperatorList-handling is now using `ReadableStream`s, which means that errors are being forwarded to the main-thread anyway. - We're also no longer displaying a notification-bar, in the built-in Firefox PDF Viewer, for any of these "UnsupportedFeature" messages.	2022-10-13 11:46:17 +02:00
Jonas Jenwald	8a4f6aca97	Stop using the `source`-object when sending "GetDocRequest" Looking at the code on the worker-thread, there doesn't appear to be any particular reason for placing some of the properties in a `source`-object when sending them with "GetDocRequest". As is often the case the explanation for this structure is rather "for historical reasons", since originally we simply sent the `source`-object as-is. Doing that was obviously a bad idea, for a couple of reasons: - It makes it less clear what is/isn't actually needed on the worker-thread. - Sending unused properties will unnecessarily increase memory usage. - The `source`-object may contain unclonable data, which would break the library.	2022-10-09 12:45:24 +02:00
Jonas Jenwald	c84b717773	Group the `evaluatorOptions` on the main-thread, when sending "GetDocRequest" Rather than sending all of these parameters individually and then grouping them together on the worker-thread, we can simply handle that in the API instead.	2022-10-09 12:31:03 +02:00
Jonas Jenwald	1ea4c4b519	[api-minor] Make `isOffscreenCanvasSupported` configurable via the API (issue 14952) This patch first of all makes `isOffscreenCanvasSupported` configurable, defaulting to `true` in browsers and `false` in Node.js environments, with a new `getDocument` parameter. While you normally want to use this, in order to improve performance, it should still be possible for users to control it (similar to e.g. `isEvalSupported`). The specific problem, as reported in issue 14952, is that the SVG back-end doesn't support the new ImageMask data-format that's introduced in PR 14754. In particular: - When the SVG back-end is used in Node.js environments, this patch will "just work" without the user needing to make any code changes. - If the SVG back-end is used in browsers, this patch will require that `isOffscreenCanvasSupported: false` is added to the `getDocument`-call.	2022-10-07 00:10:46 +02:00
Calixte Denizet	31155740c3	[Annotation] Add a div containing the text of a FreeText annotation (bug 1780375) An annotation doesn't have to be in the text flow, hence it's likely a bad idea to insert its text in the text layer. But the text must be visible from a screen reader point of view so it must somewhere in the DOM. So with this patch, the text from a FreeText annotation is extracted and added in a div in its HTML counterpart, and with the patch #15237 the text should be visible and positioned relatively to the text flow.	2022-08-04 11:14:05 +02:00
Jonas Jenwald	37ebc28756	Use more `for...of` loops in the code-base Note that these cases, which are all in older code, were found using the [`unicorn/no-for-loop`](https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/no-for-loop.md) ESLint plugin rule. However, note that I've opted not to enable this rule by default since there's still some cases where I do think that it makes sense to allow "regular" for-loops.	2022-07-17 16:18:54 +02:00
Calixte Denizet	f27c8c4471	[Editor] Add support for printing newly added Ink annotations	2022-06-21 18:21:49 +02:00
Jonas Jenwald	57c10ac213	Simplify the `newRefs` computation in the "SaveDocument"-handler in the worker-thread - Let the `Page.save`-method filter out "empty" entries, similar to the `Page._parsedAnnotations`-getter, since that on its own already simplifies the "SaveDocument"-handler a tiny bit. - The existing `reduce` and `concat` construction isn't exactly a wonder of readability :-) Thanks to modern JavaScript features it should be possible to replace all of this with `Array.prototype.flat()` instead, which at least to me feels a lot easier to understand.	2022-06-19 18:21:51 +02:00
Calixte Denizet	7773b3f5be	[edition] Add support for saving a newly added FreeText	2022-06-08 14:34:09 +02:00
apeltop	a97dd26389	Correct typos	2022-04-09 09:43:18 +09:00
Jonas Jenwald	be2b1d5d2a	[src/display/api.js] Simplify the `sendTest` function, used with Worker initialization (PR 14291 follow-up) Given that we now only use Workers when `postMessage` transfers are supported, there's really no point in trying to send a "test" message without transfers present. Hence, if `postMessage` transfers are not supported by the browser, we'll now fallback to "fake" Workers immediately instead. The comment about Opera is also removed, since it was originally added back in PR 983 and mentions Opera `11.60` [which was released in 2011](https://en.wikipedia.org/wiki/History_of_the_Opera_web_browser#Version_11).	2022-03-16 13:25:41 +01:00
Jonas Jenwald	172d007598	[api-minor] Add validation for the `PDFDocumentProxy.getPageIndex` method Currently we'll happily attempt to send any argument passed to this method over to the worker-thread, without doing any sort of validation. That could obviously be quite bad, since there's first of all no protection against sending unclonable data. Secondly, it's also possible to pass data that will cause the `Ref.get` call in the worker-thread to fail immediately. In order to address all of these issues, we'll now properly validate the argument passed to `PDFDocumentProxy.getPageIndex` and when necessary reject already on the main-thread instead.	2022-02-24 12:01:51 +01:00
Jonas Jenwald	a2f9031e9a	Ensure that `Dict.set` only accepts string `key`s Trying to use a non-string `key` in a `Dict` is not intended, and would basically be an implementation error. Hence we can add a non-PRODUCTION check to enforce this, complementing the existing `value` check added in PR 11672.	2022-02-22 16:35:20 +01:00
Jonas Jenwald	b89595fd20	[api-minor] Remove the, in `legacy` builds, bundled `ReadableStream` polyfill According to the MDN compatibility data, see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#browser_compatibility, all browsers that we support have native `ReadableStream` implementations (since quite some time too). Hence only Node.js is now lagging behind w.r.t. `ReadableStream` support, and its experimental implementation doesn't really help us given the life-span of the LTS releases (see https://en.wikipedia.org/wiki/Node.js#Releases). It seems quite unfortunate to bundle a `ReadableStream` polyfill in the `legacy` builds when it's unnecessary in browsers, given its overall size, but fortunately we can avoid that by simply listing `web-streams-polyfill` as a dependency for the `pdfjs-dist` library.	2022-02-13 10:15:58 +01:00
Jonas Jenwald	403baa7bba	[api-minor] Remove the `normalizeWhitespace` option in the `PDFPageProxy.{getTextContent, streamTextContent}` methods (issue 14519, PR 14428 follow-up) With these changes, we'll now always replace all whitespaces with standard spaces (0x20). This behaviour is already, since many years, the default in both the viewer and the browser-tests.	2022-02-03 09:17:22 +01:00
Jonas Jenwald	8836593b9e	Add a (global) cache to the `getCharUnicodeCategory` function Given that the regular expression has already become more complex (after the initial patch adding it), it seems to me that it probably cannot hurt to add a global cache to reduce unnecessary re-parsing. Obviously the `Glyph`-instances are being cached per font, however in most documents multiple fonts are being used and in practice there's very often a fair amount of overlap between the /ToUnicode-data in different fonts[1]. Consider for example loading and rendering the entire `tracemonkey.pdf` document (from the test-suite), which isn't a particularily large document. In that case the `getCharUnicodeCategory` function is being called a total of `601` times, however there's only `106` unique unicode-chars being checked. Please note: In practice I suppose that this won't have a huge effect on overall performance, however given the relative simplicity of this patch I figured that it'd not hurt to submit it for review. --- [1] Consider e.g. how there's usually different fonts used for regular, bold, respectively italic text.	2022-01-25 09:59:34 +01:00
Jonas Jenwald	d0c4bbd828	[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303) This patch basically extends the approach from PR 10392, by also checking the last page. Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an integer /Count entry it must also be correct/valid. As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser). Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the entire /Pages-tree and essentially counting the pages. To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the entire /Pages-tree to determine the number of pages. Unfortunately these changes will have a number of somewhat negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug. - This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the last page of the PDF documents. - For poorly generated PDF documents, where the entire /Pages-tree only has one level, we'll unfortunately need to fetch/parse the entire /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of some long PDF documents, - This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost. As one small additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value). Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.	2021-11-27 21:57:35 +01:00
Jonas Jenwald	6da0944fc7	[api-minor] Replace `PDFDocumentProxy.getStats` with a synchronous `PDFDocumentProxy.stats` getter Please note: These changes will primarily benefit longer documents, somewhat at the expense of e.g. one-page documents. The existing `PDFDocumentProxy.getStats` function, which in the default viewer is called for each rendered page, requires a round-trip to the worker-thread in order to obtain the current document stats. In the default viewer, we currently make one such API-call for every rendered page. This patch proposes replacing that method with a synchronous `PDFDocumentProxy.stats` getter instead, combined with re-factoring the worker-thread code by adding a `DocStats`-class to track Stream/Font-types and only send them to the main-thread the first time that a type is encountered. Note that in practice most PDF documents only use a fairly limited number of Stream/Font-types, which means that in longer documents most of the `PDFDocumentProxy.getStats`-calls will return the same data.[1] This re-factoring will obviously benefit longer document the most[2], and could actually be seen as a regression for one-page documents, since in practice there'll usually be a couple of "DocStats" messages sent during the parsing of the first page. However, if the user zooms/rotates the document (which causes re-rendering), note that even a one-page document would start to benefit from these changes. Another benefit of having the data available/cached in the API is that unless the document stats change during parsing, repeated `PDFDocumentProxy.stats`-calls will return the same identical object. This is something that we can easily take advantage of in the default viewer, by now only reporting "documentStats" telemetry[3] when the data actually have changed rather than once per rendered page (again beneficial in longer documents). --- [1] Furthermore, the maximium number of `StreamType`/`FontType` are `10` respectively `12`, which means that regardless of the complexity and page count in a PDF document there'll never be more than twenty-two "DocStats" messages sent; see `41ac3f0c07/src/shared/util.js (L206-L232)` [2] One example is the `pdf.pdf` document in the test-suite, where rendering all of its 1310 pages only result in a total of seven "DocStats" messages being sent from the worker-thread. [3] Reporting telemetry, in Firefox, includes using `JSON.stringify` on the data and then sending an event to the `PdfStreamConverter.jsm`-code. In that code the event is handled and `JSON.parse` is used to retrieve the data, and in the "documentStats"-case we'll then iterate through the data to avoid double-reporting telemetry; see https://searchfox.org/mozilla-central/rev/8f4c180b87e52f3345ef8a3432d6e54bd1eb18dc/toolkit/components/pdfjs/content/PdfStreamConverter.jsm#515-549	2021-11-20 12:20:55 +01:00
Jonas Jenwald	6f22327e61	[api-minor] Only use Workers when `postMessage` transfers are supported (PR 11123 follow-up) Given that all modern browsers now support `postMessage` transfers, and have for years, it no longer seems necessary for the PDF.js library to support using Workers unless the `postMessage` transfers functionality is available. This patch is a follow-up to PR 11123, which made it impossible to manually disable `postMessage` transfers for performance reasons (since it increases memory usage), which hasn't caused any bug reports as far as I know.[1] Hence we'll now only support proper Worker implementations, with fully working `postMessage` transfers, and fallback to using "fake" Workers otherwise. --- [1] At the time of that PR we still "supported" IE, which is why this code was left intact.	2021-11-19 16:47:58 +01:00
Calixte Denizet	4b0538d07a	Don't save anything in XFA entry if no XFA! (bug 1732344) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1732344 - rename some variables to have a more clear code; - and last but no least, add a unit test to test saving.	2021-09-23 19:51:23 +02:00
Calixte Denizet	2fc10727c5	XFA - Only warn about the wrong xfa type when there is an xfa thing	2021-09-18 15:44:05 +02:00
Jonas Jenwald	45ddb12f61	Remove no-op `onPull`/`onCancel` streamSink callbacks from the "GetTextContent"-handler The `MessageHandler`-implementation already handles either of these callbacks being undefined, hence there's no particular reason (as far as I can tell) to add no-op functions here. Also, in a couple of `MessageHandler`-methods, utilize an already existing local variable more.	2021-09-09 00:01:10 +02:00
Calixte Denizet	77b9657e57	XFA - Overwrite AcroForm dictionary when saving if no datasets in XFA (bug 1720179) - aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1720179 - in some pdfs the XFA array in AcroForm dictionary doesn't contain an entry for 'datasets' (which contains saved data), so basically this patch allows to overwrite the AcroForm dictionary with an updated XFA array when doing an incremental update.	2021-09-03 17:04:03 +02:00
Jonas Jenwald	a7f0301f21	[Regression] Re-factor the internal `includeAnnotationStorage` handling, since it's currently subtly wrong This patch is very similar to the recently fixed `renderInteractiveForms`-options, see PR 13867. As far as I can tell, this subtle bug has existed ever since `AnnotationStorage`-support was first added in PR 12106 (a little over a year ago). The value of the `includeAnnotationStorage`-option, as passed to the `PDFPageProxy.render` method, will (potentially) affect the size/content of the operatorList that's returned from the worker (for documents with forms). Given that operatorLists will generally, unless they contain huge images, be cached in the API, repeated `PDFPageProxy.render` calls where the form-data has been changed by the user in between, can thus wrongly return a cached operatorList. In the viewer we're only using the `includeAnnotationStorage`-option when printing, which is probably why this has gone unnoticed for so long. Note that we, for performance reasons, don't cache printing-operatorLists in the API. However, there's nothing stopping an API-user from using the `includeAnnotationStorage`-option during "normal" rendering, which could thus result in subtle (and difficult to understand) rendering bugs. In order to handle this, we need to know if the `AnnotationStorage`-instance has been updated since the last `PDFPageProxy.render` call. The most "correct" solution would obviously be to create a hash of the `AnnotationStorage` contents, however that would require adding a bunch of code, complexity, and runtime overhead. Given that operatorList caching in the API doesn't have to be perfect[1], but only have to avoid false cache-hits, we can simplify things significantly be only keeping track of the last time that the `AnnotationStorage`-data was modified. Please note: While working on this patch, I also noticed that the `renderInteractiveForms`- and `includeAnnotationStorage`-options in the `PDFPageProxy.render` method are mutually exclusive.[2] Given that the various Annotation-related options in `PDFPageProxy.render` have been added at different times, this has unfortunately led to the current "messy" situation.[3] --- [1] Note how we're already not caching operatorLists for pages with huge images, in order to save memory, hence there's no guarantee that operatorLists will always be cached. [2] Setting both to `true` will result in undefined behaviour, since trying to insert `AnnotationStorage`-values into fields that are being excluded from the operatorList-building will obviously not work, which isn't at all clear from the documentation. [3] My intention is to try and fix this in a follow-up PR, and I've got a WIP patch locally, however it will result in a number of API-observable changes.	2021-08-18 10:09:03 +02:00
Jonas Jenwald	107efdb178	[Regression] Re-factor the internal `renderInteractiveForms` handling, since it's currently subtly wrong The value of the `renderInteractiveForms` parameter, as passed to the `PDFPageProxy.render` method, will (potentially) affect the size/content of the operatorList that's returned from the worker (for documents with forms). Given that operatorLists will generally, unless they contain huge images, be cached in the API, repeated `PDFPageProxy.render` calls that only change the `renderInteractiveForms` parameter can thus return an incorrect operatorList. As far as I can tell, this subtle bug has existed ever since `renderInteractiveForms`-support was first added in PR 7633 (which is almost five years ago). With the previous patch, fixing this is now really simple by "encoding" the `renderInteractiveForms` parameter in the internal renderingIntent handling.	2021-08-06 00:40:43 +02:00

1 2 3 4 5 ...