Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	674052d3fc	Re-factor the blob-URL caching in `DownloadManager.openOrDownloadData` Cache blob-URLs on the actual data, rather than DOM elements, to reduce potential duplicates (note the updated unit-test).	2023-10-17 10:18:34 +02:00
Jonas Jenwald	927e50f5d4	[api-major] Output JavaScript modules in the builds (issue 10317) At this point in time all browsers, and also Node.js, support standard `import`/`export` statements and we can now finally consider outputting modern JavaScript modules in the builds.[1] In order for this to work we can only use proper `import`/`export` statements throughout the main code-base, and (as expected) our Node.js support made this much more complicated since both the official builds and the GitHub Actions-based tests must keep working.[2] One remaining issue is that the `pdf.scripting.js` file cannot be built as a JavaScript module, since doing so breaks PDF scripting. Note that my initial goal was to try and split these changes into a couple of commits, however that unfortunately didn't really work since it turned out to be difficult for smaller patches to work correctly and pass (all) tests that way.[3] This is a classic case of every change requiring a couple of other changes, with each of those changes requiring further changes in turn and the size/scope quickly increasing as a result. One possible "issue" with these changes is that we'll now only output JavaScript modules in the builds, which could perhaps be a problem with older tools. However it unfortunately seems far too complicated/time-consuming for us to attempt to support both the old and modern module formats, hence the alternative would be to do "nothing" here and just keep our "old" builds.[4] --- [1] The final blocker was module support in workers in Firefox, which was implemented in Firefox 114; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import#browser_compatibility [2] It's probably possible to further improve/simplify especially the Node.js-specific code, but it does appear to work as-is. [3] Having partially "broken" patches, that fail tests, as part of the commit history is really not a good idea in general. [4] Outputting JavaScript modules was first requested almost five years ago, see issue 10317, and nowadays there should be much better support for JavaScript modules in various tools.	2023-10-07 09:31:08 +02:00
Jonas Jenwald	bf9c33e60f	Add support for "GoToE" actions with destinations (issue 17056) This shouldn't be very common in practice, since "GoToE" actions themselves seem quite uncommon; see PR 15537.	2023-10-04 11:14:23 +02:00
Calixte Denizet	f2196f7803	StructParents entry isn't required on pages with no tagged contents (bug 1855641)	2023-09-28 14:23:10 +02:00
Calixte Denizet	3ee5268a23	[Editor] Don't try to add data to the struct tree when there is no accessibilityData (bug 1855157)	2023-09-26 11:02:14 +02:00
Calixte Denizet	a8573d4e1b	[Editor] Add the ability to create/update the structure tree when saving a pdf containing newly added annotations (bug 1845087) When there is no tree, the tags for the new annotions are just put under the root element. When there is a tree, we insert the new tags at the right place in using the value of structTreeParentId (added in PR #16916).	2023-09-16 18:34:58 +02:00
Tim van der Meij	66507ccae8	Enable unit test "creates pdf doc from non-existent URL" The unit test is re-enabled because it no longer seems to fail after 10 runs on Linux where this used to fail often. Code inspection also shows that the code is correct and should raise the previous exception (anymore). Finally, a lot has changed since this test was disabled such as new Jasmine versions, new Linux bot OS version and new browser versions.	2023-09-10 15:47:04 +02:00
Calixte Denizet	a8a50c567a	Construct the correct field name and strip out classes when searching The classes were stripped out during when creating the field name but it led to a wrong name. Since class components in a path are irrelevant, they're just ignored when searching for a node in the datasets.	2023-09-07 15:56:47 +02:00
Calixte Denizet	ee3ac35e05	Revert fix for bug 1838855 (bug 1849876) The issue described in the mentioned bug is reall because Acrobat is rendering the XFA instead of the Acroform. The original patch just tried to workaround the issue but it induces some regressions.	2023-08-23 12:34:41 -04:00
Tim van der Meij	5828ac0ee3	Merge pull request #16834 from Snuffleupagus/globalWorkerPort-parallel-test Add a unit-test for the "correct" way of using the global `workerPort` in parallel (PR 16830 follow-up)	2023-08-19 13:38:16 +02:00
Jonas Jenwald	29b2050ac2	Improve the "write a new annotation, save the pdf and check that the text content is correct" unit-test (PR 16559 follow-up) Currently this unit-test will pass just fine if compression is disabled, e.g. by commenting out the relevant code in the `src/core/writer.js` file. While we don't have a simple way of directly checking that the Annotation text-content is compressed, we can however use the resulting file-size as a fairly good proxy. (Note that if compression is disabled the file-size is more than doubled.)	2023-08-15 15:12:17 +02:00
Jonas Jenwald	2422492ee3	Add a unit-test for the "correct" way of using the global `workerPort` in parallel (PR 16830 follow-up) Please note that for performance reasons it's not really advised to use the same worker-thread in parallel for parsing multiple PDF documents, since they will then unnecessarily compete for resources. However, given that it's still possible to do that e.g. when using the global `workerPort` it probably won't hurt to add a unit-test for this particular situation.	2023-08-15 12:45:54 +02:00
Jonas Jenwald	66437917db	Avoid using the global `workerPort` when destruction has started, but not yet finished (issue 16777) Given that the `PDFDocumentLoadingTask.destroy()`-method is documented as being asynchronous, you thus need to await its completion before attempting to load a new PDF document when using the global `workerPort`. If you don't await destruction as intended then a new `getDocument`-call can remain pending indefinitely, without any kind of indication of the problem, as shown in the issue. In order to improve the current situation, without unnecessarily complicating the API-implementation, we'll now throw during the `getDocument`-call if the global `workerPort` is in the process of being destroyed. This part of the code-base has apparently never been covered by any tests, hence the patch adds unit-tests for both the correct usage (awaiting destruction) as well as the specific case outlined in the issue.	2023-08-12 21:21:50 +02:00
Jonas Jenwald	389a26c115	Fallback to check all pages when getting the pageIndex of FieldObjects Given that the FieldObjects are parsed in parallel, in combination with the existing caching in the `getPage`-method and `annotations`-getter, adding additional caches for this fallback code-path doesn't seem entirely necessary.	2023-08-10 17:10:04 +02:00
Jonas Jenwald	64e8557fb5	[api-minor] Deprecate the `PDFDocumentProxy.getJavaScript` method This method is very old, however with the exception of the auto-print hack (when scripting is disabled) in the viewer it's never actually been used. Most likely the idea with `PDFDocumentProxy.getJavaScript` was that it'd be useful if scripting support was added, however it turned out that it was a bit too simplistic and instead a number of new methods were added for the scripting use-cases.	2023-08-01 09:02:05 +02:00
Jonas Jenwald	3a886e7264	Move the `isNodeJS`-helper into the `src/shared/util.js` file With the changes in the previous patch the `isNodeJS`-helper no longer needs to live in its own file, which helps get rid of a closure in the built files.	2023-07-17 16:42:25 +02:00
Jonas Jenwald	f84657d837	Address formatting changes from Prettier version 3	2023-07-15 10:44:39 +02:00
Jonas Jenwald	39113baa33	Move the `transfers` computation into the `AnnotationStorage` class Rather than having to manually determine the potential `transfers` at various spots in the API, we can let the `AnnotationStorage.serializable` getter include this. To further simplify things, we can also let the `serializable` getter compute and include the `hash`-string as well.	2023-06-29 19:51:57 +02:00
Calixte Denizet	599b9498f2	[Editor] Add support for printing/saving newly added Stamp annotations In order to minimize the size the of a saved pdf, we generate only one image and use a reference in each annotation using it. When printing, it's slightly different since we have to render each page independantly but we use the same image within a page.	2023-06-26 15:47:05 +02:00
Calixte Denizet	5c0054d58d	Guess that a checkbox belongs to a group in using its T value (bug 1838855)	2023-06-16 18:45:09 +02:00
Jonas Jenwald	2cb113b545	Improve handling of /Filter-entries in `writeStream` Fix handling of /Filter-entries, since the current implementation could potentially corrupt the data if there's multiple filters present. Please note that filters are applied sequentially during decoding, starting from the first one in the Array, hence the first Array-entry needs to be /FlateDecode in order for things to actually work correctly. To prevent a future bug, if we want to save more "complex" data such as images, also ensure that we include any existing /DecodeParms-entries when updating the /Filter-entry.	2023-06-16 10:27:23 +02:00
Calixte Denizet	85b38fc247	Add a test to check that the compression is ok when saving an annotation	2023-06-16 10:05:42 +02:00
Calixte Denizet	71479fdd21	[Editor] Avoid to have duplicated entries in the Annot array when saving an existing and modified annotation	2023-06-15 22:02:10 +02:00
Jonas Jenwald	877884029d	Merge pull request #16551 from Snuffleupagus/page-destroyed-complete Ensure that `cleanup` during rendering is actually ignored, to prevent a blank canvas	2023-06-15 12:26:57 +02:00
Jonas Jenwald	0650be4641	Merge pull request #16550 from Snuffleupagus/rm-RenderingCancelledException-type [api-minor] Remove the `type` from `RenderingCancelledException` (PR 16226 follow-up)	2023-06-15 12:26:27 +02:00
Jonas Jenwald	a591c3de84	Ensure that `cleanup` during rendering is actually ignored, to prevent a blank canvas The existing unit-test doesn't work as intended, since the page never actually renders. Note how `cleanup` is not allowed to run when parsing and/or rendering is ongoing, however an (old) incorrect condition could prevent rendering from ever starting. This is very old code, which has been slightly re-factored a couple of times (many years ago), however this doesn't appear to affect e.g. the default viewer since the incorrect behaviour seem highly dependent on "unlucky" timing. Note also how at the start of the `PDFPageProxy.prototype.render`-method we purposely cancel any pending `cleanup`-call, to prevent unnecessary re-parsing for multiple sequential `render`-calls. Finally, avoid running `cleanup` when document/page destruction has already started since it's pointless in that case.	2023-06-15 11:39:26 +02:00
Jonas Jenwald	225734dd00	[api-minor] Remove the `type` from `RenderingCancelledException` (PR 16226 follow-up) After PR 16226 we're only using `RenderingCancelledException` together with canvas-rendering, hence the `type`-property is no longer necessary.	2023-06-14 15:40:25 +02:00
Jonas Jenwald	fee850737b	Enable the `unicorn/prefer-optional-catch-binding` ESLint plugin rule According to MDN this format is available in all browsers/environments that we currently support, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/try...catch#browser_compatibility Please also see https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-optional-catch-binding.md	2023-06-12 11:46:11 +02:00
Jonas Jenwald	8c4821ceda	[api-minor] Slightly shorten the marked-content ids used in the textLayer Generally we try to keep the ids that we create short, hence we can slightly shorten the "static" parts of them.	2023-05-18 22:32:10 +02:00
Calixte Denizet	3091e70aad	Flush the current chunk when the font changed because of a restore op (issue #14755 )	2023-05-18 19:37:16 +02:00
Jonas Jenwald	8fbd6755eb	Enable the `unicorn/no-useless-promise-resolve-reject` ESLint plugin rule Please see https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/no-useless-promise-resolve-reject.md Note that this patch also re-sorts the existing `unicorn`-rules in proper alphabetical order.	2023-05-13 11:30:25 +02:00
Jonas Jenwald	317abd6d07	Change the `createPromiseCapability` helper function into a `PromiseCapability` class This is not only slightly more compact, but it also simplifies the handling of the `settled` getter.	2023-04-29 13:43:24 +02:00
Calixte Denizet	19ca41896e	Correctly clip the text in the text layer (fixes #16316 )	2023-04-18 17:00:42 +02:00
Calixte Denizet	117bbf7cd9	[api-minor] Don't normalize the text used in the text layer. Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized when creating the search query. So to avoid to duplicate the normalization code, everything is moved in the find controller. The previous code to normalize text was using NFKC but with a hardcoded map, hence it has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size by 30kb). In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into account some RTL unicode ranges, the generated font wasn't embedding the mapping this char and the unicode ranges in the OS/2 table weren't up-to-date. When normalized some chars can be replaced by several ones and it induced to have some extra chars in the text layer. To avoid any regression, when copying some text from the text layer, a copied string is normalized (NFKC) before being put in the clipboard (it works like this in either Acrobat or Chrome).	2023-04-17 14:31:23 +02:00
Jonas Jenwald	5063a6f2a9	[api-minor] Remove the `disableCombineTextItems` option Please note: This parameter has never been used within the PDF.js library/viewer itself, and it was only ever added for backwards compatibility reasons. This parameter was added in PR 7475, over six years ago, to try and optionally maintain the previous default text-extraction behaviour. However as part of the general text-extraction improvements in PR 13257, almost two years ago, the `disableCombineTextItems` functionality was accidentally "broken" in various ways. Note how the only (very basic) unit-test was updated in a way that doesn't really make sense, since generally speaking you'd expect that using the option should result in more (or at least the same number of) text-items. Furthermore there's also the recent issue 16209, where the option causes almost all textContent to be concatenated together. Hence this patch proposes that we simply remove the `disableCombineTextItems` option since it's essentially unused/untested functionality, as evident from the fact that it took almost two years for someone to notice that it's broken.	2023-03-30 14:23:38 +02:00
Calixte Denizet	a96f10e55d	Create a new chunk when the char is too rised compared to the previouse one	2023-03-28 13:56:46 +02:00
Jonas Jenwald	137a2d6e30	Add even more non-standard ligatures (PR 15517 follow-up) Given that we already create multi-byte ToUnicode entries in other cases, see e.g. the `getNormalizedUnicodes` table, this is hopefully fine.	2023-03-22 10:42:52 +01:00
Jonas Jenwald	9321758d91	Merge pull request #16186 from Snuffleupagus/issue-16176 Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176)	2023-03-21 22:17:18 +01:00
Jonas Jenwald	d4bcfe8c16	Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176) Hopefully this makes sense, since we already "create" multi-byte ToUnicode entries in other cases (see e.g. the `getNormalizedUnicodes` table).	2023-03-21 21:35:57 +01:00
calixteman	8bfebf1c24	Merge pull request #16188 from calixteman/bug1823296 Use the position of the previous xref stream if any when saving a pdf (bug 1823296)	2023-03-21 21:21:49 +01:00
Calixte Denizet	2d0f30a67c	Use the position of the previous xref stream if any when saving a pdf (bug 1823296)	2023-03-21 19:27:24 +01:00
Jonas Jenwald	c4a725fe98	Fix the `transfer` parameter, for `structuredClone`, in the `LoopbackPort` The way that we handle the `transfer` parameter is unfortunately wrong, ever since PR 14392 which introduced the code, given that the MDN article originally contained incorrect information; please see https://github.com/mdn/content/pull/23164 By updating the `structuredClone` call such that it works correctly, we can enable more unit-tests in Node.js environments; please refer to https://developer.mozilla.org/en-US/docs/Web/API/structuredClone#parameters	2023-03-19 22:04:01 +01:00
Calixte Denizet	b8dda089e2	Slightly modify the max width of a tracking space	2023-03-07 19:38:49 +01:00
Calixte Denizet	fd03cd5493	[api-minor] Generate images in the worker instead of the main thread. We introduced the use of OffscreenCanvas in #14754 and this patch aims to use them for all kind of images. It'll slightly improve performances (and maybe slightly decrease memory use). Since an image can be rendered in using some transfer maps but because of OffscreenCanvas we don't have the underlying pixels array the transfer maps stuff is re-implemented in using the SVG filter feComponentTransfer.	2023-03-01 17:40:12 +01:00
Jonas Jenwald	f42a2e8451	[api-minor] Move the `canvasFactory` option into `getDocument` Rather than repeatedly initializing a `canvasFactory`-instance for every page, move it to the document-level instead. Please note: This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it works correctly.	2023-03-01 09:07:16 +01:00
Calixte Denizet	3a21423386	[Acroform] Use the full path to find the node in the XFA datasets where to store the value I noticed several 'Path not found' errors because of a field called #subform[2]. From the XFA specs, the hash is used for a class of elements in the template tree. When we're looking for a node in the datasets tree, it doesn't make sense to search for a class. Hence the path element starting with a hash are just skipped.	2023-02-23 12:09:39 +01:00
Jonas Jenwald	7976fc7851	[api-minor] Deprecate calling `getDocument` directly with a `PDFDataRangeTransport`-instance In general it's recommended to pass a parameter object when calling the `getDocument`-function in the API, since that's the only way to provide additional options, and the fact that it also accepts a URL or TypedArray directly is now mostly for backwards compatibility reasons. However, the `getDocument`-function also accepts a direct `PDFDataRangeTransport`-instance which just seems unnecessary. Please note: The `PDFDataRangeTransport`-implementation was added specifically for the built-in Firefox PDF Viewer, however it's most likely not commonly used by any third-party (given that it requires manual PDF-data loading). Furthermore, the default-viewer always provides a parameter object when calling the `getDocument`-function and it's thus completely unaffected by these changes.	2023-01-19 14:25:55 +01:00
Jonas Jenwald	397f943ca3	[api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up) This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data by default instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases: - TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data. - TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer). PLEASE NOTE: To avoid being affected by this, please simply copy any TypedArray data before passing it to either of the functions/methods mentioned above. Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues. Hence we'll check for this and only allow transferring of safe TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here. --- [1] See `e09ad99973/src/display/api.js (L2492-L2506)` respectively `e09ad99973/src/display/api.js (L2578-L2590)`	2023-01-14 10:39:36 +01:00
Jonas Jenwald	bbe629018d	[api-minor] Add a new `transferPdfData` option to allow transferring more data to the worker-thread (bug 1809164) Also, removes the `initialData`-parameter JSDocs for the `getDocument`-function given that this parameter has been completely unused since PR 8982 (over five years ago). Note that the `initialData`-parameter is, and always was, intended to be provided when initializing a `PDFDataRangeTransport`-instance.	2023-01-10 21:03:44 +01:00
Jonas Jenwald	0c1fb4e740	[api-minor] Remove the `PDFDocumentProxy.stats` getter (PR 15758 follow-up) This was deprecated in PR 15758 and given that it's quite unlikely that any third-party users are relying on this functionality, since it was only ever added to support telemetry reporting in the Firefox PDF Viewer, it should hopefully be fine to remove this fairly quickly. These changes reduce the bundle size of the Firefox PDF Viewer by 4.5 kB in total.	2023-01-01 17:06:47 +01:00

1 2 3 4 5 ...