Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Calixte Denizet	71479fdd21	[Editor] Avoid to have duplicated entries in the Annot array when saving an existing and modified annotation	2023-06-15 22:02:10 +02:00
Jonas Jenwald	877884029d	Merge pull request #16551 from Snuffleupagus/page-destroyed-complete Ensure that `cleanup` during rendering is actually ignored, to prevent a blank canvas	2023-06-15 12:26:57 +02:00
Jonas Jenwald	0650be4641	Merge pull request #16550 from Snuffleupagus/rm-RenderingCancelledException-type [api-minor] Remove the `type` from `RenderingCancelledException` (PR 16226 follow-up)	2023-06-15 12:26:27 +02:00
Jonas Jenwald	a591c3de84	Ensure that `cleanup` during rendering is actually ignored, to prevent a blank canvas The existing unit-test doesn't work as intended, since the page never actually renders. Note how `cleanup` is not allowed to run when parsing and/or rendering is ongoing, however an (old) incorrect condition could prevent rendering from ever starting. This is very old code, which has been slightly re-factored a couple of times (many years ago), however this doesn't appear to affect e.g. the default viewer since the incorrect behaviour seem highly dependent on "unlucky" timing. Note also how at the start of the `PDFPageProxy.prototype.render`-method we purposely cancel any pending `cleanup`-call, to prevent unnecessary re-parsing for multiple sequential `render`-calls. Finally, avoid running `cleanup` when document/page destruction has already started since it's pointless in that case.	2023-06-15 11:39:26 +02:00
Jonas Jenwald	225734dd00	[api-minor] Remove the `type` from `RenderingCancelledException` (PR 16226 follow-up) After PR 16226 we're only using `RenderingCancelledException` together with canvas-rendering, hence the `type`-property is no longer necessary.	2023-06-14 15:40:25 +02:00
Jonas Jenwald	fee850737b	Enable the `unicorn/prefer-optional-catch-binding` ESLint plugin rule According to MDN this format is available in all browsers/environments that we currently support, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/try...catch#browser_compatibility Please also see https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-optional-catch-binding.md	2023-06-12 11:46:11 +02:00
Jonas Jenwald	8c4821ceda	[api-minor] Slightly shorten the marked-content ids used in the textLayer Generally we try to keep the ids that we create short, hence we can slightly shorten the "static" parts of them.	2023-05-18 22:32:10 +02:00
Calixte Denizet	3091e70aad	Flush the current chunk when the font changed because of a restore op (issue #14755 )	2023-05-18 19:37:16 +02:00
Jonas Jenwald	8fbd6755eb	Enable the `unicorn/no-useless-promise-resolve-reject` ESLint plugin rule Please see https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/no-useless-promise-resolve-reject.md Note that this patch also re-sorts the existing `unicorn`-rules in proper alphabetical order.	2023-05-13 11:30:25 +02:00
Jonas Jenwald	317abd6d07	Change the `createPromiseCapability` helper function into a `PromiseCapability` class This is not only slightly more compact, but it also simplifies the handling of the `settled` getter.	2023-04-29 13:43:24 +02:00
Calixte Denizet	19ca41896e	Correctly clip the text in the text layer (fixes #16316 )	2023-04-18 17:00:42 +02:00
Calixte Denizet	117bbf7cd9	[api-minor] Don't normalize the text used in the text layer. Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized when creating the search query. So to avoid to duplicate the normalization code, everything is moved in the find controller. The previous code to normalize text was using NFKC but with a hardcoded map, hence it has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size by 30kb). In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into account some RTL unicode ranges, the generated font wasn't embedding the mapping this char and the unicode ranges in the OS/2 table weren't up-to-date. When normalized some chars can be replaced by several ones and it induced to have some extra chars in the text layer. To avoid any regression, when copying some text from the text layer, a copied string is normalized (NFKC) before being put in the clipboard (it works like this in either Acrobat or Chrome).	2023-04-17 14:31:23 +02:00
Jonas Jenwald	5063a6f2a9	[api-minor] Remove the `disableCombineTextItems` option Please note: This parameter has never been used within the PDF.js library/viewer itself, and it was only ever added for backwards compatibility reasons. This parameter was added in PR 7475, over six years ago, to try and optionally maintain the previous default text-extraction behaviour. However as part of the general text-extraction improvements in PR 13257, almost two years ago, the `disableCombineTextItems` functionality was accidentally "broken" in various ways. Note how the only (very basic) unit-test was updated in a way that doesn't really make sense, since generally speaking you'd expect that using the option should result in more (or at least the same number of) text-items. Furthermore there's also the recent issue 16209, where the option causes almost all textContent to be concatenated together. Hence this patch proposes that we simply remove the `disableCombineTextItems` option since it's essentially unused/untested functionality, as evident from the fact that it took almost two years for someone to notice that it's broken.	2023-03-30 14:23:38 +02:00
Calixte Denizet	a96f10e55d	Create a new chunk when the char is too rised compared to the previouse one	2023-03-28 13:56:46 +02:00
Jonas Jenwald	137a2d6e30	Add even more non-standard ligatures (PR 15517 follow-up) Given that we already create multi-byte ToUnicode entries in other cases, see e.g. the `getNormalizedUnicodes` table, this is hopefully fine.	2023-03-22 10:42:52 +01:00
Jonas Jenwald	9321758d91	Merge pull request #16186 from Snuffleupagus/issue-16176 Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176)	2023-03-21 22:17:18 +01:00
Jonas Jenwald	d4bcfe8c16	Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176) Hopefully this makes sense, since we already "create" multi-byte ToUnicode entries in other cases (see e.g. the `getNormalizedUnicodes` table).	2023-03-21 21:35:57 +01:00
calixteman	8bfebf1c24	Merge pull request #16188 from calixteman/bug1823296 Use the position of the previous xref stream if any when saving a pdf (bug 1823296)	2023-03-21 21:21:49 +01:00
Calixte Denizet	2d0f30a67c	Use the position of the previous xref stream if any when saving a pdf (bug 1823296)	2023-03-21 19:27:24 +01:00
Jonas Jenwald	c4a725fe98	Fix the `transfer` parameter, for `structuredClone`, in the `LoopbackPort` The way that we handle the `transfer` parameter is unfortunately wrong, ever since PR 14392 which introduced the code, given that the MDN article originally contained incorrect information; please see https://github.com/mdn/content/pull/23164 By updating the `structuredClone` call such that it works correctly, we can enable more unit-tests in Node.js environments; please refer to https://developer.mozilla.org/en-US/docs/Web/API/structuredClone#parameters	2023-03-19 22:04:01 +01:00
Calixte Denizet	b8dda089e2	Slightly modify the max width of a tracking space	2023-03-07 19:38:49 +01:00
Calixte Denizet	fd03cd5493	[api-minor] Generate images in the worker instead of the main thread. We introduced the use of OffscreenCanvas in #14754 and this patch aims to use them for all kind of images. It'll slightly improve performances (and maybe slightly decrease memory use). Since an image can be rendered in using some transfer maps but because of OffscreenCanvas we don't have the underlying pixels array the transfer maps stuff is re-implemented in using the SVG filter feComponentTransfer.	2023-03-01 17:40:12 +01:00
Jonas Jenwald	f42a2e8451	[api-minor] Move the `canvasFactory` option into `getDocument` Rather than repeatedly initializing a `canvasFactory`-instance for every page, move it to the document-level instead. Please note: This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it works correctly.	2023-03-01 09:07:16 +01:00
Calixte Denizet	3a21423386	[Acroform] Use the full path to find the node in the XFA datasets where to store the value I noticed several 'Path not found' errors because of a field called #subform[2]. From the XFA specs, the hash is used for a class of elements in the template tree. When we're looking for a node in the datasets tree, it doesn't make sense to search for a class. Hence the path element starting with a hash are just skipped.	2023-02-23 12:09:39 +01:00
Jonas Jenwald	7976fc7851	[api-minor] Deprecate calling `getDocument` directly with a `PDFDataRangeTransport`-instance In general it's recommended to pass a parameter object when calling the `getDocument`-function in the API, since that's the only way to provide additional options, and the fact that it also accepts a URL or TypedArray directly is now mostly for backwards compatibility reasons. However, the `getDocument`-function also accepts a direct `PDFDataRangeTransport`-instance which just seems unnecessary. Please note: The `PDFDataRangeTransport`-implementation was added specifically for the built-in Firefox PDF Viewer, however it's most likely not commonly used by any third-party (given that it requires manual PDF-data loading). Furthermore, the default-viewer always provides a parameter object when calling the `getDocument`-function and it's thus completely unaffected by these changes.	2023-01-19 14:25:55 +01:00
Jonas Jenwald	397f943ca3	[api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up) This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data by default instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases: - TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data. - TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer). PLEASE NOTE: To avoid being affected by this, please simply copy any TypedArray data before passing it to either of the functions/methods mentioned above. Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues. Hence we'll check for this and only allow transferring of safe TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here. --- [1] See `e09ad99973/src/display/api.js (L2492-L2506)` respectively `e09ad99973/src/display/api.js (L2578-L2590)`	2023-01-14 10:39:36 +01:00
Jonas Jenwald	bbe629018d	[api-minor] Add a new `transferPdfData` option to allow transferring more data to the worker-thread (bug 1809164) Also, removes the `initialData`-parameter JSDocs for the `getDocument`-function given that this parameter has been completely unused since PR 8982 (over five years ago). Note that the `initialData`-parameter is, and always was, intended to be provided when initializing a `PDFDataRangeTransport`-instance.	2023-01-10 21:03:44 +01:00
Jonas Jenwald	0c1fb4e740	[api-minor] Remove the `PDFDocumentProxy.stats` getter (PR 15758 follow-up) This was deprecated in PR 15758 and given that it's quite unlikely that any third-party users are relying on this functionality, since it was only ever added to support telemetry reporting in the Firefox PDF Viewer, it should hopefully be fine to remove this fairly quickly. These changes reduce the bundle size of the Firefox PDF Viewer by 4.5 kB in total.	2023-01-01 17:06:47 +01:00
Jonas Jenwald	91524d1a60	[api-minor] Allow specifying an extra-delay, in `RenderTask.cancel`, for worker-thread aborting of operatorList parsing This is done to support upcoming viewer-changes, and in order to prevent third-party users from outright breaking things we'll simply ignore too large values.	2022-12-14 12:34:16 +01:00
Samuel Yuan	36fb5c1e2b	Propagate the translated font name to TextContentItems. This allows font data for system fonts to be looked up in the PDFObjects.	2022-11-08 11:16:21 -08:00
Jonas Jenwald	23930a249e	[api-minor] Let `Catalog.getAllPageDicts` return an empty dictionary when loading the first /Page fails (issue 15590) In order to support opening certain corrupt PDF documents, particularly hand-edited ones, this patch adds support for letting the `Catalog.getAllPageDicts` method fallback to returning an empty dictionary to replace (only) the first /Page of the document. Given that the viewer cannot initialize/load without access to the first page, this will thus allow e.g. document-level scripting to run as expected. Note that by effectively replacing a corrupt or missing first /Page in this way[1], we'll now render nothing but a blank page for certain cases of broken/corrupt PDF documents which may look weird. Please note: This functionality is controlled via the existing `stopAtErrors` option, that can be passed to `getDocument`, since it's easy to imagine use-cases where this sort of fallback behaviour isn't desirable. --- [1] Currently we still require that a /Pages-dictionary is found though, however it may be possible to relax even that assumption if that becomes absolutely necessary in future corrupt documents.	2022-11-03 12:51:48 +01:00
Jonas Jenwald	2516ffa78e	Fallback to finding the first "obj" occurrence, when the trailer-dictionary is incomplete (issue 15590) Note that the "trailer"-case is already a fallback, since normally we're able to use the "xref"-operator even in corrupt documents. However, when a "trailer"-operator is found we still expect "startxref" to exist and be usable in order to advance the stream position. When that's not the case, as happens in the referenced issue, we use a simple fallback to find the first "obj" occurrence instead. This partially fixes issue 15590, since without this patch we fail to find any objects at all during `XRef.indexObjects`. However, note that the PDF document is still corrupt and won't render since there's no actual /Pages-dictionary and the /Root-entry simply points to the /OpenAction-dictionary instead.	2022-11-03 12:46:30 +01:00
Jonas Jenwald	ce66fefbff	[api-minor] Add partial support for the "GoToE" action (issue 8844) Please note: The referenced issue is the only mention that I can find, in either GitHub or Bugzilla, of "GoToE" actions. Hence why I've purposely settled for a very simple, and partial, "GoToE" implementation to avoid complicating things initially.[1] In particular, this patch only supports "GoToE" actions that references the /EmbeddedFiles-dict in the PDF document. See https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2048909 --- [1] Usually I always prefer having real-world test-cases to work with, whenever I'm implementing new features.	2022-10-06 10:33:07 +02:00
Jonas Jenwald	c87f90102c	Add more non-standard ligatures in the `glyphlist.js` file (issue 15516) Note that this PR only adds the "underscore"-variant of actually existing ligatures, however the referenced PDF document also uses a couple of non-standard ones (e.g. `ft`, `Th`, and `fh`) that we cannot easily support without larger changes (since they don't have official Unicode-entries). Given that it's clearly the PDF document, and its fonts, that's the culprit here it's not entirely clear to me that we actually want to attempt a larger refactoring/rewriting of the `glyphlist.js` code, assuming it's even generally possible. Especially when this patch alone already improves our copy-paste behaviour when compared to both Adobe Reader and PDFium, and that this is only the second time this sort of bug has been reported.	2022-09-27 16:31:51 +02:00
Jonas Jenwald	cc4baa2fe9	[api-minor] Add basic support for the `SetOCGState` action (issue 15372) Note that this patch implements the `SetOCGState`-handling in `PDFLinkService`, rather than as a new method in `OptionalContentConfig`[1], since this action is nothing but a series of `setVisibility`-calls and that it seems quite uncommon in real-world PDF documents. The new functionality also required some tweaks in the `PDFLayerViewer`, to ensure that the `layersView` in the sidebar is updated correctly when the optional-content visibility changes from "outside" of `PDFLayerViewer`. --- [1] We can obviously move this code into `OptionalContentConfig` instead, if deemed necessary, but for an initial implementation I figured that doing it this way might be acceptable.	2022-09-01 17:34:24 +02:00
Jonas Jenwald	216b86a082	[api-minor] Support Named-actions in the outline (issue 15367) Apparently this is implemented in e.g. Adobe Reader, and the specification does support it, however it cannot be commonly used in real-world PDF documents since it took over ten years for this feature to be requested.	2022-08-30 18:47:45 +02:00
Calixte Denizet	c06c5f7cbd	[Annotations] charLimit === 0 means unlimited (bug 1782564) Changing the charLimit in JS had no impact, so this patch aims to fix that and add an integration test for it.	2022-08-19 11:28:28 +02:00
Jonas Jenwald	dd95e4f851	Add official support for passing `ArrayBuffer`-data to `getDocument` (issue 15269) While this has always worked, as a consequence of the implementation, it's never been officially supported. In addition to adding basic unit-tests, this patch also introduces a couple of new JSDoc `@typedef`s in the API to avoid overly long lines.	2022-08-10 14:13:01 +02:00
Jonas Jenwald	0c31320c12	[api-minor] Improve `thumbnail` handling in documents that contain interactive forms To improve performance of the sidebar we use the page-canvases to generate the thumbnails whenever possible, since that avoids unnecessary re-rendering when the sidebar is open. This works generally well, however there's an old problem in PDF documents that contain interactive forms (when those are enabled): Note how the thumbnails become partially (or fully) blank, since those Annotations are not included in the OperatorList.[1] We obviously want to keep using the `PDFThumbnailView.setImage`-method for most documents, however we need a way to skip it only for those pages that contain interactive forms. As it turns out it's unfortunately not all that simple to tell, after the fact, from looking only at the OperatorList that some Annotations were skipped. While it might have been possible to try and infer that in the viewer, it'd not have been pretty considering that at the time when rendering finishes the annotationLayer has not yet been built. The overall simplest solution that I could come up with, was instead to include a summary of the interactive form-state when doing the final "flushing" of the OperatorList and expose that information in the API. --- [1] Some examples from our test-suite: `annotation-tx2.pdf` where the thumbnail is completely blank, and `bug1737260.pdf` where the thumbnail is missing the "buttons" found on the page.	2022-07-30 16:53:32 +02:00
Jonas Jenwald	2fb083f3e2	Ensure that the `isUsingOwnCanvas`-parameter is consistently included in operatorLists (PR 14247 follow-up) Currently some `OPS.beginAnnotation` arguments will contain a `Number` value for the `isUsingOwnCanvas`-parameter, or in some cases an `undefined` value, which is inconsistent from an API perspective.	2022-07-28 13:37:37 +02:00
Jonas Jenwald	c2f7942aea	Ensure that the /Resources-entry is actually a dictionary (issue 15150) Prevent issues in corrupt PDF documents, if the /Resources-entry is not of the correct and expected type.	2022-07-08 12:43:43 +02:00
Calixte Denizet	3789dab307	Always flush the current item with MarkedContent stuff when getting text (#15094 )	2022-06-25 17:19:57 +02:00
Jonas Jenwald	1cc7cecc7b	[api-minor] Introduce a `PrintAnnotationStorage` with frozen serializable data Given that printing is triggered synchronously in browsers, it's thus possible for scripting (in PDF documents) to modify the Annotation-data while printing is currently ongoing. To work-around that we add a new printing-specific `AnnotationStorage`, where the serializable data is frozen upon initialization, which the viewer can thus create/utilize during printing.	2022-06-23 17:06:46 +02:00
Calixte Denizet	cdc58b7a52	Rotate annotations based on the MK::R value (bug 1675139) - it aims to fix: https://bugzilla.mozilla.org/show_bug.cgi?id=1675139; - An annotation can be rotated (counterclockwise); - the rotation can be set in using JS.	2022-06-21 17:57:26 +02:00
Jonas Jenwald	bbf857d635	[api-minor] Stop using the `beginAnnotations`/`endAnnotations` operators (PR 14998 follow-up) After the changes in PR 14998, these operators are now no-ops in the `src/display/canvas.js` code and should no longer be necessary. Given that `beginAnnotations`/`endAnnotations` are not in the PDF specification, but are rather custom PDF.js operators, it seems reasonable to stop using them now that they've become no-ops.	2022-06-11 14:21:26 +02:00
Jonas Jenwald	8135d7ccf6	Merge pull request #14869 from calixteman/14862 [JS] Fix few bugs present in the pdf for issue #14862	2022-05-03 18:31:31 +02:00
Calixte Denizet	094ff38da0	[JS] Fix few bugs present in the pdf for issue #14862 - since resetForm function reset a field value a calculateNow is consequently triggered. But the calculate callback can itself call resetForm, hence an infinite recursive loop. So basically, prevent calculeNow to be triggered by itself. - in Firefox, the letters entered in some fields were duplicated: "AaBb" instead of "AB". It was mainly because beforeInput was triggering a Keystroke which was itself triggering an input value update and then the input event was triggered. So in order to avoid that, beforeInput calls preventDefault and then it's up to the JS to handle the event. - fields have a property valueAsString which returns the value as a string. In the implementation it was wrongly used to store the formatted value of a field (2€ when the user entered 2). So this patch implements correctly valueAsString. - non-rendered fields can be updated in using JS but when they're, they must take some properties in the annotationStorage. It was implemented for field values, but it wasn't for display, colors, ... - it fixes #14862 and #14705.	2022-05-03 15:48:44 +02:00
Jonas Jenwald	df5a4fd0a7	Support encoded dest-strings in /GoTo destination dictionaries (issue 14864) Interestingly enough this appears to be the very first case of encoded dest-strings, in /GoTo destination dictionaries, that we've actually come across. What's really fascinating is that it's less than a week after issue 14847, given that these issues are somewhat similar.	2022-05-02 10:14:32 +02:00
Jonas Jenwald	71370d012b	Support destinations in NameTrees with encoded keys (issue 14847) Initially I considered updating the `NameOrNumberTree`-implementation to handle encoded keys, however that quickly became somewhat messy (especially in the `NameOrNumberTree.get`-method) since only NameTrees using string-keys. Hence the easiest solution, as far as I'm concerned, was thus to just update the `Catalog.destinations`-getter instead. Please note that in the referenced PDF document the `Catalog.destination`-method will thus fallback to fetch all destinations, which should be fine since this is the very first case of encoded keys that we've seen. Also changes the `NameOrNumberTree.getAll`-method to prevent a possible run-time error, although we've so far not seen such a case, for any non-Array Kids-entries found in a NameTree/NumberTree. Finally, to improve overall consistency and to hopefully prevent future bugs, the patch also updates a couple of other `NameTree` call-sites to correctly handle encoded keys. (Note that the `Catalog.attachments`-getter was already doing this.)	2022-04-27 11:19:55 +02:00
Calixte Denizet	040fcae5ab	Improve performance with image masks (bug 857031) - it aims to partially fix performance issue reported: https://bugzilla.mozilla.org/show_bug.cgi?id=857031; - the idea is too avoid to use byte arrays but use ImageBitmap which are a way faster to draw: * an ImageBitmap is Transferable which means that it can be built in the worker instead of in the main thread: - this is achieved in using an OffscreenCanvas when it's available, there is a bug to enable them for pdf.js: https://bugzilla.mozilla.org/show_bug.cgi?id=1763330; - or in using createImageBitmap: in Firefox a task is sent to the main thread to build the bitmap so it's slightly slower than using an OffscreenCanvas. * it's transfered from the worker to the main thread by "reference"; * the byte buffers used to create the image data have a very short lifetime and ergo the memory used is globally less than before. - Use the localImageCache for the mask; - Fix the pdf issue4436r.pdf: it was expected to have a binary stream for the image; - Move the singlePixel trick from operator_list to image: this way we can use this trick even if it isn't in a set as defined in operator_list.	2022-04-09 18:26:26 +02:00

1 2 3 4 5 ...