Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	107efdb178	[Regression] Re-factor the internal `renderInteractiveForms` handling, since it's currently subtly wrong The value of the `renderInteractiveForms` parameter, as passed to the `PDFPageProxy.render` method, will (potentially) affect the size/content of the operatorList that's returned from the worker (for documents with forms). Given that operatorLists will generally, unless they contain huge images, be cached in the API, repeated `PDFPageProxy.render` calls that only change the `renderInteractiveForms` parameter can thus return an incorrect operatorList. As far as I can tell, this subtle bug has existed ever since `renderInteractiveForms`-support was first added in PR 7633 (which is almost five years ago). With the previous patch, fixing this is now really simple by "encoding" the `renderInteractiveForms` parameter in the internal renderingIntent handling.	2021-08-06 00:40:43 +02:00
Jonas Jenwald	47f94235ab	[api-minor] Re-factor the internal renderingIntent, and change the default `intent` value in the `PDFPageProxy.getAnnotations` method With the changes made in PR 13746 the internal renderingIntent handling became somewhat "messy", since we're now having to do string-matching in various spots in order to handle the "oplist"-intent correctly. Hence this patch, which implements the idea from PR 13746 to convert the `intent`-strings, used in various API-methods, into an internal renderingIntent that's implemented using a bit-field instead. Please note: This part of the patch, in itself, does not change the public API (but see below). This patch is tagged `api-minor` for the following reasons: 1. It changes the default value for the `intent` parameter, in the `PDFPageProxy.getAnnotations` method, to "display" in order to be consistent across the API. 2. In order to get all annotations, with the `PDFPageProxy.getAnnotations` method, you now need to explicitly set "any" as the `intent` parameter. 3. The `PDFPageProxy.getOperatorList` method will now also support the new "any" intent, to allow accessing the operatorList of all annotations (limited to those types that have one). 4. Finally, for consistency across the API, the `PDFPageProxy.render` method also support the new "any" intent (although I'm not sure how useful that'll be). Points 1 and 2 above are the significant, and thus breaking, changes in default behaviour here. However, unfortunately I cannot see a good way to improve the overall API while also keeping `PDFPageProxy.getAnnotations` unchanged.	2021-08-06 00:39:42 +02:00
Calixte Denizet	4a4591bd2c	XFA - Fix font scale factors (bug 1720888) - All the scale factors in for the substitution font were wrong because of different glyph positions between Liberation and the other ones: - regenerate all the factors - Text may have polish chars for example and in this case the glyph widths were wrong: - treat substitution font as a composite one - add a map glyphIndex to unicode for Liberation in order to generate width array for cid font	2021-07-28 19:10:42 +02:00
Jonas Jenwald	885e7a8aa4	Allow `StreamsSequenceStream.readBlock` to skip sub-streams with errors (issue 13794) This patch makes use of the existing `ignoreErrors` option, thus allowing a page to continue parsing/rendering even if (some of) its sub-streams are corrupt. Obviously this may cause part of a page to be broken/missing, however it should be better than (potentially) rendering nothing. Also, to the best of my knowledge, this is the first bug of its kind that we've encountered. To avoid having to pass in a bunch of, for a `BaseStream`-instance, mostly unrelated parameters when initializing a `StreamsSequenceStream`-instance, I settled on utilizing a callback function instead to allow conditional Error-suppression. Note that the `StreamsSequenceStream`-class is a special stream-implementation that we only use when the `/Contents`-entry, in the `/Page`-dictionary, consists of an Array with streams.	2021-07-26 16:42:50 +02:00
Jonas Jenwald	03cf28bf17	[api-minor] Add `intent` support to the `PDFPageProxy.getOperatorList` method (issue 13704) With this patch, the `PDFPageProxy.getOperatorList` method will now return `PDFOperatorList`-instances that also include Annotation-operatorLists (when those exist). Hence this closes a small, but potentially confusing, gap between the `render` and `getOperatorList` methods. Previously we've been somewhat reluctant to do this, as explained below, but given that there's actual use-cases where it's required probably means that we'll have to implement it now. Since we still need the ability to separate "normal" rendering operations from direct `getOperatorList` calls in the worker-thread, this API-change unfortunately causes the internal renderingIntent to become a bit "messy" which is indeed unfortunate (note the `"oplist-"` strings in various spots). As-is I suppose that it's not all that bad, but we may want to consider changing the internal renderingIntent to e.g. a bitfield in the future. Besides fixing issue 13704, this patch would also be necessary if someone ever tries to implement e.g. issue 10165 (since currently `PDFPageProxy.getOperatorList` doesn't include Annotation-operatorLists). Please note: This patch is also tagged "api-minor" for a second reason, which is that we're now including the Annotation-id in the `beginAnnotation` argument. The reason for this is to allow correlating the Annotation-data returned by `PDFPageProxy.getAnnotations`, with its corresponding operatorList-data (for those Annotations that have it).	2021-07-16 17:16:30 +02:00
Calixte Denizet	690b5d1941	XFA - Use fake MyriadPro as a fallback for missing fonts - aims to fix #13597.	2021-07-11 13:52:13 +02:00
Calixte Denizet	5cdee80c8e	XFA - An image can be a stream in the pdf (bug 1718521) - hrefs can be found in catalog > Names > XFAImages	2021-07-05 14:06:23 +02:00
calixteman	783cbc1793	Revert "XFA - An image can be a stream in the pdf (bug 1718521)"	2021-07-05 12:47:14 +02:00
calixteman	b370d4714f	Merge pull request #13654 from calixteman/images XFA - An image can be a stream in the pdf (bug 1718521)	2021-07-05 12:04:34 +02:00
Jonas Jenwald	901b24e8af	Enable the ESLint `operator-assignment` rule This patch was generated automatically, using the `gulp lint --fix` command. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/operator-assignment	2021-07-04 12:57:45 +02:00
Jonas Jenwald	661c60ecc9	[api-minor] Support accessing both the original and modified PDF fingerprint The PDF.js API has only ever supported accessing the original file ID, however the second one that (should) exist in modified documents have thus far been completely inaccessible through the API. That seems like a simple oversight, caused e.g. by the viewer not needing it, since it really shouldn't hurt to provide API-users with the ability to check if a PDF document has been modified since its creation.[1] Please refer to https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G13.2261661 for additional information. For an example of how to update existing code to use the new API, please see the changes in the `web/app.js` file included in this patch. Please note: While I'm not sure if we'll ever be able to remove the old `PDFDocumentProxy.fingerprint` getter, given that it's existed since "forever", that probably isn't a big deal given that it's now limited to only `GENERIC`-builds. --- [1] Although this obviously depends on the PDF software following the specification, by updating the second file ID as intended.	2021-07-03 13:56:33 +02:00
Calixte Denizet	f16828be49	XFA - An image can be a stream in the pdf (bug 1718521) - hrefs can be found in catalog > Names > XFAImages	2021-07-02 20:34:10 +02:00
Calixte Denizet	70bb672dcd	XFA - Support non-embedded fonts without a Widths entry - some pdf use some fonts which are not embedded or they don't have any width array or don't have any css info (e.g. for standard fonts or Arial). - so add widths arrays for Liberation fonts in order to compute the ones for other fonts in using scale factors array.	2021-06-28 23:05:08 +02:00
Calixte Denizet	429ffdcd2f	XFA - Save filled data in the pdf when downloading the file (Bug 1716288) - when binding (after parsing) we get a map between some template nodes and some data nodes; - so set user data in input handlers in using data node uids in the annotation storage; - to save the form, just put the value we have in the storage in the correct data nodes, serialize the xml as a string and then write the string at the end of the pdf using src/core/writer.js; - fix few bugs around data bindings: - the "Off" issue in Bug 1716980.	2021-06-25 18:57:01 +02:00
Calixte Denizet	7cdbc98716	XFA - Match font family correctly - partial fix for https://bugzilla.mozilla.org/show_bug.cgi?id=1716980; - some pdf can contain an invalid font family (e.g. 'Windings 3') so in this case remove the space; - the font family in typeface attribute doesn't always match the one defined in the FontDescriptor dictionary.	2021-06-20 15:16:28 +02:00
Calixte Denizet	8eeb7ab4a3	XFA - Add the possibily to layout and measure text - some containers doesn't always have their 2 dimensions and those dimensions re based on contents; - so in order to measure text, we must get the glyph widths (for the xfa fonts) before starting the layout; - implement a word-wrap algorithm; - handle font change during text layout.	2021-06-17 14:17:02 +02:00
Jonas Jenwald	a01c599247	Cache the "raw" standard font data in the worker-thread (PR 12726 follow-up) This implementation is basically a copy of the pre-existing `builtInCMapCache` implementation. For some, badly generated, PDF documents it's possible that we'll end up having to fetch the same standard font data over and over (which is obviously inefficient). While not common, it's certainly possible that a PDF document uses custom font names where the actual font then references one of the standard fonts; see e.g. issue 11399 for one such example. Note that I did suggest adding worker-thread caching of standard font data in PR 12726, however it wasn't deemed necessary at the time. Now that we have a real-world example that benefit from caching, I think that we should simply implement this now.	2021-06-09 18:27:51 +02:00
Calixte Denizet	34a2fa72c7	XFA - Add Liberation-Sans font as a substitution for some missing fonts - Some js files contain scale factors for each glyph in order to rescale Liberation to have a final font with the correct width. - A lot of XFA have some containers where their dimensions are based on their text content, so using default font from browser can lead to an almost unreadable pdf.	2021-06-09 16:55:45 +02:00
calixteman	8c53bf8647	Merge pull request #13437 from calixteman/xfa_mv_root XFA - Move the fake HTML representation of XFA from the worker to the main thread	2021-05-31 10:14:15 +02:00
Calixte Denizet	1b0006093d	Italic angle is defined clockwise in CSS when it's counterclockwise in PDF	2021-05-28 11:06:11 +02:00
Calixte Denizet	45c3f00a27	XFA - Move the fake HTML representation of XFA from the worker to the main thread - the only goal of this patch is to be able to get synchronously the fake html when printing from firefox: - in order to print we need to inject some html in beforeprint callback but we cannot block in waiting for all the pages. - from a memory point of view: it doesn't change anything since the fake HTML is deleted in the worker; - this way we don't break any assumptions.	2021-05-25 19:33:07 +02:00
Jonas Jenwald	1a8d05fdcf	Remove some, with Prettier `2.3.0`, unnecessary `// prettier-ignore` comments To get the maximum benefit from something like Prettier, you obviously don't want to disable the automatic formatting unless absolutely necessary. When we added Prettier there were a number of cases, mostly involving larger Arrays, which required disabling of the automatic formatting for overall readability and/or to not break inline comments. With changes in Prettier version `2.3.0`, see [the release notes](https://prettier.io/blog/2021/05/09/2.3.0.html#concise-formatting-of-number-only-arrays-10106httpsgithubcomprettierprettierpull10106-10160httpsgithubcomprettierprettierpull10160-by-thorn0httpsgithubcomthorn0), there's now better formatting support for Arrays containing only numbers. Hence we can now remove a number of `// prettier-ignore` comments, and thus get the benefit of automatic formatting in (slightly) more of the code-base.	2021-05-19 11:36:03 +02:00
Brendan Dahl	17e9cfcd2a	Merge pull request #13328 from calixteman/js_display1 JS - Add support for display property	2021-05-17 08:47:13 -07:00
Jonas Jenwald	4248f0745c	Improve the `Page.content` and `Page.getContentStream` methods First of all, by using `Dict.getArray` in the `Page.content` getter we remove the need to manually iterate through and fetch the sub-streams (when they exist) in the `Page.getContentStream` method. Secondly, we can simplify the code in `Page.{getOperatorList, extractTextContent}` by letting `Page.getContentStream` ensure that `content` is available and returning a Promise instead.	2021-05-14 11:47:34 +02:00
Calixte Denizet	af125cd299	JS - Add support for display property - in annotation_layer, move common properties treatment in a common method instead having duplicated code in each widget.	2021-05-06 11:15:38 +02:00
Jonas Jenwald	3624f9eac7	Add a new `BaseStream.getString(...)` method to replace manual `bytesToString(BaseStream.getBytes(...))` calls Given that the `bytesToString(BaseStream.getBytes(...))` pattern is somewhat common throughout the `src/core/` code, it cannot hurt to add a new `BaseStream`-method which handles that case internally.	2021-05-01 19:20:36 +02:00
Jonas Jenwald	30a22a168d	Move the `DecodeStream` and `StreamsSequenceStream` from `src/core/stream.js` and into its own file	2021-04-28 10:16:51 +02:00
Jonas Jenwald	8f6543c218	Ensure that the /Properties, used with optional content, is actually loaded before parsing the operatorList/textContent (PR 12095 follow-up) By not waiting for the /Properties to load, before parsing of the operatorList/textContent starts, there's a very real risk that a `MissingDataException` will be thrown when trying to access the data in the `PartialEvaluator.parseMarkedContentProps` method. If this ever happens it will thus lead to incomplete and/or outright broken rendering, and with e.g. `disableAutoFetch=true` set the likelihood of this occuring would increase quite a bit. Please note: While I've not yet seen this error in an actual PDF document, it can happen during loading if you're unlucky enough with e.g. the structure of the PDF document and/or the download speed offered by the server.	2021-04-20 20:22:44 +02:00
Jonas Jenwald	f560fe6875	A couple of small scripting/XFA-related tweaks in the worker-code - Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible. - Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.) - Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be. Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.) - Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.	2021-04-17 10:34:22 +02:00
Brendan Dahl	ac3fa1e3d7	Merge pull request #13146 from calixteman/xfa_fonts XFA -- Load fonts permanently from the pdf	2021-04-16 12:55:12 -07:00
Calixte Denizet	7e9579045f	XFA -- Load fonts permanently from the pdf - Different fonts can be used in xfa and some of them are embedded in the pdf. - Load all the fonts in window.document. Update src/core/document.js Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com> Update src/core/worker.js Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>	2021-04-15 17:57:42 +02:00
Jonas Jenwald	1d6d476cab	Rename the `src/core/obj.js` file to `src/core/catalog.js` Now that only the `Catalog` remains in this file, after the previous patches, it makes sense to rename the file to reduce confusion.	2021-04-13 21:00:30 +02:00
Jonas Jenwald	e8750cfe95	Move the `XRef` from `src/core/obj.js` and into its own file The size of the `src/core/obj.js` file has increased slowly over the years, and it also contains a fair amount of distinct functionality. In order to improve readability and make it easier to navigate through the code, this patch moves the `XRef` into its own file.	2021-04-13 21:00:30 +02:00
Jonas Jenwald	604cd6d600	Move the `ObjectLoader` from `src/core/obj.js` and into its own file The size of the `src/core/obj.js` file has increased slowly over the years, and it also contains a fair amount of distinct functionality. In order to improve readability and make it easier to navigate through the code, this patch moves the `ObjectLoader` into its own file.	2021-04-13 21:00:30 +02:00
Tim van der Meij	ebeb3f7999	Merge pull request #13234 from Snuffleupagus/hasJSActions-MissingDataException [api-minor] Ensure that `PDFDocumentProxy.hasJSActions` won't fail if `MissingDataException`s are thrown during the associated worker-thread parsing	2021-04-13 20:44:58 +02:00
Jonas Jenwald	2b2234fd5a	[api-minor] Ensure that `PDFDocumentProxy.hasJSActions` won't fail if `MissingDataException`s are thrown during the associated worker-thread parsing With the current implementation of `PDFDocument.hasJSActions`, in the worker-thread, we're not actually handling not-yet-loaded data correctly. This can thus fail in two different ways: - The `PDFDocument.fieldObjects` getter (and its helper method), while it may return a Promise, still fetches all of its data synchronously and it can thus throw a `MissingDataException` during parsing. - The `Catalog.jsActions` getter, which is completely synchronous, can obviously throw a `MissingDataException` during parsing. If either of these cases occur currently, the `PDFDocumentProxy.hasJSActions` method in the API can either return a rejected Promise (which it never should) or possibly "hang" and never resolve. Please note: While I've not yet seen this error in an actual PDF document, it can happen during loading if you're unlucky enough with e.g. the structure of the PDF document and/or the download speed offered by the server. This patch is thus based on code-inspection and on manually throwing a `MissingDataException` on the first access of `Catalog.jsActions` to simulate this situation. Finally, this patch adds a couple of API unit-tests for this (since none existed).	2021-04-13 14:33:56 +02:00
Jonas Jenwald	4aa27cc645	Re-factor `Catalog._collectJavaScript` to use a `Map` rather than an Object Given that this only an internal helper method, used by the `Catalog.{javaScript, jsActions}` getters, this change simplifies iteration of the returned data. We can also (slightly) re-factor the code of the `jsActions` getter, and remove an obsolete[1] JSDoc-comment from the `openAction` getter. --- [1] Not really relevant now that we've got proper scripting support.	2021-04-13 14:16:17 +02:00
Jonas Jenwald	9360c7cbdc	Avoid unnecessary parsing, in `Page.GetStructTree`, when no structTree is available (PR 13221 follow-up) It's obviously (a bit) more efficient to return early in `Page.getStructTree`, rather than trying to first "parse" an empty structTree-root. Somehow I didn't think of this yesterday, but this feels like a much better solution overall; sorry about the churn here!	2021-04-12 08:54:21 +02:00
Jonas Jenwald	ff4dae05b0	Ensure that `getStructTree` won't break with `disableAutoFetch = true` set (PR 13171 follow-up) Open http://localhost:8888/web/viewer.html?file=/test/pdfs/pdf.pdf#disableStream=true&disableAutoFetch=true and observe the following message in the console (repeated for each page of the document): ``` Uncaught (in promise) Object { message: "Missing data [19787293, 19787294)", name: "UnknownErrorException", details: "MissingDataException: Missing data [19787293, 19787294)", stack: "BaseExceptionClosure@http://localhost:8888/src/shared/util.js:458:29\n@http://localhost:8888/src/shared/util.js:462:3\n" } ```	2021-04-11 12:15:33 +02:00
Tim van der Meij	d9d626a5e1	Merge pull request #13214 from calixteman/signatures Display widget signature	2021-04-10 19:35:16 +02:00
Calixte Denizet	5875ebb1ca	Display widget signature - but don't validate them for now; - Firefox will display a bar to warn that the signature validation is not supported (see https://bugzilla.mozilla.org/show_bug.cgi?id=854315) - almost all (all ?) pdf readers display signatures; - validation is done in edge but for now it's behind a pref.	2021-04-10 19:13:28 +02:00
Brendan Dahl	fc9501a637	Add support for basic structure tree for accessibility. When a PDF is "marked" we now generate a separate DOM that represents the structure tree from the PDF. This DOM is inserted into the <canvas> element and allows screen readers to walk the tree and have more information about headings, images, links, etc. To link the structure tree DOM (which is empty) to the text layer aria-owns is used. This required modifying the text layer creation so that marked items are now tracked.	2021-04-09 09:56:28 -07:00
calixteman	84d7cccb1d	JS - Handle correctly hierarchy of fields (#13133 ) * JS - Handle correctly hierarchy of fields - it aims to fix #13132; - annotations can inherit their actions from the parent field; - there are some fields which act as a container for other fields: - they can be access through js so need to add them with an empty type (nothing in the spec about that but checked in Acrobat); - calculation order list (CO) can reference them so need make them through this.getField; - getArray method must return kids. - field values are number, string, ... depending of their type but nothing in the spec on how to know what's the type: - according to the comment for Canonical Format: https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#page=461 - it seems that this "type" can be guessed from js action Format (when setting a type in Acrobat DC, the only affected thing is this action). - util.scand with an empty string returns the current date.	2021-03-30 08:50:35 -07:00
calixteman	24e598a895	XFA - Add a layer to display XFA forms (#13069 ) - add an option to enable XFA rendering if any; - for now, let the canvas layer: it could be useful to implement XFAF forms (embedded pdf in xml stream for the background and xfa form for the foreground); - ui elements in template DOM are pretty close to their html counterpart so we generate a fake html DOM from template one: - it makes easier to translate template properties to html ones; - it makes faster the creation of the html element in the main thread.	2021-03-19 10:11:40 +01:00
Calixte Denizet	ffd4bc790c	JS -- Add tests for print/save actions * change PDFDocument::hasJSActions to return true when there are JS actions in catalog.	2020-12-24 18:51:00 +01:00
Calixte Denizet	1e2173f038	JS - Collect and execute actions at doc and pages level * the goal is to execute actions like Open or OpenAction * can be tested with issue6106.pdf (auto-print) * once #12701 is merged, we can add page actions	2020-12-18 20:03:59 +01:00
Calixte Denizet	03814bd6a2	Don't use 'in' operator to check if key is in a Map	2020-12-16 16:00:12 +01:00
Tim van der Meij	00b4f86db3	Merge pull request #12717 from Snuffleupagus/issue-12714 Ensure that the /Annots-entry, on /Page-instances, is actually an Array (issue 12714)	2020-12-10 23:06:59 +01:00
Calixte Denizet	25bf504ff5	Be sure that CalculationOrder is either null or a non-empty array	2020-12-10 16:02:11 +01:00
Jonas Jenwald	796a0d3155	Ensure that the /Annots-entry, on /Page-instances, is actually an Array (issue 12714) In the referenced PDF document, the second and third page has corrupt /Annots-entries which contain /Dict-data rather than the intended Arrays.	2020-12-10 11:42:00 +01:00

1 2 3 4