pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	03cf28bf17	[api-minor] Add `intent` support to the `PDFPageProxy.getOperatorList` method (issue 13704) With this patch, the `PDFPageProxy.getOperatorList` method will now return `PDFOperatorList`-instances that also include Annotation-operatorLists (when those exist). Hence this closes a small, but potentially confusing, gap between the `render` and `getOperatorList` methods. Previously we've been somewhat reluctant to do this, as explained below, but given that there's actual use-cases where it's required probably means that we'll have to implement it now. Since we still need the ability to separate "normal" rendering operations from direct `getOperatorList` calls in the worker-thread, this API-change unfortunately causes the internal renderingIntent to become a bit "messy" which is indeed unfortunate (note the `"oplist-"` strings in various spots). As-is I suppose that it's not all that bad, but we may want to consider changing the internal renderingIntent to e.g. a bitfield in the future. Besides fixing issue 13704, this patch would also be necessary if someone ever tries to implement e.g. issue 10165 (since currently `PDFPageProxy.getOperatorList` doesn't include Annotation-operatorLists). Please note: This patch is also tagged "api-minor" for a second reason, which is that we're now including the Annotation-id in the `beginAnnotation` argument. The reason for this is to allow correlating the Annotation-data returned by `PDFPageProxy.getAnnotations`, with its corresponding operatorList-data (for those Annotations that have it).	2021-07-16 17:16:30 +02:00
Jonas Jenwald	da808aeab3	Ensure that the field value, for checkboxes, refers to an existing appearance state (bug 1720411) Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1720411	2021-07-16 13:11:48 +02:00
Jonas Jenwald	3838c4e27c	Re-factor the handling of empty `Name`-instances (PR 13612 follow-up) When working on PR 13612, I mostly prioritized a simple solution that didn't require touching a lot of code. However, while working on PR 13735 I started to realize that the static `Name.empty` construction really wasn't a good idea. In particular, having a special `Name`-instance where the `name`-property isn't actually a String is confusing (to put it mildly) and can easily lead to issues elsewhere. The only reason for not simply allowing the `name`-property to be an empty string, in PR 13612, was to avoid having to touch a lot of existing code. However, it turns out that this is only limited to a few methods in the `PartialEvaluator` and a few of the `BaseLocalCache`-implementations, all of which can be easily re-factored to handle empty `Name`-instances. All-in-all, I think that this patch is even an overall improvement since we're now validating (what should always be) `Name`-data better in the `PartialEvaluator`. This is what I ought to have done from the start, sorry about the code churn here!	2021-07-15 12:00:42 +02:00
Calixte Denizet	9bbc194846	XFA - Support assist element	2021-07-11 21:01:18 +02:00
Calixte Denizet	58e1f51688	XFA - Fix text positions (bug 1718741) - font line height is taken into account by acrobat when it isn't with masterpdfeditor: I extracted a font from a pdf, modified some ascent/descent properties thanks to ttx and the reinjected the font in the pdf: only Acrobat is taken it into account. So in this patch, line heights for some substituted fonts are added. - it seems that Acrobat is using a line height of 1.2 when the line height in the font is not enough (it's the only way I found to fix correctly bug 1718741). - don't use flex in wrapper container (which was causing an horizontal overflow in the above bug). - consequently, the above fixes introduced a lot of small regressions, so in order to see real improvements on reftests, I fixed the regressions in this patch: - replace margin by padding in some case where padding is a part of a container dimensions; - remove some flex display: some containers are wrongly sized when rendered; - set letter-spacing to 0.01px: it helps to be sure that text is not broken because of not enough width in Firefox.	2021-07-09 18:11:12 +02:00
Jonas Jenwald	661c60ecc9	[api-minor] Support accessing both the original and modified PDF fingerprint The PDF.js API has only ever supported accessing the original file ID, however the second one that (should) exist in modified documents have thus far been completely inaccessible through the API. That seems like a simple oversight, caused e.g. by the viewer not needing it, since it really shouldn't hurt to provide API-users with the ability to check if a PDF document has been modified since its creation.[1] Please refer to https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G13.2261661 for additional information. For an example of how to update existing code to use the new API, please see the changes in the `web/app.js` file included in this patch. Please note: While I'm not sure if we'll ever be able to remove the old `PDFDocumentProxy.fingerprint` getter, given that it's existed since "forever", that probably isn't a big deal given that it's now limited to only `GENERIC`-builds. --- [1] Although this obviously depends on the PDF software following the specification, by updating the second file ID as intended.	2021-07-03 13:56:33 +02:00
Calixte Denizet	ff440d13e7	XFA - Remove empty pages - it aims to fix #13583; - fix the switch to breakBefore target; - force the layout of an unsplittable element on an empty page; - don't fail when there is horizontal overflow (except in lr-tb); - handle correctly overflow in the same content area (bug 1717805, bug 1717668); - fix a typo in radial gradient first argument.	2021-06-30 16:32:27 +02:00
Calixte Denizet	429ffdcd2f	XFA - Save filled data in the pdf when downloading the file (Bug 1716288) - when binding (after parsing) we get a map between some template nodes and some data nodes; - so set user data in input handlers in using data node uids in the annotation storage; - to save the form, just put the value we have in the storage in the correct data nodes, serialize the xml as a string and then write the string at the end of the pdf using src/core/writer.js; - fix few bugs around data bindings: - the "Off" issue in Bug 1716980.	2021-06-25 18:57:01 +02:00
Brendan Dahl	f4f00a9bc6	Merge pull request #13618 from calixteman/bind_root XFA - Always bind root subform on root data	2021-06-23 13:14:12 -07:00
Calixte Denizet	b836616667	XFA - Always bind root subform on root data - it partially fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1717805 (on the data side at least but there is still a layout issue).	2021-06-23 20:46:41 +02:00
Jonas Jenwald	6467907318	Support corrupt documents with empty `Name`-entries (issue 13610) Apparently some really bad PDF software can create documents with empty `Name`-entries, which we thus need to somehow deal with. While I don't know if this patch is necessarily the best solution, it should at least ensure that the empty `Name`-instance cannot accidentally match a proper `Name`-instance (and it doesn't require changes to a lot of existing code).[1] --- [1] I briefly considered using a `Symbol` rather than an Object, but quickly decided against that since the former one [is not clonable](https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm#supported_types) and `Name`-instances may be sent to the API.	2021-06-22 16:55:44 +02:00
calixteman	56a75f8b26	Revert "Revert "XFA - Fix the way to select page on breaking"" - and fix the error which caused the backout: add an $extra property when creating html. - switch to next content area when breaking on page area.	2021-06-21 17:07:31 +02:00
calixteman	a9385bbb52	Revert "XFA - Fix the way to select page on breaking"	2021-06-21 15:45:04 +02:00
Calixte Denizet	7aea8faa34	XFA - Fix the way to select page on breaking - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1716838. - some fonts in the pdf in the bug where bold when they shouldn't so write the font properties in the html to avoid to use some wrong inherited ones.	2021-06-21 12:45:23 +02:00
Calixte Denizet	7cdbc98716	XFA - Match font family correctly - partial fix for https://bugzilla.mozilla.org/show_bug.cgi?id=1716980; - some pdf can contain an invalid font family (e.g. 'Windings 3') so in this case remove the space; - the font family in typeface attribute doesn't always match the one defined in the FontDescriptor dictionary.	2021-06-20 15:16:28 +02:00
Calixte Denizet	df08b1548b	XFA - Fix layout issues - PR #13554 is buggy, so this patch aims to fix bugs. - check if a component fits into its parent in taking into account the parent layout. - introduce method isSplittable for template nodes to know if a component can be splitted in case of overflow.	2021-06-17 16:09:22 +02:00
Calixte Denizet	8eeb7ab4a3	XFA - Add the possibily to layout and measure text - some containers doesn't always have their 2 dimensions and those dimensions re based on contents; - so in order to measure text, we must get the glyph widths (for the xfa fonts) before starting the layout; - implement a word-wrap algorithm; - handle font change during text layout.	2021-06-17 14:17:02 +02:00
Calixte Denizet	793a0156ce	XFA - By default a text ui has only one line when in a field element - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1716809.	2021-06-16 20:18:29 +02:00
Calixte Denizet	d89c429d78	XFA - Handle maxChars property for text fields - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1716294.	2021-06-14 13:07:06 +02:00
Brendan Dahl	d333af7848	Merge pull request #13527 from calixteman/bind_inf_loop XFA - Avoid infinite loop when creating some nodes in data	2021-06-09 12:37:29 -07:00
Brendan Dahl	aa2712744d	Merge pull request #13502 from calixteman/contentarea XFA - contentarea must be on top of the other containers in a pageArea	2021-06-09 12:36:21 -07:00
Calixte Denizet	cddc1d869d	XFA - Avoid infinite loop when creating some nodes in data	2021-06-09 19:07:59 +02:00
Jonas Jenwald	a01c599247	Cache the "raw" standard font data in the worker-thread (PR 12726 follow-up) This implementation is basically a copy of the pre-existing `builtInCMapCache` implementation. For some, badly generated, PDF documents it's possible that we'll end up having to fetch the same standard font data over and over (which is obviously inefficient). While not common, it's certainly possible that a PDF document uses custom font names where the actual font then references one of the standard fonts; see e.g. issue 11399 for one such example. Note that I did suggest adding worker-thread caching of standard font data in PR 12726, however it wasn't deemed necessary at the time. Now that we have a real-world example that benefit from caching, I think that we should simply implement this now.	2021-06-09 18:27:51 +02:00
Calixte Denizet	34a2fa72c7	XFA - Add Liberation-Sans font as a substitution for some missing fonts - Some js files contain scale factors for each glyph in order to rescale Liberation to have a final font with the correct width. - A lot of XFA have some containers where their dimensions are based on their text content, so using default font from browser can lead to an almost unreadable pdf.	2021-06-09 16:55:45 +02:00
Calixte Denizet	1486608f32	XFA - contentarea must be on top of the other containers in a pageArea	2021-06-09 15:29:29 +02:00
Calixte Denizet	cfa727474e	XFA - Fix layout issues (again) - some elements weren't displayed because their rotation angle was not taken into account; - fix box model (XFA concept): - remove use of outline; - position correctly border which isn't part of box dimensions; - fix margins issues (see issue #13474). - move border on button instead of having it on wrapping div;	2021-06-08 17:42:53 +02:00
Jonas Jenwald	e7dc822e74	Merge pull request #12726 from brendandahl/standard-fonts [api-minor] Include and use the 14 standard font files.	2021-06-08 10:09:40 +02:00
Brendan Dahl	4c1dd47e65	Include and use the 14 standard fonts files.	2021-06-07 11:10:11 -07:00
Calixte Denizet	5dc7f4ade8	XFA - CDATA can be xml so parse it when required	2021-06-07 10:38:39 +02:00
Calixte Denizet	112645ea3d	XFA - Don't bind a form node with an empty value when the data node doesn't exist	2021-06-06 17:59:01 +02:00
Calixte Denizet	11573ddd16	XFA - Implement usehref support - attribute 'use' was already implemented but not usehref - in general, usehref should make reference to current document - add support for SOM expressions in use and usehref to search a node. - get prototype for all nodes if any.	2021-06-04 14:57:05 +02:00
Jonas Jenwald	af78ba64bd	Don't change options of the globally used `PartialEvaluator` in the "should render checkbox with fallback font for printing" unit-test Given that the same `PartialEvaluator`-instance is used for a lot of these unit-tests, manually changing the options in any one test-case could lead to intermittently failing unit-tests since they're run in a random order. To fix this, we simply have to use the existing method to clone the `PartialEvaluator`-instance but with the custom options.	2021-05-31 12:14:58 +02:00
Calixte Denizet	45c3f00a27	XFA - Move the fake HTML representation of XFA from the worker to the main thread - the only goal of this patch is to be able to get synchronously the fake html when printing from firefox: - in order to print we need to inject some html in beforeprint callback but we cannot block in waiting for all the pages. - from a memory point of view: it doesn't change anything since the fake HTML is deleted in the worker; - this way we don't break any assumptions.	2021-05-25 19:33:07 +02:00
Calixte Denizet	7cebdbd58c	XFA - Fix lot of layout issues - I thought it was possible to rely on browser layout engine to handle layout stuff but it isn't possible - mainly because when a contentArea overflows, we must continue to layout in the next contentArea - when no more contentArea is available then we must go to the next page... - we must handle breakBefore and breakAfter which allows to "break" the layout to go to the next container - Sometimes some containers don't provide their dimensions so we must compute them in order to know where to put them in their parents but to compute those dimensions we need to layout the container itself... - See top of file layout.js for more explanations about layout. - fix few bugs in other places I met during my work on layout.	2021-05-25 17:51:36 +02:00
Tim van der Meij	d1d9b9043d	Merge pull request #13415 from Snuffleupagus/getDestination-out-of-order Improve handling of named destinations in out-of-order NameTrees (PR 10274 follow-up)	2021-05-21 20:15:09 +02:00
Jonas Jenwald	8d5689387b	Improve handling of named destinations in out-of-order NameTrees (PR 10274 follow-up) According to the specification, see https://web.archive.org/web/20210404042322if_/https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.2384179, the keys of a NameTree/NumberTree should be ordered. For corrupt PDF files, which violate this assumption, it's thus possible that trying to lookup a single entry fails. Previously, in PR 10274, we implemented a fallback that only applies to the "bottom" node of a NameTree/NumberTree, which in general might not actually help for sufficiently corrupt NameTree/NumberTree data. Instead we remove the current limited fallback from `NameOrNumberTree.get`, and defer to the call-site to handle this case explicitly e.g. by using `NameOrNumberTree.getAll` for data where that makes sense. For well-formed documents, these changes should not lead to any additional data fetching/parsing. Finally, as part of these changes, the validation of named destination data is improved in the `Catalog` and a new unit-test is also added.	2021-05-21 15:48:37 +02:00
Jonas Jenwald	1a8d05fdcf	Remove some, with Prettier `2.3.0`, unnecessary `// prettier-ignore` comments To get the maximum benefit from something like Prettier, you obviously don't want to disable the automatic formatting unless absolutely necessary. When we added Prettier there were a number of cases, mostly involving larger Arrays, which required disabling of the automatic formatting for overall readability and/or to not break inline comments. With changes in Prettier version `2.3.0`, see [the release notes](https://prettier.io/blog/2021/05/09/2.3.0.html#concise-formatting-of-number-only-arrays-10106httpsgithubcomprettierprettierpull10106-10160httpsgithubcomprettierprettierpull10160-by-thorn0httpsgithubcomthorn0), there's now better formatting support for Arrays containing only numbers. Hence we can now remove a number of `// prettier-ignore` comments, and thus get the benefit of automatic formatting in (slightly) more of the code-base.	2021-05-19 11:36:03 +02:00
Calixte Denizet	4544ebf38a	Handle PI with no value in xml parser - an XML PI contains a target and optionally some content (see https://en.wikipedia.org/wiki/Processing_Instruction) - the parser expected to always have some content and so it could lead to wrong parsing.	2021-05-18 10:22:18 +02:00
Jonas Jenwald	8943bcd3c3	Account for formatting changes in Prettier version `2.3.0` With the exception of one tweaked `eslint-disable` comment, in `web/generic_scripting.js`, this patch was generated automatically using `gulp lint --fix`. Please find additional information at: - https://github.com/prettier/prettier/releases/tag/2.3.0 - https://prettier.io/blog/2021/05/09/2.3.0.html	2021-05-16 11:44:05 +02:00
Jonas Jenwald	757636d519	Convert the remaining functions in `src/core/primitives.js` to use standard classes This patch was tested using the PDF file from issue 2618, i.e. https://bug570667.bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file: ``` [ { "id": "issue2618", "file": "../web/pdfs/issue2618.pdf", "md5": "", "rounds": 50, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ---- \| ------------- firefox \| Overall \| 50 \| 3417 \| 3426 \| 9 \| 0.27 \| firefox \| Page Request \| 50 \| 1 \| 1 \| 0 \| 5.41 \| firefox \| Rendering \| 50 \| 3416 \| 3426 \| 9 \| 0.27 \| ``` Based on these results, there's no significant performance regression from using standard classes and this patch should thus be OK.	2021-05-12 09:36:28 +02:00
Jonas Jenwald	77b258440b	Move some constants and helper functions `from src/core/fonts.js` and into their own file - `FontFlags`, is used in both `src/core/fonts.js` and `src/core/evaluator.js`. - `getFontType`, same as the above. - `MacStandardGlyphOrdering`, is a fairly large data-structure and `src/core/fonts.js` is already a very large file. - `recoverGlyphName`, a dependency of `type1FontGlyphMapping`; please see below. - `SEAC_ANALYSIS_ENABLED`, is used by both `Type1Font`, `CFFFont`, and unit-tests; please see below. - `type1FontGlyphMapping`, is used by both `Type1Font` and `CFFFont` which a later patch will move to their own files.	2021-05-02 21:00:29 +02:00
Tim van der Meij	f6f335173d	Merge pull request #13303 from Snuffleupagus/BaseStream Add an abstract base-class, which all the various Stream implementations inherit from	2021-05-01 19:13:36 +02:00
calixteman	af4dc55019	[api-minor] Fix the way to chunk the strings (#13257 ) - Improve chunking in order to fix some bugs where the spaces aren't here: * track the last position where a glyph has been drawn; * when a new glyph (first glyph in a chunk) is added then compare its position with the last saved one and add a space or break: - there are multiple ways to move the glyphs and to avoid to have to deal with all the different possibilities it's a way easier to just compare positions; - and so there is now one function (i.e. "compareWithLastPosition") where all the job is done. - Add some breaks in order to get lines; - Remove the multiple whites spaces: * some spaces were filled with several whites spaces and so it makes harder to find some sequences of words using the search tool; * other pdf readers replace spaces by one white space. Update src/core/evaluator.js Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com> Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>	2021-04-30 14:41:13 +02:00
Jonas Jenwald	66d9d83dcb	Move the `PredictorStream` from `src/core/stream.js` and into its own file	2021-04-28 10:16:51 +02:00
Jonas Jenwald	57a1ea840f	Ensure that `saveDocument` works if there's no /ID-entry in the PDF document (issue 13279) (#13280 ) First of all, while it should be very unlikely that the /ID-entry is an indirect object, note how we're using `Dict.get` when parsing it e.g. in `PDFDocument.fingerprint`. Hence we definitely should be consistent here, since if the /ID-entry is an indirect object the existing code in `src/core/writer.js` would already fail. Secondly, to fix the referenced issue, we also need to check that the /ID-entry actually is an Array before attempting to access its contents in `src/core/writer.js`. Drive-by change: In the `xrefInfo` object passed to the `incrementalUpdate` function, re-name the `encrypt` property to `encryptRef` since its data is fetched using `Dict.getRaw` (given the names of the other properties fetched similarly).	2021-04-22 12:08:56 +02:00
Tim van der Meij	d42f3d0bfe	Convert done callbacks to async/await in `test/unit/evaluator_spec.js`	2021-04-18 14:20:54 +02:00
Tim van der Meij	f4237d3a09	Convert done callbacks to async/await in `test/unit/annotation_spec.js`	2021-04-17 19:59:18 +02:00
Tim van der Meij	c2f3a71eca	Convert done callbacks to async/await in `test/unit/api_spec.js`	2021-04-17 17:52:23 +02:00
Jonas Jenwald	f560fe6875	A couple of small scripting/XFA-related tweaks in the worker-code - Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible. - Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.) - Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be. Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.) - Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.	2021-04-17 10:34:22 +02:00
Brendan Dahl	ac3fa1e3d7	Merge pull request #13146 from calixteman/xfa_fonts XFA -- Load fonts permanently from the pdf	2021-04-16 12:55:12 -07:00

1 2 3 4 5 ...

920 Commits