pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	54ef4370a2	Ensure that the data is loaded, in the "GetPageJSActions" message handler Similar to all other data accesses, note e.g. the "GetDocJSActions" handler just above, we need to ensure that a `MissingDataException` isn't propagated to the main-thread if this data is accessed while the PDF document is still loading.	2021-04-12 13:54:37 +02:00
Jonas Jenwald	9360c7cbdc	Avoid unnecessary parsing, in `Page.GetStructTree`, when no structTree is available (PR 13221 follow-up) It's obviously (a bit) more efficient to return early in `Page.getStructTree`, rather than trying to first "parse" an empty structTree-root. Somehow I didn't think of this yesterday, but this feels like a much better solution overall; sorry about the churn here!	2021-04-12 08:54:21 +02:00
Jonas Jenwald	0d2dd6c2fe	Remove the unused "GetIsPureXfa" message handler in the worker (PR 13069 follow-up) Looking at the API, there's no code which actually sends this message. Most likely it's a left-over from a previous version of PR 13069, since the `isPureXfa` parameter is being included in the "GetDoc" message.	2021-04-12 08:52:27 +02:00
Jonas Jenwald	5adee0cdd1	[api-minor] Let `PDFPageProxy.getStructTree` return `null`, rather than an empty structTree, for documents without any accessibility data (PR 13171 follow-up) This is first of all consistent with existing API-methods, where we return `null` when the data in question doesn't exist. Secondly, it should also be (slightly) more efficient since there's less dummy-data that we need to transfer between threads. Finally, this prevents us from adding an empty/unnecessary span to every single page even in documents without any structure tree data.	2021-04-11 12:35:33 +02:00
Jonas Jenwald	ff4dae05b0	Ensure that `getStructTree` won't break with `disableAutoFetch = true` set (PR 13171 follow-up) Open http://localhost:8888/web/viewer.html?file=/test/pdfs/pdf.pdf#disableStream=true&disableAutoFetch=true and observe the following message in the console (repeated for each page of the document): ``` Uncaught (in promise) Object { message: "Missing data [19787293, 19787294)", name: "UnknownErrorException", details: "MissingDataException: Missing data [19787293, 19787294)", stack: "BaseExceptionClosure@http://localhost:8888/src/shared/util.js:458:29\n@http://localhost:8888/src/shared/util.js:462:3\n" } ```	2021-04-11 12:15:33 +02:00
Tim van der Meij	d9d626a5e1	Merge pull request #13214 from calixteman/signatures Display widget signature	2021-04-10 19:35:16 +02:00
Calixte Denizet	5875ebb1ca	Display widget signature - but don't validate them for now; - Firefox will display a bar to warn that the signature validation is not supported (see https://bugzilla.mozilla.org/show_bug.cgi?id=854315) - almost all (all ?) pdf readers display signatures; - validation is done in edge but for now it's behind a pref.	2021-04-10 19:13:28 +02:00
Tim van der Meij	03c8c89002	Merge pull request #13171 from brendandahl/struct-tree [api-minor] Add support for basic structure tree for accessibility.	2021-04-09 21:32:44 +02:00
Tim van der Meij	b0473eb353	Merge pull request #13207 from Snuffleupagus/api-AnnotationStorage-params [api-minor] Remove the manual passing of an `AnnotationStorage`-instance when calling various API-method	2021-04-09 21:09:16 +02:00
Brendan Dahl	fc9501a637	Add support for basic structure tree for accessibility. When a PDF is "marked" we now generate a separate DOM that represents the structure tree from the PDF. This DOM is inserted into the <canvas> element and allows screen readers to walk the tree and have more information about headings, images, links, etc. To link the structure tree DOM (which is empty) to the text layer aria-owns is used. This required modifying the text layer creation so that marked items are now tracked.	2021-04-09 09:56:28 -07:00
Jonas Jenwald	737a8e846d	Add `deprecated` handling of the now removed `AnnotationStorage` API-parameters These changes are done separately, to make it easier to remove them in the future.	2021-04-09 13:25:03 +02:00
Jonas Jenwald	72ef183085	[api-minor] Remove the manual passing of an `AnnotationStorage`-instance when calling various API-method Note how we purposely don't expose the `AnnotationStorage`-class directly in the official API (see `src/pdf.js`), since trying to use multiple ones simultaneously doesn't really make sense (e.g. in the viewer). Instead we lazily initialize, and cache, just one instance via `PDFDocumentProxy.annotationStorage` which should thus be available internally in the API itself without having to be manually passed to various methods. To support these changes, the `AnnotationStorage`-instance initialization is moved into the `WorkerTransport`-class to allow both `PDFDocumentProxy` and `PDFPageProxy` to access it. This patch implements the following simplifications: - Remove the `annotationStorage`-parameter from `PDFDocumentProxy.saveDocument`, since it's already available internally. Furthermore, while it's currently possible to call that method without an `AnnotationStorage`-instance, that really does not make any sense at all. In this case you're effectively reducing `PDFDocumentProxy.saveDocument` to a "regular" `PDFDocumentProxy.getData` call, but with a lot more overhead, which was obviously not the intention of the `PDFDocumentProxy.saveDocument`-method. - Try to discourage third-party users from calling `PDFDocumentProxy.saveDocument` unconditionally, as a replacement for `PDFDocumentProxy.getData` (note the previous point). - Replace the `annotationStorage`-parameter, in `PDFPageProxy.render`, with a boolean `includeAnnotationStorage`-parameter which simply indicates if the (internally available) `AnnotationStorage`-instance should be used during rendering (e.g. for printing). - By removing the need to manually provide `annotationStorage`-parameters to various API-methods, using the API should become simpler (e.g. for third-parties) since you no longer need to worry about manually fetching and passing around this data.	2021-04-09 13:24:25 +02:00
Ikko Ashimine	c4c4333d54	Fix typo in canvas.js Reseting -> Resetting	2021-04-08 23:45:24 +09:00
Tim van der Meij	6429ccc002	Merge pull request #13194 from Snuffleupagus/ttcf-fuzzy-match Fuzzy-match the fontName, for TrueType Collection fonts, where the "name"-table is wrong (issue 13193)	2021-04-07 20:50:19 +02:00
Tim van der Meij	5945f7c4a1	Merge pull request #13186 from Snuffleupagus/rm-deprecated-code Remove some `deprecated` code	2021-04-07 20:38:59 +02:00
Jonas Jenwald	f986ccdf0e	Fuzzy-match the fontName, for TrueType Collection fonts, where the "name"-table is wrong (issue 13193) The fontName, as defined in the PDF document, cannot be found in any of the "name"-tables in the TrueType Collection font. To work-around that, this patch adds a fallback code-path to allow using an approximately matching fontName rather than outright failing.	2021-04-07 15:25:32 +02:00
Jonas Jenwald	4e81e0e14f	Remove the deprecated `AnnotationStorage.getOrCreateValue`-method (PR 12759 follow-up) While this method has only been deprecated in one releases now, the `AnnotationStorage`-functionality is new enough that third-party implementations hopefully don't rely heavily on it just yet. (And removing this quickly should help reduce the likelihood that someone starts using it.)	2021-04-06 13:22:06 +02:00
Tim van der Meij	fc0cd4a443	Convert the `startXRefParsedCache` variable, in `src/core/obj.js`, from an object to a set We only want to track XRef starting points instead of actual data, so using a set conveys that intention more clearly and is slightly more efficient.	2021-04-05 19:32:58 +02:00
Tim van der Meij	228adbf673	Merge pull request #13172 from Snuffleupagus/cleanup-keepFonts [api-minor] Add an option, in `PDFDocumentProxy.cleanup`, to allow fonts to remain attached to the DOM	2021-04-05 14:21:34 +02:00
Jonas Jenwald	16fd838f52	Convert the `renderTasks`, used in `PDFPageProxy.render`/`PDFPageProxy.getOperatorList`, to a Set When removing tasks we're currently forced to indirectly iterate through the array, which can be avoided by using a Set instead. Furthermore, we can also (slightly) modernize the code responsible for initializing the `renderTasks`.	2021-04-05 10:51:28 +02:00
Jonas Jenwald	68d3a333ac	Change the `seenStyles` object, in `PartialEvaluator.getTextContent`, to a Set Given that what we actually want is only to keep track of the loadedFont-names, rather than storing any actual data, using an object isn't really necessary here. Furthermore, in the current code, we're also using `in` when checking if the data exists, which is generally less efficient than just checking for the value directly.	2021-04-05 10:34:02 +02:00
Jonas Jenwald	a2bc6481a0	[api-minor] Add an option, in `PDFDocumentProxy.cleanup`, to allow fonts to remain attached to the DOM As mentioned in the JSDoc comment, this should not be used unless you know what you're doing, since it will lead to increased memory usage. However, in some situations (e.g. SVG-rendering), we still want to be able to run general clean-up on both the main/worker-thread while keeping loaded fonts attached to the DOM.[1] As part of these changes, `WorkerTransport.startCleanup` is converted to an async method and we'll also skip clean-up when destruction has started (since it's redundant). --- [1] The SVG-rendering mode is obviously not officially supported, since it's both rather incomplete and inherently slower. However with recent changes, whereby we cache repeated images on the document rather than the page level, memory usage can be a lot worse than before if we never attempt to release e.g. cached image-data when the viewer is in SVG-rendering mode.	2021-04-02 12:32:31 +02:00
Jonas Jenwald	48ff20493f	Mark some internal `PDFDocumentProxy`-properties as "private" These two properties were never intended to be anything but "private", hence it really cannot hurt to actually indicate that they're not part of any official API.	2021-04-02 12:26:32 +02:00
Jonas Jenwald	0eb1433c78	[api-minor] Change the format of the `fontName`-property, in `defaultAppearanceData`, on Annotation-instances (PR 12831 follow-up) Currently the `fontName`-property contains an actual /Name-instance, which is a problem given that its fallback value is an empty string; see `ca7f546828/src/core/default_appearance.js (L35)` The reason that this is a problem can be seen in `ca7f546828/src/core/primitives.js (L30-L34)`, since an empty string short-circuits the cache. Essentially, in PDF documents, a /Name-instance cannot be empty and the way that the `DefaultAppearanceEvaluator` does things is unfortunately not entirely correct. Hence the `fontName`-property is changed to instead contain a string, rather than a /Name-instance, which simplifies the code overall. Please note: I'm tagging this patch with "[api-minor]", since PR 12831 is included in the current pre-release (although we're not using the `fontName`-property in the display-layer).	2021-04-01 16:47:30 +02:00
Tim van der Meij	ca7f546828	Merge pull request #12908 from calixteman/11918 Slightly rescale lineWidth to workaround chrome rendering issue	2021-03-31 21:56:31 +02:00
Calixte Denizet	a0cfb0841f	Slightly rescale lineWidth to workaround chrome rendering issue	2021-03-31 21:49:00 +02:00
Tim van der Meij	5a64157a2f	Merge pull request #13168 from janpe2/ttf-uni-glyphs Use post table when Encoding has only Differences	2021-03-31 21:35:13 +02:00
Tim van der Meij	1a4af17d07	Merge pull request #13165 from Snuffleupagus/Annotation-rm-defaultAppearance-export [api-minor] Stop exposing the raw `defaultAppearance`-string on Annotation-instances	2021-03-31 21:30:50 +02:00
Tim van der Meij	5be0fbe8f1	Merge pull request #13166 from Snuffleupagus/getDocument-URL [api-minor] Support proper `URL`-objects, in addition to URL-strings, in `getDocument`	2021-03-31 21:20:08 +02:00
Tim van der Meij	2fb4d02ea5	Merge pull request #13158 from Snuffleupagus/rm-URL-polyfill Remove the `URL` polyfill	2021-03-31 20:22:02 +02:00
Jani Pehkonen	0117ee5071	Use post table when Encoding has only Differences Fixes #13107 In the issue, some TrueType glyph names have the format `uniXXXX`. Font's `Encoding` dictionary has the entry `Differences` but no `BaseEncoding`. `uniXXXX` names are converted to glyph indices using font's `post` table but currently that is done only when `BaseEncoding` exists. We must enable the conversion also when only `Differences` exists.	2021-03-31 17:58:44 +03:00
Jonas Jenwald	db1e1612df	[api-minor] Support proper `URL`-objects, in addition to URL-strings, in `getDocument` Currently only URL-strings are officially supported by `getDocument`, however at this point in time I cannot really see any compelling reason to not support `URL`-objects as well. Most likely the reason that we've don't already support `URL`-objects, in `getDocument`, is that historically `URL` wasn't fully implemented across browsers and our old polyfill wasn't perfect; see https://developer.mozilla.org/en-US/docs/Web/API/URL/URL#browser_compatibility Please note: Because of how the `url` parameter is currently handled, there's actually some cases where passing a `URL`-object to `getDocument` already works. That, in my opinion, provides additional motivation for supporting `URL`-objects officially, since it makes the API more consistent. The following is an attempt to summarize the current situation, based on the actual code rather than the JSDocs: - `getDocument("url string")` works and is documented.[1] - `getDocument({ url: "url string", })` works and is documented.[1] - `getDocument(new URL(...))` throws immediately, since no supported parameters are found. - `getDocument({ url: new URL(...), })` actually works even though it's not documented.[1] Originally, when data was fetched on the worker-thread, this would likely have thrown since `URL` isn't clonable.[2] - `getDocument({ url: { abc: 123, }, })`, or some similarily meaningless input, will be "accepted" by `getDocument` and then throw a `MissingPDFException` when attempting to fetch the bogus data. With the changes in this patch, not only is `URL`-objects now officially supported and documented when calling `getDocument`, but we'll also do a much better job at actually validating any URL-data passed to `getDocument` (and instead fail early). --- [1] In browsers, we create a valid URL thus indirectly validating the input. In Node.js environments, on the other hand, no validation is done since obtaining a baseUrl is more difficult (and PDF.js is primarily written for browsers anyway). [2] https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm#supported_types	2021-03-31 16:21:41 +02:00
Jonas Jenwald	27add0f1f3	Re-factor the `source` parsing, in `getDocument`, to use `switch` rather than `if...else` Given the number of parameters that we now need to parse here, this code is no longer as readable as one would like. Hence this re-factoring, which will improve overall readability and also help with the next patch.	2021-03-31 16:21:37 +02:00
Jonas Jenwald	9c6770748c	Move the `PDFDocumentStats` typedef closer to its usage Currently this typedef appears slightly out-of-place, in the middle of the arguably much more important `getDocument` JSDocs.	2021-03-31 16:21:22 +02:00
calixteman	b3528868c1	XFA - Add support for few ui elements (#13115 ) - input; - layout; - border; - margin; - color.	2021-03-31 15:42:21 +02:00
Jonas Jenwald	3df24254e3	[api-minor] Stop exposing the raw `defaultAppearance`-string on Annotation-instances The reasons for making this change are: - This property is not, nor has it ever been, used anywhere in the PDF.js display-layer. - Related to the previous point, the format of the `defaultAppearance`-string is such that it'd be difficult to use it as-is in the display-layer anyway. - It (usually) contains the "raw" appearance-string, from the PDF document, which is neither parsed nor validated and could thus be bogus. - We now expose a `defaultAppearanceData`-property, which is first of all used in the display-layer and secondly contains actually parsed/validated data. - In the event that a third-party implementation needs the `defaultAppearance`-string, it could be easily constructed from the recently added `defaultAppearanceData`-property. All-in-all, I'm thus suggesting that we stop exposing an unused and unnecessary property on all Annotation-instances.	2021-03-31 15:09:18 +02:00
Jonas Jenwald	38acde8375	Use template strings, to reduce unnecessary verbosity in a few `warn(...)` calls in `src/core/annotation.js`	2021-03-31 14:40:21 +02:00
calixteman	84d7cccb1d	JS - Handle correctly hierarchy of fields (#13133 ) * JS - Handle correctly hierarchy of fields - it aims to fix #13132; - annotations can inherit their actions from the parent field; - there are some fields which act as a container for other fields: - they can be access through js so need to add them with an empty type (nothing in the spec about that but checked in Acrobat); - calculation order list (CO) can reference them so need make them through this.getField; - getArray method must return kids. - field values are number, string, ... depending of their type but nothing in the spec on how to know what's the type: - according to the comment for Canonical Format: https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#page=461 - it seems that this "type" can be guessed from js action Format (when setting a type in Acrobat DC, the only affected thing is this action). - util.scand with an empty string returns the current date.	2021-03-30 08:50:35 -07:00
Jonas Jenwald	fa86a192f9	Remove the `URL` polyfill Based on this compatibility information, given that IE 11 is now explicitly unsupported, we should no longer need to bundle a `URL` polyfill in any builds: https://developer.mozilla.org/en-US/docs/Web/API/URL/URL#browser_compatibility Note that the caveat listed for older Safari-versions doesn't apply to any code in the PDF.js library, since we never call `new URL(url, undefined)` in the code-base. Note also that Node.js has a web-compatible `URL` implementation, which according to the "History" section at https://nodejs.org/api/url.html#url_the_whatwg_url_api has been available since Node.js `10.0.0` (according to https://nodejs.org/en/about/releases/ that branch is one month away from being EOL-ed).	2021-03-29 18:00:36 +02:00
Tim van der Meij	1a2cdaffc5	Merge pull request #13152 from calixteman/13130 Skip extra objects in object stream in using offsets	2021-03-28 15:11:55 +02:00
Jonas Jenwald	19c2dfbb96	Move rotation normalization from `PDFViewerApplication` and into `BaseViewer` The rotation handling that's currently living in `PDFViewerApplication` is very old, and pre-dates the introduction of the viewer components by years. As can be seen in the `BaseViewer.pagesRotation` setter, we're not actually normalizing the rotation as intended and instead rely on the caller to handle that correctly. This is first of all inconsistent, given how other setters are implemented, and secondly it could also lead to the rotation being set to a value outside of the `[0, 360)`-range. Finally, for improved consistency the rotation handling in `PageViewport` is updated similarly. Please note that this case, it's not changing the pre-existing logic.	2021-03-28 14:19:58 +02:00
Calixte Denizet	9296ee6986	Skip extra objects in object stream in using offsets	2021-03-28 13:03:05 +02:00
calixteman	81c602c61c	Set CFF header to 4 when writing it because it contains 4 elements (#13149 )	2021-03-26 18:23:18 +01:00
calixteman	63471bcbbe	XFA - Convert some template properties into CSS ones (#13082 ) - implement few positioning properties: position, width, height, anchor; - implement font element; - implement fill element (used by font) and its children (linear, radial, ...); - font property is inherited from ancestor container (see https://www.pdfa.org/wp-content/uploads/2020/07/XFA-3_3.pdf#page=43) so let CSS handles that stuff; - in order to reduce the number of properties to set, only set non default properties and put the default in CSS; - set a background to some containers to be able to see them (will be removed in a future commit).	2021-03-25 13:02:39 +01:00
Tim van der Meij	8269ddbd16	Merge pull request #13105 from Snuffleupagus/BasePdfManager-parseDocBaseUrl Improve memory usage around the `BasePdfManager.docBaseUrl` parameter (PR 7689 follow-up)	2021-03-19 23:03:20 +01:00
Jonas Jenwald	57e7557235	Actually reset the `PDFPageProxy._xfaPromise` property as intended (PR 13069 follow-up) (#13119 ) Similar to the existing `annotationsPromise` and `_jsActionsPromise` properties, the new `_xfaPromise` should obviously also be reset, since otherwise you might end up holding onto a lot of data for pages that are no longer active. (That caching wasn't present in the original version of PR 13069, which is why I didn't spot it until now.)	2021-03-19 11:31:54 +01:00
calixteman	24e598a895	XFA - Add a layer to display XFA forms (#13069 ) - add an option to enable XFA rendering if any; - for now, let the canvas layer: it could be useful to implement XFAF forms (embedded pdf in xml stream for the background and xfa form for the foreground); - ui elements in template DOM are pretty close to their html counterpart so we generate a fake html DOM from template one: - it makes easier to translate template properties to html ones; - it makes faster the creation of the html element in the main thread.	2021-03-19 10:11:40 +01:00
Jonas Jenwald	c4c7216171	Improve memory usage around the `BasePdfManager.docBaseUrl` parameter (PR 7689 follow-up) While there is nothing outright wrong with the existing implementation, it can however lead to increased memory usage in one particular case (that I completely overlooked when implementing this): For "data:"-URLs, which by definition contains the entire PDF document and can thus be arbitrarily large, we obviously want to avoid sending, storing, and/or logging the "raw" docBaseUrl in that case. To address this, this patch makes the following changes: - Ignore any non-string in the `docBaseUrl` option passed to `getDocument`, since those are unsupported anyway, already on the main-thread. - Ignore "data:"-URLs in the `docBaseUrl` option passed to `getDocument`, to avoid having to send what could potentially be a very long string to the worker-thread. - Parse the `docBaseUrl` option directly in the `BasePdfManager`-constructors, on the worker-thread, to avoid having to store the "raw" docBaseUrl in the first place.	2021-03-17 15:48:24 +01:00
Jonas Jenwald	bd9dee1544	Move the `getPdfFilenameFromUrl` helper function from `web/ui_utils.js` and into `src/display/display_utils.js` It seems reasonable to place this alongside the similar `getFilenameFromUrl` helper function. This way, with the changes in the next patch, we also avoid having to expose the `isDataScheme` function in the API itself and we instead expose `getPdfFilenameFromUrl` in the API (which feels overall more appropriate).	2021-03-17 15:48:24 +01:00
Jonas Jenwald	5099f1977f	Support `LineAnnotation`s with empty /Rect-entries (issue 6564) This extends PR 13033 slightly, with a heuristic to support corrupt PDF documents where the `LineAnnotation`s have an empty /Rect-entry. Please note that while I have no idea if this is "correct", this patch at least makes us output the same /BBox as re-saving in Adobe Reader does.	2021-03-15 16:33:43 +01:00

1 2 3 4 5 ...

4426 Commits