pdf.js

Author	SHA1	Message	Date
Brendan Dahl	fc9501a637	Add support for basic structure tree for accessibility. When a PDF is "marked" we now generate a separate DOM that represents the structure tree from the PDF. This DOM is inserted into the <canvas> element and allows screen readers to walk the tree and have more information about headings, images, links, etc. To link the structure tree DOM (which is empty) to the text layer aria-owns is used. This required modifying the text layer creation so that marked items are now tracked.	2021-04-09 09:56:28 -07:00
Jonas Jenwald	72ef183085	[api-minor] Remove the manual passing of an `AnnotationStorage`-instance when calling various API-method Note how we purposely don't expose the `AnnotationStorage`-class directly in the official API (see `src/pdf.js`), since trying to use multiple ones simultaneously doesn't really make sense (e.g. in the viewer). Instead we lazily initialize, and cache, just one instance via `PDFDocumentProxy.annotationStorage` which should thus be available internally in the API itself without having to be manually passed to various methods. To support these changes, the `AnnotationStorage`-instance initialization is moved into the `WorkerTransport`-class to allow both `PDFDocumentProxy` and `PDFPageProxy` to access it. This patch implements the following simplifications: - Remove the `annotationStorage`-parameter from `PDFDocumentProxy.saveDocument`, since it's already available internally. Furthermore, while it's currently possible to call that method without an `AnnotationStorage`-instance, that really does not make any sense at all. In this case you're effectively reducing `PDFDocumentProxy.saveDocument` to a "regular" `PDFDocumentProxy.getData` call, but with a lot more overhead, which was obviously not the intention of the `PDFDocumentProxy.saveDocument`-method. - Try to discourage third-party users from calling `PDFDocumentProxy.saveDocument` unconditionally, as a replacement for `PDFDocumentProxy.getData` (note the previous point). - Replace the `annotationStorage`-parameter, in `PDFPageProxy.render`, with a boolean `includeAnnotationStorage`-parameter which simply indicates if the (internally available) `AnnotationStorage`-instance should be used during rendering (e.g. for printing). - By removing the need to manually provide `annotationStorage`-parameters to various API-methods, using the API should become simpler (e.g. for third-parties) since you no longer need to worry about manually fetching and passing around this data.	2021-04-09 13:24:25 +02:00
Jonas Jenwald	f986ccdf0e	Fuzzy-match the fontName, for TrueType Collection fonts, where the "name"-table is wrong (issue 13193) The fontName, as defined in the PDF document, cannot be found in any of the "name"-tables in the TrueType Collection font. To work-around that, this patch adds a fallback code-path to allow using an approximately matching fontName rather than outright failing.	2021-04-07 15:25:32 +02:00
Tim van der Meij	228adbf673	Merge pull request #13172 from Snuffleupagus/cleanup-keepFonts [api-minor] Add an option, in `PDFDocumentProxy.cleanup`, to allow fonts to remain attached to the DOM	2021-04-05 14:21:34 +02:00
Jonas Jenwald	232fbd28e1	Re-factor the `PDFDocumentProxy.cleanup` unit-tests to use async/await	2021-04-02 12:32:35 +02:00
Jonas Jenwald	0eb1433c78	[api-minor] Change the format of the `fontName`-property, in `defaultAppearanceData`, on Annotation-instances (PR 12831 follow-up) Currently the `fontName`-property contains an actual /Name-instance, which is a problem given that its fallback value is an empty string; see `ca7f546828/src/core/default_appearance.js (L35)` The reason that this is a problem can be seen in `ca7f546828/src/core/primitives.js (L30-L34)`, since an empty string short-circuits the cache. Essentially, in PDF documents, a /Name-instance cannot be empty and the way that the `DefaultAppearanceEvaluator` does things is unfortunately not entirely correct. Hence the `fontName`-property is changed to instead contain a string, rather than a /Name-instance, which simplifies the code overall. Please note: I'm tagging this patch with "[api-minor]", since PR 12831 is included in the current pre-release (although we're not using the `fontName`-property in the display-layer).	2021-04-01 16:47:30 +02:00
Tim van der Meij	5a64157a2f	Merge pull request #13168 from janpe2/ttf-uni-glyphs Use post table when Encoding has only Differences	2021-03-31 21:35:13 +02:00
Jani Pehkonen	0117ee5071	Use post table when Encoding has only Differences Fixes #13107 In the issue, some TrueType glyph names have the format `uniXXXX`. Font's `Encoding` dictionary has the entry `Differences` but no `BaseEncoding`. `uniXXXX` names are converted to glyph indices using font's `post` table but currently that is done only when `BaseEncoding` exists. We must enable the conversion also when only `Differences` exists.	2021-03-31 17:58:44 +03:00
Jonas Jenwald	db1e1612df	[api-minor] Support proper `URL`-objects, in addition to URL-strings, in `getDocument` Currently only URL-strings are officially supported by `getDocument`, however at this point in time I cannot really see any compelling reason to not support `URL`-objects as well. Most likely the reason that we've don't already support `URL`-objects, in `getDocument`, is that historically `URL` wasn't fully implemented across browsers and our old polyfill wasn't perfect; see https://developer.mozilla.org/en-US/docs/Web/API/URL/URL#browser_compatibility Please note: Because of how the `url` parameter is currently handled, there's actually some cases where passing a `URL`-object to `getDocument` already works. That, in my opinion, provides additional motivation for supporting `URL`-objects officially, since it makes the API more consistent. The following is an attempt to summarize the current situation, based on the actual code rather than the JSDocs: - `getDocument("url string")` works and is documented.[1] - `getDocument({ url: "url string", })` works and is documented.[1] - `getDocument(new URL(...))` throws immediately, since no supported parameters are found. - `getDocument({ url: new URL(...), })` actually works even though it's not documented.[1] Originally, when data was fetched on the worker-thread, this would likely have thrown since `URL` isn't clonable.[2] - `getDocument({ url: { abc: 123, }, })`, or some similarily meaningless input, will be "accepted" by `getDocument` and then throw a `MissingPDFException` when attempting to fetch the bogus data. With the changes in this patch, not only is `URL`-objects now officially supported and documented when calling `getDocument`, but we'll also do a much better job at actually validating any URL-data passed to `getDocument` (and instead fail early). --- [1] In browsers, we create a valid URL thus indirectly validating the input. In Node.js environments, on the other hand, no validation is done since obtaining a baseUrl is more difficult (and PDF.js is primarily written for browsers anyway). [2] https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm#supported_types	2021-03-31 16:21:41 +02:00
calixteman	b3528868c1	XFA - Add support for few ui elements (#13115 ) - input; - layout; - border; - margin; - color.	2021-03-31 15:42:21 +02:00
calixteman	84d7cccb1d	JS - Handle correctly hierarchy of fields (#13133 ) * JS - Handle correctly hierarchy of fields - it aims to fix #13132; - annotations can inherit their actions from the parent field; - there are some fields which act as a container for other fields: - they can be access through js so need to add them with an empty type (nothing in the spec about that but checked in Acrobat); - calculation order list (CO) can reference them so need make them through this.getField; - getArray method must return kids. - field values are number, string, ... depending of their type but nothing in the spec on how to know what's the type: - according to the comment for Canonical Format: https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#page=461 - it seems that this "type" can be guessed from js action Format (when setting a type in Acrobat DC, the only affected thing is this action). - util.scand with an empty string returns the current date.	2021-03-30 08:50:35 -07:00
Jonas Jenwald	75a6b2fa13	Improve handling of linked test-cases for the unit/integration suites (#13160 ) - Actually support linked test-cases in the integration-tests (in the same way as the unit-tests). - Add a new `"type": "other"`-kind to the test-manifest, to support linked test-cases in the unit/integration-tests without requiring the PDF document in question to also be a reference-test.	2021-03-30 13:24:04 +02:00
Calixte Denizet	9296ee6986	Skip extra objects in object stream in using offsets	2021-03-28 13:03:05 +02:00
calixteman	81c602c61c	Set CFF header to 4 when writing it because it contains 4 elements (#13149 )	2021-03-26 18:23:18 +01:00
calixteman	63471bcbbe	XFA - Convert some template properties into CSS ones (#13082 ) - implement few positioning properties: position, width, height, anchor; - implement font element; - implement fill element (used by font) and its children (linear, radial, ...); - font property is inherited from ancestor container (see https://www.pdfa.org/wp-content/uploads/2020/07/XFA-3_3.pdf#page=43) so let CSS handles that stuff; - in order to reduce the number of properties to set, only set non default properties and put the default in CSS; - set a background to some containers to be able to see them (will be removed in a future commit).	2021-03-25 13:02:39 +01:00
Jonas Jenwald	eeda2215d7	Remove redundant `done`-callback functions from unit-tests which are `async` For unit-tests which are asynchronous, using a `done`-callback is redundant and future Jasmine versions will stop supporting that pattern.	2021-03-21 11:33:39 +01:00
Tim van der Meij	8269ddbd16	Merge pull request #13105 from Snuffleupagus/BasePdfManager-parseDocBaseUrl Improve memory usage around the `BasePdfManager.docBaseUrl` parameter (PR 7689 follow-up)	2021-03-19 23:03:20 +01:00
calixteman	24e598a895	XFA - Add a layer to display XFA forms (#13069 ) - add an option to enable XFA rendering if any; - for now, let the canvas layer: it could be useful to implement XFAF forms (embedded pdf in xml stream for the background and xfa form for the foreground); - ui elements in template DOM are pretty close to their html counterpart so we generate a fake html DOM from template one: - it makes easier to translate template properties to html ones; - it makes faster the creation of the html element in the main thread.	2021-03-19 10:11:40 +01:00
Jonas Jenwald	bd9dee1544	Move the `getPdfFilenameFromUrl` helper function from `web/ui_utils.js` and into `src/display/display_utils.js` It seems reasonable to place this alongside the similar `getFilenameFromUrl` helper function. This way, with the changes in the next patch, we also avoid having to expose the `isDataScheme` function in the API itself and we instead expose `getPdfFilenameFromUrl` in the API (which feels overall more appropriate).	2021-03-17 15:48:24 +01:00
Jonas Jenwald	5099f1977f	Support `LineAnnotation`s with empty /Rect-entries (issue 6564) This extends PR 13033 slightly, with a heuristic to support corrupt PDF documents where the `LineAnnotation`s have an empty /Rect-entry. Please note that while I have no idea if this is "correct", this patch at least makes us output the same /BBox as re-saving in Adobe Reader does.	2021-03-15 16:33:43 +01:00
Tim van der Meij	cc59c81fe6	Merge pull request #13096 from Snuffleupagus/eslint-no-var-stats Enable the ESLint `no-var` rule in the `test/stats/` folder	2021-03-14 11:34:06 +01:00
Jonas Jenwald	7ec2bd0f01	Enable the ESLint `no-var` rule in `test/add_test.js` These changes were done automatically, by using the `gulp lint --fix` command.	2021-03-14 10:25:51 +01:00
Jonas Jenwald	473f0aeeb2	Enable the ESLint `no-var` rule in the `test/stats/` folder Note that the majority of these changes were done automatically, by using the `gulp lint --fix` command, and the manual changes were limited to the following diff: ```diff diff --git a/test/stats/statcmp.js b/test/stats/statcmp.js index 7c4dbf1d3..22d535a5a 100644 --- a/test/stats/statcmp.js +++ b/test/stats/statcmp.js @@ -1,13 +1,7 @@ "use strict"; const fs = require("fs"); - -try { - var ttest = require("ttest"); -} catch (e) { - console.log('\nttest is not installed -- to intall, run "npm install ttest"'); - console.log("Continuing without significance test...\n"); -} +const ttest = require("ttest"); const VALID_GROUP_BYS = ["browser", "pdf", "page", "round", "stat"]; @@ -134,9 +128,7 @@ function stat(baseline, current) { if (ttest) { labels.push("Result(P<.05)"); } - let i, - row, - rows = []; + const rows = []; // collect rows and measure column widths const width = labels.map(function (s) { return s.length; @@ -146,7 +138,7 @@ function stat(baseline, current) { const key = keys[k]; const baselineMean = mean(baselineGroup[key]); const currentMean = mean(currentGroup[key]); - row = key.split(","); + const row = key.split(","); row.push( "" + baselineGroup[key].length, "" + Math.round(baselineMean), @@ -165,7 +157,7 @@ function stat(baseline, current) { row.push(""); } } - for (i = 0; i < row.length; i++) { + for (let i = 0; i < row.length; i++) { width[i] = Math.max(width[i], row[i].length); } rows.push(row); @@ -181,8 +173,8 @@ function stat(baseline, current) { console.log("-- Grouped By " + options.groupBy.join(", ") + " --"); const groupCount = options.groupBy.length; for (let r = 0; r < rows.length; r++) { - row = rows[r]; - for (i = 0; i < row.length; i++) { + const row = rows[r]; + for (let i = 0; i < row.length; i++) { row[i] = pad(row[i], width[i], i < groupCount ? "right" : "left"); } console.log(row.join(" \| ")); @@ -208,5 +200,5 @@ function main() { stat(baseline, current); } -var options = parseOptions(); +const options = parseOptions(); main(); ```	2021-03-14 10:15:45 +01:00
Jonas Jenwald	5b5061afa8	Enable the ESLint `no-var` rule globally A significant portion of the code-base has now been converted to use `let`/`const`, rather than `var`, hence it should be possible to simply enable the ESLint `no-var` rule globally. This way we can ensure that new code won't accidentally use `var`, and it also removes the need to manually enable the rule in various folders. Obviously it makes sense to continue the efforts to replace `var`, but that should probably happen on a file and/or folder basis. Please note that this patch excludes the following code: - The `extensions/` folder, since that seemed easiest for now (and I don't know exactly what the support situation is for the Chromium-extension). - The entire `external/` folder is ignored, since most of it's currently excluded from linting. For the code that isn't imported from elsewhere (and should be ignored), we should probably (at some point) bring the code up to the same linting/formatting standard as the rest of the code-base. - Various files in the `test/` folder are ignored, as necessary, since the way that a lot of this code is loaded will require some care (or perhaps larger re-factoring) when removing `var` usage.	2021-03-13 16:12:53 +01:00
Jonas Jenwald	22e0ed51c6	Remove unnecessary `/* eslint no-var: error */` lines in the `test/unit/` folder (PR 12528 follow-up) These lines are no longer needed, since the ESLint `no-var` rule has been enabled in the entire folder.	2021-03-13 11:50:11 +01:00
Jonas Jenwald	39cd844243	Ensure that all errors are handled in `rasterizeTextLayer`/`rasterizeAnnotationLayer` Currently errors occurring within the `src/display/{text_layer, annotation_layer}.js` files are not being handled properly by the test-suite, and the tests simply time out rather than failing as intended. This makes it very easy to accidentally overlook a certain type of errors, see e.g. https://github.com/mozilla/pdf.js/pull/13055#discussion_r589005041, which this patch will thus prevent.	2021-03-12 14:05:53 +01:00
Brendan Dahl	47a5550f10	Disable intermittent unit test "creates pdf doc from non-existent URL" Disable this test so we don't have to manually review unit test failure log all the time.	2021-03-11 11:40:40 -08:00
Calixte Denizet	3243672727	XFA - Create Form DOM in merging template and data trees - Spec: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=171; - support for the 2 ways of merging: consumeData and matchTemplate; - create additional nodes in template DOM when occur node allows it; - support for global values in data DOM.	2021-03-08 14:10:30 +01:00
Jonas Jenwald	36bb4fa823	Remove the `test/features` folder, since it's very out of date (issue 11954) These tests, and their [accompanying Wiki page](https://github.com/mozilla/pdf.js/wiki/Required-Browser-Features), haven't received any real updates for many years and are sufficiently out of date to be effectively useless now. Providing irrelevant compatibility information seems overall worse than not providing any information, and as suggested in the issue it'd probably be better to use https://github.com/mozilla/pdf.js#online-demo for checking if a particular platform/browser is supported. Thanks to version control, it's easy to restore these files should the need ever arise. However, re-introducing these tests would essentially require updating every single test-case and a commitment to keeping them up to date with future code changes.	2021-03-07 13:41:30 +01:00
Calixte Denizet	c01ef24541	JS - reset correctly radio buttons	2021-03-07 11:04:40 +01:00
Tim van der Meij	5828ff6cb0	Implement rendering line annotations without appearance stream	2021-02-28 18:57:58 +01:00
Tim van der Meij	fa6cebf045	Implement rendering square/circle annotations without appearance stream	2021-02-27 19:05:12 +01:00
Calixte Denizet	17363bbc6f	Fix integration test with js-colors - need to wait for color change to check its value.	2021-02-26 11:40:02 +01:00
calixteman	45329af926	XFA -- Add support for SOM expressions (#12983 ) - specifications: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=87; - add a parser for SOM expressions; - add search functions to resolve those expressions; - search functions will be used to bind data into template.	2021-02-24 10:13:02 +01:00
Tim van der Meij	f3aa4408a5	Merge pull request #13005 from calixteman/colors JS - Fix setting a color on an annotation	2021-02-21 14:50:03 +01:00
Calixte Denizet	4a5f1d1b7a	JS - Fix setting a color on an annotation - strokeColor corresponds to borderColor; - support fillColor and textColor; - support colors on the different annotations; - fix typo in aforms (+test).	2021-02-20 15:24:37 +01:00
Jonas Jenwald	e9038cc3d1	Send the `AnnotationStorage`-data to the worker-thread as a `Map` Rather than converting the `AnnotationStorage`-data to an Object, before sending it to the worker-thread, we should be able to simply send the internal `Map` directly. The "structured clone algorithm" doesn't have a problem with `Map`s, however the `LoopbackPort` used when workers are disabled (e.g. in Node.js environments) didn't use to support them. With PR 12997 having lifted that restriction, we should now be able to simply send the `AnnotationStorage`-data as-is rather than having to iterate through it to first create an Object. Please note: The changes in `src/core/annotation.js` could have been a lot more compact if we were able to use optional chaining in the `src/core` folder. Unfortunately that's still not possible, since SystemJS is being used in the development viewer (i.g. `gulp server`) and fixing that is still blocked by [bug 1247687](https://bugzilla.mozilla.org/show_bug.cgi?id=1247687).	2021-02-18 17:13:43 +01:00
calixteman	0fa9976268	XFA - Add support for prototypes (#12979 ) - specifications: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=225&zoom=auto,-207,784 - add a clone method on nodes in order to be able to clone a proto; - support ids in template namespace; - prevent from cycle when applying protos.	2021-02-18 10:32:25 +01:00
Tim van der Meij	4619b1b568	Merge pull request #12997 from Snuffleupagus/metadata-worker Move the Metadata parsing to the worker-thread	2021-02-17 20:57:46 +01:00
calixteman	b5be515375	XFA - Add a lexer/parser for FormCalc language (#12936 ) - the language specifications are: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=1049 - it can be used to: * as a scripting language for calculation, validations, ... * in SOM expressions to select nodes: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=101	2021-02-17 20:28:06 +01:00
Jonas Jenwald	d366bbdf51	Move the `encodeToXmlString` helper function to `src/core/core_utils.js` With the previous patch this function is now only accessed on the worker-thread, hence it's no longer necessary to include it in the built `pdf.js` file.	2021-02-17 13:12:01 +01:00
Jonas Jenwald	b66f294f64	Move the XML-parser to the `src/core/`-folder With the previous patch this functionality is now only accessed on the worker-thread, hence it's no longer necessary to include it in the built `pdf.js` file.	2021-02-17 13:12:01 +01:00
Jonas Jenwald	cc3a6563ee	Move the Metadata parsing to the worker-thread The only reason, as far as I can tell, for parsing the Metadata on the main-thread is how it was originally implemented. When Metadata support was first implemented, it utilized the [`DOMParser`](https://developer.mozilla.org/en-US/docs/Web/API/DOMParser) which isn't available in workers. Today, with the custom XML-parser being used, that's no longer an issue and it seems reasonable to move the Metadata parsing to the worker-thread[1], since that's where all parsing should happen (for performance reasons). Based on these changes, we'll be able to reduce the now unnecessary duplication of the XML-parser (and related code) in both of the built `pdf.js`/`pdf.worker.js` files. Finally, this patch changes the `_repair` method to use "Array + join" rather than string concatenation. --- [1] This needed the previous patch, to enable sending of `Map`s between threads with workers disabled.	2021-02-17 13:12:01 +01:00
Calixte Denizet	ccef734ebb	Remove Promise.all and async+done from unit/scripting_spec	2021-02-17 11:19:39 +01:00
Calixte Denizet	82f75a8ac2	JS -- Fix doc.getField and add missing field methods - getField("foo") was wrongly returning a field named "foobar"; - field object had few missing unimplemented methods	2021-02-17 10:42:52 +01:00
Tim van der Meij	bab059d8fd	Merge pull request #12964 from calixteman/12963 Avoid infinite loop when getting annotation field name	2021-02-16 22:36:24 +01:00
Calixte Denizet	0fc8267576	Avoid infinite loop when getting annotation field name - aims to fix issue #12963; - use a Set to track already visited objects; - remove the loop limit in getInheritableProperty and use a RefSet too.	2021-02-14 19:58:19 +01:00
Jonas Jenwald	b26c7974fe	[api-minor] Change the `dc:subject` Metadata field to an Array This patch simply extends the existing handling of the `dc:creator` field, which should hopefully suffice here; please refer to https://wwwimages2.adobe.com/content/dam/acom/en/devnet/xmp/pdfs/XMP%20SDK%20Release%20cc-2016-08/XMPSpecificationPart1.pdf#page=34	2021-02-14 17:16:40 +01:00
Calixte Denizet	ea06bb0e36	[api-minor] Annotation -- Don't compute appearance when nothing has changed * don't set a value in annotationStorage by default: - having an undefined when the annotation is rendered for saving/printing means nothing has changed so use normal appearance - aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1681687 * change the way to compute font size when this one is null in DA: - make fontSize proportional to line height - in multiline case, take into account the number of lines for text entered to adapt the font size	2021-02-12 19:27:21 +01:00
calixteman	a8021208ea	Restore window.alert after use in scripting test (#12987 )	2021-02-12 14:19:58 +01:00

... 3 4 5 6 7 ...

2485 Commits