pdf.js

Author	SHA1	Message	Date
calixteman	af4dc55019	[api-minor] Fix the way to chunk the strings (#13257 ) - Improve chunking in order to fix some bugs where the spaces aren't here: * track the last position where a glyph has been drawn; * when a new glyph (first glyph in a chunk) is added then compare its position with the last saved one and add a space or break: - there are multiple ways to move the glyphs and to avoid to have to deal with all the different possibilities it's a way easier to just compare positions; - and so there is now one function (i.e. "compareWithLastPosition") where all the job is done. - Add some breaks in order to get lines; - Remove the multiple whites spaces: * some spaces were filled with several whites spaces and so it makes harder to find some sequences of words using the search tool; * other pdf readers replace spaces by one white space. Update src/core/evaluator.js Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com> Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>	2021-04-30 14:41:13 +02:00
Brendan Dahl	d10da907da	Fix position of highlighted all text. (#13306 ) Adds a new integration test to ensure we don't regress this again.	2021-04-28 10:15:31 +02:00
Tim van der Meij	60ab15427f	Implement rendering polyline/polygon annotations without appearance stream	2021-04-27 19:02:20 +02:00
Jonas Jenwald	6f4394fcd8	Support `InkAnnotation`s without appearance streams (issue 13298) (#13301 ) For now, we keep things purposely simple by using straight lines (rather than curves); please see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G11.2096579	2021-04-27 11:49:03 +02:00
Tim van der Meij	da0e7ea969	Merge pull request #13272 from calixteman/issue13271 Update all the text widgets having the same name with the same value	2021-04-23 21:08:54 +02:00
Jonas Jenwald	57a1ea840f	Ensure that `saveDocument` works if there's no /ID-entry in the PDF document (issue 13279) (#13280 ) First of all, while it should be very unlikely that the /ID-entry is an indirect object, note how we're using `Dict.get` when parsing it e.g. in `PDFDocument.fingerprint`. Hence we definitely should be consistent here, since if the /ID-entry is an indirect object the existing code in `src/core/writer.js` would already fail. Secondly, to fix the referenced issue, we also need to check that the /ID-entry actually is an Array before attempting to access its contents in `src/core/writer.js`. Drive-by change: In the `xrefInfo` object passed to the `incrementalUpdate` function, re-name the `encrypt` property to `encryptRef` since its data is fetched using `Dict.getRaw` (given the names of the other properties fetched similarly).	2021-04-22 12:08:56 +02:00
Jonas Jenwald	7b8d2495ca	Convert the font-test `ttx` helper function to use the Fetch API By replacing `XMLHttpRequest` with a `fetch` call, the helper function can be modernized to use async/await instead. Note that the headers doesn't seem necessary to set now, since: - The Fetch API provides a method for accessing the response as text, which renders the "Content-type" header unnecessary. - According to https://developer.mozilla.org/en-US/docs/Glossary/Forbidden_header_name, the "Content-length" header isn't necessary.	2021-04-20 23:44:15 +02:00
Calixte Denizet	e868ab0051	Update all the text widgets having the same name with the same value	2021-04-20 20:03:19 +02:00
Jonas Jenwald	3d55b2b10e	Replace `done` callbacks in the font-tests with async/await instead	2021-04-19 13:26:39 +02:00
Tim van der Meij	d42f3d0bfe	Convert done callbacks to async/await in `test/unit/evaluator_spec.js`	2021-04-18 14:20:54 +02:00
Tim van der Meij	f4237d3a09	Convert done callbacks to async/await in `test/unit/annotation_spec.js`	2021-04-17 19:59:18 +02:00
Tim van der Meij	c2f3a71eca	Convert done callbacks to async/await in `test/unit/api_spec.js`	2021-04-17 17:52:23 +02:00
Jonas Jenwald	f560fe6875	A couple of small scripting/XFA-related tweaks in the worker-code - Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible. - Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.) - Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be. Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.) - Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.	2021-04-17 10:34:22 +02:00
Brendan Dahl	ac3fa1e3d7	Merge pull request #13146 from calixteman/xfa_fonts XFA -- Load fonts permanently from the pdf	2021-04-16 12:55:12 -07:00
Tim van der Meij	6e8ff2fed9	Merge pull request #13247 from Snuffleupagus/update-yargs Update the `yargs` package to the latest version	2021-04-16 20:33:45 +02:00
Tim van der Meij	cba6a3f375	Merge pull request #13246 from timvandermeij/unit-test-async-await-pt2 Convert done callbacks to async/await in more unit test files	2021-04-16 20:24:53 +02:00
Jonas Jenwald	c988712bc5	Update the `yargs` package to the latest version While I wasn't able to figure out exactly why the old format didn't work, re-factoring the `parseOptions` function to use `yargs` differently "just worked" so that's hopefully good enough here. With these changes everything related to a particular option now appears in one place, rather than being spread out, which aids readability in my opinion. Also, the options are now sorted alphabetically, to make it easier to find a particular one. https://www.npmjs.com/package/yargs	2021-04-16 12:04:35 +02:00
Calixte Denizet	7e9579045f	XFA -- Load fonts permanently from the pdf - Different fonts can be used in xfa and some of them are embedded in the pdf. - Load all the fonts in window.document. Update src/core/document.js Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com> Update src/core/worker.js Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>	2021-04-15 17:57:42 +02:00
Tim van der Meij	38ed655562	Convert done callbacks to async/await in `test/unit/cmap_spec.js`	2021-04-14 22:24:28 +02:00
Tim van der Meij	046467ff47	Drop obsolete done callbacks in `test/unit/annotation_storage_spec.js` There is no asynchronous code involved here, so we can get rid of all done callbacks here and simply use the fact that if the function call ends without failed assertion that the test passed.	2021-04-14 22:11:45 +02:00
Tim van der Meij	82bdba78fb	Drop obsolete done callbacks in `test/unit/crypto_spec.js` There is no asynchronous code involved here, so we can get rid of all done callbacks here and simply use the fact that if the function call ends without failed assertion that the test passed.	2021-04-14 22:09:17 +02:00
Tim van der Meij	43eb4302ff	Convert done callbacks to async/await in `test/unit/message_handler_spec.js`	2021-04-14 21:59:13 +02:00
Tim van der Meij	bc8c0bbbfd	Convert done callbacks to async/await in `test/unit/display_svg_spec.js`	2021-04-14 21:59:13 +02:00
Tim van der Meij	ae48d07582	Merge pull request #13243 from janpe2/ocg-ve Implement visibility expressions for optional content	2021-04-14 20:42:49 +02:00
Tim van der Meij	cd2c4e277c	Merge pull request #13222 from timvandermeij/unit-test-async Convert done callbacks to async/await in the smaller unit test files	2021-04-14 20:37:17 +02:00
Jani Pehkonen	3a96977ea8	Implement visibility expressions for optional content	2021-04-14 17:39:41 +03:00
Tim van der Meij	c1e9f6025f	Convert done callbacks to async/await in `test/unit/custom_spec.js`	2021-04-13 21:51:27 +02:00
Tim van der Meij	a1c1e1b9f8	Convert done callbacks to async/await in `test/unit/fetch_stream_spec.js`	2021-04-13 21:51:27 +02:00
Tim van der Meij	5607484402	Convert done callbacks to async/await in `test/unit/network_spec.js`	2021-04-13 21:51:26 +02:00
Tim van der Meij	fcf4d02fca	Convert done callbacks to async/await in `test/unit/node_stream_spec.js`	2021-04-13 21:51:26 +02:00
Tim van der Meij	99dc0d6b65	Convert done callbacks to async/await in `test/unit/primitives_spec.js`	2021-04-13 21:50:13 +02:00
Tim van der Meij	a56ffb92be	Convert done callbacks to async/await in `test/unit/ui_utils_spec.js`	2021-04-13 21:50:13 +02:00
Tim van der Meij	a2811e925d	Convert done callbacks to async/await in `test/unit/util_spec.js`	2021-04-13 21:47:53 +02:00
Jonas Jenwald	2b2234fd5a	[api-minor] Ensure that `PDFDocumentProxy.hasJSActions` won't fail if `MissingDataException`s are thrown during the associated worker-thread parsing With the current implementation of `PDFDocument.hasJSActions`, in the worker-thread, we're not actually handling not-yet-loaded data correctly. This can thus fail in two different ways: - The `PDFDocument.fieldObjects` getter (and its helper method), while it may return a Promise, still fetches all of its data synchronously and it can thus throw a `MissingDataException` during parsing. - The `Catalog.jsActions` getter, which is completely synchronous, can obviously throw a `MissingDataException` during parsing. If either of these cases occur currently, the `PDFDocumentProxy.hasJSActions` method in the API can either return a rejected Promise (which it never should) or possibly "hang" and never resolve. Please note: While I've not yet seen this error in an actual PDF document, it can happen during loading if you're unlucky enough with e.g. the structure of the PDF document and/or the download speed offered by the server. This patch is thus based on code-inspection and on manually throwing a `MissingDataException` on the first access of `Catalog.jsActions` to simulate this situation. Finally, this patch adds a couple of API unit-tests for this (since none existed).	2021-04-13 14:33:56 +02:00
Calixte Denizet	a4c986515f	XFA -- Display text content - display xhtml; - allow spaces in xhtml (xfa-spacerun:yes); - support column layout; - fix some border issues.	2021-04-12 14:13:49 +02:00
Jonas Jenwald	5adee0cdd1	[api-minor] Let `PDFPageProxy.getStructTree` return `null`, rather than an empty structTree, for documents without any accessibility data (PR 13171 follow-up) This is first of all consistent with existing API-methods, where we return `null` when the data in question doesn't exist. Secondly, it should also be (slightly) more efficient since there's less dummy-data that we need to transfer between threads. Finally, this prevents us from adding an empty/unnecessary span to every single page even in documents without any structure tree data.	2021-04-11 12:35:33 +02:00
Tim van der Meij	10574a0f8a	Remove obsolete done callbacks from the unit tests The done callbacks are an outdated mechanism to signal Jasmine that a unit test is done, mostly in cases where a unit test needed to wait for an asynchronous operation to complete before doing its assertions. Nowadays a much better mechanism is in place for that, namely simply passing an asynchronous function to Jasmine, so we don't need callbacks anymore (which require more code and may be more difficult to reason about). In these particular cases though the done callbacks never had any real use since nothing asynchronous happens in these places. Synchronous functions don't need to use done callbacks since Jasmine simply knows it's done when the function reaches its normal end, so we can safely get rid of these callbacks. The telltale sign is if the done callback is used unconditionally at the end of the function. This is all done in an effort to over time get rid of all callbacks in the unit test code.	2021-04-10 20:29:39 +02:00
Tim van der Meij	d9d626a5e1	Merge pull request #13214 from calixteman/signatures Display widget signature	2021-04-10 19:35:16 +02:00
Calixte Denizet	5875ebb1ca	Display widget signature - but don't validate them for now; - Firefox will display a bar to warn that the signature validation is not supported (see https://bugzilla.mozilla.org/show_bug.cgi?id=854315) - almost all (all ?) pdf readers display signatures; - validation is done in edge but for now it's behind a pref.	2021-04-10 19:13:28 +02:00
Tim van der Meij	03c8c89002	Merge pull request #13171 from brendandahl/struct-tree [api-minor] Add support for basic structure tree for accessibility.	2021-04-09 21:32:44 +02:00
Brendan Dahl	fc9501a637	Add support for basic structure tree for accessibility. When a PDF is "marked" we now generate a separate DOM that represents the structure tree from the PDF. This DOM is inserted into the <canvas> element and allows screen readers to walk the tree and have more information about headings, images, links, etc. To link the structure tree DOM (which is empty) to the text layer aria-owns is used. This required modifying the text layer creation so that marked items are now tracked.	2021-04-09 09:56:28 -07:00
Jonas Jenwald	72ef183085	[api-minor] Remove the manual passing of an `AnnotationStorage`-instance when calling various API-method Note how we purposely don't expose the `AnnotationStorage`-class directly in the official API (see `src/pdf.js`), since trying to use multiple ones simultaneously doesn't really make sense (e.g. in the viewer). Instead we lazily initialize, and cache, just one instance via `PDFDocumentProxy.annotationStorage` which should thus be available internally in the API itself without having to be manually passed to various methods. To support these changes, the `AnnotationStorage`-instance initialization is moved into the `WorkerTransport`-class to allow both `PDFDocumentProxy` and `PDFPageProxy` to access it. This patch implements the following simplifications: - Remove the `annotationStorage`-parameter from `PDFDocumentProxy.saveDocument`, since it's already available internally. Furthermore, while it's currently possible to call that method without an `AnnotationStorage`-instance, that really does not make any sense at all. In this case you're effectively reducing `PDFDocumentProxy.saveDocument` to a "regular" `PDFDocumentProxy.getData` call, but with a lot more overhead, which was obviously not the intention of the `PDFDocumentProxy.saveDocument`-method. - Try to discourage third-party users from calling `PDFDocumentProxy.saveDocument` unconditionally, as a replacement for `PDFDocumentProxy.getData` (note the previous point). - Replace the `annotationStorage`-parameter, in `PDFPageProxy.render`, with a boolean `includeAnnotationStorage`-parameter which simply indicates if the (internally available) `AnnotationStorage`-instance should be used during rendering (e.g. for printing). - By removing the need to manually provide `annotationStorage`-parameters to various API-methods, using the API should become simpler (e.g. for third-parties) since you no longer need to worry about manually fetching and passing around this data.	2021-04-09 13:24:25 +02:00
Jonas Jenwald	f986ccdf0e	Fuzzy-match the fontName, for TrueType Collection fonts, where the "name"-table is wrong (issue 13193) The fontName, as defined in the PDF document, cannot be found in any of the "name"-tables in the TrueType Collection font. To work-around that, this patch adds a fallback code-path to allow using an approximately matching fontName rather than outright failing.	2021-04-07 15:25:32 +02:00
Tim van der Meij	228adbf673	Merge pull request #13172 from Snuffleupagus/cleanup-keepFonts [api-minor] Add an option, in `PDFDocumentProxy.cleanup`, to allow fonts to remain attached to the DOM	2021-04-05 14:21:34 +02:00
Jonas Jenwald	232fbd28e1	Re-factor the `PDFDocumentProxy.cleanup` unit-tests to use async/await	2021-04-02 12:32:35 +02:00
Jonas Jenwald	0eb1433c78	[api-minor] Change the format of the `fontName`-property, in `defaultAppearanceData`, on Annotation-instances (PR 12831 follow-up) Currently the `fontName`-property contains an actual /Name-instance, which is a problem given that its fallback value is an empty string; see `ca7f546828/src/core/default_appearance.js (L35)` The reason that this is a problem can be seen in `ca7f546828/src/core/primitives.js (L30-L34)`, since an empty string short-circuits the cache. Essentially, in PDF documents, a /Name-instance cannot be empty and the way that the `DefaultAppearanceEvaluator` does things is unfortunately not entirely correct. Hence the `fontName`-property is changed to instead contain a string, rather than a /Name-instance, which simplifies the code overall. Please note: I'm tagging this patch with "[api-minor]", since PR 12831 is included in the current pre-release (although we're not using the `fontName`-property in the display-layer).	2021-04-01 16:47:30 +02:00
Tim van der Meij	5a64157a2f	Merge pull request #13168 from janpe2/ttf-uni-glyphs Use post table when Encoding has only Differences	2021-03-31 21:35:13 +02:00
Jani Pehkonen	0117ee5071	Use post table when Encoding has only Differences Fixes #13107 In the issue, some TrueType glyph names have the format `uniXXXX`. Font's `Encoding` dictionary has the entry `Differences` but no `BaseEncoding`. `uniXXXX` names are converted to glyph indices using font's `post` table but currently that is done only when `BaseEncoding` exists. We must enable the conversion also when only `Differences` exists.	2021-03-31 17:58:44 +03:00
Jonas Jenwald	db1e1612df	[api-minor] Support proper `URL`-objects, in addition to URL-strings, in `getDocument` Currently only URL-strings are officially supported by `getDocument`, however at this point in time I cannot really see any compelling reason to not support `URL`-objects as well. Most likely the reason that we've don't already support `URL`-objects, in `getDocument`, is that historically `URL` wasn't fully implemented across browsers and our old polyfill wasn't perfect; see https://developer.mozilla.org/en-US/docs/Web/API/URL/URL#browser_compatibility Please note: Because of how the `url` parameter is currently handled, there's actually some cases where passing a `URL`-object to `getDocument` already works. That, in my opinion, provides additional motivation for supporting `URL`-objects officially, since it makes the API more consistent. The following is an attempt to summarize the current situation, based on the actual code rather than the JSDocs: - `getDocument("url string")` works and is documented.[1] - `getDocument({ url: "url string", })` works and is documented.[1] - `getDocument(new URL(...))` throws immediately, since no supported parameters are found. - `getDocument({ url: new URL(...), })` actually works even though it's not documented.[1] Originally, when data was fetched on the worker-thread, this would likely have thrown since `URL` isn't clonable.[2] - `getDocument({ url: { abc: 123, }, })`, or some similarily meaningless input, will be "accepted" by `getDocument` and then throw a `MissingPDFException` when attempting to fetch the bogus data. With the changes in this patch, not only is `URL`-objects now officially supported and documented when calling `getDocument`, but we'll also do a much better job at actually validating any URL-data passed to `getDocument` (and instead fail early). --- [1] In browsers, we create a valid URL thus indirectly validating the input. In Node.js environments, on the other hand, no validation is done since obtaining a baseUrl is more difficult (and PDF.js is primarily written for browsers anyway). [2] https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm#supported_types	2021-03-31 16:21:41 +02:00
calixteman	b3528868c1	XFA - Add support for few ui elements (#13115 ) - input; - layout; - border; - margin; - color.	2021-03-31 15:42:21 +02:00

1 2 3 4 5 ...

2325 Commits