pdf.js

Author	SHA1	Message	Date
Tim van der Meij	0acd801b1e	Merge pull request #13305 from timvandermeij/annotation-polygon-polyline-no-appearance-stream Implement rendering polyline/polygon annotations without appearance stream	2021-04-27 20:03:35 +02:00
Tim van der Meij	60ab15427f	Implement rendering polyline/polygon annotations without appearance stream	2021-04-27 19:02:20 +02:00
Jonas Jenwald	0ecb42f4d7	Convert `src/core/jpx_stream.js` to use standard classes	2021-04-27 13:29:09 +02:00
Jonas Jenwald	c51ef1f21f	Convert `src/core/jbig2_stream.js` to use standard classes	2021-04-27 13:29:09 +02:00
Jonas Jenwald	d9c1bf96b6	Convert `src/core/jpeg_stream.js` to use standard classes	2021-04-27 13:29:09 +02:00
Jonas Jenwald	0ca63f94b4	Convert `src/core/ccitt_stream.js` to use standard classes	2021-04-27 13:29:09 +02:00
Jonas Jenwald	8ff213871b	Convert `src/core/ccitt.js` to use standard classes Given that we're using modules, meaning that only explicitly `export`ed things are visible to the outside, it's no longer necessary to wrap all of the code in a closure.	2021-04-27 13:29:09 +02:00
Jonas Jenwald	6f4394fcd8	Support `InkAnnotation`s without appearance streams (issue 13298) (#13301 ) For now, we keep things purposely simple by using straight lines (rather than curves); please see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G11.2096579	2021-04-27 11:49:03 +02:00
Tim van der Meij	270e56dae8	Enable the `no-var` linting rule in `src/core/image.js` This is done automatically with `gulp lint --fix` and the following manual changes: ```diff diff --git a/src/core/image.js b/src/core/image.js index 35c06b8ab..e718b9937 100644 --- a/src/core/image.js +++ b/src/core/image.js @@ -97,7 +97,7 @@ class PDFImage { if (isName(filter)) { switch (filter.name) { case "JPXDecode": - var jpxImage = new JpxImage(); + const jpxImage = new JpxImage(); jpxImage.parseImageProperties(image.stream); image.stream.reset(); ```	2021-04-25 17:40:00 +02:00
Tim van der Meij	16efd09c9f	Enable the `no-var` linting rule in `src/core/worker.js` This is done automatically with `gulp lint --fix` and the following manual changes: ```diff diff --git a/src/core/worker.js b/src/core/worker.js index aec9c1d39..f88691622 100644 --- a/src/core/worker.js +++ b/src/core/worker.js @@ -300,7 +300,7 @@ class WorkerMessageHandler { cachedChunks = []; }; const readPromise = new Promise(function (resolve, reject) { - var readChunk = function ({ value, done }) { + const readChunk = function ({ value, done }) { try { ensureNotTerminated(); if (done) { ```	2021-04-25 17:40:00 +02:00
Tim van der Meij	85659b4cf0	Enable the `no-var` linting rule in `src/core/cmap.js` This is done automatically with `gulp lint --fix` and the following manual changes: ```diff diff --git a/src/core/cmap.js b/src/core/cmap.js index 850275a19..8794726dd 100644 --- a/src/core/cmap.js +++ b/src/core/cmap.js @@ -519,8 +519,8 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() { readHexNumber(num, size) { let last; - let stack = this.tmpBuf, - sp = 0; + const stack = this.tmpBuf; + let sp = 0; do { const b = this.readByte(); if (b < 0) { @@ -603,7 +603,6 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() { const ucs2DataSize = 1; const subitemsCount = stream.readNumber(); - var i; switch (type) { case 0: // codespacerange stream.readHex(start, dataSize); @@ -614,7 +613,7 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() { hexToInt(start, dataSize), hexToInt(end, dataSize) ); - for (i = 1; i < subitemsCount; i++) { + for (let i = 1; i < subitemsCount; i++) { incHex(end, dataSize); stream.readHexNumber(start, dataSize); addHex(start, end, dataSize); @@ -633,7 +632,7 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() { addHex(end, start, dataSize); stream.readNumber(); // code // undefined range, skipping - for (i = 1; i < subitemsCount; i++) { + for (let i = 1; i < subitemsCount; i++) { incHex(end, dataSize); stream.readHexNumber(start, dataSize); addHex(start, end, dataSize); @@ -647,7 +646,7 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() { stream.readHex(char, dataSize); code = stream.readNumber(); cMap.mapOne(hexToInt(char, dataSize), code); - for (i = 1; i < subitemsCount; i++) { + for (let i = 1; i < subitemsCount; i++) { incHex(char, dataSize); if (!sequence) { stream.readHexNumber(tmp, dataSize); @@ -667,7 +666,7 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() { hexToInt(end, dataSize), code ); - for (i = 1; i < subitemsCount; i++) { + for (let i = 1; i < subitemsCount; i++) { incHex(end, dataSize); if (!sequence) { stream.readHexNumber(start, dataSize); @@ -692,7 +691,7 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() { hexToInt(char, ucs2DataSize), hexToStr(charCode, dataSize) ); - for (i = 1; i < subitemsCount; i++) { + for (let i = 1; i < subitemsCount; i++) { incHex(char, ucs2DataSize); if (!sequence) { stream.readHexNumber(tmp, ucs2DataSize); @@ -717,7 +716,7 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() { hexToInt(end, ucs2DataSize), hexToStr(charCode, dataSize) ); - for (i = 1; i < subitemsCount; i++) { + for (let i = 1; i < subitemsCount; i++) { incHex(end, ucs2DataSize); if (!sequence) { stream.readHexNumber(start, ucs2DataSize); ```	2021-04-25 17:40:00 +02:00
Jonas Jenwald	da22146b95	Replace a bunch of `Array.prototype.forEach()` cases with `for...of` loops instead Using `for...of` is a modern and generally much nicer pattern, since it gets rid of unnecessary callback-functions. (In a couple of spots, a "regular" `for` loop had to be used.)	2021-04-24 13:00:19 +02:00
Jonas Jenwald	4ec0a4fb43	Re-factor the `Catalog._collectJavaScript` method slightly This patch first of all moves all checking/validation into the `appendIfJavaScriptDict` function, to avoid duplicating it in multiple places. Secondly, also removes what's now an outdated/incorrect comment since we have implemented scripting support.	2021-04-23 09:42:32 +02:00
Jonas Jenwald	83f7009e4b	Change `NameOrNumberTree.getAll` to return a `Map` rather than an Object Given that we're (almost) always iterating through the result of the `getAll`-calls, using a `Map` seems nicer overall since it's more suited to iteration compared to a regular Object. Also, add a couple of `Dict`-checks in existing code touched by this patch, since it really cannot hurt to prevent potential errors in a corrupt PDF document.	2021-04-22 13:15:50 +02:00
Jonas Jenwald	57a1ea840f	Ensure that `saveDocument` works if there's no /ID-entry in the PDF document (issue 13279) (#13280 ) First of all, while it should be very unlikely that the /ID-entry is an indirect object, note how we're using `Dict.get` when parsing it e.g. in `PDFDocument.fingerprint`. Hence we definitely should be consistent here, since if the /ID-entry is an indirect object the existing code in `src/core/writer.js` would already fail. Secondly, to fix the referenced issue, we also need to check that the /ID-entry actually is an Array before attempting to access its contents in `src/core/writer.js`. Drive-by change: In the `xrefInfo` object passed to the `incrementalUpdate` function, re-name the `encrypt` property to `encryptRef` since its data is fetched using `Dict.getRaw` (given the names of the other properties fetched similarly).	2021-04-22 12:08:56 +02:00
Brendan Dahl	066cbcfb27	Merge pull request #13277 from Snuffleupagus/adjustToUnicode-cff For CFF fonts without proper `ToUnicode`/`Encoding` data, utilize the "charset"/"Encoding"-data from the font file to improve text-selection (issue 13260)	2021-04-21 10:41:36 -07:00
Jonas Jenwald	7fab73ed23	For CFF fonts without proper `ToUnicode`/`Encoding` data, utilize the "charset"/"Encoding"-data from the font file to improve text-selection (issue 13260) This patch extends the approach, implemented in PR 7550, to also apply to CFF fonts.	2021-04-20 20:48:44 +02:00
Jonas Jenwald	8f6543c218	Ensure that the /Properties, used with optional content, is actually loaded before parsing the operatorList/textContent (PR 12095 follow-up) By not waiting for the /Properties to load, before parsing of the operatorList/textContent starts, there's a very real risk that a `MissingDataException` will be thrown when trying to access the data in the `PartialEvaluator.parseMarkedContentProps` method. If this ever happens it will thus lead to incomplete and/or outright broken rendering, and with e.g. `disableAutoFetch=true` set the likelihood of this occuring would increase quite a bit. Please note: While I've not yet seen this error in an actual PDF document, it can happen during loading if you're unlucky enough with e.g. the structure of the PDF document and/or the download speed offered by the server.	2021-04-20 20:22:44 +02:00
Jonas Jenwald	f560fe6875	A couple of small scripting/XFA-related tweaks in the worker-code - Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible. - Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.) - Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be. Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.) - Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.	2021-04-17 10:34:22 +02:00
Brendan Dahl	ac3fa1e3d7	Merge pull request #13146 from calixteman/xfa_fonts XFA -- Load fonts permanently from the pdf	2021-04-16 12:55:12 -07:00
Calixte Denizet	7e9579045f	XFA -- Load fonts permanently from the pdf - Different fonts can be used in xfa and some of them are embedded in the pdf. - Load all the fonts in window.document. Update src/core/document.js Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com> Update src/core/worker.js Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>	2021-04-15 17:57:42 +02:00
Jani Pehkonen	3a96977ea8	Implement visibility expressions for optional content	2021-04-14 17:39:41 +03:00
Jonas Jenwald	1d6d476cab	Rename the `src/core/obj.js` file to `src/core/catalog.js` Now that only the `Catalog` remains in this file, after the previous patches, it makes sense to rename the file to reduce confusion.	2021-04-13 21:00:30 +02:00
Jonas Jenwald	088a55f80d	Enable the `no-var` rule in the `src/core/xref.js` file	2021-04-13 21:00:30 +02:00
Jonas Jenwald	bc828cd41f	Convert the `XRef` to a "normal" class	2021-04-13 21:00:30 +02:00
Jonas Jenwald	e8750cfe95	Move the `XRef` from `src/core/obj.js` and into its own file The size of the `src/core/obj.js` file has increased slowly over the years, and it also contains a fair amount of distinct functionality. In order to improve readability and make it easier to navigate through the code, this patch moves the `XRef` into its own file.	2021-04-13 21:00:30 +02:00
Jonas Jenwald	24e5ecdf76	Move `NameTree`/`NumberTree` from `src/core/obj.js` and into its own file The size of the `src/core/obj.js` file has increased slowly over the years, and it also contains a fair amount of distinct functionality. In order to improve readability and make it easier to navigate through the code, this patch moves `NameTree`/`NumberTree` into its own file.	2021-04-13 21:00:30 +02:00
Jonas Jenwald	92141e0468	Enable the `no-var` rule in the `src/core/file_spec.js` file	2021-04-13 21:00:30 +02:00
Jonas Jenwald	22a066e657	Convert the `FileSpec` to a "normal" class	2021-04-13 21:00:30 +02:00
Jonas Jenwald	e02d17da93	Move the `FileSpec` from `src/core/obj.js` and into its own file The size of the `src/core/obj.js` file has increased slowly over the years, and it also contains a fair amount of distinct functionality. In order to improve readability and make it easier to navigate through the code, this patch moves the `FileSpec` into its own file.	2021-04-13 21:00:30 +02:00
Jonas Jenwald	6a935682fd	Covert the `ObjectLoader` to a "normal" class	2021-04-13 21:00:30 +02:00
Jonas Jenwald	604cd6d600	Move the `ObjectLoader` from `src/core/obj.js` and into its own file The size of the `src/core/obj.js` file has increased slowly over the years, and it also contains a fair amount of distinct functionality. In order to improve readability and make it easier to navigate through the code, this patch moves the `ObjectLoader` into its own file.	2021-04-13 21:00:30 +02:00
Tim van der Meij	ebeb3f7999	Merge pull request #13234 from Snuffleupagus/hasJSActions-MissingDataException [api-minor] Ensure that `PDFDocumentProxy.hasJSActions` won't fail if `MissingDataException`s are thrown during the associated worker-thread parsing	2021-04-13 20:44:58 +02:00
Tim van der Meij	3d2d8002b0	Merge pull request #13223 from Snuffleupagus/worker-xfa-structTree-tweaks Remove the unused "GetIsPureXfa" message handler; and avoid unnecessary parsing when no structTree is available (PR 13069 follow-up, PR 13221 follow-up)	2021-04-13 20:39:52 +02:00
Jonas Jenwald	2b2234fd5a	[api-minor] Ensure that `PDFDocumentProxy.hasJSActions` won't fail if `MissingDataException`s are thrown during the associated worker-thread parsing With the current implementation of `PDFDocument.hasJSActions`, in the worker-thread, we're not actually handling not-yet-loaded data correctly. This can thus fail in two different ways: - The `PDFDocument.fieldObjects` getter (and its helper method), while it may return a Promise, still fetches all of its data synchronously and it can thus throw a `MissingDataException` during parsing. - The `Catalog.jsActions` getter, which is completely synchronous, can obviously throw a `MissingDataException` during parsing. If either of these cases occur currently, the `PDFDocumentProxy.hasJSActions` method in the API can either return a rejected Promise (which it never should) or possibly "hang" and never resolve. Please note: While I've not yet seen this error in an actual PDF document, it can happen during loading if you're unlucky enough with e.g. the structure of the PDF document and/or the download speed offered by the server. This patch is thus based on code-inspection and on manually throwing a `MissingDataException` on the first access of `Catalog.jsActions` to simulate this situation. Finally, this patch adds a couple of API unit-tests for this (since none existed).	2021-04-13 14:33:56 +02:00
Jonas Jenwald	4aa27cc645	Re-factor `Catalog._collectJavaScript` to use a `Map` rather than an Object Given that this only an internal helper method, used by the `Catalog.{javaScript, jsActions}` getters, this change simplifies iteration of the returned data. We can also (slightly) re-factor the code of the `jsActions` getter, and remove an obsolete[1] JSDoc-comment from the `openAction` getter. --- [1] Not really relevant now that we've got proper scripting support.	2021-04-13 14:16:17 +02:00
Calixte Denizet	a4c986515f	XFA -- Display text content - display xhtml; - allow spaces in xhtml (xfa-spacerun:yes); - support column layout; - fix some border issues.	2021-04-12 14:13:49 +02:00
Jonas Jenwald	54ef4370a2	Ensure that the data is loaded, in the "GetPageJSActions" message handler Similar to all other data accesses, note e.g. the "GetDocJSActions" handler just above, we need to ensure that a `MissingDataException` isn't propagated to the main-thread if this data is accessed while the PDF document is still loading.	2021-04-12 13:54:37 +02:00
Jonas Jenwald	9360c7cbdc	Avoid unnecessary parsing, in `Page.GetStructTree`, when no structTree is available (PR 13221 follow-up) It's obviously (a bit) more efficient to return early in `Page.getStructTree`, rather than trying to first "parse" an empty structTree-root. Somehow I didn't think of this yesterday, but this feels like a much better solution overall; sorry about the churn here!	2021-04-12 08:54:21 +02:00
Jonas Jenwald	0d2dd6c2fe	Remove the unused "GetIsPureXfa" message handler in the worker (PR 13069 follow-up) Looking at the API, there's no code which actually sends this message. Most likely it's a left-over from a previous version of PR 13069, since the `isPureXfa` parameter is being included in the "GetDoc" message.	2021-04-12 08:52:27 +02:00
Jonas Jenwald	5adee0cdd1	[api-minor] Let `PDFPageProxy.getStructTree` return `null`, rather than an empty structTree, for documents without any accessibility data (PR 13171 follow-up) This is first of all consistent with existing API-methods, where we return `null` when the data in question doesn't exist. Secondly, it should also be (slightly) more efficient since there's less dummy-data that we need to transfer between threads. Finally, this prevents us from adding an empty/unnecessary span to every single page even in documents without any structure tree data.	2021-04-11 12:35:33 +02:00
Jonas Jenwald	ff4dae05b0	Ensure that `getStructTree` won't break with `disableAutoFetch = true` set (PR 13171 follow-up) Open http://localhost:8888/web/viewer.html?file=/test/pdfs/pdf.pdf#disableStream=true&disableAutoFetch=true and observe the following message in the console (repeated for each page of the document): ``` Uncaught (in promise) Object { message: "Missing data [19787293, 19787294)", name: "UnknownErrorException", details: "MissingDataException: Missing data [19787293, 19787294)", stack: "BaseExceptionClosure@http://localhost:8888/src/shared/util.js:458:29\n@http://localhost:8888/src/shared/util.js:462:3\n" } ```	2021-04-11 12:15:33 +02:00
Tim van der Meij	d9d626a5e1	Merge pull request #13214 from calixteman/signatures Display widget signature	2021-04-10 19:35:16 +02:00
Calixte Denizet	5875ebb1ca	Display widget signature - but don't validate them for now; - Firefox will display a bar to warn that the signature validation is not supported (see https://bugzilla.mozilla.org/show_bug.cgi?id=854315) - almost all (all ?) pdf readers display signatures; - validation is done in edge but for now it's behind a pref.	2021-04-10 19:13:28 +02:00
Brendan Dahl	fc9501a637	Add support for basic structure tree for accessibility. When a PDF is "marked" we now generate a separate DOM that represents the structure tree from the PDF. This DOM is inserted into the <canvas> element and allows screen readers to walk the tree and have more information about headings, images, links, etc. To link the structure tree DOM (which is empty) to the text layer aria-owns is used. This required modifying the text layer creation so that marked items are now tracked.	2021-04-09 09:56:28 -07:00
Tim van der Meij	6429ccc002	Merge pull request #13194 from Snuffleupagus/ttcf-fuzzy-match Fuzzy-match the fontName, for TrueType Collection fonts, where the "name"-table is wrong (issue 13193)	2021-04-07 20:50:19 +02:00
Jonas Jenwald	f986ccdf0e	Fuzzy-match the fontName, for TrueType Collection fonts, where the "name"-table is wrong (issue 13193) The fontName, as defined in the PDF document, cannot be found in any of the "name"-tables in the TrueType Collection font. To work-around that, this patch adds a fallback code-path to allow using an approximately matching fontName rather than outright failing.	2021-04-07 15:25:32 +02:00
Tim van der Meij	fc0cd4a443	Convert the `startXRefParsedCache` variable, in `src/core/obj.js`, from an object to a set We only want to track XRef starting points instead of actual data, so using a set conveys that intention more clearly and is slightly more efficient.	2021-04-05 19:32:58 +02:00
Jonas Jenwald	68d3a333ac	Change the `seenStyles` object, in `PartialEvaluator.getTextContent`, to a Set Given that what we actually want is only to keep track of the loadedFont-names, rather than storing any actual data, using an object isn't really necessary here. Furthermore, in the current code, we're also using `in` when checking if the data exists, which is generally less efficient than just checking for the value directly.	2021-04-05 10:34:02 +02:00
Jonas Jenwald	0eb1433c78	[api-minor] Change the format of the `fontName`-property, in `defaultAppearanceData`, on Annotation-instances (PR 12831 follow-up) Currently the `fontName`-property contains an actual /Name-instance, which is a problem given that its fallback value is an empty string; see `ca7f546828/src/core/default_appearance.js (L35)` The reason that this is a problem can be seen in `ca7f546828/src/core/primitives.js (L30-L34)`, since an empty string short-circuits the cache. Essentially, in PDF documents, a /Name-instance cannot be empty and the way that the `DefaultAppearanceEvaluator` does things is unfortunately not entirely correct. Hence the `fontName`-property is changed to instead contain a string, rather than a /Name-instance, which simplifies the code overall. Please note: I'm tagging this patch with "[api-minor]", since PR 12831 is included in the current pre-release (although we're not using the `fontName`-property in the display-layer).	2021-04-01 16:47:30 +02:00

1 2 3 4 5 ...

1996 Commits