pdf.js

Author	SHA1	Message	Date
calixteman	af4dc55019	[api-minor] Fix the way to chunk the strings (#13257 ) - Improve chunking in order to fix some bugs where the spaces aren't here: * track the last position where a glyph has been drawn; * when a new glyph (first glyph in a chunk) is added then compare its position with the last saved one and add a space or break: - there are multiple ways to move the glyphs and to avoid to have to deal with all the different possibilities it's a way easier to just compare positions; - and so there is now one function (i.e. "compareWithLastPosition") where all the job is done. - Add some breaks in order to get lines; - Remove the multiple whites spaces: * some spaces were filled with several whites spaces and so it makes harder to find some sequences of words using the search tool; * other pdf readers replace spaces by one white space. Update src/core/evaluator.js Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com> Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>	2021-04-30 14:41:13 +02:00
Jonas Jenwald	2ac4ad3111	Let `ChunkedStream` extend `Stream`, rather than `BaseStream` directly Looking at the `ChunkedStream` implementation, it's basically a "regular" `Stream` but with added functionality in order to deal with fetching/loading of missing data. Hence, by letting `ChunkedStream` extend `Stream`, we can remove some duplicate methods from the `ChunkedStream` class.	2021-04-28 14:05:25 +02:00
Jonas Jenwald	fb0775525e	Stop special-casing the `dict` parameter in the `Jbig2Stream`/`JpegStream`/`JpxStream` constructors For all of the other `DecodeStream`s we're not passing in a `Dict`-instance manually, but instead get it from the `stream`-parameter. Hence there's no particularly good reason, as far as I can tell, to not do the same thing in `Jbig2Stream`/`JpegStream`/`JpxStream` as well.	2021-04-28 13:44:47 +02:00
Jonas Jenwald	67a1cfc1b1	Improve the handling `getBaseStreams`, on the various Stream implementations The way that `getBaseStreams` is currently handled has bothered me from time to time, especially how we're checking if the method exists before calling it. By adding a dummy `BaseStream.getBaseStreams` method, and having the call-sites simply check the return value, we can improve some of the relevant code. Note in particular how the `ObjectLoader._walk` method didn't actually check that the data in question is a Stream instance, and instead only checked the `currentNode` (which could be anything) for the existence of a `getBaseStreams` property.	2021-04-28 13:44:47 +02:00
Jonas Jenwald	67415bfabe	Add an abstract base-class, which all the various Stream implementations inherit from By having an abstract base-class, it becomes a lot clearer exactly which methods/getters are expected to exist on all Stream instances. Furthermore, since a number of the methods are identical for all Stream implementations, this reduces unnecessary code duplication in the `Stream`, `DecodeStream`, and `ChunkedStream` classes. For e.g. `gulp mozcentral`, the built `pdf.worker.js` files decreases from `1 619 329` to `1 616 115` bytes with this patch-series.	2021-04-28 13:44:45 +02:00
Jonas Jenwald	6151b4ecac	Convert `src/core/stream.js` to use standard classes	2021-04-28 13:44:10 +02:00
Jonas Jenwald	29cf415a69	Enable the `no-var` rule in the `src/core/stream.js` file	2021-04-28 10:16:51 +02:00
Jonas Jenwald	b11f012e52	Convert `src/core/decode_stream.js` to use standard classes	2021-04-28 10:16:51 +02:00
Jonas Jenwald	8ce2cae4a7	Enable the `no-var` rule in the `src/core/decode_stream.js` file	2021-04-28 10:16:51 +02:00
Jonas Jenwald	30a22a168d	Move the `DecodeStream` and `StreamsSequenceStream` from `src/core/stream.js` and into its own file	2021-04-28 10:16:51 +02:00
Jonas Jenwald	213e1c389c	Convert `src/core/flate_stream.js` to use standard classes	2021-04-28 10:16:51 +02:00
Jonas Jenwald	aa1deaf93c	Enable the `no-var` rule in the `src/core/flate_stream.js` file	2021-04-28 10:16:51 +02:00
Jonas Jenwald	1e5bf352a5	Move the `FlateStream` from `src/core/stream.js` and into its own file	2021-04-28 10:16:51 +02:00
Jonas Jenwald	40c342ec6c	Convert `src/core/predictor_stream.js` to use standard classes	2021-04-28 10:16:51 +02:00
Jonas Jenwald	b08f9a8182	Enable the `no-var` rule in the `src/core/predictor_stream.js` file	2021-04-28 10:16:51 +02:00
Jonas Jenwald	66d9d83dcb	Move the `PredictorStream` from `src/core/stream.js` and into its own file	2021-04-28 10:16:51 +02:00
Jonas Jenwald	e938c05edb	Convert `src/core/decrypt_stream.js` to use standard classes	2021-04-28 10:16:51 +02:00
Jonas Jenwald	a9476e7dd0	Enable the `no-var` rule in the `src/core/decrypt_stream.js` file	2021-04-28 10:16:51 +02:00
Jonas Jenwald	28b0809e60	Move the `DecryptStream` from `src/core/stream.js` and into its own file	2021-04-28 10:16:51 +02:00
Jonas Jenwald	cdb583b764	Convert `src/core/ascii_85_stream.js` to use standard classes	2021-04-28 10:16:51 +02:00
Jonas Jenwald	f6c7a65202	Enable the `no-var` rule in the `src/core/ascii_85_stream.js` file	2021-04-28 10:16:51 +02:00
Jonas Jenwald	3294d4d5a3	Move the `Ascii85Stream` from `src/core/stream.js` and into its own file	2021-04-28 10:16:51 +02:00
Jonas Jenwald	d2227a7d10	Convert `src/core/ascii_hex_stream.js` to use standard classes	2021-04-28 10:16:50 +02:00
Jonas Jenwald	59591f8788	Enable the `no-var` rule in the `src/core/ascii_hex_stream.js` file	2021-04-28 10:16:50 +02:00
Jonas Jenwald	d63df04854	Move the `AsciiHexStream` from `src/core/stream.js` and into its own file	2021-04-28 10:16:50 +02:00
Jonas Jenwald	704514c7cd	Convert `src/core/run_length_stream.js` to use standard classes	2021-04-28 10:16:50 +02:00
Jonas Jenwald	66b898eb58	Enable the `no-var` rule in the `src/core/run_length_stream.js` file	2021-04-28 10:16:50 +02:00
Jonas Jenwald	342b0c1bbc	Move the `RunLengthStream` from `src/core/stream.js` and into its own file	2021-04-28 10:16:50 +02:00
Jonas Jenwald	1f0685cee6	Convert `src/core/lzw_stream.js` to use standard classes	2021-04-28 10:16:50 +02:00
Jonas Jenwald	1f9b134c6a	Enable the `no-var` rule in the `src/core/src/core/lzw_stream.js` file	2021-04-28 10:16:50 +02:00
Jonas Jenwald	6c1a321500	Move the `LZWStream` from `src/core/stream.js` and into its own file	2021-04-28 10:16:50 +02:00
Tim van der Meij	0acd801b1e	Merge pull request #13305 from timvandermeij/annotation-polygon-polyline-no-appearance-stream Implement rendering polyline/polygon annotations without appearance stream	2021-04-27 20:03:35 +02:00
Tim van der Meij	60ab15427f	Implement rendering polyline/polygon annotations without appearance stream	2021-04-27 19:02:20 +02:00
Jonas Jenwald	0ecb42f4d7	Convert `src/core/jpx_stream.js` to use standard classes	2021-04-27 13:29:09 +02:00
Jonas Jenwald	c51ef1f21f	Convert `src/core/jbig2_stream.js` to use standard classes	2021-04-27 13:29:09 +02:00
Jonas Jenwald	d9c1bf96b6	Convert `src/core/jpeg_stream.js` to use standard classes	2021-04-27 13:29:09 +02:00
Jonas Jenwald	0ca63f94b4	Convert `src/core/ccitt_stream.js` to use standard classes	2021-04-27 13:29:09 +02:00
Jonas Jenwald	8ff213871b	Convert `src/core/ccitt.js` to use standard classes Given that we're using modules, meaning that only explicitly `export`ed things are visible to the outside, it's no longer necessary to wrap all of the code in a closure.	2021-04-27 13:29:09 +02:00
Jonas Jenwald	6f4394fcd8	Support `InkAnnotation`s without appearance streams (issue 13298) (#13301 ) For now, we keep things purposely simple by using straight lines (rather than curves); please see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G11.2096579	2021-04-27 11:49:03 +02:00
Tim van der Meij	270e56dae8	Enable the `no-var` linting rule in `src/core/image.js` This is done automatically with `gulp lint --fix` and the following manual changes: ```diff diff --git a/src/core/image.js b/src/core/image.js index 35c06b8ab..e718b9937 100644 --- a/src/core/image.js +++ b/src/core/image.js @@ -97,7 +97,7 @@ class PDFImage { if (isName(filter)) { switch (filter.name) { case "JPXDecode": - var jpxImage = new JpxImage(); + const jpxImage = new JpxImage(); jpxImage.parseImageProperties(image.stream); image.stream.reset(); ```	2021-04-25 17:40:00 +02:00
Tim van der Meij	16efd09c9f	Enable the `no-var` linting rule in `src/core/worker.js` This is done automatically with `gulp lint --fix` and the following manual changes: ```diff diff --git a/src/core/worker.js b/src/core/worker.js index aec9c1d39..f88691622 100644 --- a/src/core/worker.js +++ b/src/core/worker.js @@ -300,7 +300,7 @@ class WorkerMessageHandler { cachedChunks = []; }; const readPromise = new Promise(function (resolve, reject) { - var readChunk = function ({ value, done }) { + const readChunk = function ({ value, done }) { try { ensureNotTerminated(); if (done) { ```	2021-04-25 17:40:00 +02:00
Tim van der Meij	85659b4cf0	Enable the `no-var` linting rule in `src/core/cmap.js` This is done automatically with `gulp lint --fix` and the following manual changes: ```diff diff --git a/src/core/cmap.js b/src/core/cmap.js index 850275a19..8794726dd 100644 --- a/src/core/cmap.js +++ b/src/core/cmap.js @@ -519,8 +519,8 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() { readHexNumber(num, size) { let last; - let stack = this.tmpBuf, - sp = 0; + const stack = this.tmpBuf; + let sp = 0; do { const b = this.readByte(); if (b < 0) { @@ -603,7 +603,6 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() { const ucs2DataSize = 1; const subitemsCount = stream.readNumber(); - var i; switch (type) { case 0: // codespacerange stream.readHex(start, dataSize); @@ -614,7 +613,7 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() { hexToInt(start, dataSize), hexToInt(end, dataSize) ); - for (i = 1; i < subitemsCount; i++) { + for (let i = 1; i < subitemsCount; i++) { incHex(end, dataSize); stream.readHexNumber(start, dataSize); addHex(start, end, dataSize); @@ -633,7 +632,7 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() { addHex(end, start, dataSize); stream.readNumber(); // code // undefined range, skipping - for (i = 1; i < subitemsCount; i++) { + for (let i = 1; i < subitemsCount; i++) { incHex(end, dataSize); stream.readHexNumber(start, dataSize); addHex(start, end, dataSize); @@ -647,7 +646,7 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() { stream.readHex(char, dataSize); code = stream.readNumber(); cMap.mapOne(hexToInt(char, dataSize), code); - for (i = 1; i < subitemsCount; i++) { + for (let i = 1; i < subitemsCount; i++) { incHex(char, dataSize); if (!sequence) { stream.readHexNumber(tmp, dataSize); @@ -667,7 +666,7 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() { hexToInt(end, dataSize), code ); - for (i = 1; i < subitemsCount; i++) { + for (let i = 1; i < subitemsCount; i++) { incHex(end, dataSize); if (!sequence) { stream.readHexNumber(start, dataSize); @@ -692,7 +691,7 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() { hexToInt(char, ucs2DataSize), hexToStr(charCode, dataSize) ); - for (i = 1; i < subitemsCount; i++) { + for (let i = 1; i < subitemsCount; i++) { incHex(char, ucs2DataSize); if (!sequence) { stream.readHexNumber(tmp, ucs2DataSize); @@ -717,7 +716,7 @@ const BinaryCMapReader = (function BinaryCMapReaderClosure() { hexToInt(end, ucs2DataSize), hexToStr(charCode, dataSize) ); - for (i = 1; i < subitemsCount; i++) { + for (let i = 1; i < subitemsCount; i++) { incHex(end, ucs2DataSize); if (!sequence) { stream.readHexNumber(start, ucs2DataSize); ```	2021-04-25 17:40:00 +02:00
Jonas Jenwald	da22146b95	Replace a bunch of `Array.prototype.forEach()` cases with `for...of` loops instead Using `for...of` is a modern and generally much nicer pattern, since it gets rid of unnecessary callback-functions. (In a couple of spots, a "regular" `for` loop had to be used.)	2021-04-24 13:00:19 +02:00
Jonas Jenwald	4ec0a4fb43	Re-factor the `Catalog._collectJavaScript` method slightly This patch first of all moves all checking/validation into the `appendIfJavaScriptDict` function, to avoid duplicating it in multiple places. Secondly, also removes what's now an outdated/incorrect comment since we have implemented scripting support.	2021-04-23 09:42:32 +02:00
Jonas Jenwald	83f7009e4b	Change `NameOrNumberTree.getAll` to return a `Map` rather than an Object Given that we're (almost) always iterating through the result of the `getAll`-calls, using a `Map` seems nicer overall since it's more suited to iteration compared to a regular Object. Also, add a couple of `Dict`-checks in existing code touched by this patch, since it really cannot hurt to prevent potential errors in a corrupt PDF document.	2021-04-22 13:15:50 +02:00
Jonas Jenwald	57a1ea840f	Ensure that `saveDocument` works if there's no /ID-entry in the PDF document (issue 13279) (#13280 ) First of all, while it should be very unlikely that the /ID-entry is an indirect object, note how we're using `Dict.get` when parsing it e.g. in `PDFDocument.fingerprint`. Hence we definitely should be consistent here, since if the /ID-entry is an indirect object the existing code in `src/core/writer.js` would already fail. Secondly, to fix the referenced issue, we also need to check that the /ID-entry actually is an Array before attempting to access its contents in `src/core/writer.js`. Drive-by change: In the `xrefInfo` object passed to the `incrementalUpdate` function, re-name the `encrypt` property to `encryptRef` since its data is fetched using `Dict.getRaw` (given the names of the other properties fetched similarly).	2021-04-22 12:08:56 +02:00
Brendan Dahl	066cbcfb27	Merge pull request #13277 from Snuffleupagus/adjustToUnicode-cff For CFF fonts without proper `ToUnicode`/`Encoding` data, utilize the "charset"/"Encoding"-data from the font file to improve text-selection (issue 13260)	2021-04-21 10:41:36 -07:00
Jonas Jenwald	7fab73ed23	For CFF fonts without proper `ToUnicode`/`Encoding` data, utilize the "charset"/"Encoding"-data from the font file to improve text-selection (issue 13260) This patch extends the approach, implemented in PR 7550, to also apply to CFF fonts.	2021-04-20 20:48:44 +02:00
Jonas Jenwald	8f6543c218	Ensure that the /Properties, used with optional content, is actually loaded before parsing the operatorList/textContent (PR 12095 follow-up) By not waiting for the /Properties to load, before parsing of the operatorList/textContent starts, there's a very real risk that a `MissingDataException` will be thrown when trying to access the data in the `PartialEvaluator.parseMarkedContentProps` method. If this ever happens it will thus lead to incomplete and/or outright broken rendering, and with e.g. `disableAutoFetch=true` set the likelihood of this occuring would increase quite a bit. Please note: While I've not yet seen this error in an actual PDF document, it can happen during loading if you're unlucky enough with e.g. the structure of the PDF document and/or the download speed offered by the server.	2021-04-20 20:22:44 +02:00
Jonas Jenwald	f560fe6875	A couple of small scripting/XFA-related tweaks in the worker-code - Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible. - Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.) - Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be. Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.) - Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.	2021-04-17 10:34:22 +02:00

... 4 5 6 7 8 ...

2277 Commits