Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Rob Wu	e211c25f06	Improve robustness of stream parser (invalid length) When the parser finds a stream, it retrieves the Length from the stream dictionary and advances the lexer to the offset as specified in Length. If this Length is incorrect, the lexer could end up anywhere. When the lexer gets in an invalid state, it could throw errors. For example, in issue 6108, the lexer ends up inside the stream data. This stream has the ASCIIHexDecode filter, so all characters are made up from ASCII characters, and the lexer interprets it as a command token. Tokens cannot be longer than 127 bytes, so eventually 128 bytes are consumed and the lexer throws "Command token too long" error. Another possible error is "Illegal character: 41" when the lexer happens to end up at a ')' due to the length mismatch. These problems are solved by catching lexer errors and recovering the parser via the existing stream length detection branch.	2015-07-11 20:12:49 +02:00
Rob Wu	456ad438d8	Issue a warning instead of an error for long Names The PDF specification (cited below) specifies a maximum length of a name in bytes as a minimal architectural limit. This means that PDF writers should not create names that exceed 127 bytes. It does not forbid PDF readers to accept such names though. These names are only used internally to link PDF objects to other objects. For these use cases, the lengths of the names do not really matter. Hence I have changed the implementation to not treat long names as errors, but warnings. > (7.3.5) The length of a name shall be subject to an implementation > limit; see Annex C. > > (Annex C.2) Table C.1 describes the minimum architectural limits that > should be accommodated by conforming readers running on 32-bit > machines. Because conforming readers may be subject to these limits, > conforming writers producing PDF files should remain within them. > > (Table C.1) name 127 "Maximum length of a name, in bytes." http://adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf	2015-07-10 16:10:24 +02:00
Jonas Jenwald	eac168f3cc	Refactor searching for end of inline (EI) JPEG image streams This patch changes searching for EI image streams to rely on the EOI (end-of-image) marker for DCTDecode filters (i.e. JPEG images).	2015-01-10 23:55:55 +01:00
Jonas Jenwald	184880a751	Fix searching for end of inline (EI) images with ASCII85Decode filters (bug 1077808) This patch changes searching for the end of inline image streams to rely on the EOD marker for the filters: ASCII85Decode and ASCIIHexDecode.	2014-12-15 18:48:29 +01:00
Yury Delendik	f5df30f967	Merge pull request #5445 from CodingFabian/fixImageCachingInParser Fixes caching of inline images during parsing.	2014-12-15 10:51:23 -06:00
Jonas Jenwald	3e1b5216ac	Refactor searching for the SOI marker of inline JPEG image streams	2014-12-05 17:24:27 +01:00
Fabian Lange	970c048d50	fixes caching of inline images during parsing. As described in #5444, the evaluator will perform identity checking of paintImageMaskXObjects to decide if it can use paintImageMaskXObjectRepeat instead of paintImageMaskXObjectGroup. This can only ever work if the entry is a cache hit. However the previous caching implementation was doing a lazy caching, which would only consider a image cache worthy if it is repeated. Only then the repeated instance would be cached. As a result of this the sequence of identical images A1 A2 A3 A4 would be seen as A1 A2 A2 A2 by the evaluator, which prevents using the "repeat" optimization. Also only the last encountered image is cached, so A1 B1 A2 B2, would stay A1 B1 A2 B2. The new implementation drops the "lazy" init of the cache. The threshold for enabling an image to be cached is rather small, so the potential waste in storage and adler32 calculation is rather low. It also caches any eligible image by its adler32. The two example from above would now be A1 A1 A1 A1 and A1 B1 A1 B1 which not only saves temporary storage, but also prevents computing identical masks over and over again (which is the main performance impact of #2618)	2014-10-28 15:39:41 +01:00
Jonas Jenwald	2003d83ea6	Fix loading of inline JPEG images	2014-09-11 16:42:51 +02:00
Jonas Jenwald	d1974eae34	Add peekByte method to Stream, DecodeStream and ChunkedStream	2014-09-11 16:42:41 +02:00
Yury Delendik	aa8d3d98f8	Fetches params in makeFilter	2014-09-09 08:29:31 -05:00
Rob Wu	07a4837763	CCITTFaxStream parser: resolve xref if needed Fixes #5243	2014-08-31 11:03:24 +02:00
Nicholas Nethercote	ffae848f4e	Reduce ASCII checks in makeInlineImage(). makeInlineImage() has a "are the next five chars ASCII?" check which is run after an "EI" sequence has been found. This check involves the creation of a new object because peekBytes() calls subarray(). Unfortunately, the check is currently run on whitespace chars even when an "EI" sequence has not yet been found, i.e. when it's not needed. For the PDF in #2618, there are over 820,000 such checks. This change reworks the relevant loop so that the check is only done once an "EI" sequence has been seen. This reduces the number of checks to 157,000, and speeds up rendering by somewhere between 2% and 7% (the measurements are noisy).	2014-08-14 16:20:58 -07:00
Jonas Jenwald	7fa204c805	Add strict equalities in src/core/parser.js	2014-08-02 17:37:24 +02:00
Jonas Jenwald	a5c98aab36	Re-factor parsing of the Linearization dictionary	2014-07-27 12:56:09 +02:00
Yury Delendik	fbdab2c7c5	Not ignoring MissingDataException exception.	2014-06-18 18:24:54 -05:00
Jonas Jenwald	ab67e1c272	Let Parser_makeFilter return NullStream when an invalid stream is encountered (issue 3417)	2014-06-17 12:03:34 +02:00
Yury Delendik	0cd28ebfa3	Telemetry for used stream and font types	2014-06-16 16:41:04 -05:00
Yury Delendik	b2d8e73d54	Merge pull request #4895 from p01/Small_optimizations_1 Small optimizations 1	2014-06-10 10:09:12 -05:00
p01	37c9765ab4	Optimized Lexer_getObj 2x faster	2014-06-10 12:37:36 +02:00
Jonas Jenwald	26bbcedcae	Prevent infinite loop when scanning for endstream (bug 1020226)	2014-06-09 22:42:35 +02:00
fkaelberer	f88118dbf9	small optimizations in parser.getObj(), lexer.getObj()	2014-05-23 09:25:36 +02:00
p01	7b68737baa	Strict isEOF / ~22% faster on issue2813, from 16.5s to 13.5s	2014-05-20 12:39:58 +02:00
Rob Wu	2e97c0d085	Remove some unused variables from src/ Only obviously useless, local variables have been removed.	2014-04-15 17:10:23 +02:00
Tim van der Meij	df91acf239	Fixes lint warning W004 in src/core	2014-04-11 00:41:08 +02:00
Yury Delendik	31f081ae17	Doesn't traverse cyclic references in Dict.getAll; reduces empty-Dict garbage	2014-03-26 09:07:38 -05:00
Brendan Dahl	1416eca164	Merge pull request #4493 from yurydelendik/issue4491 Fixes ignoring of the escaped CR LF	2014-03-20 14:57:24 -07:00
Tim van der Meij	284288f1d0	Making src/core/{image,obj,parser}.js adhere to the style guide	2014-03-20 20:28:22 +01:00
Yury Delendik	20a91bcdbf	Fixes ignoring of the escaped CR LF	2014-03-20 11:50:12 -05:00
Yury Delendik	257898b359	Caching inlined mask images	2014-03-13 11:01:34 -05:00
Nicholas Nethercote	b3024db677	Estimate the size of decoded streams in advance. When decoding a stream, the decode buffer is often grown multiple times, its byte size increasing like so: 512, 1024, 2048, etc. This patch estimates the minimum size in advance (using the length of the encoded stream), often allowing the smaller sizes to be skipped. It also renames numerous \|length\| variables as \|maybeLength\| to make it clear that they can be \|null\|. I measured this change on eight documents. This change reduces the cumulative size of decode buffer allocations by 0--32%, with 10--20% being typical. This reduces peak RSS by 10 or 20 MiB for several of them.	2014-03-13 02:06:58 -07:00
Nicholas Nethercote	d0253c8291	Don't get bytes eagerly when creating {Jpeg,Jpx,Jbig2}Stream objects. This avoids lots of unnecessary work when such streams are referred to via fetch(), and so their bytes aren't subsequently read. This is a large performance win on some files.	2014-03-11 16:03:15 -07:00
Tim van der Meij	3df8f89bd4	Fixes off-by-one error when finding missing endstream	2014-03-06 23:57:27 +01:00
Nicholas Nethercote	fdb7c218da	Use a cache to minimize the number of Name objects.	2014-02-27 20:41:03 -08:00
Ophir LOJKINE	4a66eccedc	Rewrite Lexer_getNumber. Now, it computes the numbers with only basic arithmetic operations, without first creating a string and then calling parseFloat. The new function doesn't behave exactly the same as the old one. In particular, the old behaviour was that when there was a number immediatly followed by an 'E', the 'E' was consumed. Now it's not. It allows for "glued" numbers and operators. Also, the new function is faster and consumes less memory.	2014-02-01 21:46:09 +01:00
Nicholas Nethercote	164d7a6e15	Don't create a string when lexing all-digit integers.	2014-01-29 18:22:09 -08:00
Nicholas Nethercote	b64cca0bef	When lexing numbers, look for digits first.	2014-01-29 18:20:53 -08:00
Nicholas Nethercote	c1ef7e4d63	Use Array.join instead of += to build up strings in the Lexer.	2014-01-29 18:19:58 -08:00
Yury Delendik	09f8f951c8	Extracts evaluator preprocessor and refactor text extraction	2014-01-17 07:16:52 -06:00
Jonas	628f4aaf81	Enable loading of PDFs with undefined or missing stream lengths	2013-08-16 16:32:40 +02:00
Brendan Dahl	5ecce4996b	Split files into worker and main thread pieces.	2013-08-12 10:48:06 -07:00

1 2