Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	40feca12c1	Ignore line-breaks between operator and digit in `Lexer.getNumber` This is consistent with the behaviour in Adobe Reader (and PDFium), and it fixes the display of page 30 in https://bug1354114.bmoattachments.org/attachment.cgi?id=8855457 (taken from https://bugzilla.mozilla.org/show_bug.cgi?id=1354114). The patch also makes the `error` message for invalid numbers slightly more useful, by including the charCode as well. (Having that information available would have reduced the time spent on debugging the PDF file above.)	2017-05-02 20:59:42 +02:00
Jonas Jenwald	afc74b0178	Enable the `object-shorthand` ESLint rule in `src/shared` Please see http://eslint.org/docs/rules/object-shorthand. For the most part, these changes are of the search-and-replace kind, and the previously enabled `no-undef` rule should complement the tests in helping ensure that no stupid errors crept into to the patch.	2017-04-27 17:29:40 +02:00
Jonas Jenwald	23c62cc321	Consume the current character when encountering illegal characters in `Lexer.getObject`, in order to prevent infinite loops during reading of streams (issue 8061) Please note: The rendering of the PDF file in issue 8061 first regressed in PR 7039, and then PR 7493 exacerbated the problem even further by causing an infinite loop. In this particular case, when errors were encountered inside of the `Lexer.getObject` method itself, we didn't advance the stream position. This thus caused an inifinite loop in `parseCMap`, since the exact same character was then parsed over and over again. Fixes 8061.	2017-02-11 19:32:48 +01:00
Jonas Jenwald	50c2856097	Move `EOF`/`isEOF` from core/parser.js to core/primitives.js Given the nature of `EOF` and `isEOF`, it seems to me that they really ought to be placed in `core/primitives.js` instead. In general, it doesn't seem great to have to depend on the entire `core/parser.js` file for such simple primitives/helper functions. In particular, while `core/ps_parser.js` is completely separate from `core/parser.js` with regards to its function, it still depends on the latter for just one primitive. Note that compared to e.g. PR 7389, this will not reduce the number of dependencies for `core/ps_parser`, however the new dependency IMHO makes more sense.	2017-01-27 13:37:48 +01:00
Jonas Jenwald	2f3805efbc	Switch to using ESLint, instead of JSHint, for linting Please note that most of the necessary code adjustments were made in PR 7890. ESLint has a number of advantageous properties, compared to JSHint. Among those are: - The ability to find subtle bugs, thanks to more rules (e.g. PR 7881). - Much more customizable in general, and many rules allow fine-tuned behaviour rather than the just the on/off rules in JSHint. - Many more rules that can help developers avoid bugs, and a lot of rules that can be used to enforce a consistent coding style. The latter should be particularily useful for new contributors (and reduce the amount of stylistic review comments necessary). - The ability to easily specify exactly what rules to use/not to use, as opposed to JSHint which has a default set. Note: in future JSHint version some of the rules we depend on will be removed, according to warnings in http://jshint.com/docs/options/, so we wouldn't be able to update without losing lint coverage. - More easily disable one, or more, rules temporarily. In JSHint this requires using a numeric code, which isn't very user friendly, whereas in ESLint the rule name is simply used instead. By default there's no rules enabled in ESLint, but there are some default rule sets available. However, to prevent linting failures if we update ESLint in the future, it seemed easier to just explicitly specify what rules we want. Obviously this makes the ESLint config file somewhat bigger than the old JSHint config file, but given how rarely that one has been updated over the years I don't think that matters too much. I've tried, to the best of my ability, to ensure that we enable the same rules for ESLint that we had for JSHint. Furthermore, I've also enabled a number of rules that seemed to make sense, both to catch possible errors and various style guide violations. Despite the ESLint README claiming that it's slower that JSHint, https://github.com/eslint/eslint#how-does-eslint-performance-compare-to-jshint, locally this patch actually reduces the runtime for `gulp` lint (by approximately 20-25%). A couple of stylistic rules that would have been nice to enable, but where our code currently differs to much to make it feasible: - `comma-dangle`, controls trailing commas in Objects and Arrays (among others). - `object-curly-spacing`, controls spacing inside of Objects. - `spaced-comment`, used to enforce spaces after `//` and `/*. (This is made difficult by the fact that there's still some usage of the old preprocessor left.) Rules that I indend to look into possibly enabling in follow-ups, if it seems to make sense: `no-else-return`, `no-lonely-if`, `brace-style` with the `allowSingleLine` parameter removed. Useful links: - http://eslint.org/docs/user-guide/configuring - http://eslint.org/docs/rules/	2016-12-16 21:06:36 +01:00
Jonas Jenwald	28e50cfa21	Fix errors reported by the `space-infix-ops` ESLint rule http://eslint.org/docs/rules/space-infix-ops	2016-12-12 20:36:00 +01:00
Jonas Jenwald	b4ac6bd2f6	Ensure that we resolve indirect objects in `Filter` and `DecodeParms` arrays in `parser.js` I've not actually, thus far, come across a PDF file that this patch fixes. However, given the string of recent patches that has fixed issues with indirect objects in arrays, I think that it makes sense to proactively avoid any issues in this code.	2016-12-08 11:55:08 +01:00
Jonas Jenwald	c8f83d6487	Let `Parser_makeFilter` pass in the `DecodeParms` data to various image `Stream`s, instead of re-fetching it in various `[...]Stream.prototype.ensureBuffer` methods In `Parser_filter` the `DecodeParms` data is fetched and passed to `Parser_makeFilter`, where we also make sure that a `Ref` is resolved to a direct object. We can thus pass this along to the various image `Stream` constructors, to avoid the current situation where we lookup/resolve data that is already available. Note also that we currently do not handle the case where `DecodeParms` is an Array entirely correct in the various image `Stream`s, and this patch fixes that for free.	2016-10-15 12:09:51 +02:00
Yury Delendik	7b2a9ee4e0	Merge pull request #7670 from Snuffleupagus/Parser_makeFilter-maybeLength Only skip parsing a stream in `Parser_makeFilter` when we know for sure that it is empty (PR 6372 follow-up)	2016-10-05 10:38:12 -05:00
Jonas Jenwald	116ba19dd9	Respect the 'ColorTransform' entry in the image dictionary when decoding JPEG images (bug 956965, issue 6574) Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=956965. Fixes 6574.	2016-09-26 21:55:43 +02:00
Jonas Jenwald	a22f0ae820	Only skip parsing a stream in `Parser_makeFilter` when we know for sure that it is empty (PR 6372 follow-up) For PDF files with multiple `/Filter`s, where the `/Length` entry is zero, we fail to render the file correctly. The reason is that `maybeLength` is `null` for the every filter except the first, and `!maybeLength` is thus truthy. Hence it seems that we should completely ignore the `/Length` entry and also explicitly check `maybeLength === 0`. Note that I've not (yet) come across a PDF file with this issue in the wild, but given all the stupid things PDF generators do I wouldn't be surprised if such a file actually exists. In order to prevent a possible future bug, I'm submitting this patch which includes a hand-edited PDF file that we currently cannot render correctly (but e.g. Adobe Reader can).	2016-09-25 12:40:15 +02:00
Jonas Jenwald	544d29f5cb	Add a `recoveryMode` that suppresses errors from the `Parser`, and utilize it when searching for the main trailer in `XRef_indexObjects` (bug 1250079) Instead of having `Parser_getObj` fail unconditionally for the referenced PDF file, this patch attempts to let searching for the main trailer continue even if there are errors. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1250079.	2016-08-17 12:37:35 +02:00
klemens	6f03f62327	trivial spelling fixes	2016-07-17 14:33:41 +02:00
Jonas Jenwald	a36a946976	Move the `isSpace` utility function from core/parser.js to shared/util.js Currently the `isSpace` utility function is a member of `Lexer`, which seems suboptimal, given that it's placed in `core/parser.js`. In practice, this means that in a number of `core/.js` files we thus have an otherwise* completely unnecessary dependency on `core/parser.js` for a one-line function. Instead, this patch moves `isSpace` into `shared/util.js` which seems more appropriate for this kind of utility function. Not to mention that since all the affected `core/*.js` files already depends on `shared/util.js`, this doesn't incur any more file dependencies.	2016-06-06 09:11:33 +02:00
Yury Delendik	6038c236b2	Removes core/stream circular dependency on core/parser.	2016-03-22 14:06:01 -05:00
Yury Delendik	825a2225ab	Merge pull request #6915 from yurydelendik/lookuptables Refactor lookup hash tables/objects	2016-01-28 15:01:06 -06:00
Yury Delendik	2edf2792dc	Replaces literal {} created lookup tables with Object.create	2016-01-28 12:18:38 -06:00
Jonas Jenwald	15ce96a6eb	Prevent failures in the "scanning for endstream" code, in `Parser_makeStream`, by handling the case where 'endstream' is split between contiguous chunks (issue 1536)	2016-01-26 09:03:51 +01:00
Yury Delendik	6b60c8f4db	Adds UMD headers to core, display and shared files.	2015-12-15 13:24:39 -06:00
Jonas Jenwald	995e1a45b8	Ensure that `Lexer_getName` does not fail if a `Name` contains in invalid usage of the NUMBER SIGN (#) (issue 6692) This is a regression from PR 3424. The PDF file in the referenced issue is using `Type3` fonts. In one of those, the `/CharProcs` dictionary contains an entry with the name `/#`. Before the changes to `Lexer_getName` in PR 3424, we were allowing certain invalid `Name` patterns containing the NUMBER SIGN (#). It's unfortunate that this has been broken for close to two and a half years before the bug surfaced, but it should at least indicate that this is not a widespread issue. Fixes 6692.	2015-11-28 11:59:09 +01:00
Manas	a2ba1b8189	Uses editorconfig to maintain consistent coding styles Removes the following as they unnecessary /* -- Mode: Java; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -- / / vim: set shiftwidth=2 tabstop=2 autoindent cindent expandtab: */	2015-11-14 07:32:18 +05:30
Jonas Jenwald	b1d148a4aa	Remove `Parser_fetchIfRef` since it's obsolete This code was added in PR 1214, but was made obsolete by PRs 1488/1493. Prior to the latter ones, `Dict_get` retured the raw objects. However, afterwards (and currently) `Dict_get` now resolves indirect objects, which makes `Parser_fetchIfRef` redundant. Potential risks with this patch: This patch passes all tests locally, but there's a small possibility that it could break some weird PDF files. In the current code, wrapping `Dict_get` inside `Parser_fetchIfRef` will potentially mean two back-to-back call of `XRef_fetch`, if a reference points directly to another reference. I'm not sure if this can actually happen in practice, and I'd think that if that were the case we'd already have run into it elsewhere in the code-base, given that `Parser` is the only place where we try to "double" resolve references.	2015-09-02 23:11:00 +02:00
Tim van der Meij	b42b894570	Merge pull request #6386 from Snuffleupagus/Parser_makeFilter-warn-on-empty-stream Add a warning when we encounter an empty stream in `Parser_makeFilter`	2015-08-30 23:14:22 +02:00
Rob Wu	582573b96b	Merge pull request #6358 from Snuffleupagus/Parser_tryShift-missingDataException Don't catch `MissingDataException` in `Parser_tryShift`	2015-08-27 14:46:24 +02:00
Jonas Jenwald	f814fdc215	Add a warning when we encounter an empty stream in `Parser_makeFilter` Having a warning here would have meant that issue 6360 could have been solved in approximately five minutes, instead of an hour. To avoid that happening again, this patch adds a warning whenever we treat a stream as empty.	2015-08-26 20:14:30 +02:00
Jonas Jenwald	5128603f64	Also check `maybeLength` when deciding if a stream is empty in `Parser_makeFilter` (issue 6360) The problem with the PDF files in the issue, besides the obviously broken XRef tables which we're able to recover from, is that many/most of the streams have Dictionaries where the `Length` entry is set to `0`. This causes us to return `NullStream`, instead of the appropriate one in `Parser_makeFilter`. Fixes 6360.	2015-08-20 23:04:18 +02:00
Jonas Jenwald	8c3b8238ac	Don't catch `MissingDataException` in `Parser_tryShift` I overlooked this while reviewing PR 6197, but I don't think that we should be catching that particular kind of exception here; hence this patch.	2015-08-16 11:35:54 +02:00
Jonas Jenwald	c718d1ab10	Ignore double negative in `Lexer_getNumber` (issue 6218) Basic mathematics would suggest that a double negative should always become positive, but it appears that Adobe Reader simply ignores that case. Hence I think that it makes sense for us to do the same. Fixes 6218.	2015-07-16 12:11:49 +02:00
Rob Wu	e211c25f06	Improve robustness of stream parser (invalid length) When the parser finds a stream, it retrieves the Length from the stream dictionary and advances the lexer to the offset as specified in Length. If this Length is incorrect, the lexer could end up anywhere. When the lexer gets in an invalid state, it could throw errors. For example, in issue 6108, the lexer ends up inside the stream data. This stream has the ASCIIHexDecode filter, so all characters are made up from ASCII characters, and the lexer interprets it as a command token. Tokens cannot be longer than 127 bytes, so eventually 128 bytes are consumed and the lexer throws "Command token too long" error. Another possible error is "Illegal character: 41" when the lexer happens to end up at a ')' due to the length mismatch. These problems are solved by catching lexer errors and recovering the parser via the existing stream length detection branch.	2015-07-11 20:12:49 +02:00
Rob Wu	456ad438d8	Issue a warning instead of an error for long Names The PDF specification (cited below) specifies a maximum length of a name in bytes as a minimal architectural limit. This means that PDF writers should not create names that exceed 127 bytes. It does not forbid PDF readers to accept such names though. These names are only used internally to link PDF objects to other objects. For these use cases, the lengths of the names do not really matter. Hence I have changed the implementation to not treat long names as errors, but warnings. > (7.3.5) The length of a name shall be subject to an implementation > limit; see Annex C. > > (Annex C.2) Table C.1 describes the minimum architectural limits that > should be accommodated by conforming readers running on 32-bit > machines. Because conforming readers may be subject to these limits, > conforming writers producing PDF files should remain within them. > > (Table C.1) name 127 "Maximum length of a name, in bytes." http://adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf	2015-07-10 16:10:24 +02:00
Jonas Jenwald	eac168f3cc	Refactor searching for end of inline (EI) JPEG image streams This patch changes searching for EI image streams to rely on the EOI (end-of-image) marker for DCTDecode filters (i.e. JPEG images).	2015-01-10 23:55:55 +01:00
Jonas Jenwald	184880a751	Fix searching for end of inline (EI) images with ASCII85Decode filters (bug 1077808) This patch changes searching for the end of inline image streams to rely on the EOD marker for the filters: ASCII85Decode and ASCIIHexDecode.	2014-12-15 18:48:29 +01:00
Yury Delendik	f5df30f967	Merge pull request #5445 from CodingFabian/fixImageCachingInParser Fixes caching of inline images during parsing.	2014-12-15 10:51:23 -06:00
Jonas Jenwald	3e1b5216ac	Refactor searching for the SOI marker of inline JPEG image streams	2014-12-05 17:24:27 +01:00
Fabian Lange	970c048d50	fixes caching of inline images during parsing. As described in #5444, the evaluator will perform identity checking of paintImageMaskXObjects to decide if it can use paintImageMaskXObjectRepeat instead of paintImageMaskXObjectGroup. This can only ever work if the entry is a cache hit. However the previous caching implementation was doing a lazy caching, which would only consider a image cache worthy if it is repeated. Only then the repeated instance would be cached. As a result of this the sequence of identical images A1 A2 A3 A4 would be seen as A1 A2 A2 A2 by the evaluator, which prevents using the "repeat" optimization. Also only the last encountered image is cached, so A1 B1 A2 B2, would stay A1 B1 A2 B2. The new implementation drops the "lazy" init of the cache. The threshold for enabling an image to be cached is rather small, so the potential waste in storage and adler32 calculation is rather low. It also caches any eligible image by its adler32. The two example from above would now be A1 A1 A1 A1 and A1 B1 A1 B1 which not only saves temporary storage, but also prevents computing identical masks over and over again (which is the main performance impact of #2618)	2014-10-28 15:39:41 +01:00
Jonas Jenwald	2003d83ea6	Fix loading of inline JPEG images	2014-09-11 16:42:51 +02:00
Jonas Jenwald	d1974eae34	Add peekByte method to Stream, DecodeStream and ChunkedStream	2014-09-11 16:42:41 +02:00
Yury Delendik	aa8d3d98f8	Fetches params in makeFilter	2014-09-09 08:29:31 -05:00
Rob Wu	07a4837763	CCITTFaxStream parser: resolve xref if needed Fixes #5243	2014-08-31 11:03:24 +02:00
Nicholas Nethercote	ffae848f4e	Reduce ASCII checks in makeInlineImage(). makeInlineImage() has a "are the next five chars ASCII?" check which is run after an "EI" sequence has been found. This check involves the creation of a new object because peekBytes() calls subarray(). Unfortunately, the check is currently run on whitespace chars even when an "EI" sequence has not yet been found, i.e. when it's not needed. For the PDF in #2618, there are over 820,000 such checks. This change reworks the relevant loop so that the check is only done once an "EI" sequence has been seen. This reduces the number of checks to 157,000, and speeds up rendering by somewhere between 2% and 7% (the measurements are noisy).	2014-08-14 16:20:58 -07:00
Jonas Jenwald	7fa204c805	Add strict equalities in src/core/parser.js	2014-08-02 17:37:24 +02:00
Jonas Jenwald	a5c98aab36	Re-factor parsing of the Linearization dictionary	2014-07-27 12:56:09 +02:00
Yury Delendik	fbdab2c7c5	Not ignoring MissingDataException exception.	2014-06-18 18:24:54 -05:00
Jonas Jenwald	ab67e1c272	Let Parser_makeFilter return NullStream when an invalid stream is encountered (issue 3417)	2014-06-17 12:03:34 +02:00
Yury Delendik	0cd28ebfa3	Telemetry for used stream and font types	2014-06-16 16:41:04 -05:00
Yury Delendik	b2d8e73d54	Merge pull request #4895 from p01/Small_optimizations_1 Small optimizations 1	2014-06-10 10:09:12 -05:00
p01	37c9765ab4	Optimized Lexer_getObj 2x faster	2014-06-10 12:37:36 +02:00
Jonas Jenwald	26bbcedcae	Prevent infinite loop when scanning for endstream (bug 1020226)	2014-06-09 22:42:35 +02:00
fkaelberer	f88118dbf9	small optimizations in parser.getObj(), lexer.getObj()	2014-05-23 09:25:36 +02:00
p01	7b68737baa	Strict isEOF / ~22% faster on issue2813, from 16.5s to 13.5s	2014-05-20 12:39:58 +02:00

1 2