pdf.js

Author	SHA1	Message	Date
Brendan Dahl	f1f9d98519	Merge pull request #8507 from Snuffleupagus/issue-8480 Only special-case OpenType fonts with `CFF` data if it's both a composite (i.e. Type0) font and also has a non-default CID to GID map (issue 8480)	2017-06-23 13:36:58 -07:00
Yury Delendik	e2ca894fec	Merge pull request #8488 from mukulmishra18/streams-getTextContent Streams get text content	2017-06-23 12:52:13 -05:00
Jonas Jenwald	73234577e1	Rename `map` to `_map` inside of `Dict`, to make it clearer that it should be regarded as a "private" property	2017-06-17 17:32:00 +02:00
Mukul Mishra	0c13d0ff46	Adds Streams API in getTextContent to stream data. This patch adds Streams API support in getTextContent so that we can stream data in chunks instead of fetching whole data from worker thread to main thread. This patch supports Streams API without changing the core functionality of getTextContent. Enqueue textContent directly at getTextContent in partialEvaluator. Adds desiredSize and ready property in streamSink.	2017-06-17 20:03:27 +05:30
Jonas Jenwald	3a20fd165f	Refactor `ObjectLoader` to use `Dict`s correctly, rather than abusing their internal properties The `ObjectLoader` currently takes an Object as input, despite actually working with `Dict`s internally. This means that at the (two) existing call-sites, we're passing in the "private" `Dict.map` property directly. Doing this seems like an anti-pattern, and we could (and even should) simply provide the actual `Dict` when creating an `ObjectLoader` instance. Accessing properties stored in the `Dict` is now done using the intended methods instead, in particular `getRaw` which (as the name suggests) doesn't do any de-referencing, thus maintaining the current functionality of the code. The only functional change in this patch is that `ObjectLoader.load` will now ignore empty nodes, such that `ObjectLoader._walk` only needs to deal with nodes that are known to contain data. (This lets us skip, among other checks, meaningless `addChildren` function calls.)	2017-06-16 22:59:32 +02:00
Jonas Jenwald	f2fc9ee281	Slightly refactor and ES6-ify the code in `ObjectLoader` This patch changes all `var` to `let`, and caches the array lengths in all loops. Also removes two unnecessary temporary variable assignments.	2017-06-16 22:59:32 +02:00
Jonas Jenwald	e589834f13	Ensure that `TilingPattern`s have valid (non-zero) /BBox arrays (issue 8330) Fixes 8330.	2017-06-09 21:41:48 +02:00
Jonas Jenwald	8b4a42e5b8	Only special-case OpenType fonts with `CFF` data if it's both a composite (i.e. Type0) font and also has a non-default CID to GID map (issue 8480) As mentioned the last time that I touched this particular part of the font code, I'm sincerely hope that this doesn't cause any regressions! However, the patch passes all tests added in PRs 5770, 6270, and 7904 (and obviously all other tests as well). Furthermore, I've manually checked all the issues/bugs referenced in those PRs without finding any issues. Fixes 8480.	2017-06-09 21:15:39 +02:00
Jonas Jenwald	999e30723d	Reduce the duplication slightly when detecting an OpenType font (in the `Font` constructor)	2017-06-09 18:26:57 +02:00
Jonas Jenwald	a8c87f8019	Fix inconsistent spacing and trailing commas in objects in `src/core/` files, so we can enable the `comma-dangle` and `object-curly-spacing` ESLint rules later on Unfortunately this patch is fairly big, even though it only covers the `src/core` folder, but splitting it even further seemed difficult. http://eslint.org/docs/rules/comma-dangle http://eslint.org/docs/rules/object-curly-spacing Given that we currently have quite inconsistent object formatting, fixing this in one big patch probably wouldn't be feasible (since I cannot imagine anyone wanting to review that); hence I've opted to try and do this piecewise instead. Please note: This patch was created automatically, using the ESLint --fix command line option. In a couple of places this caused lines to become too long, and I've fixed those manually; please refer to the interdiff below for the only hand-edits in this patch. ```diff diff --git a/src/core/evaluator.js b/src/core/evaluator.js index abab9027..dcd3594b 100644 --- a/src/core/evaluator.js +++ b/src/core/evaluator.js @@ -2785,7 +2785,8 @@ var EvaluatorPreprocessor = (function EvaluatorPreprocessorClosure() { t['Tz'] = { id: OPS.setHScale, numArgs: 1, variableArgs: false, }; t['TL'] = { id: OPS.setLeading, numArgs: 1, variableArgs: false, }; t['Tf'] = { id: OPS.setFont, numArgs: 2, variableArgs: false, }; - t['Tr'] = { id: OPS.setTextRenderingMode, numArgs: 1, variableArgs: false, }; + t['Tr'] = { id: OPS.setTextRenderingMode, numArgs: 1, + variableArgs: false, }; t['Ts'] = { id: OPS.setTextRise, numArgs: 1, variableArgs: false, }; t['Td'] = { id: OPS.moveText, numArgs: 2, variableArgs: false, }; t['TD'] = { id: OPS.setLeadingMoveText, numArgs: 2, variableArgs: false, }; diff --git a/src/core/jbig2.js b/src/core/jbig2.js index 5a17d482..71671541 100644 --- a/src/core/jbig2.js +++ b/src/core/jbig2.js @@ -123,19 +123,22 @@ var Jbig2Image = (function Jbig2ImageClosure() { { x: -1, y: -1, }, { x: 0, y: -1, }, { x: 1, y: -1, }, { x: -2, y: 0, }, { x: -1, y: 0, }], [{ x: -3, y: -1, }, { x: -2, y: -1, }, { x: -1, y: -1, }, { x: 0, y: -1, }, - { x: 1, y: -1, }, { x: -4, y: 0, }, { x: -3, y: 0, }, { x: -2, y: 0, }, { x: -1, y: 0, }] + { x: 1, y: -1, }, { x: -4, y: 0, }, { x: -3, y: 0, }, { x: -2, y: 0, }, + { x: -1, y: 0, }] ]; var RefinementTemplates = [ { coding: [{ x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }], - reference: [{ x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }, { x: 0, y: 0, }, - { x: 1, y: 0, }, { x: -1, y: 1, }, { x: 0, y: 1, }, { x: 1, y: 1, }], + reference: [{ x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }, + { x: 0, y: 0, }, { x: 1, y: 0, }, { x: -1, y: 1, }, + { x: 0, y: 1, }, { x: 1, y: 1, }], }, { - coding: [{ x: -1, y: -1, }, { x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }], - reference: [{ x: 0, y: -1, }, { x: -1, y: 0, }, { x: 0, y: 0, }, { x: 1, y: 0, }, - { x: 0, y: 1, }, { x: 1, y: 1, }], + coding: [{ x: -1, y: -1, }, { x: 0, y: -1, }, { x: 1, y: -1, }, + { x: -1, y: 0, }], + reference: [{ x: 0, y: -1, }, { x: -1, y: 0, }, { x: 0, y: 0, }, + { x: 1, y: 0, }, { x: 0, y: 1, }, { x: 1, y: 1, }], } ]; ```	2017-06-02 11:20:19 +02:00
Jonas Jenwald	982b6aa65b	Convert the files in the `/src/core` folder to ES6 modules Please note that the `glyphlist.js` and `unicode.js` files are converted to CommonJS modules instead, since Babel cannot handle files that large and they are thus excluded from transpilation.	2017-05-30 22:06:21 +02:00
Jonas Jenwald	4ce5e520fb	Add different code-paths to `{CMap, ToUnicodeMap}.charCodeOf` depending on length, since `Array.prototype.indexOf` can be extremely inefficient for very large arrays (issue 8372) Fixes 8372.	2017-05-24 19:47:04 +02:00
Jonas Jenwald	31c24ed631	Don't map glyphs to the HANGUL FILLER (0x3164) Unicode location (issue 8424) This patch follows a similar pattern as previous ones, by skipping certain problematic Unicode locations. According to http://searchfox.org/mozilla-central/rev/6c2dbacbba1d58b8679cee700fd0a54189e0cf1b/gfx/harfbuzz/src/hb-unicode-private.hh#136, it seems that the HANGUL FILLER (0x3164) location is "special". Fixes 8424.	2017-05-23 16:12:45 +02:00
Jonas Jenwald	0ddf52aca5	Remove the special handling for `nameddest`s that look like standard pageNumbers PR 7341 added special handling for `nameddest`s that look like pageNumbers, to prevent issues since we previously incorrectly supported specifying a pageNumber directly in the hash; i.e. `#10` versus the correct `#page=10` format. Since this behaviour wasn't correct, PR 7757 fixed and deprecated the old format, which means that we no longer need to maintain the `nameddest` hack in multiple files.	2017-05-20 11:29:29 +02:00
Yury Delendik	5dc8dcdc0f	Merge pull request #8388 from Snuffleupagus/issue-8380 Cache JPEG images, just as we do for other image formats, in `evaluator.js` (issue 8380)	2017-05-17 17:25:51 -05:00
巴里切罗	8d5d97264e	fix(svg) adjust strategy for decoding JPEG images	2017-05-08 11:32:44 +08:00
Jonas Jenwald	0c2ebda31c	Cache JPEG images, just as we do for other image formats, in `evaluator.js` (issue 8380) For some reason, we're putting all kind of images except JPEG into the `imageCache` in `evaluator.js`.[1] This means that in the PDF file in issue 8380, we'll keep sending the same two small images[2] to the main-thread and decoding them over and over. This is obviously hugely inefficient! As can be seen from the discussion in the issue, the performance becomes extremely bad if the user has the addon "Adblock Plus" installed. However, even in a clean Firefox profile, the performance isn't that great. This patch not only addresses the performance implications of the "Adblock Plus" addon together with that particular PDF file, but it also improves the rendering times considerably for all users. Locally, with a clean profile, the rendering times are reduced from `~2000 ms` to `~500 ms` for my setup! Obviously, the general structure of the PDF file and its operator sequence is still hugely inefficient, however I'd say that the performance with this patch is good enough to consider the issue (as it stands) resolved.[3] Fixes 8380. --- [1] Not technically true, since inline images are cached from `parser.js`, but whatever :-) [2] The two JPEG images have dimensions 1x2, respectively 4x2. [3] To make this even more efficient, a new state would have to be added to the `QueueOptimizer`. Given that PDF files this stupid fortunately aren't too common, I'm not convinced that it's worth doing.	2017-05-07 13:07:41 +02:00
Yury Delendik	3adda80f97	Merge pull request #8358 from Snuffleupagus/PartialEvaluator-method-signatures Change the signatures of the `PartialEvaluator` "constructor" and its `getOperatorList`/`getTextContent` methods to take parameter objects	2017-05-04 08:10:30 -05:00
Yury Delendik	74ba3033e8	Merge pull request #8359 from Snuffleupagus/Lexer-getNumber-ignore-line-breaks Ignore line-breaks between operator and digit in `Lexer.getNumber`	2017-05-03 09:43:59 -05:00
Jonas Jenwald	3e20d30afc	Change the signatures of the `PartialEvaluator` "constructor" and its `getOperatorList`/`getTextContent` methods to take parameter objects Currently these methods accept a large number of parameters, which creates quite unwieldy call-sites. When invoking them, you have to remember not only what arguments to supply, but also the correct order, to avoid runtime errors. Furthermore, since some of the parameters are optional, you also have to remember to pass e.g. `null` or `undefined` for those ones. Also, adding new parameters to these methods (which happens occasionally), often becomes unnecessarily tedious (based on personal experience). Please note that I do not think that we need/should convert every single method in `evaluator.js` (or elsewhere in `/core` files) to take parameter objects. However, in my opinion, once a method starts relying on approximately five parameter (or even more), passing them in individually becomes quite cumbersome. With these changes, I obviously needed to update the `evaluator_spec.js` unit-tests. The main change there, except the new method signatures[1], is that it's now re-using one `PartialEvalutor` instance, since I couldn't see any compelling reason for creating a new one in every single test. Note: If this patch is accepted, my intention is to (time permitting) see if it makes sense to convert additional methods in `evaluator.js` (and other `/core` files) in a similar fashion, but I figured that it'd be a good idea to limit the initial scope somewhat. --- [1] A fun fact here, note how the `PartialEvaluator` signature used in `evaluator_spec.js` wasn't even correct in the current `master`.	2017-05-03 12:10:20 +02:00
Yury Delendik	008aa56ac6	Adds initializeFromPort to the WorkerMessageHandler.	2017-05-02 16:11:54 -05:00
Jonas Jenwald	40feca12c1	Ignore line-breaks between operator and digit in `Lexer.getNumber` This is consistent with the behaviour in Adobe Reader (and PDFium), and it fixes the display of page 30 in https://bug1354114.bmoattachments.org/attachment.cgi?id=8855457 (taken from https://bugzilla.mozilla.org/show_bug.cgi?id=1354114). The patch also makes the `error` message for invalid numbers slightly more useful, by including the charCode as well. (Having that information available would have reduced the time spent on debugging the PDF file above.)	2017-05-02 20:59:42 +02:00
Jonas Jenwald	ebaa22478c	Replace unnecessary `bind(this)` and `var self = this` statements with arrow functions in remaining `src/core/` files	2017-05-02 15:47:43 +02:00
Jonas Jenwald	95bbc8101c	Replace unnecessary `bind(this)` and `var self = this` statements with arrow functions in `src/core/evaluator.js` Note that by using `let` instead of `var` in `PartialEvaluator.setGState` and `TranslatedFont.loadType3Data`, we can get rid of further `bind` usages since `let` is block-scoped. Also, the fact that `bind` wasn't used in the `Font` case inside of `setGState` is actually a bug which has been present ever since PR 5205, where a closure was replaced by a standard loop.[1] --- [1] I'm not aware of any bugs caused by this, but that is probably more a happy accident than anything else, since e.g. just removing the `bind` from the `SMask` case without using block-scoped variables causes test failures.	2017-05-01 20:29:44 +02:00
Tim van der Meij	06c93d8fbd	Merge pull request #8342 from Snuffleupagus/eslint_object-shorthand-src-core Enable the `object-shorthand` ESLint rule in `src/core`	2017-04-29 23:59:20 +02:00
Jonas Jenwald	afc74b0178	Enable the `object-shorthand` ESLint rule in `src/shared` Please see http://eslint.org/docs/rules/object-shorthand. For the most part, these changes are of the search-and-replace kind, and the previously enabled `no-undef` rule should complement the tests in helping ensure that no stupid errors crept into to the patch.	2017-04-27 17:29:40 +02:00
Jani Pehkonen	64deb6c700	Subtract the X/Y offsets when decoding refinement regions of JBIG2 images (issue 7145, 7308, 7401, 7850, 8270) Please refer to the JBIG2 standard, see https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-T.88-200002-I!!PDF-E&type=items. In particular, section "6.3.5.3 Fixed templates and adaptive templates" mentions that the offsets should be subtracted; where the offsets are defined according to "Table 6" under section "6.3.2 Input parameters". Fixes 7145. Fixes 7308. Fixes 7401. Fixes 7850. Fixes 8270.	2017-04-26 16:06:15 +02:00
Jonas Jenwald	fd51a7cb8c	Merge pull request #8287 from yurydelendik/babel-es2015-preset Allow to convert (some of) ES6 code to ES5.	2017-04-14 21:47:45 +02:00
Yury Delendik	5855c0a8be	Allow to convert (some of) ES6 code to ES5.	2017-04-14 14:39:25 -05:00
Yury Delendik	30bee9fe0c	Moves Uint32ArrayView and hasCanvasTypedArrays into compatibility.js.	2017-04-14 10:04:52 -05:00
Yury Delendik	c4c44c1bbe	Merge pull request #8240 from Snuffleupagus/api-stopAtErrors [api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815)	2017-04-13 10:58:49 -05:00
Tim van der Meij	32e01cda96	Merge pull request #8228 from timvandermeij/line-annotations Implement support for line annotations	2017-04-13 00:18:31 +02:00
Tim van der Meij	e15a2ec523	Annotations: implement support for line annotations This patch implements support for line annotations. Other viewers only show the popup annotation when hovering over the line, which may have any orientation. To make this possible, we render an invisible line (SVG element) over the line on the canvas that acts as the trigger for the popup annotation. This invisible line has the same starting coordinates, ending coordinates and width of the line on the canvas.	2017-04-12 23:05:25 +02:00
Jonas Jenwald	fbe7b2eee7	Always ignore Type3 glyphs if their `OperatorList`s contain errors, regardless of the value of the `stopAtErrors` option Compared to the parsing of e.g. an entire page, it doesn't really make sense to only be able to render a Type3 glyph partially.	2017-04-11 08:59:22 +02:00
Jonas Jenwald	a39d636eb8	[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815) Other PDF readers, e.g. Adobe Reader and PDFium (in Chrome), will attempt to render as much of a page as possible even if there are errors present. Currently we just bail as soon the first error is hit, which means that we'll usually not render anything in these cases and just display a blank page instead. NOTE: This patch changes the default behaviour of the PDF.js API to always attempt to recover as much data as possible, even when encountering errors during e.g. `getOperatorList`/`getTextContent`, which thus improve our handling of corrupt PDF files and allow the default viewer to handle errors slightly more gracefully. In the event that an API consumer wishes to use the old behaviour, where we stop parsing as soon as an error is encountered, the `stopAtErrors` parameter can be set at `getDocument`. Fixes, inasmuch it's possible since the PDF files are corrupt, e.g. issue 6342, issue 3795, and [bug 1130815](https://bugzilla.mozilla.org/show_bug.cgi?id=1130815) (and probably others too).	2017-04-11 08:59:22 +02:00
Jonas Jenwald	10e5f766a2	Merge pull request #8266 from brendandahl/issue6652 Normalize blend mode names.	2017-04-11 08:54:42 +02:00
Brendan Dahl	4969b2ad97	Normalize blend mode names.	2017-04-10 16:18:08 -07:00
Tim van der Meij	30d63b0c50	Annotations: move container border removal to the display layer The display layer is responsible for creating the HTML elements for the annotations from the core layer. If we need to ignore border styling for the containers of certain elements, the display layer should do so and not the core layer. I noticed this during the implementation of line annotations, for which we actually need the original border width in the display layer, even though we ignore it for the container. If we set the border style to zero in the core layer, this becomes impossible. To prevent this, this patch moves the container border removal code from the core layer to the display layer. This makes the core layer output the unchanged annotation data and lets the display layer remove any border styling if necessary.	2017-04-09 19:01:38 +02:00
Jonas Jenwald	f41d80bdd3	Enable the `prefer-promise-reject-errors` ESLint rule See http://eslint.org/docs/rules/prefer-promise-reject-errors, note that this is similar to the already used `no-throw-literal` rule.	2017-04-08 11:47:22 +02:00
Brendan Dahl	cdc79a4721	Don’t skip glyph 0 in cmap.	2017-04-05 15:17:38 -07:00
Tim van der Meij	8cee63df5d	Merge pull request #8205 from Snuffleupagus/built-in-CMap-errors Improve the error handling when loading of built-in CMap files fail (PR 8064 follow-up)	2017-03-30 23:01:13 +02:00
Jonas Jenwald	437104969d	Improve the error handling when loading of built-in CMap files fail (PR 8064 follow-up) I happened to notice that the error handling wasn't that great, which I missed previously since there were no unit-tests for failure to load built-in CMap files. Hence this patch, which improves the error handling and adds tests.	2017-03-29 22:38:29 +02:00
Jonas Jenwald	61ee0de29f	Use a simple `RefSetCache` to significantly improve the performance of `Catalog.getPageDict` for certain long documents (PR 8105 follow-up) I found that PR 8105 unfortunately causes a very serious performance regression in long PDF documents where the `Pages` tree only has one level; my apologies for this! Obviously we cannot revert that PR, since that would cause more issues than it solves. Hence it seems to me that the only viable solution here, is to add a simple `RefSetCache` to reduce the amount of redundant lookups. Previously in PR 8105 caching was thought to be unnecessary, but as it turns out I don't think that we really have a choice in the matter any more.	2017-03-28 21:39:55 +02:00
Jonas Jenwald	62eee8c782	Try harder to find the next valid JPEG marker when decoding Scan data (issue 8182, issue 8189) Tentatively fixes 8182 and fixes 8189.	2017-03-27 15:55:21 +02:00
Jonas Jenwald	e229c21ce1	Remove unnecessary `xref` parameters from various method signatures in `PartialEvaluator`, since `this.xref` is already available in the relevant scope For reasons I don't pretend to understand, we're passing around `xref` arguments to a bunch of methods despite `this.xref` being available in `PartialEvaluator`. This patch is a small first small step towards cleaning up the, often unwieldy, signatures of methods in `PartialEvaluator`.	2017-03-26 14:12:53 +02:00
Jonas Jenwald	e40fd63bd3	In `src/core/evaluator.js`, convert a couple of `if (!someVariable) { error(...); }` instances to `assert(someVariable);` instead Rather than, in a number of places, basically duplicating the logic of `assert` we can simply utilize the function directly instead.	2017-03-26 13:53:13 +02:00
Jonas Jenwald	3705e5e459	Use a proper `MessageHandler` for `PartialEvaluator.getTextContent` to avoid errors for fonts relying on built-in CMap files (PR 8064 follow-up) My apologies for inadvertently breaking this in PR 8064; apparently we don't have any tests that cover this use-case :( Without this patch `getTextContent` will fail if called before `getOperatorList`, since loading of fonts during text-extraction may require fetching of built-in CMap files. Please note: The `text` test added here, which uses an already existing PDF file, fails without this patch.	2017-03-24 17:39:33 +01:00
Rob Wu	49af56f730	Rethrow MissingDataException when needed In core/document.js: `PDFDocument.prototype.parse` accesses a dictionary property, which could throw if the underlying data is not yet available. In core/obj.js: `get Catalog.prototype.metadata` calls `stream.getBytes`, which can throw MissingDataException too when the stream is a ChunkedStream.	2017-03-22 14:55:59 +01:00
Jonas Jenwald	8527d27eae	Ensure that `PDFDocument.documentInfo` doesn't fail during document load, when the entire XRef table hasn't been fetched yet (issue 8180) Similar to other `try-catch` statements in `/core` code, we must re-throw `MissingDataException` to prevent issues with missing data during document loading. Note that I'm not sure if/how we can test this, which is why the patch doesn't include any test(s). Fixes 8180.	2017-03-22 14:14:38 +01:00
Jonas Jenwald	e2e13df4a5	Merge pull request #8164 from Snuffleupagus/issue-7828 Don't read past the EOI marker for JPEG images with non-default restart interval (issue 7828)	2017-03-20 22:17:28 +01:00

1 2 3 4 5 ...

1138 Commits