Commit Graph

1194 Commits

Author SHA1 Message Date
Brendan Dahl
9f5c1550ed Merge pull request #8592 from brendandahl/cmap-3-0
Only mask char codes of (3, 0) cmap tables in the range of 0xF000 to 0…
2017-07-03 17:58:28 -07:00
Brendan Dahl
efbbd8533f Only mask char codes of (3, 0) cmap tables in the range of 0xF000 to 0xF0FF. 2017-07-03 13:13:46 -07:00
Brendan Dahl
6d4f748fb1 Fix how we detect and handle missing glyph data. 2017-07-03 13:06:06 -07:00
Mukul Mishra
308a83e5ca Fixes wrong structure of fullReader.read() result. 2017-07-01 15:52:47 +05:30
Jonas Jenwald
de0e7a9a68 Check that the MessageHandler isn't already terminated in the onFailure handler in src/core/worker.js (issue 8584)
All other code-paths already checks that the `MessageHandler` isn't terminated, but apparently `onFailure` was missing that check (compare e.g. with the `onSuccess` function).
From what I can tell, this is only an issue if workers are *disabled*, hence why I didn't bother adding a unit-test.

Fixes 8584.
2017-06-30 10:11:13 +02:00
Brendan Dahl
a8a8909d2d Fix missing notdef in expert encoding. 2017-06-29 12:12:39 -07:00
Brendan Dahl
f1f9d98519 Merge pull request #8507 from Snuffleupagus/issue-8480
Only special-case OpenType fonts with `CFF` data if it's both a composite (i.e. Type0) font and also has a non-default CID to GID map (issue 8480)
2017-06-23 13:36:58 -07:00
Yury Delendik
e2ca894fec Merge pull request #8488 from mukulmishra18/streams-getTextContent
Streams get text content
2017-06-23 12:52:13 -05:00
Jonas Jenwald
73234577e1 Rename map to _map inside of Dict, to make it clearer that it should be regarded as a "private" property 2017-06-17 17:32:00 +02:00
Mukul Mishra
0c13d0ff46 Adds Streams API in getTextContent to stream data.
This patch adds Streams API support in getTextContent
so that we can stream data in chunks instead of fetching
whole data from worker thread to main thread. This patch
supports Streams API without changing the core functionality
of getTextContent.

Enqueue textContent directly at getTextContent in partialEvaluator.

Adds desiredSize and ready property in streamSink.
2017-06-17 20:03:27 +05:30
Jonas Jenwald
3a20fd165f Refactor ObjectLoader to use Dicts correctly, rather than abusing their internal properties
The `ObjectLoader` currently takes an Object as input, despite actually working with `Dict`s internally. This means that at the (two) existing call-sites, we're passing in the "private" `Dict.map` property directly.

Doing this seems like an anti-pattern, and we could (and even should) simply provide the actual `Dict` when creating an `ObjectLoader` instance.
Accessing properties stored in the `Dict` is now done using the intended methods instead, in particular `getRaw` which (as the name suggests) doesn't do any de-referencing, thus maintaining the current functionality of the code.

The only functional change in this patch is that `ObjectLoader.load` will now ignore empty nodes, such that `ObjectLoader._walk` only needs to deal with nodes that are known to contain data. (This lets us skip, among other checks, meaningless `addChildren` function calls.)
2017-06-16 22:59:32 +02:00
Jonas Jenwald
f2fc9ee281 Slightly refactor and ES6-ify the code in ObjectLoader
This patch changes all `var` to `let`, and caches the array lengths in all loops. Also removes two unnecessary temporary variable assignments.
2017-06-16 22:59:32 +02:00
Jonas Jenwald
e589834f13 Ensure that TilingPatterns have valid (non-zero) /BBox arrays (issue 8330)
Fixes 8330.
2017-06-09 21:41:48 +02:00
Jonas Jenwald
8b4a42e5b8 Only special-case OpenType fonts with CFF data if it's both a composite (i.e. Type0) font and also has a non-default CID to GID map (issue 8480)
*As mentioned the last time that I touched this particular part of the font code, I'm sincerely hope that this doesn't cause any regressions!*

However, the patch passes all tests added in PRs 5770, 6270, and 7904 (and obviously all other tests as well). Furthermore, I've manually checked all the issues/bugs referenced in those PRs without finding any issues.

Fixes 8480.
2017-06-09 21:15:39 +02:00
Jonas Jenwald
999e30723d Reduce the duplication slightly when detecting an OpenType font (in the Font constructor) 2017-06-09 18:26:57 +02:00
Jonas Jenwald
a8c87f8019 Fix inconsistent spacing and trailing commas in objects in src/core/ files, so we can enable the comma-dangle and object-curly-spacing ESLint rules later on
*Unfortunately this patch is fairly big, even though it only covers the `src/core` folder, but splitting it even further seemed difficult.*

http://eslint.org/docs/rules/comma-dangle
http://eslint.org/docs/rules/object-curly-spacing

Given that we currently have quite inconsistent object formatting, fixing this in *one* big patch probably wouldn't be feasible (since I cannot imagine anyone wanting to review that); hence I've opted to try and do this piecewise instead.

Please note: This patch was created automatically, using the ESLint --fix command line option. In a couple of places this caused lines to become too long, and I've fixed those manually; please refer to the interdiff below for the only hand-edits in this patch.

```diff
diff --git a/src/core/evaluator.js b/src/core/evaluator.js
index abab9027..dcd3594b 100644
--- a/src/core/evaluator.js
+++ b/src/core/evaluator.js
@@ -2785,7 +2785,8 @@ var EvaluatorPreprocessor = (function EvaluatorPreprocessorClosure() {
     t['Tz'] = { id: OPS.setHScale, numArgs: 1, variableArgs: false, };
     t['TL'] = { id: OPS.setLeading, numArgs: 1, variableArgs: false, };
     t['Tf'] = { id: OPS.setFont, numArgs: 2, variableArgs: false, };
-    t['Tr'] = { id: OPS.setTextRenderingMode, numArgs: 1, variableArgs: false, };
+    t['Tr'] = { id: OPS.setTextRenderingMode, numArgs: 1,
+                variableArgs: false, };
     t['Ts'] = { id: OPS.setTextRise, numArgs: 1, variableArgs: false, };
     t['Td'] = { id: OPS.moveText, numArgs: 2, variableArgs: false, };
     t['TD'] = { id: OPS.setLeadingMoveText, numArgs: 2, variableArgs: false, };
diff --git a/src/core/jbig2.js b/src/core/jbig2.js
index 5a17d482..71671541 100644
--- a/src/core/jbig2.js
+++ b/src/core/jbig2.js
@@ -123,19 +123,22 @@ var Jbig2Image = (function Jbig2ImageClosure() {
      { x: -1, y: -1, }, { x: 0, y: -1, }, { x: 1, y: -1, }, { x: -2, y: 0, },
      { x: -1, y: 0, }],
     [{ x: -3, y: -1, }, { x: -2, y: -1, }, { x: -1, y: -1, }, { x: 0, y: -1, },
-     { x: 1, y: -1, }, { x: -4, y: 0, }, { x: -3, y: 0, }, { x: -2, y: 0, }, { x: -1, y: 0, }]
+     { x: 1, y: -1, }, { x: -4, y: 0, }, { x: -3, y: 0, }, { x: -2, y: 0, },
+     { x: -1, y: 0, }]
   ];

   var RefinementTemplates = [
     {
       coding: [{ x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }],
-      reference: [{ x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }, { x: 0, y: 0, },
-                  { x: 1, y: 0, }, { x: -1, y: 1, }, { x: 0, y: 1, }, { x: 1, y: 1, }],
+      reference: [{ x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, },
+                  { x: 0, y: 0, }, { x: 1, y: 0, }, { x: -1, y: 1, },
+                  { x: 0, y: 1, }, { x: 1, y: 1, }],
     },
     {
-      coding: [{ x: -1, y: -1, }, { x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }],
-      reference: [{ x: 0, y: -1, }, { x: -1, y: 0, }, { x: 0, y: 0, }, { x: 1, y: 0, },
-                  { x: 0, y: 1, }, { x: 1, y: 1, }],
+      coding: [{ x: -1, y: -1, }, { x: 0, y: -1, }, { x: 1, y: -1, },
+               { x: -1, y: 0, }],
+      reference: [{ x: 0, y: -1, }, { x: -1, y: 0, }, { x: 0, y: 0, },
+                  { x: 1, y: 0, }, { x: 0, y: 1, }, { x: 1, y: 1, }],
     }
   ];
```
2017-06-02 11:20:19 +02:00
Jonas Jenwald
982b6aa65b Convert the files in the /src/core folder to ES6 modules
Please note that the `glyphlist.js` and `unicode.js` files are converted to CommonJS modules instead, since Babel cannot handle files that large and they are thus excluded from transpilation.
2017-05-30 22:06:21 +02:00
Jonas Jenwald
4ce5e520fb Add different code-paths to {CMap, ToUnicodeMap}.charCodeOf depending on length, since Array.prototype.indexOf can be extremely inefficient for very large arrays (issue 8372)
Fixes 8372.
2017-05-24 19:47:04 +02:00
Jonas Jenwald
31c24ed631 Don't map glyphs to the HANGUL FILLER (0x3164) Unicode location (issue 8424)
*This patch follows a similar pattern as previous ones, by skipping certain problematic Unicode locations.*

According to http://searchfox.org/mozilla-central/rev/6c2dbacbba1d58b8679cee700fd0a54189e0cf1b/gfx/harfbuzz/src/hb-unicode-private.hh#136, it seems that the HANGUL FILLER (0x3164) location is "special".

Fixes 8424.
2017-05-23 16:12:45 +02:00
Jonas Jenwald
0ddf52aca5 Remove the special handling for nameddests that look like standard pageNumbers
PR 7341 added special handling for `nameddest`s that look like pageNumbers, to prevent issues since we previously *incorrectly* supported specifying a pageNumber directly in the hash; i.e. `#10` versus the correct `#page=10` format.

Since this behaviour wasn't correct, PR 7757 fixed and deprecated the old format, which means that we no longer need to maintain the `nameddest` hack in multiple files.
2017-05-20 11:29:29 +02:00
Yury Delendik
5dc8dcdc0f Merge pull request #8388 from Snuffleupagus/issue-8380
Cache JPEG images, just as we do for other image formats, in `evaluator.js` (issue 8380)
2017-05-17 17:25:51 -05:00
巴里切罗
8d5d97264e fix(svg) adjust strategy for decoding JPEG images 2017-05-08 11:32:44 +08:00
Jonas Jenwald
0c2ebda31c Cache JPEG images, just as we do for other image formats, in evaluator.js (issue 8380)
For some reason, we're putting all kind of images *except* JPEG into the `imageCache` in `evaluator.js`.[1]
This means that in the PDF file in issue 8380, we'll keep sending the *same* two small images[2] to the main-thread and decoding them over and over. This is obviously hugely inefficient!

As can be seen from the discussion in the issue, the performance becomes *extremely* bad if the user has the addon "Adblock Plus" installed. However, even in a clean Firefox profile, the performance isn't that great.

This patch not only addresses the performance implications of the "Adblock Plus" addon together with that particular PDF file, but it *also* improves the rendering times considerably for *all* users.
Locally, with a clean profile, the rendering times are reduced from `~2000 ms` to `~500 ms` for my setup!

Obviously, the general structure of the PDF file and its operator sequence is still hugely inefficient, however I'd say that the performance with this patch is good enough to consider the issue (as it stands) resolved.[3]

Fixes 8380.

---
[1] Not technically true, since inline images are cached from `parser.js`, but whatever :-)

[2] The two JPEG images have dimensions 1x2, respectively 4x2.

[3] To make this even more efficient, a new state would have to be added to the `QueueOptimizer`. Given that PDF files this stupid fortunately aren't too common, I'm not convinced that it's worth doing.
2017-05-07 13:07:41 +02:00
Yury Delendik
3adda80f97 Merge pull request #8358 from Snuffleupagus/PartialEvaluator-method-signatures
Change the signatures of the `PartialEvaluator` "constructor" and its `getOperatorList`/`getTextContent` methods to take parameter objects
2017-05-04 08:10:30 -05:00
Yury Delendik
74ba3033e8 Merge pull request #8359 from Snuffleupagus/Lexer-getNumber-ignore-line-breaks
Ignore line-breaks between operator and digit in `Lexer.getNumber`
2017-05-03 09:43:59 -05:00
Jonas Jenwald
3e20d30afc Change the signatures of the PartialEvaluator "constructor" and its getOperatorList/getTextContent methods to take parameter objects
Currently these methods accept a large number of parameters, which creates quite unwieldy call-sites. When invoking them, you have to remember not only what arguments to supply, but also the correct order, to avoid runtime errors.
Furthermore, since some of the parameters are optional, you also have to remember to pass e.g. `null` or `undefined` for those ones.
Also, adding new parameters to these methods (which happens occasionally), often becomes unnecessarily tedious (based on personal experience).

Please note that I do *not* think that we need/should convert *every* single method in `evaluator.js` (or elsewhere in `/core` files) to take parameter objects. However, in my opinion, once a method starts relying on approximately five parameter (or even more), passing them in individually becomes quite cumbersome.

With these changes, I obviously needed to update the `evaluator_spec.js` unit-tests. The main change there, except the new method signatures[1], is that it's now re-using *one* `PartialEvalutor` instance, since I couldn't see any compelling reason for creating a new one in every single test.

*Note:* If this patch is accepted, my intention is to (time permitting) see if it makes sense to convert additional methods in `evaluator.js` (and other `/core` files) in a similar fashion, but I figured that it'd be a good idea to limit the initial scope somewhat.

---

[1] A fun fact here, note how the `PartialEvaluator` signature used in `evaluator_spec.js` wasn't even correct in the current `master`.
2017-05-03 12:10:20 +02:00
Yury Delendik
008aa56ac6 Adds initializeFromPort to the WorkerMessageHandler. 2017-05-02 16:11:54 -05:00
Jonas Jenwald
40feca12c1 Ignore line-breaks between operator and digit in Lexer.getNumber
This is consistent with the behaviour in Adobe Reader (and PDFium), and it fixes the display of page 30 in https://bug1354114.bmoattachments.org/attachment.cgi?id=8855457 (taken from https://bugzilla.mozilla.org/show_bug.cgi?id=1354114).

The patch also makes the `error` message for invalid numbers slightly more useful, by including the charCode as well. (Having that information available would have reduced the time spent on debugging the PDF file above.)
2017-05-02 20:59:42 +02:00
Jonas Jenwald
ebaa22478c Replace unnecessary bind(this) and var self = this statements with arrow functions in remaining src/core/ files 2017-05-02 15:47:43 +02:00
Jonas Jenwald
95bbc8101c Replace unnecessary bind(this) and var self = this statements with arrow functions in src/core/evaluator.js
Note that by using `let` instead of `var` in `PartialEvaluator.setGState` and `TranslatedFont.loadType3Data`, we can get rid of further `bind` usages since `let` is block-scoped.
Also, the fact that `bind` wasn't used in the `Font` case inside of `setGState` is actually a bug which has been present ever since PR 5205, where a closure was replaced by a standard loop.[1]

---
[1] I'm not aware of any bugs caused by this, but that is probably more a happy accident than anything else, since e.g. just removing the `bind` from the `SMask` case without using block-scoped variables causes test failures.
2017-05-01 20:29:44 +02:00
Tim van der Meij
06c93d8fbd Merge pull request #8342 from Snuffleupagus/eslint_object-shorthand-src-core
Enable the `object-shorthand` ESLint rule in `src/core`
2017-04-29 23:59:20 +02:00
Jonas Jenwald
afc74b0178 Enable the object-shorthand ESLint rule in src/shared
Please see http://eslint.org/docs/rules/object-shorthand.

For the most part, these changes are of the search-and-replace kind, and the previously enabled `no-undef` rule should complement the tests in helping ensure that no stupid errors crept into to the patch.
2017-04-27 17:29:40 +02:00
Jani Pehkonen
64deb6c700 Subtract the X/Y offsets when decoding refinement regions of JBIG2 images (issue 7145, 7308, 7401, 7850, 8270)
Please refer to the JBIG2 standard, see https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-T.88-200002-I!!PDF-E&type=items.
In particular, section "6.3.5.3 Fixed templates and adaptive templates" mentions that the offsets should be *subtracted*; where the offsets are defined according to "Table 6" under section "6.3.2 Input parameters".

Fixes 7145.
Fixes 7308.
Fixes 7401.
Fixes 7850.
Fixes 8270.
2017-04-26 16:06:15 +02:00
Jonas Jenwald
fd51a7cb8c Merge pull request #8287 from yurydelendik/babel-es2015-preset
Allow to convert (some of) ES6 code to ES5.
2017-04-14 21:47:45 +02:00
Yury Delendik
5855c0a8be Allow to convert (some of) ES6 code to ES5. 2017-04-14 14:39:25 -05:00
Yury Delendik
30bee9fe0c Moves Uint32ArrayView and hasCanvasTypedArrays into compatibility.js. 2017-04-14 10:04:52 -05:00
Yury Delendik
c4c44c1bbe Merge pull request #8240 from Snuffleupagus/api-stopAtErrors
[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815)
2017-04-13 10:58:49 -05:00
Tim van der Meij
32e01cda96 Merge pull request #8228 from timvandermeij/line-annotations
Implement support for line annotations
2017-04-13 00:18:31 +02:00
Tim van der Meij
e15a2ec523
Annotations: implement support for line annotations
This patch implements support for line annotations. Other viewers only
show the popup annotation when hovering over the line, which may have
any orientation. To make this possible, we render an invisible line (SVG
element) over the line on the canvas that acts as the trigger for the
popup annotation. This invisible line has the same starting coordinates,
ending coordinates and width of the line on the canvas.
2017-04-12 23:05:25 +02:00
Jonas Jenwald
fbe7b2eee7 Always ignore Type3 glyphs if their OperatorLists contain errors, regardless of the value of the stopAtErrors option
Compared to the parsing of e.g. an entire page, it doesn't really make sense to only be able to render a Type3 glyph partially.
2017-04-11 08:59:22 +02:00
Jonas Jenwald
a39d636eb8 [api-minor] Always allow e.g. rendering to continue even if there are errors, and add a stopAtErrors parameter to getDocument to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815)
Other PDF readers, e.g. Adobe Reader and PDFium (in Chrome), will attempt to render as much of a page as possible even if there are errors present.
Currently we just bail as soon the first error is hit, which means that we'll usually not render anything in these cases and just display a blank page instead.

NOTE: This patch changes the default behaviour of the PDF.js API to always attempt to recover as much data as possible, even when encountering errors during e.g. `getOperatorList`/`getTextContent`, which thus improve our handling of corrupt PDF files and allow the default viewer to handle errors slightly more gracefully.
In the event that an API consumer wishes to use the old behaviour, where we stop parsing as soon as an error is encountered, the `stopAtErrors` parameter can be set at `getDocument`.

Fixes, inasmuch it's possible since the PDF files are corrupt, e.g. issue 6342, issue 3795, and [bug 1130815](https://bugzilla.mozilla.org/show_bug.cgi?id=1130815) (and probably others too).
2017-04-11 08:59:22 +02:00
Jonas Jenwald
10e5f766a2 Merge pull request #8266 from brendandahl/issue6652
Normalize blend mode names.
2017-04-11 08:54:42 +02:00
Brendan Dahl
4969b2ad97 Normalize blend mode names. 2017-04-10 16:18:08 -07:00
Tim van der Meij
30d63b0c50
Annotations: move container border removal to the display layer
The display layer is responsible for creating the HTML elements for the
annotations from the core layer. If we need to ignore border styling for
the containers of certain elements, the display layer should do so and
not the core layer. I noticed this during the implementation of line
annotations, for which we actually need the original border width in the
display layer, even though we ignore it for the container. If we set the
border style to zero in the core layer, this becomes impossible.

To prevent this, this patch moves the container border removal code from
the core layer to the display layer. This makes the core layer output
the unchanged annotation data and lets the display layer remove any
border styling if necessary.
2017-04-09 19:01:38 +02:00
Jonas Jenwald
f41d80bdd3 Enable the prefer-promise-reject-errors ESLint rule
See http://eslint.org/docs/rules/prefer-promise-reject-errors, note that this is similar to the already used  `no-throw-literal` rule.
2017-04-08 11:47:22 +02:00
Brendan Dahl
cdc79a4721 Don’t skip glyph 0 in cmap. 2017-04-05 15:17:38 -07:00
Tim van der Meij
8cee63df5d Merge pull request #8205 from Snuffleupagus/built-in-CMap-errors
Improve the error handling when loading of built-in CMap files fail (PR 8064 follow-up)
2017-03-30 23:01:13 +02:00
Jonas Jenwald
437104969d Improve the error handling when loading of built-in CMap files fail (PR 8064 follow-up)
I happened to notice that the error handling wasn't that great, which I missed previously since there were no unit-tests for failure to load built-in CMap files.
Hence this patch, which improves the error handling *and* adds tests.
2017-03-29 22:38:29 +02:00
Jonas Jenwald
61ee0de29f Use a simple RefSetCache to significantly improve the performance of Catalog.getPageDict for certain long documents (PR 8105 follow-up)
I found that PR 8105 unfortunately causes a *very serious* performance regression in long PDF documents where the `Pages` tree only has one level; my apologies for this!

Obviously we cannot revert that PR, since that would cause more issues than it solves. Hence it seems to me that the only viable solution here, is to add a simple `RefSetCache` to reduce the amount of redundant lookups.
Previously in PR 8105 caching was thought to be unnecessary, but as it turns out I don't think that we really have a choice in the matter any more.
2017-03-28 21:39:55 +02:00
Jonas Jenwald
62eee8c782 Try harder to find the next valid JPEG marker when decoding Scan data (issue 8182, issue 8189)
Tentatively fixes 8182 and fixes 8189.
2017-03-27 15:55:21 +02:00
Jonas Jenwald
e229c21ce1 Remove unnecessary xref parameters from various method signatures in PartialEvaluator, since this.xref is already available in the relevant scope
For reasons I don't pretend to understand, we're passing around `xref` arguments to a bunch of methods despite `this.xref` being available in `PartialEvaluator`.

This patch is a small first small step towards cleaning up the, often unwieldy, signatures of methods in `PartialEvaluator`.
2017-03-26 14:12:53 +02:00
Jonas Jenwald
e40fd63bd3 In src/core/evaluator.js, convert a couple of if (!someVariable) { error(...); } instances to assert(someVariable); instead
Rather than, in a number of places, basically duplicating the logic of `assert` we can simply utilize the function directly instead.
2017-03-26 13:53:13 +02:00
Jonas Jenwald
3705e5e459 Use a proper MessageHandler for PartialEvaluator.getTextContent to avoid errors for fonts relying on built-in CMap files (PR 8064 follow-up)
*My apologies for inadvertently breaking this in PR 8064; apparently we don't have any tests that cover this use-case :(*

Without this patch `getTextContent` will fail if called before `getOperatorList`, since loading of fonts during text-extraction may require fetching of built-in CMap files.

*Please note:* The `text` test added here, which uses an already existing PDF file, fails without this patch.
2017-03-24 17:39:33 +01:00
Rob Wu
49af56f730 Rethrow MissingDataException when needed
In core/document.js: `PDFDocument.prototype.parse` accesses a dictionary
property, which could throw if the underlying data is not yet available.

In core/obj.js: `get Catalog.prototype.metadata` calls
`stream.getBytes`, which can throw MissingDataException too when the
stream is a ChunkedStream.
2017-03-22 14:55:59 +01:00
Jonas Jenwald
8527d27eae Ensure that PDFDocument.documentInfo doesn't fail during document load, when the entire XRef table hasn't been fetched yet (issue 8180)
Similar to other `try-catch` statements in `/core` code, we must re-throw `MissingDataException` to prevent issues with missing data during document loading.
Note that I'm not sure if/how we can test this, which is why the patch doesn't include any test(s).

Fixes 8180.
2017-03-22 14:14:38 +01:00
Jonas Jenwald
e2e13df4a5 Merge pull request #8164 from Snuffleupagus/issue-7828
Don't read past the EOI marker for JPEG images with non-default restart interval (issue 7828)
2017-03-20 22:17:28 +01:00
Jonas Jenwald
d6d0f778aa Don't read past the EOI marker for JPEG images with non-default restart interval (issue 7828)
*After browsing through (a version of) the JPEG specification, see https://www.w3.org/Graphics/JPEG/itu-t81.pdf, I hope that this patch makes sense.*

Note that while issue 7828 became a problem after PR 7661, it isn't really a regression from than PR. The explanation is rather that we're now relying on `core/jpg.js` instead of the Native Image decoder in more situations than before, which thus exposed an *existing* issue in our JPEG decoder.
Another factor also seems to be that in many JPEG images, the DRI (Define Restart Interval) marker isn't present, in which case this bug won't manifest either.

According to https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=89 (at the bottom of the page):
"NOTE – The final restart interval may be smaller than the size specified by the DRI marker segment, as it includes only the number of MCUs remaining in the scan."
Furthermore, according to https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=39 (in the middle of the page):
"[...] If restart is enabled and the restart interval is defined to be Ri, each entropy-coded segment except the last one shall contain Ri MCUs. The last one shall contain whatever number of MCUs completes the scan."

Based on the above, it thus seem to me that we should simply ensure that we're not attempting to continue to parse Scan data once we've found all MCUs (Minimum Coded Unit) of the image.

Fixes 7828.
2017-03-20 17:16:33 +01:00
Jonas Jenwald
be1a6f294f Try to recover when encountering JPEG markers with too short marker lengths (issue 8169)
The issue with the JPEG image in question, is that the COM (Comment) marker has an incorrect length entry.

Fixes 8169.
2017-03-20 17:05:51 +01:00
Jonas Jenwald
a7c19d9cbb Adjust the yoda ESLint rule to apply to inequalities as well
I happened to notice that some inequalities had the wrong order, and was surprised since I thought that the `yoda` rule should have caught that.
However, reading http://eslint.org/docs/rules/yoda#options a bit more closely than previously, it's quite obvious that the `onlyEquality` option does *exactly* what its name suggests. Hence I think that it makes sense to adjust the options such that only ranges are allowed instead.
2017-03-19 13:27:14 +01:00
Jonas Jenwald
224613a511 Merge pull request #8135 from jasonjensen/issue8097
Handle cff fonts with erroneous stackSize (issue 8097)
2017-03-11 09:55:00 +01:00
Tim van der Meij
fc5810c97a Merge pull request #8144 from timvandermeij/issue-8143
Widget annotations: do not crash if `Parent` is not a dictionary during field name construction (issue 8143)
2017-03-10 00:40:13 +01:00
Tim van der Meij
936d3c0698
Widget annotations: do not crash if Parent is not a dictionary
during field name construction (issue 8143)
2017-03-09 23:51:52 +01:00
Jason O. Jensen
d230784ac3 Handle cff fonts with erroneous stackSize 2017-03-06 19:28:46 -05:00
Tim van der Meij
4e3e97be8e Merge pull request #8129 from Snuffleupagus/getInheritedPageProp-undefined
Return `undefined` instead of `Dict.empty` from `Page.getInheritedPageProp` for non-existent properties to prevent possible future bugs
2017-03-04 16:18:24 +01:00
Yury Delendik
c290561488 Merge pull request #8120 from yurydelendik/lib
Publishes processed sources into pdfjs-dist/lib
2017-03-04 08:48:36 -06:00
Jonas Jenwald
9bed87f5dc Return undefined instead of Dict.empty from Page.getInheritedPageProp for non-existent properties to prevent possible future bugs
*This is something that I noticed while working on PR 8126, which is (more) fallout from PR 6065.*

In general, it's actually *not* correct to return `Dict.empty` as the default value for non-existent properties. Please note that a prior PR, see https://github.com/mozilla/pdf.js/pull/5957#issuecomment-103112698, asked for that behaviour but I don't think that's right.

Obviously for properties that are (or should) be `Dict`s it makes sense, however certain properties can be e.g. Strings or Arrays instead. In the latter case, returning `Dict.empty` is just plain wrong, and it's quite fascinating that this hasn't caused any errors in practice. (The existing validation in the various getters has actually saved us here.)

Also, when looking at this code again, it seemed unnecessary to duplicate the `MAX_LOOP_COUNT` check since we could just return immediately instead.
2017-03-04 13:08:39 +01:00
Tim van der Meij
1eb96d7ca9 Merge pull request #8128 from timvandermeij/csp-headers
Network: use the current location to prevent errors when using CSP headers
2017-03-04 00:01:50 +01:00
Yury Delendik
e7cc07cc11 Moves checkProblematicCharRanges to font_spec.js 2017-03-03 16:33:35 -06:00
Job van der Weiden
a05115d2ec
Network: use the current location to prevent errors when using CSP headers
When using content security headers to restrict connections to the same origin,
you may not make connections to `example.com`. This feature detection also
works with a request to the current location.
2017-03-03 23:18:51 +01:00
Jonas Jenwald
4a0ff5dbf7 Ensure that we don't ignore 0 values in Page.getInheritedPageProp (issue 8125)
It appears that I accidentally broke this in PR 6065, sorry about that!

The issue in this particular PDF file is that there's `/Rotate` entries on different levels of the `/Pages` tree. We're supposed to use the `/Rotate` entry in the `/Page` dict (which is `0`), but because of an incorrect condition we instead ended up with the one from the `/Pages` dict (which is `180`).

Fixes 8125.
2017-03-03 12:27:40 +01:00
Jonas Jenwald
9163a6fba4 Merge pull request #8112 from Snuffleupagus/JS-action-newWindow
Support the `newWindow` flag in white-listed `app.launchURL` JavaScript actions (PR 7794 follow-up)
2017-03-01 21:24:34 +01:00
Tim van der Meij
4e201d3787 Merge pull request #8072 from timvandermeij/annotation-append-operator-list
Annotations: move operator list addition logic to `src/core/document.js`
2017-02-27 22:50:57 +01:00
Tim van der Meij
0739f90707
Annotations: move operator list addition logic to src/core/document.js
Ideally, the `Annotation` class should not have anything to do with the
page's operator list. How annotations are added to the page's operator
list is logic that belongs in `src/core/document.js` instead where the
operator list is constructed.

Moreover, some comments have been added to clarify the intent of the
code.
2017-02-27 22:17:49 +01:00
Tim van der Meij
9db4240b85 Merge pull request #8110 from timvandermeij/interactive-forms-choice-inherit-options
Interactive forms: make choice widget options inheritable (issue 8094)
2017-02-27 22:14:25 +01:00
Jonas Jenwald
2a7e5b8a54 Support the newWindow flag in white-listed app.launchURL JavaScript actions (PR 7794 follow-up)
A simple follow-up to PR 7794, which let's us add support for the `newWindow` parameter; refer to https://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/js_api_reference.pdf#G5.1507380.

The patch also fixes an embarrassing oversight regarding the placement of the case-insensitive flag, and also allows arbitrary white-space at the beginning of JS actions.
2017-02-27 15:58:28 +01:00
Tim van der Meij
8990de8614
Interactive forms: make choice widget options inheritable (issue 8094)
Even though the PDF specification does not state that `Opt` fields are
inheritable, in practice there are PDF generators that let annotations
inherit the options from a parent.
2017-02-25 23:34:26 +01:00
Jonas Jenwald
14cc6acb90 Ensure that Dicts found in Object Streams are assigned an objId in XRef.fetch
This fixes something that I noticed while working with the code in `Catalog.getPageDict` when debugging issue 8088.

Note that while I don't have an example where this patch really matters, given that e.g. `PartialEvaluator.hasBlendModes` depends on the `objId` to avoid cyclic references this patch could potentially help for some PDF files.
2017-02-25 10:20:19 +01:00
Tim van der Meij
59392fd544 Merge pull request #8102 from yurydelendik/mv-compatibilty
Move compatibility code to the shared/compatibility.js.
2017-02-24 22:47:49 +01:00
Jonas Jenwald
1ce295541c Always check all Kids nodes, in Catalog.getPageDict, to avoid getting stuck in an empty node further down in the Pages tree (issue 8088)
As discussed on IRC, we need to check all nodes at the *bottom* of the tree to ensure that we find the correct `Page` dict.
Furthermore, this patch also gets rid of the caching present in a previous version, since it's not clear if that really helps.

Note that this patch purposely adds an `eq` test, using a reduced test-case, so that we can be sure that the algorithm actually finds the correct `Page` dict for each `pageIndex`.

Fixes 8088.
2017-02-24 12:09:46 +01:00
Yury Delendik
facefb0c79 Move compatibility code to the shared/compatibility.js. 2017-02-23 19:18:44 -06:00
Jonas Jenwald
9082f08e37 Enable running the cmap unit-tests on Travis by utilizing a NodeCMapReaderFactory 2017-02-17 23:15:36 +01:00
Yury Delendik
cfaa621a05 Merge pull request #8064 from Snuffleupagus/fetchBuiltInCMap
[api-minor] Refactor fetching of built-in CMaps to utilize a factory on the `display` side instead, to allow users of the API to provide a custom CMap loading factory (e.g. for use with Node.js)
2017-02-17 15:30:31 -06:00
Brendan Dahl
425ad30912 Merge pull request #8071 from Snuffleupagus/bug-1337429
Always choose a (3, 1) cmap table for TrueType fonts that have an encoding specified, regardless of the Symbolic font flag (bug 1337429)
2017-02-16 15:13:46 -08:00
Jonas Jenwald
111419a64a Cache built-in binary CMap files in the worker (issue 4794) 2017-02-16 10:55:39 +01:00
Jonas Jenwald
769c1450b7 [api-minor] Refactor fetching of built-in CMaps to utilize a factory on the display side instead, to allow users of the API to provide a custom CMap loading factory (e.g. for use with Node.js)
Currently the built-in CMap files are loaded in `src/core/cmap.js` using `XMLHttpRequest` directly. For some environments that might be a problem, hence this patch refactors that to instead use a factory to load built-in CMaps on the main thread and message the data to the worker thread.

This is inspired by other recent work, e.g. the addition of the `CanvasFactory`, and to a large extent on the IRC discussion starting at http://logs.glob.uno/?c=mozilla%23pdfjs&s=12+Oct+2016&e=12+Oct+2016#c53010.
2017-02-16 10:55:35 +01:00
Tim van der Meij
8aad33e8a3 Merge pull request #8065 from timvandermeij/annotation-appearances
Annotations: refactor setting the normal appearance stream
2017-02-15 23:27:40 +01:00
Tim van der Meij
26fc79d51d
Annotations: refactor setting the normal appearance stream
Previously, we had a function called `getDefaultAppearance`. This name,
however, is misleading as the method gets the normal appearance (in the
`N` entry) and not the default appearance (in the `DA` entry). Moreover,
it was not entirely clear how it works just from reading the code. It
primarily lacks comments and explicit error case handling.

This patch improves the situation by fixing the issues mentioned above
and making this function a proper method of the `Annotation` class, just
like e.g., `setColor` and `setBorderStyle`.
2017-02-15 22:42:17 +01:00
Jonas Jenwald
ce072022c1 Always choose a (3, 1) cmap table for TrueType fonts that have an encoding specified, regardless of the Symbolic font flag (bug 1337429)
This patch basically reverts one aspect of TrueType (3, 1) cmap parsing to the state prior to PR 4259. After that PR, a number of regressions occurred in this particular code-path, which necessitated a number of follow-ups such as PRs 5703, 5743, and 6425.
The empirical data suggests, at least to me, that we should always prefer a (3, 1) cmap for TrueType fonts when they have an encoding, regardless of the Symbolic font flag.

Obviously this patch passes all unit/font/reference tests locally, and I made sure that all the PRs mentioned above landed with test-cases included.
However, in my opinion, there's still a very real possibility that this patch could potentially cause new regressions.

Given that the PDF file in bug 1337429 has been broken for almost *three* years before anyone noticed, and considering that the code-path in question has been the source of numerous regressions, I do *not* intend to request uplift of this patch to previous Firefox versions (assuming that it's even accepted).

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1337429.
2017-02-15 17:38:08 +01:00
Yury Delendik
fa0e559fe2 New node.js check to protect from webpack. 2017-02-14 15:00:52 -06:00
Jonas Jenwald
23c62cc321 Consume the current character when encountering illegal characters in Lexer.getObject, in order to prevent infinite loops during reading of streams (issue 8061)
*Please note:* The rendering of the PDF file in issue 8061 first regressed in PR 7039, and then PR 7493 exacerbated the problem even further by causing an infinite loop.

In this particular case, when errors were encountered inside of the `Lexer.getObject` method *itself*, we didn't advance the stream position. This thus caused an inifinite loop in `parseCMap`, since the exact same character was then parsed over and over again.

Fixes 8061.
2017-02-11 19:32:48 +01:00
pmysore1
af8292058f Font ascent descent calculation fix 2017-02-11 01:25:05 -05:00
vkuryakov
4e181e59ef Interactive forms: values for radio buttons (issue #6995) 2017-02-07 23:42:40 +01:00
Jonas Jenwald
9c34d0aa8c [api-minor] Add a getDocument parameter that allows disabling of the NativeImageDecoder (e.g. for use with Node.js)
Note that I initially tried to add this as a parameter to the `PDFPageProxy.render` method, such that it could be passed to `PartialEvaluator.getOperatorList`.
However, given all the different code-paths that call `getOperatorList` (there's a bunch only in `annotation.js`), this seemed to very quickly become unwieldy and thus difficult to maintain compared to simply using the existing `evaluatorOptions`.
2017-02-06 22:21:34 +01:00
Jonas Jenwald
bc736fdc7d Adjust the brace-style ESLint rule to disallow single lines (and also enable no-iterator)
See http://eslint.org/docs/rules/brace-style.
Having the opening/closing braces on the same line can often make the code slightly more difficult to read, in particular for `if`/`else if` statements, compared to using new lines.

This patch also, for consistency with `mozilla-central`, enables the [`no-iterator`](http://eslint.org/docs/rules/no-iterator) rule. Note that this rule didn't require a single code change.
2017-02-04 15:53:08 +01:00
Tim van der Meij
6f0cf8c4cb Merge pull request #7972 from Snuffleupagus/eslint_no-unused-vars
Enable the `no-unused-vars` ESLint rule
2017-02-01 23:50:23 +01:00
Jonas Jenwald
f7d99ccc26 Remove the unused isStream property on various Streams
This property was added all the way back in PR 542, but hasn't actually been relied upon ever since PR 692.
Note that there's a `isStream()` utility function which replaced the property years ago, hence the `isStream` property is now dead code.
2017-02-01 11:38:11 +01:00
Jonas Jenwald
52e0f51917 Enable the no-unused-vars ESLint rule
Please see http://eslint.org/docs/rules/no-unused-vars; note that this patch purposely uses the same rule options as in `mozilla-central`, such that it fixes part of issue 7957.

It wasn't, in my opinion, entirely straightforward to enable this rule compared to the already existing rules. In many cases a `var descriptiveName = ...` format was used (more or less) to document the code, and I choose to place the old variable name in a trailing comment to not lose that information.

I welcome feedback on these changes, since it wasn't always entirely easy to know what changes made the most sense in every situation.
2017-01-29 23:23:17 +01:00
Jonas Jenwald
50c2856097 Move EOF/isEOF from core/parser.js to core/primitives.js
Given the nature of `EOF` and `isEOF`, it seems to me that they really ought to be placed in `core/primitives.js` instead.

In general, it doesn't seem great to have to depend on the entire `core/parser.js` file for such simple primitives/helper functions.
In particular, while `core/ps_parser.js` is completely separate from `core/parser.js` with regards to its function, it still depends on the latter for just *one* primitive.

Note that compared to e.g. PR 7389, this will not reduce the number of dependencies for `core/ps_parser`, however the new dependency IMHO makes more sense.
2017-01-27 13:37:48 +01:00
Jonas Jenwald
f000417ce0 [Firefox addon] Stop bundling src/core/network.js into the FIREFOX/MOZCENTRAL builds (PR 7322 follow-up)
PR 7322 added the `PdfJsNetwork.jsm` file, instead of the general `src/core/network.js` file for the Firefox addon. However, `make.js` wasn't updated to actually stop including the now obsolete network file.
2017-01-23 22:23:17 +01:00
Jonas Jenwald
f77c52291e Enable the no-empty-pattern/no-floating-decimal/no-self-compare/no-delete-var/no-new-object ESLint rules
The following rules required no code changes:
http://eslint.org/docs/rules/no-empty-pattern
http://eslint.org/docs/rules/no-floating-decimal
http://eslint.org/docs/rules/no-delete-var
http://eslint.org/docs/rules/no-new-object

There was just one change needed in order to enable:
http://eslint.org/docs/rules/no-self-compare; which I think helps readability a lot, since that comparison makes no sense until you realize that we push `NaN` onto the `stack` in some cases *and* furthermore that `NaN !== NaN`.
2017-01-23 20:30:50 +01:00