Commit Graph

131 Commits

Author SHA1 Message Date
Jonas Jenwald
b773b356af Convert NameOrNumberTree, NameTree, and NumberTree to ES6 classes
Also changes `var` to `let`/`const` in code already touched in the patch.
2018-07-09 21:12:01 +02:00
Jonas Jenwald
56e3648b65 Add basic validation of the 'trailer' dictionary candidates in XRef.indexObjects (issue 9418)
This patch avoids choosing a (possible) 'trailer' dictionary that `XRef.parse` and/or the `Catalog` constructor/methods will reject anyway.
Since `XRef.indexObjects` is already parsing the entire PDF file, the extra dictionary look-ups added here shouldn't matter much. Besides, this is a fallback code-path that only applies to corrupt PDF files anyway.
2018-06-20 13:41:22 +02:00
Jonas Jenwald
346810e02a Add basic validation of the 'Root' dictionary in XRef.parse and try to recover when possible
Note that the `Catalog` constructor, and some of its methods, are already enforcing that the 'Root' dictionary is valid/well-formed. However, by doing additional validation already in `XRef.parse` there's a slightly larger chance that corrupt PDF files could be successfully parsed/rendered.
2018-06-20 13:41:22 +02:00
Jonas Jenwald
e84813e7cc Prevent hard errors if fetching the Encrypt dictionary fails in XRef.parse 2018-06-20 13:41:22 +02:00
Jonas Jenwald
30ad62a86a Use the correct startPos when repeating the search for 'endobj' operators in XRef.indexObjects (PR 9288 follow-up) 2018-06-20 13:41:22 +02:00
Jonas Jenwald
731f2e6dfc Remove manual clamping/rounding from ColorSpace and PDFImage, by having their methods use Uint8ClampedArrays
The built-in image decoders are already using `Uint8ClampedArray` when returning data, and this patch simply extends that to the rest of the image/colorspace code.

As far as I can tell, the only reason for using manual clamping/rounding in the first place was because TypedArrays used to be polyfilled (using regular arrays). And trying to polyfill the native clamping/rounding would probably have been had too much overhead, but given that TypedArray support is required in PDF.js version `2.0` that's no longer a concern.

*Please note:* Because of different rounding behaviour, basically `Math.round` in `Uint8ClampedArray` respectively `Math.floor` in the old code, there will be very slight movement in quite a few existing test-cases. However, the changes should be imperceivable to the naked eye, given that the absolute difference is *at most* `1` for each RGB component when comparing `master` and this patch (see also the updated expectation values in the unit-tests).
2018-06-12 11:01:32 +02:00
Wojciech Maj
ea2850e9a7 Fix typos 2018-04-01 23:20:41 +02:00
Jonas Jenwald
374d074f6e Add stricter validation in Catalog.readPageLabels
The current PageLabel dictionary validation code won't catch some (unlikely) forms of corruption. For example: a `Type`/`S` entry being `null`/`0`/empty string, a `P`/`St` entry being `null`/`0`.

Please note: I'm not aware of any bugs caused by the old code, but I've had this patch sitting locally for some time and figured it couldn't hurt to submit it.
2018-03-21 14:36:05 +01:00
Jonas Jenwald
d431ae069d Attempt to handle corrupt PDF documents that inline Page dictionaries in a Kids array (issue 9540)
According to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.1942297, the contents of a Kids array should be indirect objects.
2018-03-12 14:13:23 +01:00
Jonas Jenwald
1dc54ddb40 Handle PDF files with missing 'endobj' operators, by searching for the "obj" string rather than "endobj" in XRef.indexObjects (issue 9105)
This patch refactors the searching for 'endobj', to try and find the next occurance of "obj" and then check if it was in fact an 'endobj' and continue searching otherwise.
This approach is used to avoid having to first find 'endobj', and then re-check the entire contents of the object and having to run (potentially expensive) regular expressions on arbitrary long strings.

Fixes 9105.
2017-12-18 13:17:45 +01:00
Naveen Jain
1135674647 Replaced occurence of throw new Error with unreachable where applicable 2017-12-14 12:58:50 +05:30
Jonas Jenwald
1cd1582cb9 [api-major] Change getJavaScript to return null, rather than an empty Array, when no JavaScript exists
Other API methods already return `null`, rather than empty Arrays/Objects, hence it makes sense to change `getJavaScript` to be consistent.
2017-10-15 22:17:14 +02:00
Jonas Jenwald
cfb4955a92 Replace the isArray helper function with the native Array.isArray function
*Follow-up to PR 8813.*
2017-09-01 20:27:13 +02:00
Jonas Jenwald
11408da340 Replace the isInt helper function with the native Number.isInteger function
*Follow-up to PR 8643.*
2017-09-01 16:52:50 +02:00
Jonas Jenwald
772a5412a4 Avoid some redundant type checks in XRef.fetchUncompressed
When looking briefly at using `Number.isInteger`/`Number.isNan` rather than `isInt`/`isNaN`, I noticed that there's a couple of not entirely straightforward cases to consider.

At first I really couldn't understand why `parseInt` is being used like it is in `XRef.fetchUncompressed`, since the `num` and `gen` properties of an object reference should *always* be integers.
However, doing a bit of code archaeology pointed to PR 4348, and it thus seem that this was a very deliberate change. Since I didn't want to inadvertently introduce any regressions, I've kept the `parseInt` calls intact but moved them to occur *only* when actually necessary.[1]

Secondly, I noticed that there's a redundant `isCmd` check for an edge-case of broken operators. Since we're throwing a `FormatError` if `obj3` isn't a command, we don't need to repeat that check.

In practice, this patch could perhaps be considered as a micro-optimization, but considering that `XRef.fetchUncompressed` can be called *many* thousand times when loading larger PDF documents these changes at least cannot hurt.

---
[1] I even ran all tests locally, with an added `assert(Number.isInteger(obj1) && Number.isInteger(obj2));` check, and everything passed with flying colours.
However, since it appears that this was in fact necessary at one point, one possible explanation is that the failing test-case(s) have now been replaced by reduced ones.
2017-08-31 16:49:04 +02:00
Jonas Jenwald
42f2d36d1f Account for broken outlines/annotations, where the destination dictionary contains an invalid /Dest entry
According to the specification, see http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=377, a `Dest` entry in an outline item should *not* contain a dictionary.
Unsurprisingly there's PDF generators that completely ignore this, treating is an `A` entry instead.

The patch also adds a little bit more validation code in `Catalog.parseDestDictionary`.
2017-08-26 17:38:15 +02:00
Jonas Jenwald
4660cf8238 Prevent an infinite loop in XRef.readXRef by keeping track of already parsed tables (bug 1393476)
With this patch, not only is the infinite loop prevented, but we're also able to actually render the file (which e.g. Adobe Reader isn't able to).

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1393476.
2017-08-24 19:18:08 +02:00
Tim van der Meij
af71ea7a7d Merge pull request #8673 from Snuffleupagus/api-pageMode
[api-minor] Add support for PageMode in the API and viewer (issue 8657)
2017-07-23 13:17:07 +02:00
Jonas Jenwald
814fa1dee3 Remove most assert() calls (issue 8506)
This replaces `assert` calls with `throw new FormatError()`/`throw new Error()`.
In a few places, throwing an `Error` (which is what `assert` meant) isn't correct since the enclosing function is supposed to return a `Promise`, hence some cases were changed to `Promise.reject(...)` and similarily for `createPromiseCapability` instances.
2017-07-21 18:51:02 +02:00
Jonas Jenwald
15f0963f51 Fix a typo, in the Catalog.numPages getter, than prevents shadowing from working correctly
Looking at the blame, it seems that this typo was present even before PR 700 (almost six years ago).
The result of using `'num'`, rather than the *correct* `'numPages'` string, is that the `Catalog.numPages` getter isn't actually being shadowed.
2017-07-20 12:35:09 +02:00
Jonas Jenwald
16c5d41c5b [api-minor] Add support for PageMode in the API (issue 8657)
Please refer to https://wwwimages2.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf#page=82.
2017-07-19 16:40:03 +02:00
Yury Delendik
d028c26210 Removes error() 2017-07-07 09:40:24 -05:00
Jonas Jenwald
3a20fd165f Refactor ObjectLoader to use Dicts correctly, rather than abusing their internal properties
The `ObjectLoader` currently takes an Object as input, despite actually working with `Dict`s internally. This means that at the (two) existing call-sites, we're passing in the "private" `Dict.map` property directly.

Doing this seems like an anti-pattern, and we could (and even should) simply provide the actual `Dict` when creating an `ObjectLoader` instance.
Accessing properties stored in the `Dict` is now done using the intended methods instead, in particular `getRaw` which (as the name suggests) doesn't do any de-referencing, thus maintaining the current functionality of the code.

The only functional change in this patch is that `ObjectLoader.load` will now ignore empty nodes, such that `ObjectLoader._walk` only needs to deal with nodes that are known to contain data. (This lets us skip, among other checks, meaningless `addChildren` function calls.)
2017-06-16 22:59:32 +02:00
Jonas Jenwald
f2fc9ee281 Slightly refactor and ES6-ify the code in ObjectLoader
This patch changes all `var` to `let`, and caches the array lengths in all loops. Also removes two unnecessary temporary variable assignments.
2017-06-16 22:59:32 +02:00
Jonas Jenwald
a8c87f8019 Fix inconsistent spacing and trailing commas in objects in src/core/ files, so we can enable the comma-dangle and object-curly-spacing ESLint rules later on
*Unfortunately this patch is fairly big, even though it only covers the `src/core` folder, but splitting it even further seemed difficult.*

http://eslint.org/docs/rules/comma-dangle
http://eslint.org/docs/rules/object-curly-spacing

Given that we currently have quite inconsistent object formatting, fixing this in *one* big patch probably wouldn't be feasible (since I cannot imagine anyone wanting to review that); hence I've opted to try and do this piecewise instead.

Please note: This patch was created automatically, using the ESLint --fix command line option. In a couple of places this caused lines to become too long, and I've fixed those manually; please refer to the interdiff below for the only hand-edits in this patch.

```diff
diff --git a/src/core/evaluator.js b/src/core/evaluator.js
index abab9027..dcd3594b 100644
--- a/src/core/evaluator.js
+++ b/src/core/evaluator.js
@@ -2785,7 +2785,8 @@ var EvaluatorPreprocessor = (function EvaluatorPreprocessorClosure() {
     t['Tz'] = { id: OPS.setHScale, numArgs: 1, variableArgs: false, };
     t['TL'] = { id: OPS.setLeading, numArgs: 1, variableArgs: false, };
     t['Tf'] = { id: OPS.setFont, numArgs: 2, variableArgs: false, };
-    t['Tr'] = { id: OPS.setTextRenderingMode, numArgs: 1, variableArgs: false, };
+    t['Tr'] = { id: OPS.setTextRenderingMode, numArgs: 1,
+                variableArgs: false, };
     t['Ts'] = { id: OPS.setTextRise, numArgs: 1, variableArgs: false, };
     t['Td'] = { id: OPS.moveText, numArgs: 2, variableArgs: false, };
     t['TD'] = { id: OPS.setLeadingMoveText, numArgs: 2, variableArgs: false, };
diff --git a/src/core/jbig2.js b/src/core/jbig2.js
index 5a17d482..71671541 100644
--- a/src/core/jbig2.js
+++ b/src/core/jbig2.js
@@ -123,19 +123,22 @@ var Jbig2Image = (function Jbig2ImageClosure() {
      { x: -1, y: -1, }, { x: 0, y: -1, }, { x: 1, y: -1, }, { x: -2, y: 0, },
      { x: -1, y: 0, }],
     [{ x: -3, y: -1, }, { x: -2, y: -1, }, { x: -1, y: -1, }, { x: 0, y: -1, },
-     { x: 1, y: -1, }, { x: -4, y: 0, }, { x: -3, y: 0, }, { x: -2, y: 0, }, { x: -1, y: 0, }]
+     { x: 1, y: -1, }, { x: -4, y: 0, }, { x: -3, y: 0, }, { x: -2, y: 0, },
+     { x: -1, y: 0, }]
   ];

   var RefinementTemplates = [
     {
       coding: [{ x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }],
-      reference: [{ x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }, { x: 0, y: 0, },
-                  { x: 1, y: 0, }, { x: -1, y: 1, }, { x: 0, y: 1, }, { x: 1, y: 1, }],
+      reference: [{ x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, },
+                  { x: 0, y: 0, }, { x: 1, y: 0, }, { x: -1, y: 1, },
+                  { x: 0, y: 1, }, { x: 1, y: 1, }],
     },
     {
-      coding: [{ x: -1, y: -1, }, { x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }],
-      reference: [{ x: 0, y: -1, }, { x: -1, y: 0, }, { x: 0, y: 0, }, { x: 1, y: 0, },
-                  { x: 0, y: 1, }, { x: 1, y: 1, }],
+      coding: [{ x: -1, y: -1, }, { x: 0, y: -1, }, { x: 1, y: -1, },
+               { x: -1, y: 0, }],
+      reference: [{ x: 0, y: -1, }, { x: -1, y: 0, }, { x: 0, y: 0, },
+                  { x: 1, y: 0, }, { x: 0, y: 1, }, { x: 1, y: 1, }],
     }
   ];
```
2017-06-02 11:20:19 +02:00
Jonas Jenwald
982b6aa65b Convert the files in the /src/core folder to ES6 modules
Please note that the `glyphlist.js` and `unicode.js` files are converted to CommonJS modules instead, since Babel cannot handle files that large and they are thus excluded from transpilation.
2017-05-30 22:06:21 +02:00
Jonas Jenwald
0ddf52aca5 Remove the special handling for nameddests that look like standard pageNumbers
PR 7341 added special handling for `nameddest`s that look like pageNumbers, to prevent issues since we previously *incorrectly* supported specifying a pageNumber directly in the hash; i.e. `#10` versus the correct `#page=10` format.

Since this behaviour wasn't correct, PR 7757 fixed and deprecated the old format, which means that we no longer need to maintain the `nameddest` hack in multiple files.
2017-05-20 11:29:29 +02:00
Jonas Jenwald
ebaa22478c Replace unnecessary bind(this) and var self = this statements with arrow functions in remaining src/core/ files 2017-05-02 15:47:43 +02:00
Jonas Jenwald
afc74b0178 Enable the object-shorthand ESLint rule in src/shared
Please see http://eslint.org/docs/rules/object-shorthand.

For the most part, these changes are of the search-and-replace kind, and the previously enabled `no-undef` rule should complement the tests in helping ensure that no stupid errors crept into to the patch.
2017-04-27 17:29:40 +02:00
Jonas Jenwald
61ee0de29f Use a simple RefSetCache to significantly improve the performance of Catalog.getPageDict for certain long documents (PR 8105 follow-up)
I found that PR 8105 unfortunately causes a *very serious* performance regression in long PDF documents where the `Pages` tree only has one level; my apologies for this!

Obviously we cannot revert that PR, since that would cause more issues than it solves. Hence it seems to me that the only viable solution here, is to add a simple `RefSetCache` to reduce the amount of redundant lookups.
Previously in PR 8105 caching was thought to be unnecessary, but as it turns out I don't think that we really have a choice in the matter any more.
2017-03-28 21:39:55 +02:00
Rob Wu
49af56f730 Rethrow MissingDataException when needed
In core/document.js: `PDFDocument.prototype.parse` accesses a dictionary
property, which could throw if the underlying data is not yet available.

In core/obj.js: `get Catalog.prototype.metadata` calls
`stream.getBytes`, which can throw MissingDataException too when the
stream is a ChunkedStream.
2017-03-22 14:55:59 +01:00
Jonas Jenwald
9163a6fba4 Merge pull request #8112 from Snuffleupagus/JS-action-newWindow
Support the `newWindow` flag in white-listed `app.launchURL` JavaScript actions (PR 7794 follow-up)
2017-03-01 21:24:34 +01:00
Jonas Jenwald
2a7e5b8a54 Support the newWindow flag in white-listed app.launchURL JavaScript actions (PR 7794 follow-up)
A simple follow-up to PR 7794, which let's us add support for the `newWindow` parameter; refer to https://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/js_api_reference.pdf#G5.1507380.

The patch also fixes an embarrassing oversight regarding the placement of the case-insensitive flag, and also allows arbitrary white-space at the beginning of JS actions.
2017-02-27 15:58:28 +01:00
Jonas Jenwald
14cc6acb90 Ensure that Dicts found in Object Streams are assigned an objId in XRef.fetch
This fixes something that I noticed while working with the code in `Catalog.getPageDict` when debugging issue 8088.

Note that while I don't have an example where this patch really matters, given that e.g. `PartialEvaluator.hasBlendModes` depends on the `objId` to avoid cyclic references this patch could potentially help for some PDF files.
2017-02-25 10:20:19 +01:00
Jonas Jenwald
1ce295541c Always check all Kids nodes, in Catalog.getPageDict, to avoid getting stuck in an empty node further down in the Pages tree (issue 8088)
As discussed on IRC, we need to check all nodes at the *bottom* of the tree to ensure that we find the correct `Page` dict.
Furthermore, this patch also gets rid of the caching present in a previous version, since it's not clear if that really helps.

Note that this patch purposely adds an `eq` test, using a reduced test-case, so that we can be sure that the algorithm actually finds the correct `Page` dict for each `pageIndex`.

Fixes 8088.
2017-02-24 12:09:46 +01:00
Jonas Jenwald
111419a64a Cache built-in binary CMap files in the worker (issue 4794) 2017-02-16 10:55:39 +01:00
Jonas Jenwald
4046d67fde Enable the no-else-return ESLint rule
Using `else` after `return` is not necessary, and can often lead to unnecessarily cluttered code. By using the `no-else-return` rule in ESLint we can avoid this pattern, see http://eslint.org/docs/rules/no-else-return.
2017-01-09 20:27:39 +01:00
Jonas Jenwald
14b8523314 Refactor the password handling so that it's stored in the PdfManagers, instead of in the XRef
We're already passing in a, currently unused, `PdfManager` instance when initializing the `XRef`. To avoid having to pass a single `password` parameter around, we could thus simply get the `password` through the `PdfManager` instance instead.
2017-01-03 20:29:52 +01:00
Jonas Jenwald
c6008b4d7c Fix the JSDoc comment for Catalog.parseDestDictionary 2016-11-27 11:18:18 +01:00
Jonas Jenwald
6d8a404a9c [api-minor] Add support for a couple of white-listed JavaScript actions that contains valid URLs (issue 3897, bug 843699)
By only allowing very specific type of `JavaScript` actions, and also utilizing the existing `URL` validation, this patch shouldn't pose too much risk.

Fixes one of the points in issue 3897 (with the PDF file taken from issue 3438).
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=843699 (probably, since that bug doesn't contain a test-case).
2016-11-08 16:48:27 +01:00
Jonas Jenwald
0844a72b4d Add a bit more validation to Catalog_readPageLabels, to ensure that the Page Labels are well formed 2016-11-03 20:08:06 +01:00
Jonas Jenwald
2d8d8b5e53 Use stringToPDFString to sanitizing bad "Prefix" entries in Page Label dictionaries
It seems that certain bad PDF generators can create badly encoded "Prefix" entries for Page Labels, one example being http://ukjewishfilm.org/wp-content/uploads/2015/09/Jewish-Film-Festival-Programme-ONLINE.pdf.

Unfortunately I didn't come across such a PDF file while adding the API support for Page Labels, but with them now being used in the viewer I just found this issue. With this patch, we now display the Page Labels in the same way as Adobe Reader.
2016-11-03 19:48:08 +01:00
Tim van der Meij
6e22b32372 Merge pull request #7745 from Snuffleupagus/Launch-actions
[api-minor] Add basic support for `Launch` actions (issue 1778, issue 3897, issue 6616)
2016-11-01 21:12:08 +01:00
Tim van der Meij
5194e68134 Lint: correct code style violations
Manual observations and working with other linting tools found these.
2016-11-01 15:04:21 +01:00
Jonas Jenwald
2b79782377 [api-minor] Add basic support for Launch actions (issue 1778, issue 3897, issue 6616)
In general we neither want, nor can, support arbitrary `Launch` actions. But in practice, all the cases we've seen so far just contains relative URLs to other PDF files. Building on PR 7689, we can thus at least support basic `Launch` actions.
2016-10-21 13:40:32 +02:00
Jonas Jenwald
d284cfd5eb [api-minor] Add support for relative URLs, in both annotations and the outline, by adding a docBaseUrl parameter to PDFJS.getDocument (bug 766086)
Note that in `FIREFOX/MOZCENTRAL/CHROME` builds of the standard viewer the `docBaseUrl` parameter will be set by default, since in that case it makes sense to use the current URL as a base.
For the `GENERIC` viewer, or the API itself, it doesn't make sense to try and set the `docBaseUrl` by default. However, custom deployments/implementations may still find the parameter useful.
2016-10-19 22:20:24 +02:00
Jonas Jenwald
71a781ee5c Deprecate the isValidUrl utility function and replace it with createValidAbsoluteUrl/isValidProtocal functions instead, since the main URL validation is now done using the new URL constructor 2016-10-19 22:11:22 +02:00
Jonas Jenwald
42f07c6262 [api-minor] Use the new URL constructor when validating URLs in annotations and the outline, as a complement to only checking the protocol, and add a bit more validation to Catalog_parseDestDictionary
Note that this will automatically reject any relative URL.
To make the API more useful to consumers, URLs that are rejected will be available via the `unsafeUrl` property in the data object returned by `PDFPageProxy_getAnnotations`.

The patch also adds a bit more validation of the data for `Named` actions.
2016-10-19 22:11:17 +02:00
Jonas Jenwald
e64bc1fd13 Move parsing of destination dictionaries to a helper function
This not only reduces code duplication, but it also allow us to easily support the same kind of URLs we currently do for Link annotations in the Outline as well.
2016-10-18 16:14:07 +02:00
Jonas Jenwald
3e77cf6b32 Prevent an infinite loop in XRef_fetchUncompressed for encrypted PDF files with indirect objects in the /Encrypt dictionary (issue 7665) 2016-09-25 00:18:47 +02:00