Commit Graph

84 Commits

Author SHA1 Message Date
Jonas Jenwald
42f07c6262 [api-minor] Use the new URL constructor when validating URLs in annotations and the outline, as a complement to only checking the protocol, and add a bit more validation to Catalog_parseDestDictionary
Note that this will automatically reject any relative URL.
To make the API more useful to consumers, URLs that are rejected will be available via the `unsafeUrl` property in the data object returned by `PDFPageProxy_getAnnotations`.

The patch also adds a bit more validation of the data for `Named` actions.
2016-10-19 22:11:17 +02:00
Jonas Jenwald
e64bc1fd13 Move parsing of destination dictionaries to a helper function
This not only reduces code duplication, but it also allow us to easily support the same kind of URLs we currently do for Link annotations in the Outline as well.
2016-10-18 16:14:07 +02:00
Jonas Jenwald
3e77cf6b32 Prevent an infinite loop in XRef_fetchUncompressed for encrypted PDF files with indirect objects in the /Encrypt dictionary (issue 7665) 2016-09-25 00:18:47 +02:00
Tim van der Meij
b4c8814fc9 Merge pull request #7534 from Snuffleupagus/isName-name-check
Add a parameter to the `isName` function that enables checking not just that something is a `Name`, but also that the actual `name` properties matches
2016-08-17 15:48:42 +02:00
Jonas Jenwald
544d29f5cb Add a recoveryMode that suppresses errors from the Parser, and utilize it when searching for the main trailer in XRef_indexObjects (bug 1250079)
Instead of having `Parser_getObj` fail unconditionally for the referenced PDF file, this patch attempts to let searching for the main trailer continue even if there are errors.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1250079.
2016-08-17 12:37:35 +02:00
Jonas Jenwald
83ce6f0b6d Adjust the (applicable) existing isName callsites to use the new isName(v, name) version of the function 2016-08-10 11:15:08 +02:00
Jonas Jenwald
01ab15a6f1 [api-minor] Let Catalog_getPageIndex check that the Ref actually points to a /Page dictionary
Currently the `getPageIndex` method will happily return `0`, even if the `Ref` parameter doesn't actually point to a proper /Page dictionary.
Having the API trust that the consumer is doing the right thing seems error-prone, hence this patch which adds a check for this case.

Given that the `Catalog_getPageIndex` method isn't used in any hot part of the codebase, this extra check shouldn't be a problem.
(Note: in the standard viewer, it is only ever used from `PDFLinkService_navigateTo` if a destination needs to be resolved during document loading, which isn't common enough to be an issue IMHO.)
2016-05-21 14:13:41 +02:00
Tim van der Meij
c1c199d702 Merge pull request #7295 from Snuffleupagus/core-getArray
Use `Dict_getArray` in more places in `src/core/` to avoid issues when Arrays contain indirect objects
2016-05-10 23:21:54 +02:00
Jonas Jenwald
182d33800a Ignore 'endobj' commands inside of ObjStm streams (issue 5241, bug 898610, bug 1037816)
According to an example in the PDF specification, see http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=56, an `ObjStm` stream should not contain 'endobj' commands.

Fixes 5241.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=898610.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1037816.
2016-05-09 09:50:45 +02:00
Jonas Jenwald
6111c17c8a Use Dict_getArray in more places in src/core/ to avoid issues when Arrays contain indirect objects
As evident from e.g. PRs 6485 and 7118, some bad PDF generators unfortunately create Arrays where *some* elements are indirect objects (i.e. `Ref`s). This seems to mostly affect Arrays that contain numbers, such as e.g. `Matrix/FontMatrix/BBox/FontBBox/Rect/Color/...`, and has manifested itself in PDF files that fail to render correctly (some elements are missing).

The problem in both the cases above, besides broken rendering, was that there were *no* errors/warnings that indicated what the problem was, making it difficult to pinpoint the issue.
Hence this patch, where I've audited all usages of `Dict_get` in `src/core/` files, and replaced it with `Dict_getArray` where appropriate to try and prevent unnecessary future bugs.
2016-05-05 19:42:57 +02:00
Jonas Jenwald
e281ef15db Adjust incorrect first obj number of "free" xref entry in XRef_readXRefTable (issue 7229)
Fixes 7229.
2016-04-21 16:36:32 +02:00
Jonas Jenwald
41efb92d3a Merge pull request #6988 from timvandermeij/fileattachment-annotation
Implement support for FileAttachment annotations
2016-02-24 12:58:06 +01:00
Tim van der Meij
6a33dfd13a Implement support for FileAttachment annotations 2016-02-23 22:49:53 +01:00
Jonas Jenwald
7cf9de2c17 [api-minor] Change getOutline to actually return the RGB color of outline items
Currently the `C` entry in an outline item is returned as is, which is neither particularly useful nor what the API documentation claims.

This patch also adds unit-tests for both the color handling, and the `F` entry (bold/italic flags).
2016-02-15 13:41:22 +01:00
Jonas Jenwald
98db068079 Reduce the overall indentation level in Catalog_readDocumentOutline, by using early returns, in order to improve readability 2016-02-14 11:38:43 +01:00
Yury Delendik
825a2225ab Merge pull request #6915 from yurydelendik/lookuptables
Refactor lookup hash tables/objects
2016-01-28 15:01:06 -06:00
Yury Delendik
2edf2792dc Replaces literal {} created lookup tables with Object.create 2016-01-28 12:18:38 -06:00
Jonas Jenwald
1140a34f5c [api-minor] Change getPageLabels to always return the pageLabels, even if they are identical to standard page numbering 2016-01-27 13:36:03 +01:00
Jonas Jenwald
85cf90643f [api-minor] Add support for PageLabels in the API 2016-01-19 22:49:04 +01:00
Jonas Jenwald
8ad18959d7 Add support for NumberTree 2016-01-19 22:47:45 +01:00
Jonas Jenwald
0030a82dc3 [api-minor] Add support for URLs in the document outline
Re: issue 5089.
(Note that since there are other outline features that we currently don't support, e.g. bold/italic text and custom colours, I thus think we can keep the referenced issue open.)
2016-01-19 21:36:27 +01:00
Yury Delendik
6b60c8f4db Adds UMD headers to core, display and shared files. 2015-12-15 13:24:39 -06:00
Manas
a2ba1b8189 Uses editorconfig to maintain consistent coding styles
Removes the following as they unnecessary
/* -*- Mode: Java; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
/* vim: set shiftwidth=2 tabstop=2 autoindent cindent expandtab: */
2015-11-14 07:32:18 +05:30
Yury Delendik
59c13b32aa Adds destroy method to the document loading task.
Also renames PDFPageProxy.destroy method to cleanup.
2015-10-23 08:57:14 -05:00
Jonas Jenwald
49883439a5 Ensure that Dict_getArray doesn't fail if xref in undefined (PR 6485 follow-up)
In PR 6485 I somehow missed to account for the case where `xref` is undefined. Since a dictonary can be initialized without providing a reference to an `xref` instance, `Dict_getArray` can thus fail without this added check.
2015-10-15 11:47:07 +02:00
Jonas Jenwald
9b12c64be5 Cache the regular expression used for finding objs in XRef_indexObjects, to avoid unnecessary allocations 2015-10-02 12:46:58 +02:00
Jonas Jenwald
192907e0d2 Make XRef_indexObjects even more robust against bad PDF files, by checking for the existence of 'trailer' if 'xref' is not found
Fixes http://www.cyjack.com/cognition/Terence%20McKenna%20-%20Lectures%20on%20Alchemy.pdf.
2015-10-01 15:01:25 +02:00
Jonas Jenwald
75557d27d1 Add getArray method to Dict
This method extend `get`, and will fetch all indirect objects (i.e. `Ref`s) when the result is an `Array`.
2015-09-29 10:11:47 +02:00
Jonas Jenwald
56a43a3181 Make XRef_indexObjects more robust against bad PDF files (issue 5752)
This patch improves the detection of `xref` in files where it is followed by an arbitrary whitespace character (not just a line-breaking char).
It also adds a check for missing whitespace, e.g. `1 0 obj<<`, to speed up `readToken` for the PDF file in the referenced issue.
Finally, the patch also replaces a bunch of magic numbers with suitably named constants.

Fixes 5752.

Also improves 6243, but there are still issues.
2015-08-21 20:33:02 +02:00
Rob Wu
c676ecb5a0 Detect scripted auto-print requests
Fixes #6106

To avoid future regressions, two new unit tests were added:
1. A new PDF based on the report from #6106, which contains an
   OpenAction of type JavaScript and a string "this.print({...}".
2. An existing PDF from https://bugzil.la/1001080 (from #4698).

Although it does not matter, since we don't execute the JavaScript code,
I have also changed "print(true)" to "print({})" since the print method
takes an object (not a boolean). See "Printing PDF documents", page 62:
http://adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/js_developer_guide.pdf
2015-07-20 18:25:02 +02:00
Jonas Jenwald
28f40b1b58 Fetch all indirect objects (i.e. Refs) in NameTree_getAll and NameTree_get (issue 6204) 2015-07-14 10:56:56 +02:00
Tim van der Meij
1416a1b521 Merge pull request #6187 from Snuffleupagus/more-efficient-getDestination
A couple of improvements of `getDestination` (unit-test included)
2015-07-13 23:03:13 +02:00
Rob Wu
fd29bb0c57 Subtract start offset for xrefs in recovery mode
Xref offsets are relative to the start of the PDF data, not to the start
of the PDF file. This is clear if you look at the other code:

- In the XRef's readXRefTable and processXRefTable methods of XRef, the
  offset of a xref entry is set to the bytes as given by a PDF file.
  These values are always relative to the start of the PDF file (%PDF-).

- The XRef's readXRef method adds the start offset of the stream to
  Xref entry's offset: "stream.pos = startXRef + stream.start".
  Clearly, this line assumes that the entry offset excludes the start
  offset.

However, when the PDF is parsed in recovery mode, the xref table is
filled with entries whose offset is relative to the start of the stream
rather than the PDF file. This is incorrect, and the fix is to subtract
the start offset of the stream from the entry's byte offset.

The manually created PDF file serves as a regression test. It is a valid
PDF, except:
- The integer to point to the start of the xref table and the %%EOF
  trailer are missing. This will activate recovery mode in PDF.js
- Some junk was added before the start of the PDF file. This exposes the
  bad offset bug.
2015-07-10 23:33:10 +02:00
Jonas Jenwald
7df78f997e Slightly more efficient getDestination
For named destinations that are contained in a `Dict`, as opposed to a `NameTree`, we currently iterate through the *entire* dictionary just to fetch *one* destination.
This code appears to simply have been copy-pasted from the `get destinations` method, but in its current form it's quite unnecessary/inefficient since can just get the required destination directly instead.
2015-07-08 18:31:51 +02:00
Jonas Jenwald
940bedf75f Add a unit-test that attempts to fetch a non-existent named destination
Doing this helped uncover an issue with the `getDestination` implementation.
Currently if a named destination doesn't exist, the method (in `obj.js`) may return `undefined` which leads to the promise being stuck in a pending state.
*Note:* returning `null` for this case is consistent with other methods, e.g. `getOutline` and `getAttachments`.
2015-07-07 22:05:08 +02:00
Jonas Jenwald
a28ed7c834 Always traverse the entire parent chain in Page_getInheritedPageProp (issue 5954)
This enables us to find resources placed on multiple levels of the tree.

Fixes 5954.
2015-05-30 12:21:05 +02:00
Tim van der Meij
d484ebd492 Merge pull request #5910 from jordan-thoms/fix-concatenated-files
Fix error reading concatenated pdfs
2015-05-13 22:40:55 +02:00
Jonas Jenwald
760222cf0b Handle the Encoding being a dictionary in PartialEvaluator_preEvaluateFont (bug 1157493)
*This is a regression from PR 4423.*

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1157493.
2015-04-25 16:48:14 +02:00
Brendan Dahl
846eb967cc Merge pull request #5655 from Snuffleupagus/issue-5644
Avoid getting stuck in empty nodes in the Pages tree when calling |Catalog_getPageDict| (issue 5644)
2015-04-20 11:46:27 -07:00
Jordan Thoms
d0ea772fc6 Fix error reading concatenated pdfs 2015-04-18 20:56:07 +12:00
Jonas Jenwald
888cbe0bde Avoid getting stuck in empty nodes in the Pages tree when calling |Catalog_getPageDict| (issue 5644) 2015-02-22 17:42:15 +01:00
Collin Anderson
54e984c763 cleaned whitespace 2015-02-17 11:07:37 -05:00
Tim van der Meij
aaa1f2cb11 Implemented NameTree.get() using binary search 2014-10-07 00:02:15 +02:00
Tim van der Meij
b215af30d3 Require destinations when they are needed and do not fetch all of them in advance 2014-10-06 22:26:18 +02:00
Jonas Jenwald
06b5d97bc6 Remove two instances of leftover console.log debug statements
The `console.log` statement in evaluator_spec.js is obviously not needed. In obj.js it could have been replaced by `info`, but that seemed unnecessary given the already existing `error`.
2014-08-13 14:29:46 +02:00
Yury Delendik
cc180d7e2b Removes some bind() calls from fetchAsync 2014-08-05 21:22:12 -05:00
Jonas Jenwald
ee0c0dd8a9 Add strict equalities in src/core/obj.js 2014-08-01 21:56:04 +02:00
Nicholas Nethercote
856e1c600b Optimize Ref_toString().
I have a large PDF where this function is called 1.6 million times
during loading. Minimizing the string concatenations reduces the
cumulative allocations done by Firefox within this function from 113 MB
to 48 MB.
2014-07-24 06:49:56 -07:00
Yury Delendik
84157e039d Merge pull request #4973 from nnethercote/better-ref-keys
Factor out repeated Ref key string generation code.
2014-06-19 21:00:09 -05:00
Nicholas Nethercote
1ad3ffbc7b Factor out repeated Ref key string generation code.
In src/core/obj.js, we convert a Ref to a string to index into a table like
this: 'R1.0'.  This conversion is repeated numerous times.

This patch factors out the conversion into a new function.
Ref.prototype.toString().
2014-06-19 18:22:39 -07:00