2852 Commits

Author SHA1 Message Date
Tony Jin
ef667823dd [api-minor] Add an optional param to DocumentInitParameters for specifying the range request chunk size to use. Defaults to 2^16 = 65536. 2015-10-26 17:22:11 -07:00
Jonas Jenwald
1c66d4a106 Add a totalLength getter to OperatorList, since the length is zero after flushing
In the `RenderPageRequest` handler in `worker.js`, we attempt to print an `info` message containing the rendering time and the length of the operator list. The latter is currently broken (and has been for quite some time), since the `length` of an `OperatorList` is reset when flushing occurs.
This patch attempts to rectify this, by adding a getter which keeps track of the total length.
2015-10-26 18:12:14 +01:00
Yury Delendik
58c3ea0820 Adds thread abort capabilities. 2015-10-23 09:06:32 -05:00
Yury Delendik
59c13b32aa Adds destroy method to the document loading task.
Also renames PDFPageProxy.destroy method to cleanup.
2015-10-23 08:57:14 -05:00
Jonas Jenwald
2e751199fb Prevent getOperatorList from failing to correctly parse OPS.paintXObject for TilingPatterns that are missing some /Resources entries (issue 6541)
Fixes 6541.
2015-10-21 21:30:56 +02:00
Rob Wu
50ff2d4c2a Ignore operators that are known to be unsupported
`operatorList.addOp` adds the arguments to the list which is then
passed as-is by postMessage to the main thread. But since we don't
parse these operations, they are raw PDF objects and may therefore
cause a serialization error.

This is a conservative patch, and only affects operators which are
known to be unsupported. We should ignore all unknown operators,
but I haven't really looked into the consequences of doing that.

Fixes #6549
2015-10-21 15:39:25 +02:00
Brendan Dahl
e4f0e6f2a0 Merge pull request #6531 from covlllp/new_merge
Fixes bluebeam password protection issue
2015-10-16 13:47:06 -07:00
Colin VanLang
6d8e883fe6 Fixes bluebeam password protection issue 2015-10-15 21:22:27 -04:00
Jonas Jenwald
49883439a5 Ensure that Dict_getArray doesn't fail if xref in undefined (PR 6485 follow-up)
In PR 6485 I somehow missed to account for the case where `xref` is undefined. Since a dictonary can be initialized without providing a reference to an `xref` instance, `Dict_getArray` can thus fail without this added check.
2015-10-15 11:47:07 +02:00
Brendan Dahl
3eaeacfe19 Merge pull request #6476 from Snuffleupagus/PartialEvaluator_readToUnicode-cmap-length
Right-size the `map` array in PartialEvaluator_readToUnicode
2015-10-09 10:31:28 -07:00
Jonas Jenwald
9b12c64be5 Cache the regular expression used for finding objs in XRef_indexObjects, to avoid unnecessary allocations 2015-10-02 12:46:58 +02:00
Jonas Jenwald
192907e0d2 Make XRef_indexObjects even more robust against bad PDF files, by checking for the existence of 'trailer' if 'xref' is not found
Fixes http://www.cyjack.com/cognition/Terence%20McKenna%20-%20Lectures%20on%20Alchemy.pdf.
2015-10-01 15:01:25 +02:00
Tim van der Meij
1bdfc47de8 Merge pull request #6411 from Snuffleupagus/remove-Parser_fetchIfRef
Remove `Parser_fetchIfRef` since it's obsolete
2015-09-30 00:38:35 +02:00
Jonas Jenwald
1b8cb52555 Prevent PartialEvaluator_buildFormXObject from failing if the Matrix or BBox contains indirect objects
This patch fixes yet another instance of bad PDF data, specifically a case where the `BBox` array contains indirect objects (i.e. `Ref`s).

Fixes the missing image in http://www.int.washington.edu/talks/WorkShops/int_08_37W/People/Franz_M/Franz.pdf#page=24. *Note:* There are missing images on a number of the pages in that file.
2015-09-29 10:11:49 +02:00
Jonas Jenwald
75557d27d1 Add getArray method to Dict
This method extend `get`, and will fetch all indirect objects (i.e. `Ref`s) when the result is an `Array`.
2015-09-29 10:11:47 +02:00
Jonas Jenwald
8d831449ab Right-size the map array in PartialEvaluator_readToUnicode
We can avoid a lot of intermediate resizings, by directly allocating the required number of elements for the `map` array.
2015-09-24 13:08:53 +02:00
Fabian Lange
2564827503 Fix text spacing with vertical fonts (#6387)
According to the PDF spec 5.3.2, a positive value means in horizontal,
that the next glyph is further to the left (so narrower), and in
vertical that it is further down (so wider).
This change fixes the way PDF.js has interpreted the value.
2015-09-15 09:28:45 +02:00
Tim van der Meij
12b0b9744b Merge pull request #6427 from Snuffleupagus/slightly-more-robust-get-fingerprint
Make `get fingerprint` slightly more robust against corrupt PDF files
2015-09-10 22:07:44 +02:00
Jonas Jenwald
5853553455 Make get fingerprint slightly more robust against corrupt PDF files
This patch adjusts `get fingerprint` to also check that the `/ID` entry contains (non-empty) strings, to prevent more possible failures when loading corrupt PDF files (follow-up to PR 5602).

Note that I've not actually encountered such a PDF file in the wild. However given that `stringToBytes` will assert that the input is a string, and that we'll thus fail to load a document unless `get fingerprint` succeeds, making this more robust seems like a good idea to me.
2015-09-08 13:42:53 +02:00
Jonas Jenwald
29a1cdb6a6 Only choose a (3, 1) cmap table for TrueType fonts that have an encoding specified (issue 6410)
For (1, 0) cmaps, we have two different codepaths depending on whether the font has/hasn't got an encoding. But with (3, 1) cmaps we don't have a good fallback when the encoding is missing, hence this patch changes `readCmapTable` to only choose a (3, 1) cmap table if the font is non-symbolic *and* an encoding exists. Without this, we'll not be able to successfully create a working glyph map for some TrueType fonts with (3, 1) cmap tables.

Fixes 6410.
2015-09-07 16:56:05 +02:00
Jonas Jenwald
b1d148a4aa Remove Parser_fetchIfRef since it's obsolete
This code was added in PR 1214, but was made obsolete by PRs 1488/1493. Prior to the latter ones, `Dict_get` retured the raw objects. However, afterwards (and currently) `Dict_get` now resolves indirect objects, which makes `Parser_fetchIfRef` redundant.

*Potential risks with this patch:*
This patch passes all tests locally, but there's a *small* possibility that it could break some weird PDF files.
In the current code, wrapping `Dict_get` inside `Parser_fetchIfRef` will potentially mean two back-to-back call of `XRef_fetch`, if a reference points directly to another reference. I'm not sure if this can actually happen in practice, and I'd think that if that were the case we'd already have run into it elsewhere in the code-base, given that `Parser` is the only place where we try to "double" resolve references.
2015-09-02 23:11:00 +02:00
Jonas Jenwald
0fb31a4a9e Fallback in readCmapTable, instead of using error, for TrueType fonts with unsupported cmap formats (bug 1200096)
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1200096.

The problematic font has a `format 2` cmap, which we've never supported properly. Prior to PR 2606, we were able to fallback to a working state, despite not having proper support for that cmap format.

Obviously the best/correct solution would be to implement actual support for more cmap formats[1]. However, I'm hoping that a simple patch will be OK for now, given that:
 - `format 2` cmaps seem to be quite rare in practice, since this has been broken for 2.5 years before anyone noticed.
 - Having a simple patch will make potential uplifts a lot easier.

[1] See the specification at https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6cmap.html
2015-09-01 14:01:19 +02:00
Tim van der Meij
0020f33873 Merge pull request #6357 from Snuffleupagus/bidi-result
Avoid more allocations for RTL text in bidi.js
2015-09-01 00:44:33 +02:00
Tim van der Meij
b42b894570 Merge pull request #6386 from Snuffleupagus/Parser_makeFilter-warn-on-empty-stream
Add a warning when we encounter an empty stream in `Parser_makeFilter`
2015-08-30 23:14:22 +02:00
Rob Wu
582573b96b Merge pull request #6358 from Snuffleupagus/Parser_tryShift-missingDataException
Don't catch `MissingDataException` in `Parser_tryShift`
2015-08-27 14:46:24 +02:00
Jonas Jenwald
f814fdc215 Add a warning when we encounter an empty stream in Parser_makeFilter
Having a warning here would have meant that issue 6360 could have been solved in approximately five minutes, instead of an hour. To avoid that happening again, this patch adds a warning whenever we treat a stream as empty.
2015-08-26 20:14:30 +02:00
Brendan Dahl
88e0326787 Merge pull request #6337 from Snuffleupagus/issue-6336
Adjust which TrueType (3, 1) glyphs we attempt to skip mapping of (issue 6336)
2015-08-25 09:49:46 -07:00
Jonas Jenwald
56a43a3181 Make XRef_indexObjects more robust against bad PDF files (issue 5752)
This patch improves the detection of `xref` in files where it is followed by an arbitrary whitespace character (not just a line-breaking char).
It also adds a check for missing whitespace, e.g. `1 0 obj<<`, to speed up `readToken` for the PDF file in the referenced issue.
Finally, the patch also replaces a bunch of magic numbers with suitably named constants.

Fixes 5752.

Also improves 6243, but there are still issues.
2015-08-21 20:33:02 +02:00
Jonas Jenwald
5128603f64 Also check maybeLength when deciding if a stream is empty in Parser_makeFilter (issue 6360)
The problem with the PDF files in the issue, besides the obviously broken XRef tables which we're able to recover from, is that many/most of the streams have Dictionaries where the `Length` entry is set to `0`. This causes us to return `NullStream`, instead of the appropriate one in `Parser_makeFilter`.

Fixes 6360.
2015-08-20 23:04:18 +02:00
Yury Delendik
c56dc9a093 Merge pull request #6141 from skalnik/fix-font-csp-issues
Provide a fallback for font rendering when not allowed to use `eval`
2015-08-18 18:50:11 -05:00
Jonas Jenwald
3fa5f6cc3b Only take the fast-path in PDFImage_createImageData for un-masked JPEG images with "standard" colour spaces (issue 6364)
Fixes 6364.
2015-08-18 22:25:37 +02:00
Jonas Jenwald
8c3b8238ac Don't catch MissingDataException in Parser_tryShift
I overlooked this while reviewing PR 6197, but I don't think that we should be catching that particular kind of exception here; hence this patch.
2015-08-16 11:35:54 +02:00
Jonas Jenwald
b1cf4d98ad Avoid more allocations for RTL text in bidi.js
Instead of building the resulting string char-by-char for RTL text, which is inefficient, we can just as well `join` the `chars` array.
2015-08-14 21:46:59 +02:00
Mike Skalnik
341c5e9d1f [PATCH] Add fallback for font loading when eval disabled
In some cases, such as in use with a CSP header, constructing a function with a
string of javascript is not allowed. However, compiling the various commands
that need to be done on the canvas element is faster than interpreting them.
This patch changes the font renderer to instead emit commands that are compiled
by the font loader. If, during compilation, we receive an EvalError, we instead
interpret them.
2015-08-13 14:33:18 -07:00
Yury Delendik
20b46aaf88 Fixes supportsMozChunked for node.js 2015-08-12 18:48:59 -05:00
Jonas Jenwald
99d29487ab Adjust which TrueType (3, 1) glyphs we attempt to skip mapping of (issue 6336)
Fixes 6336.
2015-08-09 12:51:43 +02:00
Rob Wu
b0a8c0fa40 cmaps: Use cmap.forEach instead of Array.forEach
CMaps may be sparse. Array.prototype.forEach is terribly slow in Chrome
(and also in Firefox) when the sparse array contains a key with a high
value. E.g.

    console.time('forEach sparse')
    var a = [];
    a[0xFFFFFF] = 1;
    a.forEach(function(){});
    console.timeEnd('forEach sparse');

    // Chrome: 2890ms
    // Firefox: 1345ms

Switching to CMap.prototype.forEach, which is optimized for such
scenarios fixes the problem.
2015-08-08 13:30:30 +02:00
Tilman Hausherr
6d1e0f7e8d fix handling of flags 1-3 in tensor shading
pi is an index in the stream and is explained on page 201 of the 32000-spec (however 1-based there), and ps is an index to something in PDF.js. I used the code from flag 0 (which works) to understand which is which. It is also important to understand that for flags 1,2 and 3, the stream is always assigned to the same coordinates and colors. What changes is which "old" coordinates and colors are assigned to what is "missing" in the stream. This is why for these flags, the code is identical except for the assignments in the first "row". (Same principle as in #6304). Note that this change will not improve the lamp_cairo.pdf file, only the two files mentioned in #6305.
2015-08-04 18:21:29 +02:00
Tilman Hausherr
c85fa00d62 fix handling of flags 1-3 in coons shading
Short story: somebody got lost in two different indices. pi is an index in the stream and is explained on page 198 of the 32000-spec (however 1-based there), and ps is an index to something in PDF.js. I used the code from flag 0 (which works) to understand which is which. It is also important to understand that for flags 1,2 and 3, the stream is always assigned to the same coordinates and colors. What changes is which "old" coordinates and colors are assigned to what is "missing" in the stream. This is why for these flags, the code is identical except for the assignments in the first "row".
2015-08-03 21:15:26 +02:00
Brendan Dahl
977397ebfd Merge pull request #6270 from Snuffleupagus/opentype-cff-2
Adjust the heuristics used to detect OpenType font file with CFF data (bug 1186827, bug 1182130, issue 6264)
2015-08-03 09:43:33 -07:00
Tim van der Meij
72ecbec49d Merge pull request #6292 from Snuffleupagus/issue-6287
Fix various shading pattern regressions (issue 6287)
2015-07-31 22:26:01 +02:00
Jonas Jenwald
1d65daf5e5 Correctly access colorSpace.numComps in MeshStreamReader (issue 6287)
This regressed in f750e35224.
2015-07-31 18:00:58 +02:00
Jonas Jenwald
7fe2442a18 Ensure that we don't use the same typed array for both coords and colors in Mesh figures (issue 6287)
This regressed in 1e8d70af98.
2015-07-31 18:00:23 +02:00
Jonas Jenwald
55bc98a8b0 Rename PatternType to ShadingType to avoid confusion
The current name is somewhat confusing, since the specification calls it `ShadingType`, see http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.4044105 and http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.3882826.

The real problem, however, is that there is actually another property called `PatternType`, which makes the current code very confusing, see http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.1850929.

Since `ShadingType` is only relevant for shading patterns (i.e. `PatternType === 2`), and *not* for tiling patterns (i.e `PatternType === 1`), this patch should help reduce confusion when reading the code.
2015-07-30 20:03:45 +02:00
Tim van der Meij
4f920ad100 Refactor annotation code to use a factory
Currently, `src/core/core.js` uses the `fromRef` method on an `Annotation` object to obtain the right annotation type object (such as `LinkAnnotation` or `TextAnnotation`). That method in turn uses a method `getConstructor` to find out which annotation type object must be returned.

Aside from the fact that there is currently a lot of code to achieve this, these methods should not be part of the base `Annotation` class at all. Creation of annotation object should be done by a factory (as also recommended by @yurydelendik at https://github.com/mozilla/pdf.js/pull/5218#issuecomment-52779659) that handles finding out the correct annotation type object and returning it. This patch implements this separation of concerns.

Doing this allows us to also simplify the code quite a bit and to make it more readable. Additionally, we are now able to get rid of the hardcoded array of supported annotation types. The factory takes care of checking the annotation types and falls back to returning the base annotation type (and issuing a warning, which the current code also does not do well) when an annotation type is unsupported.

I have manually tested this commit with 20 test PDFs with different annotation types, such as /Link, /Text, /Widget, /FileAttachment and /FreeText. All render identically before and after the patch, and unsupported annotation types are now properly indicated with a warning in the console.
2015-07-29 00:31:51 +02:00
Tim van der Meij
d08895d659 Merge pull request #6236 from Rob--W/print-javascript-action
Detect scripted auto-print requests
2015-07-25 19:42:31 +02:00
Jonas Jenwald
0a024b5051 Adjust the heuristics used to detect OpenType font file with CFF data (bug 1186827, bug 1182130, issue 6264)
*This is a tentative patch.*

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1186827.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1182130.
Fixes 6264.
2015-07-25 12:26:36 +02:00
Jonas Jenwald
385e2e5aaf Check if the Decode entry is non-default when deciding if JPEG images are natively supported/decodable (issue 6238)
Tentatively fixes 6238.
2015-07-21 12:23:07 +02:00
Tim van der Meij
980aa10e04 Refactor annotation rectangle code and add unit tests
This patch refactors the code responsible for setting the annotation's rectangle. Its goal is to:

- Actually check that the input array is actually an array, and if so, that it contains exactly four elements.
- Only call `normalizeRect` if the input array is valid, i.e., we do not call it for the default rectangle anymore.

Unit tests are provided just like with the other patches in this series.
2015-07-20 22:01:47 +02:00
Rob Wu
c676ecb5a0 Detect scripted auto-print requests
Fixes #6106

To avoid future regressions, two new unit tests were added:
1. A new PDF based on the report from #6106, which contains an
   OpenAction of type JavaScript and a string "this.print({...}".
2. An existing PDF from https://bugzil.la/1001080 (from #4698).

Although it does not matter, since we don't execute the JavaScript code,
I have also changed "print(true)" to "print({})" since the print method
takes an object (not a boolean). See "Printing PDF documents", page 62:
http://adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/js_developer_guide.pdf
2015-07-20 18:25:02 +02:00