Commit Graph

4206 Commits

Author SHA1 Message Date
Tim van der Meij
9c8fe3142a
Merge pull request #11034 from Snuffleupagus/cancel-with-AbortException
Ensure that `ReadableStream`s are cancelled with actual Errors
2019-08-02 00:18:44 +02:00
Tim van der Meij
e0b38bed3c
Merge pull request #11029 from brendandahl/pdfjs-telemetry-update
[api-minor] Update telemetry to use 'categorical' histograms.
2019-08-02 00:11:02 +02:00
Brendan Dahl
31d71808e7 [api-minor] Update telemetry to use 'categorical' histograms.
Firefox telemetry supports using string labels now. Convert our integers
that we used for categories to just use strings.

The upstream work will happen in:
https://bugzilla.mozilla.org/show_bug.cgi?id=1566882
2019-08-01 09:51:02 -07:00
Jonas Jenwald
a3150166ec Ensure that ReadableStreams are cancelled with actual Errors
There's a number of spots in the current code, and tests, where `cancel` methods are not called with appropriate arguments (leading to Promises not being rejected with Errors as intended).
In some cases the cancel `reason` is implicitly set to `undefined`, and in others the cancel `reason` is just a plain String. To address this inconsistency, the patch changes things such that cancelling is done with `AbortException`s everywhere instead.
2019-08-01 16:40:46 +02:00
Tim van der Meij
d909b86b28
Merge pull request #11020 from Snuffleupagus/issue-11016
Add a work-around, in `glyphlist.js`, for bad PDF generators which use a non-standard `/f_f` string in the `Encoding` dictionary when referring to the ff ligature (issue 11016)
2019-07-31 23:33:34 +02:00
wangsongyan
c61205d980 decode filename when match an urlencode filename from contentDispositionFilename 2019-07-31 09:33:56 +08:00
Jonas Jenwald
9ad50521b1 Add a work-around, in glyphlist.js, for bad PDF generators which use a non-standard /f_f string in the Encoding dictionary when referring to the ff ligature (issue 11016)
This patch will not incur any (measurable) overhead, since the glyphlist is already quite long and one more entry won't really matter, which is important given that this sort of PDF corruption ought to be very rare.

Furthermore, this patch purposely does *not* add a bunch of similarly modified ligature names on pure speculation. Any similar additions, for other ligatures, should only be made if there's real-world examples of PDF files where that's actually necessary.
2019-07-30 17:06:58 +02:00
Jonas Jenwald
38ccb43436 Reduce the number of function calls in EvaluatorPreprocessor.read
For very large and complex PDF files this will help performance slightly, since `EvaluatorPreprocessor.read` is called a lot during parsing in the worker.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, using the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 200,
       "type": "eq"
    }
]
```

This gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   200 |         3402 |        3358 | -43 | -1.28 |        faster
Firefox | Page Request |   200 |            1 |           1 |   0 | 26.71 |
Firefox | Rendering    |   200 |         3401 |        3357 | -44 | -1.28 |        faster
```
2019-07-29 08:43:36 +02:00
Tim van der Meij
9114004d5b
[api-minor] Implement quadpoints for annotations in the core layer 2019-07-28 20:36:21 +02:00
Jonas Jenwald
ff90aa4323 Inline the isCmd check in the Parser.shift method
For very large and complex PDF files this will help performance slightly, since `Parser.shift` is called *a lot* during parsing.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471 (with well over *four million* `Parser.shift` calls for just the one page), using the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 100,
       "type": "eq"
    }
]
```

This gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   100 |         3386 |        3322 | -65 | -1.92 |        faster
Firefox | Page Request |   100 |            1 |           1 |   0 | -8.08 |
Firefox | Rendering    |   100 |         3385 |        3321 | -65 | -1.92 |        faster
```
2019-07-22 12:07:36 +02:00
Jonas Jenwald
b5254f2745 Attempt to significantly reduce the number of ChunkedStream.{ensureByte, ensureRange} calls by inlining the this.progressiveDataLength checks at the call-sites
The number of in particular `ChunkedStream.ensureByte` calls is often absolutely *huge* (on the order of million calls) when loading and rendering even moderately complicated PDF files, which isn't entirely surprising considering that the `getByte`/`getBytes`/`peekByte`/`peekBytes` methods are used for essentially all data reading/parsing.

The idea implemented in this patch is to inline an inverted `progressiveDataLength` check at all of the `ensureByte`/`ensureRange` call-sites, which in practice will often result in *several* orders of magnitude fewer function calls.
Obviously this patch will only help if the browser supports streaming, which all reasonably modern browsers now do (including the Firefox built-in PDF viewer), and assuming that the user didn't set the `disableStream` option (e.g. for using `disableAutoFetch`). However, I think we should be able to improve performance for the default out-of-the-box use case, without worrying about e.g. older browsers (where this patch will thus incur *one* additional check before calling `ensureByte`/`ensureRange`).

This patch was inspired by the *first* commit in PR 5005, which was subsequently backed out in PR 5145 for causing regressions. Since the general idea of avoiding unnecessary function calls was really nice, I figured that re-attempting this in one way or another wouldn't be a bad idea.
Given that streaming is now supported, which it wasn't back then, using `progressiveDataLength` seemed like an easier approach in general since it also allowed supporting both `ensureByte` and `ensureRange`.

This sort of patch obviously needs data to back it up, hence I've benchmarked the changes using the following manifest file (with the default `tracemonkey` file):
```
[
    {  "id": "tracemonkey-eq",
       "file": "pdfs/tracemonkey.pdf",
       "md5": "9a192d8b1a7dc652a19835f6f08098bd",
       "rounds": 250,
       "type": "eq"
    }
]
```

I get the following complete results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |  3500 |          140 |         134 |  -6 | -4.46 |        faster
Firefox | Page Request |  3500 |            2 |           2 |   0 | -0.10 |
Firefox | Rendering    |  3500 |          138 |         131 |  -6 | -4.54 |        faster
```

Here it's pretty clear that the patch does have a positive net effect, even for a PDF file of fairly moderate size and complexity. However, in this case it's probably interesting to also look at the results per page:
```
-- Grouped By page, stat --
page | stat         | Count | Baseline(ms) | Current(ms) | +/- |     %  | Result(P<.05)
---- | ------------ | ----- | ------------ | ----------- | --- | ------ | -------------
0    | Overall      |   250 |           74 |          75 |   1 |   0.69 |
0    | Page Request |   250 |            1 |           1 |   0 |  33.20 |
0    | Rendering    |   250 |           73 |          74 |   0 |   0.25 |
1    | Overall      |   250 |          123 |         121 |  -2 |  -1.87 |        faster
1    | Page Request |   250 |            3 |           2 |   0 | -11.73 |
1    | Rendering    |   250 |          121 |         119 |  -2 |  -1.67 |
2    | Overall      |   250 |           64 |          63 |  -1 |  -1.91 |
2    | Page Request |   250 |            1 |           1 |   0 |   8.81 |
2    | Rendering    |   250 |           63 |          62 |  -1 |  -2.13 |        faster
3    | Overall      |   250 |           97 |          97 |   0 |  -0.06 |
3    | Page Request |   250 |            1 |           1 |   0 |  25.37 |
3    | Rendering    |   250 |           96 |          95 |   0 |  -0.34 |
4    | Overall      |   250 |           97 |          97 |   0 |  -0.38 |
4    | Page Request |   250 |            1 |           1 |   0 |  -5.97 |
4    | Rendering    |   250 |           96 |          96 |   0 |  -0.27 |
5    | Overall      |   250 |           99 |          97 |  -3 |  -2.92 |
5    | Page Request |   250 |            2 |           1 |   0 | -17.20 |
5    | Rendering    |   250 |           98 |          95 |  -3 |  -2.68 |
6    | Overall      |   250 |           99 |          99 |   0 |  -0.14 |
6    | Page Request |   250 |            2 |           2 |   0 | -16.49 |
6    | Rendering    |   250 |           97 |          98 |   0 |   0.16 |
7    | Overall      |   250 |           96 |          95 |  -1 |  -0.55 |
7    | Page Request |   250 |            1 |           2 |   1 |  66.67 |        slower
7    | Rendering    |   250 |           95 |          94 |  -1 |  -1.19 |
8    | Overall      |   250 |           92 |          92 |  -1 |  -0.69 |
8    | Page Request |   250 |            1 |           1 |   0 | -17.60 |
8    | Rendering    |   250 |           91 |          91 |   0 |  -0.52 |
9    | Overall      |   250 |          112 |         112 |   0 |   0.29 |
9    | Page Request |   250 |            2 |           1 |   0 |  -7.92 |
9    | Rendering    |   250 |          110 |         111 |   0 |   0.37 |
10   | Overall      |   250 |          589 |         522 | -67 | -11.38 |        faster
10   | Page Request |   250 |           14 |          13 |   0 |  -1.26 |
10   | Rendering    |   250 |          575 |         508 | -67 | -11.62 |        faster
11   | Overall      |   250 |           66 |          66 |  -1 |  -0.86 |
11   | Page Request |   250 |            1 |           1 |   0 | -16.48 |
11   | Rendering    |   250 |           65 |          65 |   0 |  -0.62 |
12   | Overall      |   250 |          303 |         291 | -12 |  -4.07 |        faster
12   | Page Request |   250 |            2 |           2 |   0 |  12.93 |
12   | Rendering    |   250 |          301 |         289 | -13 |  -4.19 |        faster
13   | Overall      |   250 |           48 |          47 |   0 |  -0.45 |
13   | Page Request |   250 |            1 |           1 |   0 |   1.59 |
13   | Rendering    |   250 |           47 |          46 |   0 |  -0.52 |
```

Here it's clear that this patch *significantly* improves the rendering performance of the slowest pages, while not causing any big regressions elsewhere. As expected, this patch thus helps larger and/or more complex pages the most (which is also where even small improvements will be most beneficial).
There's obviously the question if this is *slightly* regressing simpler pages, but given just how short the times are in most cases it's not inconceivable that the page results above are simply caused be e.g. limited `Date.now()` and/or limited numerical precision.
2019-07-18 17:30:22 +02:00
Tim van der Meij
6e96a158f4
Merge pull request #10820 from vlastimilmaca/annot-irt-rt-states
Annotations - Added parsing of IRT, RT, State and StateModel
2019-07-17 23:34:31 +02:00
vlastimilmaca
fe49f0f766 Annotations - Implement parsing of IRT, RT, State and StateModel 2019-07-16 23:33:07 +02:00
Jonas Jenwald
bea15b6ce5 Simplify the PDFDocument.fingerprint method slightly
The way that this method handles documents without an `ID` entry in the Trailer dictionary feels overly complicated to me. Hence this patch adds `getByteRange` methods to the various Stream implementations[1], and utilize that rather than manually calling `ensureRange` when computing a fallback `fingerprint`.

---
[1] Note that `PDFDocument` is only ever initialized with either a `Stream` or a `ChunkedStream`, hence why the `DecodeStream.getByteRange` method isn't implemented.
2019-07-15 13:26:08 +02:00
Tim van der Meij
13ebfec903
Merge pull request #10969 from Snuffleupagus/api-test-stopAtErrors
Add an API unit-test for the `stopAtErrors` option (PRs 8240 and 8922 follow-up)
2019-07-14 14:47:57 +02:00
Jonas Jenwald
b548bafef7 Simplify, and inline, the finalize function in the MessageHandler class
The `finalize` helper function has only a *single* call-site, and furthermore it's just a one-liner too. Furthermore it's only ever called with a `Promise` as its argument, meaning that it's unnecessarily convoluted as well (i.e. the `Promise.resolve()` part shouldn't be necessary).
Hence this code can be both simplified *and* inlined at its only call-site instead.
2019-07-13 17:54:32 +02:00
Jonas Jenwald
c7fb7116d6 Add an API unit-test for the stopAtErrors option (PRs 8240 and 8922 follow-up)
Also fixes an inconsistency in the 'PageError' handler, for `getOperatorList`, in the API.
2019-07-13 16:06:05 +02:00
Jonas Jenwald
17116917f7 Remove useless wrapReason calls in the MessageHandler class
Currently `wrapReason` is manually called at *every* `resolveOrReject` call-site, despite it being completely unnecessary unless there's an actual error being handled. This is obviously inefficient, and it's easy enough to avoid by having `resolveOrReject` handle this only when actually needed.
2019-07-13 13:08:29 +02:00
Tim van der Meij
ed3954fc7a
Merge pull request #10851 from brendandahl/shading-bbox
Apply bounding box before using shading patterns.
2019-07-12 22:52:07 +02:00
Tim van der Meij
87f36e3520
Merge pull request #10850 from brendandahl/scale-line-width
Scale stroking line width when using a tiling pattern.
2019-07-12 22:50:32 +02:00
Tim van der Meij
28326165ff
Merge pull request #10958 from Snuffleupagus/api-rm-receivingOperatorList
Remove the `intentState.receivingOperatorList` boolean since it's redundant
2019-07-11 23:55:00 +02:00
Jonas Jenwald
9a4d14bf36 Prevent "Uncaught promise" messages in the console when cancelling TextLayer tasks (PR 10601 follow-up)
Since `finally` won't stop error propagation, this causes unnecessary messages to be printed in the console whenever a `TextLayer` task is cancelled.
2019-07-11 11:48:33 +02:00
Jonas Jenwald
ef48a9a713 Update the PageError handler, in the API, to always mark the operatorList as done and finalize any pending renderTasks
Note that, in the old code, there was a code-path which could prevent this from happening thus affecting future cleanup.
Furthermore, ensure that we'll always attempt to cleanup when handling the 'PageError' message, similar to the code in e.g. the `PDFPageProxy._renderPageChunk` method.
2019-07-10 14:23:59 +02:00
Jonas Jenwald
c6fcdf474b Remove the intentState.receivingOperatorList boolean since it's redundant
The `receivingOperatorList` property is currently tracked *twice* in the rendering code, both directly and inversely through the `intentState.operatorList.lastChunk` boolean. This type of double bookkeeping is never a good idea, since it's just too easy for the properties to accidentally fall out of sync.

In this case there's even a `cleanup`-related bug caused by this, which means that `PDFPageProxy._tryCleanup` will never be able to discard any data if there's an error on the worker-thread (as handled through the 'PageError' message).

Hence the simplest solution seems, at least to me, to update `PDFPageProxy._tryCleanup` to replace the `intentState.receivingOperatorList` check with a `!intentState.operatorList.lastChunk` check and completely remove the former property.
2019-07-10 14:23:10 +02:00
Brendan Dahl
6fab0a0dac Apply bounding box before using shading patterns.
Fixes #8092
2019-07-08 14:05:48 -07:00
Brendan Dahl
446efab707 Scale stroking line width when using a tiling pattern. 2019-07-08 13:47:54 -07:00
Jonas Jenwald
bdc31f8b50 Make the find helper function, in src/core/document.js, more efficient by using peekBytes rather reading the stream one byte at a time
*Please note:* A a similar change was attempted in PR 5005, but it was subsequently backed out in PR 5069.

Unfortunately I don't think anyone ever tried to debug *exactly* why it didn't work, since it ought to have worked, and having re-tested this now I'm not able to reproduce the problem any more. However, given just how inefficient the current code is, with thousands of strictly unnecessary function calls for each `find` invocation, I'd really like to try fixing this again.
2019-07-06 11:44:17 +02:00
Jonas Jenwald
41745a5996 Reduce the number of isCmd calls slightly in the XRef class
This reduces the total number of function calls, when reading the XRef table respectively when fetching uncompressed XRef entries.
Note in particular the `XRef.readXRefTable` method, where there're *two* back-to-back `isCmd` checks rather than just one.
2019-06-29 16:28:45 +02:00
Tim van der Meij
7fc329d6e3
Merge pull request #10902 from ahuglajbclajep/tiling-pattern-support
Implement tiling patterns for the SVG back-end
2019-06-28 12:45:02 +02:00
Tim van der Meij
f1867de492
Merge pull request #10925 from Snuffleupagus/eslint_no-unsanitized
Enable the `eslint-plugin-no-unsanitized` ESLint plugin to disallow unsafe usage of e.g. `innerHTML`
2019-06-27 20:32:24 +02:00
ahuglajbclajep
77940dbd86 Implement tiling patterns for the SVG back-end 2019-06-25 16:25:25 +09:00
Jonas Jenwald
f710eb56e4 Change the signature of the Parser constructor to take a parameter object
A lot of the `new Parser()` call-sites look quite unwieldy/ugly as-is, with a bunch of somewhat randomly ordered arguments, which we can avoid by changing the constructor to accept an object instead. As an added bonus, this provides better documentation without having to add inline argument comments in the code.
2019-06-23 16:01:45 +02:00
Jonas Jenwald
5bb5e7741d Enable the eslint-plugin-no-unsanitized ESLint plugin to disallow unsafe usage of e.g. innerHTML
See https://github.com/mozilla/eslint-plugin-no-unsanitized

Since we've generally never allowed e.g. `innerHTML`, which is enforced during review, there's only one linting failure with this patch. (Which is white-listed, according to the existing comment and the fact that it's test-only code.)
2019-06-23 13:50:30 +02:00
Jonas Jenwald
021e5ffb88 Move PDFWorkerStream and related code to its own file
Since all other `IPDFStream` implementations live in their own files, it seems reasonable for these to do so as well.

Furthermore, converts all of the relevant code to ES6 classes and updates the interface definitions to mark a couple of methods `async`.
2019-06-15 13:05:25 +02:00
Tim van der Meij
06b253d609
Merge pull request #10890 from Snuffleupagus/outline-items-hidden
Add support for outline items, in the default viewer, which default to collapsed when the outline is built
2019-06-09 11:35:49 +02:00
Jonas Jenwald
26bc630e19 Add support for outline items, in the default viewer, which default to collapsed when the outline is built
The PDF specification supports this feature, which is commonly used in large/long documents (such as the spec itself), and it seems reasonably straightforward to implement; see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G11.2095911
2019-06-07 12:26:23 +02:00
Jonas Jenwald
625af8d2ad [api-minor] Attempt to reduce memory usage during printing, by always running cleanup once rendering has finished
Given that `cleanupAfterRender` is already set for large images, when handling 'obj' messages, this patch *should* thus be safe in general (since otherwise there ought be existing bugs related to cleanup and printing).
2019-06-03 00:29:17 +02:00
Jonas Jenwald
876c962235 Ignore Annotations with too large border widths, to prevent the annotationLayer from rendering it over the surrounding document (bug 1552113)
The border `width` will instead fallback to the default value of `1`, rather than ignoring it altoghether, to also ensure that e.g. `LinkAnnotation`s become clickable as intended.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1552113
2019-06-01 15:51:22 +02:00
Tim van der Meij
209e42043a
Merge pull request #10873 from Snuffleupagus/worker-terminate-clearPrimitiveCaches
Ensure that the `Cmd`/`Name`/`Ref` caches are cleared when terminating the worker (PR 10863 follow-up)
2019-05-31 12:56:53 +02:00
Jonas Jenwald
a3742a9f83 Ensure that the Cmd/Name/Ref caches are cleared when terminating the worker (PR 10863 follow-up)
Usually when the worker is terminated it will also be completely destroyed/removed, which means that any global caches (such as the ones in `src/core/primitive.js`) should be automatically cleared in the process.

However, for certain ways of loading the `pdf.worker.js` file, e.g. passing in a re-usable worker to `getDocument`, using the `workerPort` functionality, or even disabling workers completely  (even though this is never a good idea), the worker file may be kept in memory and these caches will not be cleared as expected.
2019-05-30 20:57:28 +02:00
Jonas Jenwald
8857a81c8d Re-use, rather than re-creating, some Arrays when resetting them in src/display/api.js
Calling `someArray = []` will create a new Array, which seems completely unnecessary when it's sufficient to just call `someArray.length = 0` to achieve the same effect.

Even though I cannot imagine these particular cases having any noticeable performance impact, similar changes were made in `core/` code years ago since it's apparently more efficient memory wise.
2019-05-30 16:33:05 +02:00
Jani Pehkonen
343b1381a2 Don't clip if path is undefined in SVG back-end 2019-05-28 18:37:15 +03:00
Jonas Jenwald
5e045bcdba Ensure that the Cmd/Name/Ref caches are cleared when running other cleanup code
The purpose of these caches is to reduce peak memory usage, by only ever having *a single* instance of a particular object.
However, as-is these caches are never cleared and they will thus remain until the worker is destroyed. This could very well have a negative effect on total memory usage, particularly for large/long documents, hence it seems to make sense to clear out these caches together with various other ones.
2019-05-26 14:29:59 +02:00
Jonas Jenwald
2fe9f3ff8f Add caching to reduce the number of Ref objects
This is similar to the existing caching used to reduced the number of `Cmd` and `Name` objects.
With the `tracemonkey.pdf` file, this patch changes the number of `Ref` objects as follows (in the default viewer):

|          | Loading the first page | Loading *all* the pages |
|----------|------------------------|-------------------------|
| `master` | 332                    | 3265                    |
| `patch`  | 163                    | 996                     |
2019-05-26 12:23:37 +02:00
Tim van der Meij
bc1eb49a77
Implement creation date only for markup annotations
The specification states that `CreationDate` is only available for
markup annotations instead of for all annotation types.

Moreover, popup annotations are not markup annotations according to the
specification, so the creation date inheritance from the parent
annotation is also removed there (note that only the modification date
is used in e.g., the viewer).
2019-05-25 15:31:06 +02:00
Tim van der Meij
cf07918ccb
Implement contents for every annotation type
The specification states that `Contents` can be available for every
annotation types instead of only for markup annotations.
2019-05-18 15:52:17 +02:00
Tim van der Meij
1421b2f205
Merge pull request #10827 from Snuffleupagus/network-streams-class
Convert the (remaining) network streams to ES6 classes
2019-05-16 22:04:29 +02:00
Jonas Jenwald
f9769af365 Convert network.js to use ES6 classes 2019-05-16 10:08:51 +02:00
Jonas Jenwald
cc661a4d38 Update fetch_stream.js to use const in more places 2019-05-16 09:15:43 +02:00
Jonas Jenwald
737705264b Convert transport_stream.js to use ES6 classes 2019-05-16 09:15:39 +02:00
Jonas Jenwald
0784c98172 Remove unused ref property from the parameters object used when creating annotations in AnnotationFactory._create
The only use-cases for this property was removed in PRs 7570 and 7775, and it's been completely unused ever since the latter one.
2019-05-16 08:33:38 +02:00
Tim van der Meij
c8c937c257
Merge pull request #10794 from janpe2/cidtogidmap-zero
Fix glyph at index zero in CIDFontType2 that has a CIDToGIDMap stream
2019-05-15 00:04:39 +02:00
Jonas Jenwald
173fbef05b Enable the consistent-return ESLint rule
This rule is already enabled in mozilla-central, and helps ensure more consistent functions/methods, see https://searchfox.org/mozilla-central/rev/b9da45f63cb567244933c77b2c7e827a057d3f9b/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#119-120

Please see https://eslint.org/docs/rules/consistent-return for additional information.
2019-05-11 14:27:21 +02:00
Jani Pehkonen
05c527f035 Fix glyph 0 in CIDFontType2 that has a CIDToGIDMap stream 2019-05-07 18:44:37 +03:00
Tim van der Meij
be1d6626a7
Implement creation/modification date for annotations
This includes the information in the core and display layers. The
date parsing logic from the document properties is rewritten according
to the specification and now includes unit tests.

Moreover, missing unit tests for the color of a popup annotation have
been added.

Finally the styling of the popup is changed slightly to make the text a
bit smaller (it's currently quite large in comparison to other viewers)
and to make the drop shadow a bit more subtle. The former is done to be
able to easily include the modification date in the popup similar to how
other viewers do this.
2019-05-05 14:51:03 +02:00
Jonas Jenwald
007fab6ab5 Change PartialEvaluator.handleColorN to throw when no valid pattern is found
Currently `handleColorN` will fallback to add a completely unparsed/unvalidated operator when no valid pattern was found. This is unfortunate, since it could very easily lead to a couple of different errors:
 - `DataCloneError`s when attempting to send the data to the main-thread, e.g. when `args` is `Dict`/`Stream`.
 - Errors in `getShadingPatternFromIR` on the main-thread, unless `args` just happens to have the expected format.
 - Errors when actually attempting to render the pattern on the main-thread, since the `args` will most likely not have the expected format.

Hence it probably makes sense to error in `PartialEvaluator.handleColorN`, and having invalid patterns fail gracefully via the existing `ignoreErrors` code-paths instead.
2019-05-04 12:53:18 +02:00
Tim van der Meij
155304a0c1
Merge pull request #10756 from Snuffleupagus/issue-10542
Attempt to handle corrupt PDF documents that contains path operators inside of text object (issue 10542)
2019-05-02 22:29:24 +02:00
Jonas Jenwald
96942d4f7f Ensure that the OperatorList constructor actually initializes a NullOptimizer when intended (PR 9089 follow-up)
It appears that this has been broken ever since PR 9089, which also introduced this code, since the `QueueOptimizer`/`NullOptimizer` choice was made based on the still undefined `this.intent` property.

Furthermore, fixing this also uncovered the fact that the `NullOptimizer.reset` method was missing.
2019-05-02 17:37:05 +02:00
Jonas Jenwald
5335285cda Attempt to handle corrupt PDF documents that contains path operators inside of text object (issue 10542)
First of all, while this simple approach appears to work OK in practice I'm not sure if it's the best way of addressing the problem (assuming that you even want to).
Second of all, while the solution implemented here only requires tracking/checking one new boolean in order for this to work, I'm nonetheless not entirely happy about this since it will add additional overhead (albeit *very* small) to the parsing of path operators in PDF documents just for a handful of *corrupt* ones.
2019-04-30 23:35:33 +02:00
Tim van der Meij
762c58e0fc
Merge pull request #10738 from Snuffleupagus/ViewerPreferences-api
[api-minor] Add support for ViewerPreferences in the API (issue 10736)
2019-04-20 18:39:32 +02:00
Jonas Jenwald
34952b732e Add a getDocId method to the idFactory, in Page instances, to avoid passing around PDFManager instances unnecessarily (PR 7941 follow-up)
This way we can avoid manually building a "document id" in multiple places in `evaluator.js`, and it also let's us avoid passing in an otherwise unnecessary `PDFManager` instance when creating a `PartialEvaluator`.
2019-04-20 13:11:17 +02:00
Tim van der Meij
55d9b35d37
Merge pull request #10727 from Snuffleupagus/type3-image-resources
Support (rare) Type3 fonts which contains image resources (issue 10717)
2019-04-18 23:07:26 +02:00
Jonas Jenwald
5e9b606e7b [Firefox] Avoid displaying the indeterminate loadingBar when disableStream=true is set (PR 10714 follow-up)
While PR 10714 did address the `disableRange=true` case, it also managed to "break" the `disableStream=true` case instead since the indeterminate loadingBar is now displayed when it shouldn't; sorry about that!
The solution is simple enough though, don't attempt to fallback to `_fullRequestReader.onProgress` when handling "incomplete" loading information.
2019-04-16 15:35:42 +02:00
Jonas Jenwald
311bac3ebb [api-minor] Add support for ViewerPreferences in the API (issue 10736)
Please see the specification, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#M11.9.12864.1Heading.71.Viewer.Preferences

Furthermore, note that this patch *only* adds API support and unit-tests but does not attempt to integrate e.g. the `ViewerPreferences -> Direction` property into the viewer (which would be necessary to address issue 10736).
The reason for this is that it's not entirely clear to me exactly if/how that could be implemented; e.g. would it be as simple as setting the `dir` attribute on the `viewerContainer` DOM element, or will it be more complicated?
There's also the question of how the `ViewerPreferences -> Direction` value interacts with the `PageMode`, and this will generally require a fair bit of manual testing. Since the direction of the *entire* viewer depends on the browser locale, there's also a somewhat open question regarding what default value to use for different locales.
Finally, if the viewer supports `ViewerPreferences -> Direction` then I'm assuming that it will be necessary to allow users to override the default value, which will require (most likely) new `SecondaryToolbar` buttons and icons for those etc.

Hence this patch only lays the necessary foundation for eventually addressing issue 10736, but defers the actual implementation until later. (Time permitting, I'll try to look into the viewer part later.)
2019-04-14 14:20:52 +02:00
Tim van der Meij
ae2a4dc3dd
Implement free text annotations 2019-04-13 18:45:22 +02:00
Jonas Jenwald
be604bd195 Support (rare) Type3 fonts which contains image resources (issue 10717)
The Type3 font type is not commonly used in PDF documents, as can be seen from telemetry data such as: https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2019-04-09&include_spill=0&keys=__none__!__none__!__none__&max_channel_version=nightly%252F68&measure=PDF_VIEWER_FONT_TYPES&min_channel_version=nightly%252F57&processType=*&product=Firefox&sanitize=1&sort_by_value=0&sort_keys=submissions&start_date=2019-03-18&table=0&trim=1&use_submission_date=0 (see also https://github.com/mozilla/pdf.js/wiki/Enumeration-Assignments-for-the-Telemetry-Histograms#pdf_viewer_font_types).

Type3 fonts containing image resources are *very* rare in practice, usually they only contain path rendering operators, but as the issue shows they unfortunately do exist.
Currently these Type3-related image resources are not handled in any special way, and given that fonts are document rather than page specific rendering breaks since the image resources are thus not available to the *entire* document.
Fortunately fixing this isn't too difficult, but it does require adding a couple of Type3-specific code-paths to the `PartialEvaluator`. In order to keep the implementation simple, particularily on the main-thread, these Type3 image resources are completely decoded on the worker-thread to avoid adding too many special cases. This should not cause any issues, only marginally less efficient code, but given how rare this kind of Type3 font is adding premature optimizations didn't seem at all warranted at this point.
2019-04-13 18:27:50 +02:00
Tim van der Meij
17de90b88a
Merge pull request #10694 from Snuffleupagus/main-thread-progressiveDataLength
Avoid dispatching range requests to fetch PDF data that's already loaded with streaming (PR 10675 follow-up)
2019-04-13 17:15:01 +02:00
Tim van der Meij
2d0c38d626
Merge pull request #10696 from Snuffleupagus/makeSubStream-ensureByte
Update `ChunkedStream.makeSubStream` to actually check if (some) data exists when the `length` parameter is undefined
2019-04-13 17:12:20 +02:00
Jonas Jenwald
a7273c8efe Avoid dispatching range requests to fetch PDF data that's already loaded with streaming (PR 10675 follow-up)
*Please note:* This patch purposely ignores `src/display/network.js`, since its support for progressive reading depends on the non-standard `moz-chunked-arraybuffer` responseType which is currently in the process of being removed.
2019-04-13 00:26:13 +02:00
Vlastimil Máca
d96267c30c Annotations - _preparePopup method replaced with MarkupAnnotation base class. This is just refactoring, so it shouldn't break anything. It should move annotation API closer to PDF spec and enable future expansion. 2019-04-12 11:24:21 +02:00
Tim van der Meij
4055d0a302
Implement caret annotations
The file `test/pdfs/annotation-caret-ink.pdf` is already available in
the repository as a reference test for this since I supplied it for
another patch that implemented ink annotations.
2019-04-09 23:39:56 +02:00
Tim van der Meij
ce62373db3
Merge pull request #10674 from timvandermeij/svg-backend-es6
Convert `src/display/svg.js` to ES6 syntax and implement `setRenderingIntent` and `setFlatness` for the SVG backend
2019-04-06 17:15:14 +02:00
Tim van der Meij
5a03b1c0d7
Optimize convertOpList in svg.js by computing the operator ID mapping only once
There is no need to recompute this for every operator list we encounter.
2019-04-06 16:57:31 +02:00
Tim van der Meij
2b18e5a355
Implement setRenderingIntent and setFlatness for the SVG backend
This mirrors the canvas implementation where we ignore these operators.
This avoids console spam regarding unimplemented operators we're not
interested in.

For the Tracemonkey paper, we're now down to one warning about tiling
patterns which is in fact a valid one.
2019-04-06 16:57:30 +02:00
Tim van der Meij
47d3620d5a
Convert src/display/svg.js to ES6 syntax
In particular, this should reduce intermediate string creation by using
template strings and reduce variable lookup times by removing unneeded
variables and caching `this.current` in more places.
2019-04-06 16:57:30 +02:00
Jonas Jenwald
f0a28b3c0d [Firefox] Ensure that loading progress is reported, and the loadingBar updated, when disableRange=true is set
With PR 10675 having fixed the completely broken `disableRange=true` setting in the Firefox version of PDF.js, I couldn't help but noticing that loading progress is never reported properly in that case.
Currently loading progress is only reported for the `rangeProgress` chrome-event, which obviously isn't dispatched with `disableRange=true` set. However, the `progressiveRead` chrome-event includes loading progress as well, but this information isn't being used in any way.
Furthermore, the `PDFDataRangeTransport.onDataProgress` method wasn't able to handle "complete" loading information, and neither was `PDFDataTransportStream._onProgress` since that method would only ever attempt to report it through a RangeReader (which won't exist when `disableRange=true` is set).
2019-04-06 12:53:33 +02:00
Tim van der Meij
b161050df4
Merge pull request #10709 from Snuffleupagus/pageLayout
[api-minor] Add basic support for PageLayout in the API and the viewer
2019-04-05 23:07:32 +02:00
Tim van der Meij
8c8738ea47
Merge pull request #10678 from Snuffleupagus/rm-moz-chunked-arraybuffer
Remove `moz-chunked-arraybuffer` support, and related code, from `src/display/network.js`
2019-04-05 22:52:28 +02:00
Jonas Jenwald
7a999d1d67 [api-minor] Add basic support for PageLayout in the API and the viewer
Please see the specification, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.2393749, and refer to the inline comments for additional details.
2019-04-05 11:32:01 +02:00
Tim van der Meij
57abddc9ca
Merge pull request #10713 from Snuffleupagus/rm-JSDoc-annotation
Remove `src/core/annotation.js` from the `gulp jsdoc` build target
2019-04-04 23:15:02 +02:00
Tim van der Meij
072c5864fb
Merge pull request #10675 from Snuffleupagus/PDFDataTransportStream-disableRange
[Firefox regression] Fix `disableRange=true` bug in `PDFDataTransportStream`
2019-04-04 23:07:45 +02:00
Jonas Jenwald
f666395c24 Remove src/core/annotation.js from the gulp jsdoc build target
Note how at https://mozilla.github.io/pdf.js/api/ it's being described as API docs, however `src/core/annotation.js` is not part of the public API.
Furthermore, given that the code residing in the `src/core/` folder is run in a worker-thread, it's not even accessible on the main-thread (since `postMessage` is being used to transfer the data).
Hence the different API methods simply returns a "proxy" to the underlying data, but not actually the same objects and data structures as in the worker-thread itself; thus it doesn't make a whole lot of sense to expose this in API docs as far as I'm concerned.

Finally, the patch fixes a small JSDoc related typo in `src/display/api.js` when referring to the `TextStyle` typedef.
2019-04-04 18:03:08 +02:00
Jonas Jenwald
b40e6723be Remove moz-chunked-arraybuffer support, and related code, from src/display/network.js
The `moz-chunked-arraybuffer` responseType is a non-standard property, which has been subsumed by the Fetch API, and it's in the process of being removed from Firefox; please see https://bugzilla.mozilla.org/show_bug.cgi?id=1120171 and https://bugzilla.mozilla.org/show_bug.cgi?id=1411865

*Please note:* Rather than waiting for both `Fetch` *and* `ReadableStream` to be available in e.g. a Firefox ESR version (which is probably going to be 68 at the earliest), let's just decide that PDF.js release `2.1.266` will be the last one with `moz-chunked-arraybuffer` support and land this patch (since nothing should outright break without it anyway).
2019-04-01 20:48:51 +02:00
Jonas Jenwald
c6ddbd55e2 Add a progressiveDataLength fast-path to ChunkedStream.ensureByte
This is *similar* to the existing check using in `ChunkedStream.ensureRange`.
2019-03-29 20:00:28 +01:00
Jonas Jenwald
49e8a270c4 Update ChunkedStream.makeSubStream to actually check if (some) data exists when the length parameter is undefined
Note how `XRef.fetchUncompressed`, which is used *a lot* for most PDF documents, is calling the `makeSubStream` method without providing a `length` argument.
In practice this results in the `makeSubStream` method, on the `ChunkedStream` instance, calling the `ensureRange` method with `NaN` as the end position,  thus resulting in no data being requested despite it possibly being necessary.

This may be quite bad, since in this particular case it will lead to a new `ChunkedStream` being created *and* also a new `Parser`/`Lexer` instance. Given that it's quite possible that even the very first `Parser.getObj` call could throw `MissingDataException`, this could thus lead to wasted time/resources (since re-parsing is necessary once the data finally arrives).

You obviously need to be very careful to not have `ChunkedStream.makeSubStream` accidentally requesting the *entire* file, hence its `this.end` property is of no use here, but it should be possible to at least check that the `start` of the data is present before any potentially expensive parsing occurs.
2019-03-29 17:20:31 +01:00
Tim van der Meij
b4c3b94592
Merge pull request #6606 from Rob--W/pattern-scaling
Improve performance and correctness of Tiling Patterns
2019-03-29 00:01:38 +01:00
Tim van der Meij
f9c58115fc
Merge pull request #10683 from janpe2/type0-noncid-cmap
Use CMap in Type0 fonts when CFF is not a CID font
2019-03-28 00:07:08 +01:00
Rob Wu
5985d4069a TilingPattern: Add comment to explain the implementation 2019-03-27 17:50:46 +01:00
Rob Wu
d3dc8f16b5 TilingPattern: Reverse transform after painting
This transform resulted in an incorrectly positioned object when the
bounding box's upper-left corner did not start at (0,0), because
the translation was not reverted. This patch adds the missing transform.

The test file (tiling-pattern-box.pdf) is based on the PDF from #2825.
All but the first cube (including the PDF data) have been removed.
To trigger the bug that is fixed by this commit, I changed the BBox of
the first pattern from "[ 0 0 596 842]" to "[90 0 596 842]". Without
this patch, the dashed vertical line that intersects the corners at A
and E would disappear.
2019-03-27 17:50:35 +01:00
Rob Wu
a72a8e921f Avoid extreme sizing / scaling in tiling pattern
The new test file (tiling-pattern-large-steps.pdf) was manually created,
to have the following characteristics:
- Large xstep and ystep (90000)
- Page width is 4000 (which is larger than MAX_PATTERN_SIZE)
- Visually, the page consists of a red rectangle with a black border,
  surrounded by a 50 unit white padding.
- Before patch: blurry; After patch: sharp

Fixes #6496
Fixes #5698
Fixes #1434
Fixes #2825
2019-03-27 17:44:04 +01:00
Jonas Jenwald
9077abc263 Take the FirstChar/LastChar properties into account when computing the hash in PartialEvaluator.preEvaluateFont (issue 10665)
Without this some fonts may incorrectly end up with matching `hash`es, thus breaking rendering since we'll not actually try to load/parse some of the fonts.
2019-03-27 16:27:10 +01:00
Jonas Jenwald
a2a824ed01 Don't accidentally use an empty hash value when comparing preEvaluatedFonts in PartialEvaluator.loadFont
Note that `PartialEvaluator.preEvaluateFont` will return an empty string when no hash was computed. This will complete short-circuit the `fontAlias` comparison in `PartialEvaluator.loadFont`, since fonts which are totally different will then match if their `hash`es are empty.
2019-03-27 00:54:39 +01:00
Jani Pehkonen
49c6233fbc Use CMap in Type0 fonts when CFF is not a CID font 2019-03-26 19:38:44 +02:00
Rob Wu
60d4685c10 Refactor TilingPattern
- Deduplicate size/scale calculation, by introducing `getSizeAndScale`.
- Eliminate unnecessary calculations / variables.
2019-03-26 17:35:23 +01:00
Jonas Jenwald
bb384dd5ed [Firefox regression] Fix disableRange=true bug in PDFDataTransportStream
Currently if trying to set `disableRange=true` in the built-in PDF Viewer in Firefox, either through `about:config` or via the URL hash, the PDF document will never load. It appears that this has been broken for a couple of years, without anyone noticing.

Obviously it's not a good idea to set `disableRange=true`, however it seems that this bug affects the PDF Viewer in Firefox even with default settings:
 - In the case where `initialData` already contains the *entire* file, we're forced to dispatch a range request to re-fetch already available data just so that file loading may complete.
 - (In the case where the data arrives, via streaming, before being specifically requested through `requestDataRange`, we're also forced to re-fetch data unnecessarily.) *This part was removed, to reduce the scope/risk of the patch somewhat.*

In the cases outlined above, we're having to re-fetch already available data thus potentially delaying loading/rendering of PDF files in Firefox (and wasting resources in the process).
2019-03-26 16:34:13 +01:00
wuhao.daraw
1472c10bab fix: electron enviroment detection 2019-03-26 20:52:49 +08:00
Tim van der Meij
33bfbef6ba
Merge pull request #10635 from timvandermeij/lexer-parser
Convert `src/core/parser.js` to ES6 syntax and write more unit tests for the lexer and the parser
2019-03-19 23:17:34 +01:00
Tim van der Meij
7d3cb19571
Convert the Linearization class in src/core/parser.js to ES6 syntax
Moreover, disable `var` usage for this file.
2019-03-17 13:27:45 +01:00
Tim van der Meij
ee3cfb7986
Merge pull request #10646 from terurou/svg-fill
Implement linear-gradient, radial-gradient and dummy-pattern in SVGGraphics.
2019-03-17 13:13:45 +01:00
terurou
9c70a3831c Fix to use radicalGradient. 2019-03-17 10:57:16 +09:00
Tim van der Meij
7c9f1cc518
Merge pull request #10644 from Snuffleupagus/revokeObjectURL
Ensure that `blob:` URLs will be revoked when pages are cleaned-up/destroyed (JPEG memory usage)
2019-03-16 19:29:23 +01:00
terurou
c970a4b6ae Fix copy-paste mistake. 2019-03-16 23:21:56 +09:00
Jonas Jenwald
56eeeea1dc Re-factor the getTransfers helper function into a "private" getter method on the OperatorList
This function is currently called with the `OperatorList` instance as its argument, hence I cannot think of any good reason for not just moving it into the `OperatorList` properly. (This will also help with other planned changes regarding the `ImageCache` functionality.)
2019-03-16 13:06:51 +01:00
Jonas Jenwald
7273795eb6 Actually transfer eligible ImageMask data, rather than always copying it
By transfering `ArrayBuffer`s you can avoid having two copies of the same data, i.e. one copy on each of the worker/main-thread, for data that's used only *once* on the worker-thread.

Note how the code in [`PDFImage.createMask`](80135378ca/src/core/image.js (L284-L285)) goes to great lengths to actually enable tranfering of the image data. However in [`PartialEvaluator.buildPaintImageXObject`](80135378ca/src/core/evaluator.js (L336)) the `cached` property is always set to `true`, which disqualifies the image data from being transfered; see [`getTransfers`](80135378ca/src/core/operator_list.js (L552-L554)).

For most ImageMask data this patch won't matter, since images found in the `/Resources -> /XObject` dictionary will always be indexed by name. However for *inline* images which contains ImageMask data, where only "small" images are cached (in both `parser.js` and `evaluator.js`), the current code will result in some unnecessary memory usage.
2019-03-16 13:06:32 +01:00
terurou
fc0f844539 Implement linear-gradient, radial-gradient and dummy-pattern in SVGGraphics. 2019-03-16 13:56:29 +09:00
Jonas Jenwald
88d5750030 Remove the src attribute from Image objects used with natively supported JPEG images, when pages are cleaned-up/destroyed
This will further help reduce the amount of image data that's currently being held alive, by explicitly removing the `src` attribute.

Please note that this is mostly relevant for browsers which do not support `URL.createObjectURL`, or where `disableCreateObjectURL` was manually set by the user, since `blob:` URLs will be revoked (see the previous patch).
However, using `about:memory` (in Firefox) it does seem that this may also be generally helpful, given that calling `URL.revokeObjectURL` won't invalidate the image data itself (as far as I can tell).
2019-03-15 15:25:48 +01:00
Jonas Jenwald
983b25f863 Ensure that blob: URLs will be revoked when pages are cleaned-up/destroyed
Natively supported JPEG images are sent as-is, using a `blob:` or possibly a `data` URL, to the main-thread for loading/decoding.
However there's currently no attempt at releasing these resources, which are held alive by `blob:` URLs, which seems unfortunately given that images can be arbitrarily large.

As mentioned in https://developer.mozilla.org/en-US/docs/Web/API/URL/createObjectURL the lifetime of these URLs are tied to the document, hence they are not being removed when a page is cleaned-up/destroyed (e.g. when being removed from the `PDFPageViewBuffer` in the viewer).
This is easy to test with the help of `about:memory` (in Firefox), which clearly shows the number of `blob:` URLs becomming arbitrarily large *without* this patch. With this patch however the `blob:` URLs are immediately release upon clean-up as expected, and the memory consumption should thus be considerably reduced for long documents with (simple) JPEG images.
2019-03-15 10:40:58 +01:00
Tim van der Meij
80135378ca
Merge pull request #10636 from Snuffleupagus/PDFDocumentProxy-destroy
Small clean-up of the `PDFDocumentProxy.destroy` method and related code
2019-03-13 23:46:41 +01:00
Jonas Jenwald
24fc4f83ca Small clean-up of the PDFDocumentProxy.destroy method and related code
Note how `PDFDocumentProxy.destroy` is a nothing more than an alias for `PDFDocumentLoadingTask.destroy`. While removing the latter method would be a breaking API change, there's still room for at least some clean-up here.

The main changes in this patch are:
 - Stop providing a `PDFDocumentLoadingTask` instance *separately* when creating a `PDFDocumentProxy`, since the loadingTask is already available through the `WorkerTransport` instance.
 - Stop tracking the `PDFDocumentProxy` instance on the `WorkerTransport`, since that property is completely unused.
 - Simplify the 'Multiple `getDocument` instances' unit-tests by only destroying *once*, rather than twice, for each document.
2019-03-12 13:25:29 +01:00
Jonas Jenwald
88f9e633dd Try to improve text-selection for Type3 fonts that utilize a non-default /FontMatrix (bug 1513120)
For Type3 fonts text-selection is often not that great, and there's a couple of heuristics used to try and improve things. This patch simple extends those heuristics a bit, and fixes a pre-existing "naive" array comparison, but this all feels a bit brittle to say the least.

The existing Type3 test-coverage isn't that great in general, and in particular Type3 `text` tests are few and far between, hence why this patch adds *two* different new `text` tests.
2019-03-12 10:32:08 +01:00
Tim van der Meij
8d4d7dbf58
Convert the Lexer class in src/core/parser.js to ES6 syntax 2019-03-10 19:04:36 +01:00
Tim van der Meij
7d0ecee771
Convert the Parser class in src/core/parser.js to ES6 syntax 2019-03-10 19:04:35 +01:00
Tim van der Meij
d587abbceb
Merge pull request #10633 from Snuffleupagus/murmurhash-class
Convert `MurmurHash3_64` to an ES6 class
2019-03-09 21:07:12 +01:00
Jonas Jenwald
6b1ac44aea Convert MurmurHash3_64 to an ES6 class
Notable changes:
 - Remove the `return this;` from the `MurmurHash3_64.update` method, since it's completely unused and doesn't make a lot of sense.
 - Remove the loop(s) from the `MurmurHash3_64.hexdigest` method, since creating a temporary array and then looping over it is wasteful given how simple this can be written with modern JavaScript.
2019-03-09 17:03:06 +01:00
Jonas Jenwald
2665502055 Move NativeImageDecoder into a separate file, and convert it to a class
Given the size of the `src/core/evaluator.js` file, it cannot hurt to move some of its (image related) helper functionality into a separate file.
2019-03-09 15:59:04 +01:00
Tim van der Meij
e41c4aece4
Merge pull request #10621 from janpe2/svg-Tm-stroke
Don't scale SVG stroke width by text matrix
2019-03-08 23:16:10 +01:00
Tim van der Meij
8b149b818e
Merge pull request #10615 from Snuffleupagus/corrupt-inline-ASCII85Decode
Handle corrupt ASCII85Decode inline images with whitespace "inside" of the EOD marker (issue 10614)
2019-03-08 23:06:01 +01:00
Tim van der Meij
e1b01a601c
Merge pull request #10605 from timvandermeij/display-utils
Convert `let` to `const` if possible in, and improve unit test coverage for, `src/display/display_utils.js`
2019-03-06 23:46:53 +01:00
Tim van der Meij
87a70f3359
Convert let to const if possible in src/display/display_utils.js
Finally, `var` usage is removed.
2019-03-06 23:41:54 +01:00
Jani Pehkonen
d9e30b3452 Don't scale SVG stroke width by text matrix 2019-03-05 22:54:25 +02:00
Jonas Jenwald
3ce8fe7927 Handle corrupt ASCII85Decode inline images with whitespace "inside" of the EOD marker (issue 10614)
There's a number of things wrong with the PDF document, since its inline images are first all *a lot* larger than the 4 KB limit (as mandated by the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.1852045).

Furthermore the actual ASCII85Decode data is interspersed with *a lot* of needless whitespace, in particular also "inside" of the EOD (end-of-data) marker which thus completely breaks the detection.
Note that according to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.1940130, this patch should be safe since it explicitly mentions that *all* whitespace should be ignored.
2019-03-04 23:41:36 +01:00
Jonas Jenwald
7caf769a66 Move the deprecated helper function to the src/display/display_utils.js file
Given that the function is (purposely) independent of the verbosity level and that its message is worded to only apply on the main-thread, there's no reason to duplicate this across the built `pdf.js`/`pdf.worker.js` files.
2019-03-02 20:23:56 +01:00
Jonas Jenwald
4170c414fa Reduce usage of Date.now() in src/core/worker.js
Currently for every single parsed/rendered page there's no less than *four* `Date.now()` calls being made on the worker-side. This seems totally unnecessary, since the result of these calls are, by default, not used for anything *unless* the verbosity level is set to `INFO`.
2019-03-02 20:23:52 +01:00
Tim van der Meij
c43396c2b7
Merge pull request #10590 from janpe2/svg-missing-moveto
Fix missing moveTos in SVG paths
2019-03-02 14:43:53 +01:00
Tim van der Meij
4f13eb00d0
Merge pull request #10604 from brendandahl/fix-type1-charset
Put the string name of the glyph in the charset array.
2019-03-02 13:03:16 +01:00
Brendan Dahl
7d6ab081eb Put the string name of the glyph in the charset array.
Also, only warn once per font when missing a glyph name.
2019-03-01 18:03:51 -08:00
Jonas Jenwald
d7d1f23826 Zero the width/height of the temporary canvas used during TextLayer rendering
The default size of these canvases seem to be `300 x 150` (two orders of magnitude larger than the ones in PR 10597), which probably is sufficient enough to matter since there's one such canvas for each textLayer that's rendered in the viewer.

Also fixes the incorrect rejection reason, i.e. one using a string rather than an `Error`, in the `TextLayerRenderTask.cancel` method.
2019-03-01 04:05:37 +01:00
Brendan Dahl
34022d2fd1
Merge pull request #10591 from brendandahl/fix-charset
Add unique glyph names for CFF fonts.
2019-02-28 17:22:29 -08:00
Tim van der Meij
9559d57636
Merge pull request #10595 from Snuffleupagus/JpegDecode-zero-tmpCanvas
Zero the width/height of the temporary canvas used during `JpegDecode` (issue 10594)
2019-02-28 23:41:22 +01:00
Tim van der Meij
39fa26ea33
Merge pull request #10597 from Snuffleupagus/isFontSubpixelAAEnabled-canvas-cleanup
Ensure that the temporary canvas created in `CanvasGraphics.isFontSubpixelAAEnabled` will be cleared
2019-02-28 23:37:24 +01:00
Tim van der Meij
af5597b7e5
Merge pull request #10573 from Snuffleupagus/type3-avoid-truncation
Avoid truncating/breaking some Type3 glyphs in `compileType3Glyph` (bug 1245391, issue 10568)
2019-02-28 23:25:45 +01:00
Jonas Jenwald
b61b4d3229 Ensure that the temporary canvas created in CanvasGraphics.isFontSubpixelAAEnabled will be cleared
While this particular canvas may be small, there can still be an arbitrarily large number of them (one per page rendered), which can/will eventually add up memory wise. This can be easily avoided by using the `cachedCanvases` abstraction instead, which will ensure that the `isFontSubpixelAAEnabled` canvas is removed together with other temporary canvases in `CanvasGraphics.endDrawing`.
2019-02-28 14:18:38 +01:00
Jonas Jenwald
4687cc85ac Zero the width/height of the temporary canvas used during JpegDecode (issue 10594) 2019-02-28 12:23:34 +01:00
Brendan Dahl
8a596ef5d5 Add unique glyph names for CFF fonts.
Printing on MacOS was broken with the previous approach of just mapping
all the glyphs to notdef.
2019-02-27 15:00:29 -08:00
Jonas Jenwald
f664e074c9 Avoid using the Fetch API, in GENERIC builds, for unsupported protocols (issue 10587) 2019-02-27 13:04:20 +01:00
Jonas Jenwald
cbc07f985b Load built-in CMap files using the Fetch API when possible 2019-02-27 13:04:19 +01:00
Jani Pehkonen
52e8e9b059 Fix missing moveTos in SVG paths 2019-02-26 20:00:35 +02:00
Jonas Jenwald
3a09a2f7a5 Update the year in the license_header files 2019-02-24 00:35:42 +01:00
Jonas Jenwald
db5dc14158 Move worker-thread only functions from src/shared/util.js and into a new src/core/core_utils.js file
The `src/shared/util.js` file is being bundled into both the `pdf.js` and `pdf.worker.js` files, meaning that its code is by definition duplicated.
Some main-thread only utility functions have already been moved to a separate `src/display/display_utils.js` file, and this patch simply extends that concept to utility functions which are used *only* on the worker-thread.

Note in particular the `getInheritableProperty` function, which expects a `Dict` as input and thus *cannot* possibly ever be used on the main-thread.
2019-02-24 00:35:39 +01:00
Jonas Jenwald
a1f7517996 Rename the src/display/dom_utils.js file to src/display/display_utils.js
This file (currently) contains not only DOM-specific helper functions/classes, but is used generally for various helper code relevant for main-thread functionality.
2019-02-23 16:30:16 +01:00
Jonas Jenwald
fb774a65b0 Avoid truncating/breaking some Type3 glyphs in compileType3Glyph (bug 1245391, issue 10568)
*Hopefully this patch makes sense, since I cannot claim to fully understand this function.*

With the changes made in PR 3354 *some* Type3 glyph outlines are no longer rendering correctly, since the final paths were being accidentally ignored.
The fact that Type3 fonts are not very common in PDF documents, and that most Type3 glyphs are unaffected by this regression, probably explains why this has gone unnoticed since 2013.
2019-02-21 23:29:43 +01:00
Jonas Jenwald
60f6d49ff7 [api-minor] Expose the existence of a Collection dictionary via the getMetadata API method (issue 10555)
Given the complexity of this functionality, and the fact that it doesn't seem widely used, I highly doubt that it'd ever make sense to support Collections; see also https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#M11.9.39646.2Heading.824.Collections
2019-02-15 15:40:31 +01:00
Jonas Jenwald
b6d090cc14 Fallback to the built-in font renderer when font loading fails
After PR 9340 all glyphs are now re-mapped to a Private Use Area (PUA) which means that if a font fails to load, for whatever reason[1], all glyphs in the font will now render as Unicode glyph outlines.
This obviously doesn't look good, to say the least, and might be seen as a "regression" since previously many glyphs were left in their original positions which provided a slightly better fallback[2].

Hence this patch, which implements a *general* fallback to the PDF.js built-in font renderer for fonts that fail to load (i.e. are rejected by the sanitizer). One caveat here is that this only works for the Font Loading API, since it's easy to handle errors in that case[3].

The solution implemented in this patch does *not* in any way delay the loading of valid fonts, which was the problem with my previous attempt at a solution, and will only require a bit of extra work/waiting for those fonts that actually fail to load.

*Please note:* This patch doesn't fix any of the underlying PDF.js font conversion bugs that's responsible for creating corrupt font files, however it does *improve* rendering in a number of cases; refer to this possibly incomplete list:

[Bug 1524888](https://bugzilla.mozilla.org/show_bug.cgi?id=1524888)
Issue 10175
Issue 10232

---
[1] Usually because the PDF.js font conversion code wasn't able to parse the font file correctly.

[2] Glyphs fell back to some default font, which while not accurate was more useful than the current state.

[3] Furthermore I'm not sure how to implement this generally, assuming that's even possible, and don't really have time/interest to look into it either.
2019-02-11 10:27:08 +01:00
Jonas Jenwald
13230a1123 Remove the ability to pass in more than one font to BaseFontLoader.bind
- The only existing call-site, of this method, is never passing more than *one* font at a time anyway.
 - As far as I can remember, this functionality has never actually been used (caveat: I didn't check the git history).
 - This allows simplification of the method, especially by making use of the fact that it's now asynchronous.
 - It should be just as easy to call `BaseFontLoader.bind` from within a loop, rather than having the loop in the method itself.
2019-02-10 21:09:57 +01:00
Jonas Jenwald
af3fcca88d Convert BaseFontLoader.bind to be async, and only utilize BaseFontLoader._queueLoadingCallback when actually necessary
Currently all fonts are using the `_queueLoadingCallback` method to determine when they have been loaded[1]. However in most cases this is just adding unnecessary overhead, especially with `BaseFontLoader.bind` now being asynchronous, given how fonts are loaded:
 - For fonts loaded using the Font Loading API, it's already possible to easily tell when a font has been loaded simply by checking the `loaded` promise on the FontFace object itself.
 - For browsers, e.g. Firefox, which support synchronous font loading it's already assumed that fonts are immediately available.

Hence the `_queueLoadingCallback` method is moved into the `GenericFontLoader`, such that it's only utilized for fonts which are loaded using CSS.

---
[1] In the "fonts loaded using CSS" case, this is already a hack anyway as outlined in the comments.
2019-02-10 21:09:57 +01:00
Tsukasa OI
96ba6afd47 Fix copying on supplementary plane characters
pdf.js had a problem when copying characters on supplementary planes
(0xPPXXXX where PP is nonzero).  This is because certain methods of
PartialEvaluator use classic String.fromCharCode instead of ES6's
String.fromCodePoint.

Despite the fact that readToUnicode method *tried* to parse out-of-UCS2
code points by parsing UTF-16BE, it was inadequate because
String.fromCharCode only supports UCS-2 range of Unicode.
2019-02-10 18:14:53 +09:00
Jonas Jenwald
3bcf9187ec Add a polyfill for classList.{add, remove} with more than one parameter
Unsurprisingly IE11 doesn't support this, so a polyfill is needed since otherwise the sidebar can no longer be opened.

Also, simplifies the existing `classList.toggle` polyfill.
2019-02-08 13:35:01 +01:00
Jonas Jenwald
614e502227 [api-minor] Remove the document.currentScript polyfill
This polyfill is currently used in only *one* file, i.e. `src/display/api.js`, and only when trying to build a *fallback* `workerSrc` path.

Given that the global `workerSrc` should *always* be set[1] when using the PDF.js library[2], and that the fallback `workerSrc` should only be regarded as a best-effort solution anyway, there isn't a particularily strong reason to keep the compatibility code in my opinion.

---
[1] Other supported options include setting the global `workerPort`, or passing in a `PDFWorker` instance as part of the `getDocument` call.

[2] Which is clearly mentioned in the JSDocs in `src/display/worker_options.js`.
2019-02-03 14:09:24 +01:00
Jonas Jenwald
22468817e1 Add a settled property, tracking the fulfilled/rejected stated of the Promise, to createPromiseCapability
This allows cleaning-up code which is currently manually tracking the state of the Promise of a `createPromiseCapability` instance.
2019-02-02 15:18:56 +01:00
Jonas Jenwald
6f94a05a29 Do the final text scaling correctly in flushTextContentItem (issue 8276)
It's necessary to take into account whether or not the text is vertical, to avoid either the textContent `width` or `height` becoming incorrect.
2019-01-29 15:24:04 +01:00
Jonas Jenwald
5081063b9e Attempt to clean-up/restore pending rendering operations when errors occurs while a RenderTask runs (PR 10202 follow-up)
This piggybacks of the existing `cancel` functionality, to ensure that any pending operations are closed *and* that any temporary canvases are actually being removed.

Also simplifies `finishPaintTask` in `PDFPageView.draw` slightly, by converting it to an async function.
2019-01-26 16:02:51 +01:00
Jonas Jenwald
29f36d7a1b Reduce unnecessary duplication of the isDefaultDecode methods on ColorSpace instances
The recent PR 10482 made me realize that I missed an opportunity for simplification when doing the class conversion of this code in PR 10007.
2019-01-25 08:53:08 +01:00
Tim van der Meij
e2701d5422
Merge pull request #10482 from janpe2/indexed-decode
Implement Decode entry in Indexed images
2019-01-24 23:46:55 +01:00
Jonas Jenwald
41fbc71ef9 Ensure that XRef.indexObjects can handle object numbers with zero-padding (issue 10491)
All objects in the PDF document follow this pattern:
```
0000000001 0 obj
<<
% Some content here...
>>
endobj
0000000002 0 obj
<<
% More content here...
endobj

```
2019-01-24 22:37:18 +01:00
Jonas Jenwald
249b199ff1 Stop bundling the ReadableStream polyfill in MOZCENTRAL builds (PR 10470 follow-up)
Based on the discussion in https://bugzilla.mozilla.org/show_bug.cgi?id=1521413, this patch simply removes the `ReadableStream` polyfill completely from MOZCENTRAL builds.

With this patch, the size of the `gulp mozcentral` build target is thus further reduced (building on PR 10470):

|       | `build/mozcentral`
|-------|-------------------
|master |   3 339 666
|patch  |   3 209 572
2019-01-23 20:33:20 +01:00
Jani Pehkonen
26121177ab Implement Decode entry in Indexed images 2019-01-22 22:51:04 +02:00
Jonas Jenwald
01d624f6a0 Add an Array.from polyfill, using core-js, and remove some compatibility hacks from the src/display/content_disposition.js file 2019-01-20 08:49:20 +01:00
Tim van der Meij
66acc7397f
Merge pull request #10470 from Snuffleupagus/mozcentral-streams
Try to, completely, avoid loading the `ReadableStream` polyfill in MOZCENTRAL builds
2019-01-19 21:22:18 +01:00
Jonas Jenwald
480110625a Try to, completely, avoid loading the ReadableStream polyfill in MOZCENTRAL builds
With https://bugzilla.mozilla.org/show_bug.cgi?id=1505122 landing in Firefox 65, the native `ReadableStream` implementation is now enabled by default in Firefox.

Obviously it would be nice to simply stop bundling the polyfill in MOZCENTRAL builds altogether, however given that it's still possible to disable[1] `ReadableStream` this is probably not a good idea just yet.
Nonetheless, now that native support is available, it seems unnecessary (and wasteful) to keep bundling the polyfill twice[2] in MOZCENTRAL builds. Hence this patch, which contains a suggest approach for packing the polyfill in a *separate* file which is then *only* loaded if/when needed.

With this patch, the size of the `gulp mozcentral` build target is thus reduced accordingly:

|       | `build/mozcentral`
|-------|-------------------
|master |   3 461 089
|patch  |   3 340 268

Besides the PDF.js files taking up less space in Firefox this way, the additional benefit is that there's (by default) less code that needs to be loaded and parsed when the PDF Viewer is used which also cannot hurt.

---
[1] In `about:config`, by toggling the `javascript.options.streams` preference.

[2] Once in the `build/pdf.js` file, and once in the `build/pdf.worker.js` file.
2019-01-19 09:05:01 +01:00
Jonas Jenwald
24a688d6c6 Convert some usage of indexOf to startsWith/includes where applicable
In many cases in the code you don't actually care about the index itself, but rather just want to know if something exists in a String/Array or if a String starts in a particular way. With modern JavaScript functionality, it's thus possible to remove a number of existing `indexOf` cases.
2019-01-18 17:57:41 +01:00
Tim van der Meij
cdbc33ba06
Merge pull request #10457 from Snuffleupagus/metadata-tests
When parsing Metadata, attempt to remove "junk" before the first tag (PR 10398 follow-up)
2019-01-16 23:03:39 +01:00
Jonas Jenwald
68ad3e8e9d Tweak the DOMTokenList.toggle polyfill (issue 10460) 2019-01-16 20:15:44 +01:00
Jonas Jenwald
9f45f8dfda When parsing Metadata, attempt to remove "junk" before the first tag (PR 10398 follow-up)
This will allow the Metadata to be successfully extracted from the PDF file in issue 10395.
Furthermore, this patch also fixes a bug in `Metadata.get` which causes the method to return `null` rather than an empty string or zero (since either ought to be allowed).
2019-01-16 12:44:27 +01:00
Jonas Jenwald
b531fc4106 Avoid truncating inline images, where the data and the "EI" marker is glued together (issue 10388) (#10436)
Thanks to the *excellent* debugging done by @janpe2, this was easy to fix!
2019-01-12 20:31:23 +01:00
Jonas Jenwald
d4a3858ed5 Handle more cases of corrupt PDF files with missing 'endobj' operators, where the "obj" string is immediately followed by the dictionary (PR 9288 follow-up) 2019-01-10 17:55:28 +01:00
Jonas Jenwald
358cd0c096 Add a few more String polyfills (startsWith, endsWith, padStart, padEnd) 2019-01-06 20:10:55 +01:00
Tim van der Meij
f162fed6b9
Convert src/core/charsets.js and src/core/standard_fonts.js to ES6 syntax
Moreover, include the "no var" ESLint comment to
`src/core/annotation.js` and `src/core/ps_parser.js` since they are
already converted.
2019-01-06 15:04:01 +01:00
Tim van der Meij
3b637e71d4
Convert src/core/arithmetic_decoder.js to ES6 syntax 2019-01-06 15:04:01 +01:00
Tim van der Meij
b81984f0cb
Merge pull request #10417 from brendandahl/metric-length
Fix reading number of HTMX metrics.
2019-01-05 13:35:16 +01:00
Jonas Jenwald
e8f4b47d59 Prevent errors, in SimpleXMLParser.onEndElement, when the stack has already been completely parsed (issue 10410)
The error was triggered for a particular set of metadata, where an end tag was encountered without the corresponding begin tag being present in the data.
(The patch also fixes a minor oversight, from a recent PR, in the `SimpleDOMNode.nextSibling` method.)
2019-01-05 11:15:34 +01:00
Brendan Dahl
32eace043b Fix reading number of HTMX metrics.
The length of the HHEA table can be incorrect, so it is better to
read the number of metrics offset from beginning of table instead.
2019-01-04 15:13:13 -08:00
Tim van der Meij
b39ec7af96
Merge pull request #10408 from Snuffleupagus/issue-10407
Prevent errors, because of incorrect scope, in the `XMLParserBase._resolveEntities` method (issue 10407)
2019-01-04 23:45:26 +01:00
Jonas Jenwald
66fccd860b Adjust how AnnotationBorderStyle.setWidth handles the input being a Name (issue 10385)
In order to be consistent with the behaviour in Adobe Reader, the width will now always be set to zero when the input is a `Name`.
2019-01-04 10:38:10 +01:00
Jonas Jenwald
6cd9ff48f3 Prevent errors, because of incorrect scope, in the XMLParserBase._resolveEntities method (issue 10407) 2019-01-04 10:13:32 +01:00
Tim van der Meij
2d00bb098b
Merge pull request #10404 from Snuffleupagus/issue-10401
Remove the `for ... of` loop from the `PDFDocument.fingerprint` getter (issue 10401)
2019-01-03 22:46:51 +01:00
Brendan Dahl
e2686db49b
Merge pull request #10277 from janpe2/cff-stems
Repair CFF fonts if stem hints are in wrong order
2019-01-03 10:30:43 -08:00
Jonas Jenwald
8c278530dd Remove the for ... of loop from the PDFDocument.fingerprint getter (issue 10401)
It appears that the `Symbol` polyfill doesn't work well in conjunction with `TypedArray`s, and that part of PR 10393 is thus reverted.
2019-01-03 11:17:45 +01:00
Tim van der Meij
1b84b2ed60
Merge pull request #10398 from Snuffleupagus/issue-10395
Prevent errors in various methods in `SimpleDOMNode` when the `childNodes` property is not defined (issue 10395)
2019-01-01 16:22:11 +01:00
Jonas Jenwald
d371d23382 Prevent errors in various methods in SimpleDOMNode when the childNodes property is not defined (issue 10395)
Given that the issue, as filed, is incomplete since no PDF file was provided for debugging, this patch is really the best that we can do here. *Please note:* This patch will *not* enable the Metadata to be successfully parsed, but it should at least prevent the errors.
2018-12-31 13:07:15 +01:00
Tim van der Meij
d8f201ea2a
Merge pull request #10397 from Snuffleupagus/issue-10385
Ensure that `AnnotationBorderStyle.setWidth` is able to handle the input being a `Name`, to correctly deal with corrupt PDF documents (issue 10385)
2018-12-31 12:58:28 +01:00
Jonas Jenwald
76a9580aeb Ensure that AnnotationBorderStyle.setWidth is able to handle the input being a Name, to correctly deal with corrupt PDF documents (issue 10385) 2018-12-31 12:21:28 +01:00
Jonas Jenwald
15b3806937 Actually validate the input in AnnotationBorderStyle.setStyle 2018-12-31 12:15:15 +01:00
Tim van der Meij
5b57e69da2
Optimize CanvasGraphics.setFont to avoid intermediate string creation
This method creates quite a few intermediate strings on each call and
it's called often, even for smaller documents like the Tracemonkey
document. Scrolling from top to bottom in that document resulted in
14126 strings being created in this method. With this commit applied,
this is reduced to 2018 strings.
2018-12-30 14:58:32 +01:00
Tim van der Meij
95f9075565
Optimize TextLayerRenderTask._layoutText to avoid intermediate string creation
This method creates quite a few intermediate strings on each call and
it's called often, even for smaller documents like the Tracemonkey
document. Scrolling from top to bottom in that document resulted in
12936 strings being created in this method. With this commit applied,
this is reduced to 3610 strings.
2018-12-30 14:39:08 +01:00
Tim van der Meij
d5e5d18430
Convert the PDFDocument class in src/core/document.js to ES6 syntax 2018-12-30 13:54:43 +01:00
Tim van der Meij
612fc9fcc2
Convert the Page class in src/core/document.js to ES6 syntax 2018-12-30 13:54:43 +01:00
Tim van der Meij
aad27ff9a0
Optimize the Ref class in src/core/primitives.js
The `toString` method always creates two string objects (for the 'R'
character and for the `num` concatenation) and in the worst case
creates three string objects (one more for the `gen` concatenation).
For the Tracemonkey paper alone, this resulted in 12000 string
objects when scrolling from the top to the bottom of the document.
Since this is a hot function, it's worth minimizing the number of string
objects, especially for large documents, to reduce peak memory usage.

This commit refactors the `toString` method to always create only one
string object.
2018-12-29 17:48:41 +01:00
Jonas Jenwald
60bcce184e Check that the first page can be successfully loaded, to try and ascertain the validity of the XRef table (issue 7496, issue 10326)
For PDF documents with sufficiently broken XRef tables, it's usually quite obvious when you need to fallback to indexing the entire file. However, for certain kinds of corrupted PDF documents the XRef table will, for all intents and purposes, appear to be valid. It's not until you actually try to fetch various objects that things will start to break, which is the case in the referenced issues[1].

Since there's generally a real effort being in made PDF.js to load even corrupt PDF documents, this patch contains a suggested approach to attempt to do a bit more validation of the XRef table during the initial document loading phase.

Here the choice is made to attempt to load the *first* page, as a basic sanity check of the validity of the XRef table. Please note that attempting to load a more-or-less arbitrarily chosen object without any context of what it's supposed to be isn't a very useful, which is why this particular choice was made.
Obviously, just because the first page can be loaded successfully that doesn't guarantee that the *entire* XRef table is valid, however if even the first page fails to load you can be reasonably sure that the document is *not* valid[2].

Even though this patch won't cause any significant increase in the amount of parsing required during initial loading of the document[3], it will require loading of more data upfront which thus delays the initial `getDocument` call.
Whether or not this is a problem depends very much on what you actually measure, please consider the following examples:

```javascript
console.time('first');
getDocument(...).promise.then((pdfDocument) => {
  console.timeEnd('first');
});

console.time('second');
getDocument(...).promise.then((pdfDocument) => {
  pdfDocument.getPage(1).then((pdfPage) => { // Note: the API uses `pageNumber >= 1`, the Worker uses `pageIndex >= 0`.
    console.timeEnd('second');
  });
});
```

The first case is pretty much guaranteed to show a small regression, however the second case won't be affected at all since the Worker caches the result of `getPage` calls. Again, please remember that the second case is what matters for the standard PDF.js use-case which is why I'm hoping that this patch is deemed acceptable.

---
[1] In issue 7496, the problem is that the document is edited without the XRef table being correctly updated.
In issue 10326, the generator was sorting the XRef table according to the offsets rather than the objects.

[2] The idea of checking the first page in particular came from the "standard" use-case for the PDF.js library, i.e. the default viewer, where a failure to load the first page basically means that nothing will work; note how `{BaseViewer, PDFThumbnailViewer}.setDocument` depends completely on being able to fetch the *first* page.

[3] The only extra parsing is caused by, potentially, having to traverse *part* of the `Pages` tree to find the first page.
2018-12-29 12:47:25 +01:00
Tim van der Meij
360c3d3813
Remove the unused url argument for the ChunkedStreamManager class 2018-12-24 13:14:42 +01:00
Tim van der Meij
47344197f4
Convert src/core/chunked_stream.js to ES6 syntax 2018-12-24 13:14:42 +01:00
Tim van der Meij
103f4616ac
Merge pull request #10334 from Snuffleupagus/OpenAction-dest
[api-minor] Add support for OpenAction destinations (issue 10332)
2018-12-23 20:49:50 +01:00
Jonas Jenwald
f0719ed565 [api-minor] Change the getViewport method, on PDFPageProxy, to take a parameter object rather than a bunch of (randomly) ordered parameters
If, as PR 10368 suggests, more parameters should be added to `getViewport` I think that it would be a mistake to not change the signature *first* to avoid needlessly unwieldy call-sites.

To not break any existing code and third-party use-cases, this is obviously implemented with a deprecation warning *and* with a working fallback[1] for the old method signature.

---
[1] This is limited to `GENERIC` builds, which should be sufficient.
2018-12-21 11:55:20 +01:00
Jonas Jenwald
b05f053287 [api-minor] Add support for OpenAction destinations (issue 10332)
Note that the OpenAction dictionary may contain other information besides just a destination array, e.g. instructions for auto-printing[1].
Given first of all that an arbitrary `Dict` cannot be sent from the Worker (since cloning would fail), and second of all that the data obviously needs to be validated, this patch purposely only adds support for fetching a destination from the OpenAction entry[2].

---
[1] This information is, currently in PDF.js, being included through the `getJavaScript` API method.

[2] This significantly reduces the complexity of the implementation, which seems fine for now. If there's ever need for other kinds of OpenAction to be fetched, additional API methods could/should be implemented as necessary (could e.g. follow the `getOpenActionWhatever` naming scheme).
2018-12-19 11:45:16 +01:00
Jonas Jenwald
ba2edeae18 [api-minor] Add support, in getMetadata, for custom information dictionary entries (issue 5970, issue 10344) (#10346)
The custom entries, provided that they exist *and* that their types are safe to include, are exposed through a new `Custom` infoDict entry to clearly separate them from the standard ones.

Fixes 5970.
Fixes 10344.
2018-12-18 23:26:02 +01:00
Jonas Jenwald
437fb8a8a7 Ignore the fieldValue for Signature annotations, since they're currently unsupported (issue 10374)
Given that Signature (Widget) annotations are currently not supported, since they cannot be validated, simply ignoring the `fieldValue` seems OK for now considering that attempting to blindly include unparsed/unvalidated data isn't very useful.

Fixes 10347.
2018-12-12 18:01:43 +01:00
Tim van der Meij
45c0197465
Merge pull request #10330 from janpe2/svg-line-width-zero
Handle line width of zero in SVG
2018-12-07 23:34:27 +01:00
Jani Pehkonen
ddabeb0645 Handle line width of zero in SVG 2018-12-04 16:05:32 +02:00
Jonas Jenwald
d0fec7c6fb Fix NameOrNumberTree.get to actually perform a binary search to find the requested key
The intent of the code, based on existing comments, is to perform a binary search. However, because of what appears to be a typo in the code responsible for computing the current search index, this code is always checking *every* entry (albeit only at the "final" node) starting from the last one.
2018-11-23 23:52:33 +01:00
Jonas Jenwald
fdad0a0b0b Fallback to an exhaustive search, in corrupt PDF files, for NameTrees/NumberTrees that are not correctly ordered (issue 10272)
According to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.2384179, the keys of NameTree/NumberTree should be ordered.
For corrupt PDF files, which violate this assumption, we thus need to fallback to an exhaustive search in order to e.g. find all destinations.

*Please note:* Given that this only implements a fallback for the "final" node of the Tree, there's obviously a risk that the patch isn't sufficient for dealing with all kinds of out-of-order corruption. However, this kind of problem should be rare in practice, and without a real-world test-case it's difficult to implement a completely general solution (and there's obviously a question if you'd even want to).
2018-11-20 17:50:47 +01:00
Jani Pehkonen
9e990f6f3e Repair CFF fonts if stem hints are in wrong order 2018-11-20 18:50:37 +02:00
Jonas Jenwald
ac6b94c9dd Replace the remaining occurences, in src/display/api.js, of var with let/const 2018-11-18 19:08:27 +01:00
Jonas Jenwald
061f7bd2f3 Convert PDFWorker, in src/display/api.js, to an ES6 class
Also changes all occurrences of `var` to `let`/`const` in this code.
2018-11-18 19:08:27 +01:00
Jonas Jenwald
02e77a39ec Convert InternalRenderTask, in src/display/api.js, to an ES6 class
This changes all occurrences of `var` to `let`/`const` in this code, and updates the signature of the constructor to use object destructuring for better readability (and self documentation).
Also, `useRequestAnimationFrame` is changed to a parameter and the `typeof window` check is now done *once* rather than at every `_scheduleNext` call.
2018-11-18 19:08:27 +01:00
Jonas Jenwald
5a0d64a6de Convert PDFPageProxy, in src/display/api.js, to an ES6 class
This changes all occurrences of `var` to `let`/`const` in this code, and updates the signatures of a couple of methods to use object destructuring.
Finally, when creating `InternalRenderTask` instances *only* the necessary parameter are now provided, since passing through the `RenderParameters` as-is seems completely unnecessary.
2018-11-18 19:08:25 +01:00
Jonas Jenwald
2c003a82d5 Convert RenderTask, in src/display/api.js, to an ES6 class
Also deprecates the `then` method, in favour of the `promise` getter.
2018-11-18 19:08:00 +01:00
Jonas Jenwald
ef8e5fd77c Convert PDFDocumentLoadingTask, in src/display/api.js, to an ES6 class
Also deprecates the `then` method, in favour of the `promise` getter.
2018-11-18 19:07:57 +01:00
PalmerAL
5f15dc2023 Use span instead of div in the text layer
This improves copy/pasting text content since it reduces the amount of unnecessary newlines.
2018-11-18 15:54:08 +01:00
Tim van der Meij
2194aef03e
Merge pull request #10238 from Snuffleupagus/interfaces
Move the `interface` definitions out of `src/core/worker.js` and into their own file
2018-11-08 23:28:46 +01:00
Jonas Jenwald
4829f567c1 Move the interface definitions out of src/core/worker.js and into their own file
These interfaces are already used in different files, in both the `src/core/` and `src/display/` folders, and having them reside in their own file seems a lot clearer and is also similar to the existing viewer interfaces.

As part of moving the `interface` definitions, they're also converted to ES6 classes.
2018-11-08 13:21:37 +01:00
Jonas Jenwald
60da2d882b [api-minor] Refactor/simplify the PDFObject class
First of all, note how there's currently *two* methods for checking if a certain object exists, which seems completely unwarranted.
Furthermore, the rarely used `getData` method was removed and its only callsite changed to use a combination of `PDFObjects.{has, get}` instead.
Finally, the methods were rearranged slightly, to bring the most important ones (for an API user) to the top of the class.
2018-11-08 10:13:39 +01:00
Jonas Jenwald
d32321d84f Convert PDFObjects, in src/display/api.js, to an ES6 class
Also changes all occurrences of `var` to `const`, and marks internal properties/methods as "private".
2018-11-08 10:11:40 +01:00
Tim van der Meij
3e342554d1
Merge pull request #10228 from morille/patch-2
Don't detect nw.js as node.js
2018-11-07 23:51:01 +01:00
Romain Petit
13b0ca6b2a Don't detect nw.js as node.js
nw.js is chrome plus nodejs.
It will succeed everywhere chrome succeeds, but fail in many cases where nodejs succeeds (see issue 9071).
So it's safer to consider it as a browser context rather than a nodejs context.

Make travis happy again

CS

Readability + Explanation

The relevant portion of the NW.js documentation:
http://docs.nwjs.io/en/latest/For%20Users/Advanced/JavaScript%20Contexts%20in%20NW.js/#access-nodejs-and-nwjs-api-in-browser-context

Added full link to relevant doc.
2018-11-07 11:14:22 +01:00
Jonas Jenwald
a963d139dc Convert src/core/ps_parser.js to use ES6 classes
Besides being a fairly small and self-contained file, this code also shows a possible way of defining static constants on classes.
2018-11-03 17:43:06 +01:00
Tim van der Meij
ec76aa531e
Merge pull request #10202 from Snuffleupagus/issue-10200
Attempt to clean-up/restore pending rendering operations on `RenderTask.cancel` (issue 10200)
2018-11-02 23:11:47 +01:00
Jonas Jenwald
f23dba1c10 Change canvasInRendering to a WeakSet instead of a WeakMap
Note how nowhere in the code `canvasInRendering.get()` is ever called, and that this structure is really only used to store references to `<canvas>` DOM elements.
The reason for this being a `WeakMap` is probably because at the time we weren't using `core-js` polyfills yet, and since there already existed a manually implemented `WeakMap` polyfill it was probably simpler to use that.
2018-10-31 18:15:23 +01:00
Jonas Jenwald
f77b463339 Attempt to clean-up/restore pending rendering operations on RenderTask.cancel (issue 10200)
Please note that, given the lack of a runnable example, I'm not totally sure if this first of all is enough to *completely* address the issue as filed and second of all if we actually want this new behaviour.
2018-10-31 16:22:17 +01:00
Tim van der Meij
ed4ac1bc67
Merge pull request #10162 from janpe2/svg-normalize-bbox
Normalize BBox of form XObjects in SVG back-end
2018-10-28 13:18:48 +01:00
Jani Pehkonen
9cd5f94f03 Normalize the BBox of form XObjects on the /core side 2018-10-22 14:17:05 +03:00
Jonas Jenwald
5bb7f4b615 Convert PDFDataRangeTransport to an ES6 class 2018-10-20 17:15:27 +02:00
Tim van der Meij
d21892933d
Merge pull request #10161 from Snuffleupagus/DataLoaded-onProgress
Ensure that `onProgress` is always called when the entire PDF file has been loaded, regardless of how it was fetched (issue 10160)
2018-10-20 15:22:05 +02:00
Jonas Jenwald
54f9883c51 Export CMapCompressionType and PermissionFlag on the pdfjsLib object (issue 10148, PR 10033 follow-up)
`CMapCompressionType` makes a lot of sense to export, for anyone attempting to implement a custom `CMapReaderFactory`; fixes 10148.

`PermissionFlag` likewise needs to be exported, since otherwise the result of the `getPermissions` API method becomes difficult to interpret; follow-up to 10033.
2018-10-20 11:38:00 +02:00
Jonas Jenwald
327f2eb588 Ensure that onProgress is always called when the entire PDF file has been loaded, regardless of how it was fetched (issue 10160)
*Please note:* I'm totally fine with this patch being rejected, and the issue closed as WONTFIX; however these changes should address the issue if that's desired.

From a conceptual point of view, reporting loading progress doesn't really make a lot of sense for PDF files opened by passing raw binary data directly to `getDocument` (since obviously *all* data was loaded).
This is compared to PDF files loaded via e.g. `XMLHttpRequest` or the Fetch API, where the entire PDF file isn't available from the start and knowing the loading progress makes total sense.

However I can certainly see why the current API could be considered inconsistent, which isn't great, since a registered `onProgress` callback will never be called for certain `getDocument` calls.
The simplest solution to this inconsistency thus seem to be to ensure that `onProgress` is always called when handling the `DataLoaded` message, since that will *always* be dispatched[1] from the worker-thread.

---
[1] Note that this isn't guaranteed to happen, since setting `disableAutoFetch = true` often prevents the *entire* file from ever loading. However, this isn't relevant for the issue at hand, and is a well-known consequence of using `disableAutoFetch = true`; note how the default viewer even has a specialized code-path for hiding the loadingBar.
2018-10-16 13:51:12 +02:00
Jonas Jenwald
4cde844ffe Add a DOMTokenList.toggle polyfill for the second, optional, "force" parameter
This is based on the polyfill available at https://developer.mozilla.org/en-US/docs/Web/API/Element/classList#Polyfill
2018-10-12 15:41:09 +02:00
Tim van der Meij
9e9426c354
Merge pull request #10143 from Snuffleupagus/getMainThreadWorkerMessageHandler-catch-errors
Ensure that `getMainThreadWorkerMessageHandler` won't accidentally break `getDocument` (PR 10139 follow-up)
2018-10-11 00:05:01 +02:00
Jonas Jenwald
0e2c6047e4 Ensure that getMainThreadWorkerMessageHandler won't accidentally break getDocument (PR 10139 follow-up)
*This should have been part of PR 10139.*

In the event that a user has attempted to manually load the worker file on the main-thread, but somehow failed to do that correctly, there's a possibility that `getMainThreadWorkerMessageHandler` could throw. Considering how/where that helper function is being called, an error could still prevent `PDFDocumentLoadingTask` from completing (regardless if it's being resolved/rejected).
2018-10-09 15:44:31 +02:00
Jonas Jenwald
21c8dd4842 Combine the pdfjsFilePath and fallback workerSrc handling in src/display/api.js
With the way that the `getWorkerSrc()` helper function is implemented now, there's no longer a particularly strong reason for keeping the global `pdfjsFilePath` variable around.
With this patch the fallback `workerSrc` will thus, assuming is wasn't already set, be set to the "pdfjsFilePath" which simplifies the `getWorkerSrc()` function and reduces the amount of global state.

Finally, the global `workerSrc` variable was renamed to prevent shadowing.
2018-10-09 13:47:48 +02:00
Tim van der Meij
f45e46d7ad
Merge pull request #10133 from kevinleedrum/fix-content-length
Set returnValues.suggestedLength to Content-Length if integer
2018-10-09 00:05:57 +02:00
Kevin Lee Drum
4cf10ac79d set returnValues.suggestedLength to Content-Length if integer 2018-10-07 13:26:29 -04:00
Jonas Jenwald
755c6edc5e Ensure that the PDFDocumentLoadingTask is rejected when "setting up fake worker" failed (issue 10135)
This should, hopefully, cover all the possible ways[1] in which "fake workers" are loaded. Given the different code-paths, adding unit-tests might not be that simple.
Note that in order to make this work, the various `fakeWorkerFilesLoader` functions were converted to return `Promises`.

---
[1] Unfortunately there's lots of them, for various build targets and configurations.
2018-10-06 13:18:51 +02:00
Simon Leblanc
b5806735d8 Add support of Ink annotation 2018-10-03 00:28:49 +02:00
Tim van der Meij
138324502c
Merge pull request #10119 from Snuffleupagus/rm-onFileAttachmentAnnotation
Attempt to simplify the `fileattachmentannotation` event dispatching
2018-10-02 23:25:22 +02:00
Jonas Jenwald
d60ce998f1 Attempt to simplify the fileattachmentannotation event dispatching
This attempts to reduced the level of indirection, and the amount of code, when dispatching `fileattachmentannotation` events, by removing the `PDFLinkService.onFileAttachmentAnnotation` method and just accessing `PDFLinkService.eventBus` directly in the `FileAttachmentAnnotationElement` constructor.
Given that other properties, such as `externalLinkTarget`/`externalLinkRel`, are already being accessed directly this pattern seems fine here as well.
2018-10-01 15:09:08 +02:00
Jonas Jenwald
d6f4d2ff33 Add a Symbol polyfill, using core-js, to allow using for...of loops
https://github.com/zloirock/core-js#ecmascript-symbol
2018-09-29 16:05:00 +02:00
Jonas Jenwald
435ec6a0d5 Use the Font Loading API in MOZCENTRAL builds, and GENERIC builds for Firefox version 63 and above (issue 9945) 2018-09-29 16:05:00 +02:00
Jonas Jenwald
05b021bcce Refactor the FontLoader into proper, build-specific, ES6 classes
Also changes `var` to `let`/`const` in code already touched in the patch, and makes use of template strings in a few spots.
2018-09-29 16:05:00 +02:00
Jonas Jenwald
45d6651976 Refactor unused Date.now() calls in FontLoader.queueLoadingCallback
The `started` timestamp is completely usused, and the `end` timestamp is currently[1] being used essentially like a boolean value.
Hence this code can be simplified to use an actual boolean value instead, which avoids potentially hundreds (or even thousands) of unnecessary `Date.now()` calls.

---
[1] Looking briefly at the history of this code, I cannot tell if the timestamps themselves were ever used for anything (except for tracking "boolean" state).
2018-09-29 15:57:04 +02:00
Jonas Jenwald
ad3e937816 Replace the Font.loading property with, the already existing, Font.missingFile property
The `Font.loading` property is only ever used *once* in the code, whereas `Font.missingFile` is more widely used. Furthermore the name `loading` feels, at least to me, slight less clear than `missingFile`. Finally, note that these two properties are the inverse of each other.
2018-09-29 15:57:04 +02:00
Jonas Jenwald
caf90ff6ee Convert FontFaceObject to an ES6 class
Also changes `var` to `let`/`const` in code already touched in the patch, and makes use of template strings in a few spots.
2018-09-29 15:57:04 +02:00
Jonas Jenwald
842e9206c0 Replace String.prototype.substr() occurrences with String.prototype.substring()
As outlined in https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/substr, which refers to the ECMA-262 specification, using the `substr` function is advised against.

Hence this PR, which replaces all remaining `substr` occurrences with `substring` instead. Please refer to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/substr#Syntax respectively https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/substring#Syntax for the differences between the two functions.

Note that in most cases in the code-base there's only one argument passed to `substr`, and those require no other changes except replacing "substr" with "substring". For the other cases, the `substr(start, length)` calls are changed to `substring(start, start + length)` instead.
2018-09-28 11:41:07 +02:00
Brendan Dahl
ae7dcae27e Fix abbreviation. 2018-09-13 13:10:38 -07:00
Brendan Dahl
6adeabbb66 Add Glyph & Cog's XPDF copyright/license information. 2018-09-12 13:59:56 -07:00
Jonas Jenwald
5181172498 Slightly improve the isSourcePDF parameter handling in JpegImage (PR 10031 follow-up)
Currently there's only a single spot in the code-base where `JpegImage.getData` is called, however it nonetheless seem like a good idea to ensure during tests that the `isSourcePDF` parameter is correctly set. (Especially considering that the PDF use-cases will break without it.)

Additionally, in `JpegImage._getLinearizedBlockData`, the code can be made a tiny bit more efficient by checking the value of `isSourcePDF` *first* to avoid useless checks (for the default PDF use-cases).
2018-09-12 11:30:59 +02:00
Tim van der Meij
bf13c8a50b
Use the const keyword for constants in src/shared/util.js
Moreover, move general constants to the top of the file, i.e., those
that are not closely tied to a function in the file.
2018-09-11 16:17:45 +02:00
Tim van der Meij
99de25d6cc
Implement unit tests for the isSameOrigin and createValidAbsoluteUrl utility functions
Moreover, mark the `isValidProtocol` function as private since it's only
used in the utilities file and is not (meant to be) exported.
2018-09-11 16:17:45 +02:00
Tim van der Meij
9a115b41de
Merge pull request #10034 from timvandermeij/canvas-workaround
Remove `getSinglePixelWidth` workaround
2018-09-09 17:36:04 +02:00
Tim van der Meij
66422eb83e
Merge pull request #9340 from brendandahl/private-use
Map all glyphs to the private use area and duplicate the first glyph.
2018-09-08 17:51:04 +02:00
Romain Petit
8671081001
Fix font-string variable name typo
The font-string rebuild condition is always satisfied because the concerned variables are never set.
2018-09-07 09:55:45 +02:00
Brendan Dahl
b76cf665ec Map all glyphs to the private use area and duplicate the first glyph.
There have been lots of problems with trying to map glyphs to their unicode
values. It's more reliable to just use the private use areas so the browser's
font renderer doesn't mess with the glyphs.

Using the private use area for all glyphs did highlight other issues that this
patch also had to fix:

  * small private use area - Previously, only the BMP private use area was used
    which can't map many glyphs. Now, the (much bigger) PUP 16 area can also be
    used.

  * glyph zero not shown - Browsers will not use the glyph from a font if it is
    glyph id = 0. This issue was less prevalent when we mapped to unicode values
    since the fallback font would be used. However, when using the private use
    area, the glyph would not be drawn at all. This is illustrated in one of the
    current test cases (issue #8234) where there's an "ä" glyph at position
    zero. The PDF looked like it rendered correctly, but it was actually not
    using the glyph from the font. To properly show the first glyph it is always
    duplicated and appended to the glyphs and the maps are adjusted.

  * supplementary characters - The private use area PUP 16 is 4 bytes, so
    String.fromCodePoint must be used where we previously used
    String.fromCharCode. This is actually an issue that should have been fixed
    regardless of this patch.

  * charset - Freetype fails to load fonts when the charset size doesn't match
    number of glyphs in the font. We now write out a fake charset with the
    correct length. This also brought up the issue that glyphs with seac/endchar
    should only ever write a standard charset, but we now write a custom one.
    To get around this the seac analysis is permanently enabled so those glyphs
    are instead always drawn as two glyphs.
2018-09-05 14:04:54 -07:00
Jonas Jenwald
e5a6d892b4
Revert "Attempt to combine separate beginText/endText sequences in getTextContent (issue 9984)" 2018-09-05 18:01:33 +02:00
Tim van der Meij
959ed3705b
Implement a permissions API 2018-09-02 21:23:09 +02:00
Tim van der Meij
4874e9ace0
Convert the WorkerTransport class, in src/display/api.js, to ES6 syntax 2018-09-02 21:06:57 +02:00
Tim van der Meij
9c37599fd3
Convert the PDFDocumentProxy class, in src/display/api.js, to ES6 syntax
Moreover, indicate that a member are private and improve the comments to
be more consistent.
2018-09-02 21:06:57 +02:00
Tim van der Meij
1a3e842dc4
Remove getSinglePixelWidth workaround
It's no longer necessary since https://bugzilla.mozilla.org/show_bug.cgi?id=1305963 is fixed quite some time ago.

While we're here, mark the `cachedGetSinglePixelWidth` member as being
private and use ES6 syntax in the `getSinglePixelWidth` method.
2018-09-02 20:36:06 +02:00
Jonas Jenwald
663922f93f Add a new parameter to JpegImage.getData to indicate the source of the image data (issue 9513)
The purpose of this patch is to provide a better default behaviour when `JpegImage` is used to parse standalone JPEG images with CMYK colour spaces.
Since the issue that the patch concerns is somewhat of a special-case, the implementation utilizes the already existing decode support in an attempt to minimize the impact w.r.t. code size.

*Please note:* It's always possible for the user of `JpegImage` to control image inversion, and thus override the new behaviour, by simply passing a custom `decodeTransform` array upon initialization.
2018-09-02 14:15:22 +02:00
Jonas Jenwald
47bf12cbac Change JpegImage._isColorConversionNeeded into a getter, rather than a regular function
Given how `_isColorConversionNeeded` is used, and that it always returns a boolean value, having it be a getter seems more appropriate.
2018-09-02 13:06:28 +02:00
Tim van der Meij
c94df0fef3
Merge pull request #9986 from Snuffleupagus/issue-9984
Attempt to combine separate beginText/endText sequences in `getTextContent` (issue 9984)
2018-09-01 21:21:29 +02:00
Tim van der Meij
66bd088948
Merge pull request #10010 from Snuffleupagus/issue-10004
Attempt to find truncated endstream commands, in the fallback code-path, in `Parser.makeStream` (issue 10004)
2018-09-01 18:44:08 +02:00
Tim van der Meij
283f2dfcc3
Merge pull request #10022 from janpe2/svg-Tr
Implement text rendering modes in SVG backend
2018-08-29 23:51:07 +02:00
Tim van der Meij
27ebb41b8f
Merge pull request #10020 from Snuffleupagus/addon-prefs-no-eslint
Ensure that the built `PdfJsDefaultPreferences.jsm` file won't be affected/touched during tree-wide ESLint rule changes in `mozilla-central` (PR 9571 follow-up)
2018-08-29 22:40:56 +02:00
Jonas Jenwald
d8aaa2f978 Update to the current year, i.e. 2018, in the bundle license headers 2018-08-28 23:46:56 +02:00
Jani Pehkonen
c426ea376c Implement text rendering modes in SVG backend 2018-08-29 00:42:07 +03:00
cheryly279
29c0ea159d Adding chunkname to async loaded code
Better name
2018-08-27 17:17:32 -04:00
Jonas Jenwald
95e5bad4c4 Attempt to find truncated endstream commands, in the fallback code-path, in Parser.makeStream (issue 10004)
Apparently there's some PDF generators, in this case the culprit is "Nooog Pdf Library / Nooog PStoPDF v1.5", that manage to mess up PDF creation enough that endstream[1] commands actually become truncated.

*Please note:* The solution implemented here isn't perfect, since it won't be able to cope with PDF files that contains a *mixture* of correct and truncated endstream commands.
However, considering that this particular mode of corruption *fortunately* doesn't seem very common[2], a slightly less complex solution ought to suffice for now.

Fixes 10004.

---
[1] Scanning through the PDF data to find endstream commands becomes necessary, in order to determine the stream length in cases where the `Length` entry of the (stream) dictionary is missing/incorrect.

[2] I cannot recall having seen any (previous) issues/bugs with "Missing endstream" errors.
2018-08-26 11:51:11 +02:00
Jonas Jenwald
c81cbe113c Extract the "scanning for endstream command" part of Parser.makeStream into a helper method
With this code now living in a separate method, it can be simplified slightly (e.g. by using early returns).
2018-08-26 11:51:09 +02:00
Tim van der Meij
436d2efa8a
Merge pull request #10007 from Snuffleupagus/ColorSpace-class
Convert the code in `src/core/colorspace.js to use ES6 classes
2018-08-25 18:45:40 +02:00
Tim van der Meij
4a0d15aa0e
Slightly simplify the catalog code 2018-08-25 16:40:59 +02:00
Tim van der Meij
aec236f6d8
Convert the Catalog class, in src/core/obj.js, to ES6 syntax 2018-08-25 16:38:22 +02:00
Jonas Jenwald
a182907592 Replace all occurences of var with let/const in src/core/colorspace.js 2018-08-25 03:20:21 +02:00
Jonas Jenwald
ce9a38c536 Convert the code in `src/core/colorspace.js to use ES6 classes
Reduces the amount of boilerplate code when defining the the sub-classes.

Please note that a couple of the closures were kept, since it's not (yet) possible to include helper functions inside of `class`es.
2018-08-25 03:20:19 +02:00
Jonas Jenwald
45b7b861b8 Remove the unused defaultColor property on ColorSpace instances
This property is not only completely unused now, it never actually appears to have been used. Even though the memory savings, from not initializing these extra typed arrays, won't be significant in the grand scheme of things it still seems completely unnecessary to keep allocating this data.

As far as I can tell, the main reason for the existence of `defaultColor` seem to be for documentation purposes. Hence the code is changed into comments instead, to keep the information around (but without the unnecessary allocations).
2018-08-23 11:16:52 +02:00
Jonas Jenwald
099ed08852 Add support for async/await using Babel
For proof-of-concept, this patch converts a couple of `Promise` returning methods to use `async` instead.
Please note that the `generic` build, based on this patch, has been successfully testing in IE11 (i.e. the viewer loads and nothing is obviously broken).

Being able to use modern JavaScript features like `async`/`await` is a huge plus, but there's one (obvious) side-effect: The size of the built files will increase slightly (unless `SKIP_BABEL == true`). That's unavoidable, but seems like a small price to pay in the grand scheme of things.

Finally, note that the `chromium` build target was changed to no longer skip Babel translation, since the Chrome extension still supports version `49` of the browser (where native `async` support isn't available).
2018-08-19 16:54:11 +02:00
Tim van der Meij
4ea663aa8a
Merge pull request #9987 from Snuffleupagus/rm-createBlob
[api-minor] Remove the obsolete `createBlob` helper function
2018-08-19 16:43:36 +02:00
Jonas Jenwald
75923ea515 Remove the unused PDFDocument.mainXRefEntriesOffset method
Not only is this method completely unused *now*, looking through the history of the code it never appears to have been used for anything either.
Years ago `mainXRefEntriesOffset` was included when creating `XRef` instances, however it wasn't actually used for anything (the parameter was never checked, nor assigned to a property on `XRef`).

If this method ever becomes useful (again) it's easy enough to restore it thanks to version control, but including dead code in the builds just seems wasteful.
2018-08-19 14:08:39 +02:00
Jonas Jenwald
50a47be190 [api-minor] Remove the obsolete createBlob helper function
At this point in time, all supported browsers have native support for `Blob`; please see https://developer.mozilla.org/en-US/docs/Web/API/Blob/Blob#Browser_compatibility.
Furthermore, note how the helper function was throwing an error if `Blob` isn't available anyway.
2018-08-19 13:37:19 +02:00
Jonas Jenwald
497b765ede Attempt to combine separate beginText/endText sequences in getTextContent (issue 9984)
Please note that while this *improves* issue 9984 slightly (and likely others too), it's not a complete solution.
The remaining issues are related to the, more general, problems with the existing heuristics related to attempting to combine separate text items.
2018-08-18 13:45:32 +02:00
Jonas Jenwald
bc89edb8f0 Ensure that Uint8ClampedArray is used for image data transfered by getTransfers (PR 9802 follow-up)
One of the `QueueOptimizer` cases wasn't updated to use `Uint8ClampedArray`s, which leads to inconsistent image data on the API side (but no actual rendering bugs, as far as I can tell).
To prevent future errors, a non-production/test-only `assert` was added to ensure that the relevant image data only uses `Uint8ClampedArray`s.
2018-08-16 10:29:44 +02:00
Tim van der Meij
1268aea2b6
Merge pull request #9975 from Snuffleupagus/getDestination-refactor
Re-factor `destinations`/`getDestination` to reduce unnecessary duplication, and reject non-string inputs
2018-08-12 15:51:58 +02:00
Tim van der Meij
af19ed6ee9
Merge pull request #9822 from timvandermeij/annotations
[api-minor] Refactor the annotation code to be asynchronous
2018-08-11 20:39:50 +02:00
dmitryskey
3741becb9b
[api-minor] Refactor the annotation code to be asynchronous
This commit is the first step towards implementing parsing for the
appearance streams of annotations.

Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
Co-authored-by: Tim van der Meij <timvandermeij@gmail.com>
2018-08-11 19:00:29 +02:00
Jonas Jenwald
1179584fd6 Reject getDestination, in the API, for non-string inputs
Note how e.g. the `getPage` method does basic validation of the input.
2018-08-11 16:06:35 +02:00
Jonas Jenwald
b74c813353 Re-factor destinations/getDestination, in the Catalog, to reduce unnecessary duplication
Currently, these two methods contain the same boilerplate code for getting the /Dests data.
2018-08-11 16:04:58 +02:00
Jonas Jenwald
06d1ff5af4 Tweak the MMType1 font detection in getFontFileType to improve font telemetry (PR 9961 follow-up)
Please note that this patch does *not* affect rendering in any way, however it's relevant for font telemetry[1].

According to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1904956, Type1C is a valid subtype for *both* Type1 and MMType1 fonts.

---
[1] Refer to the font telemetry results in https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2018-06-25&keys=__none__!__none__!__none__&max_channel_version=nightly%252F62&measure=PDF_VIEWER_FONT_TYPES&min_channel_version=nightly%252F59&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2018-05-07&table=0&trim=1&use_submission_date=0

See also https://github.com/mozilla/pdf.js/wiki/Enumeration-Assignments-for-the-Telemetry-Histograms#pdf_viewer_font_types for help with interpreting the data.
2018-08-08 12:18:37 +02:00
Jonas Jenwald
f78efd883e Attempt to throw MissingPDFException when applicable in node_stream.js (issue 9791) 2018-08-06 10:00:03 +02:00
Tim van der Meij
4111871ac5
Merge pull request #9958 from brendandahl/always-fallback
Always fallback to system font on font failure.
2018-08-05 19:58:48 +02:00
Jonas Jenwald
3177f6aa55 Parse the font file to determine the correct type/subtype, rather than relying on the (often incorrect) data in the font dictionary
The current font type/subtype detection code is quite inconsistent/unwieldy. In some cases it will simply assume that the font dictionary is correct, in others it will somewhat "arbitrarily" check the actual font file (more of these cases have been added over the years to fix specific bugs).

As is evident from e.g. issue 9949, the font type/subtype detection code is continuing to cause issues. In an attempt to get rid of these hacks once and for all, this patch instead re-factors the type/subtype detection to *always* parse the font file.

Please note that, as far as I can tell, we still appear to need to rely on the composite font detection based on the font dictionary. However, even if the composite/non-composite detection would get it wrong, that shouldn't really matter too much given that there's basically only two different code-paths (for "TrueType-like" vs "Type1-like" fonts).
2018-08-05 11:13:16 +02:00
Jonas Jenwald
9bbca04579 Add a (basic) isCFFFile helper function to detect CFF font files
Compared to most other font formats, the CFF doesn't have a constant header which makes is slightly more difficult to detect such font files.

Please refer to the Compact Font Format specification: https://www.adobe.com/content/dam/acom/en/devnet/font/pdfs/5176.CFF.pdf#G3.32094
2018-08-05 11:13:14 +02:00
Jonas Jenwald
f4db38aadf Update the TrueType font file detection to also recognize the Mac specific header 'true'
Please refer to the TrueType specification: https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6.html#ScalerTypeNote
2018-08-05 10:33:56 +02:00
Brendan Dahl
5f67a6a237 Always fallback to system font on font failure.
The font in the PDF is marked as a CIDFontType0, but the font file is
actually a true type font. To fully address this issue we should really
peek into the font file and try to determine what it is. However, this
is the first case of this issue, so I think this solution is acceptable for
now.
2018-08-03 16:49:22 -07:00
Tim van der Meij
f19ee127a3
Merge pull request #9874 from boundlesshq/master
[api-minor] Include export value for checkboxes
2018-08-03 23:43:23 +02:00
Jonas Jenwald
a504befc76 Stop warning for non-Name /Filter entries in the PDFImage constructor (PR 9897 follow-up)
Fixes a stupid oversight on my part, since /Filter may (obviously) contain an Array, which resulted in unnecessary console warning spam in perfectly valid PDF files.
Note that it still makes sense to check that /Filter is actually a Name, before attempting to access its `name` property, but the warning should definitely be removed.
2018-08-03 10:23:08 +02:00
Brian
2a665ebad4 Removed Extraneous Matrix Check in CalRGB Conversion 2018-08-02 10:16:42 -07:00
Tim van der Meij
716acf63d4
Merge pull request #9938 from Snuffleupagus/issue-9915
Ensure that Type0, i.e. composite, OpenType fonts with `CFF ` tables are *not* treated as CFF fonts if their glyph mapping is non-default (issue 9915)
2018-08-02 00:11:18 +02:00
Jonas Jenwald
3ce420131f Prefer the Width/Height of the image data, rather than the image dictionary, for JPEG 2000 images (issue 9650)
According to the PDF specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=45
> When using the JPXDecode filter with image XObjects, the following changes to and constraints on some entries in the image dictionary shall apply (see 8.9.5, "Image Dictionaries" for details on these entries):
>
>  - Width and Height shall match the corresponding width and height values in the JPEG2000 data.
>
>  - . . .

Hence it seems reasonable to use the Width/Height of the image data *itself*, rather than the image dictionary when there's a mismatch. Given that JPEG 2000 images are already being parsed, in order to obtain basic parameters, the actual Width/Height is readily available in the `PDFImage` constructor.
2018-08-01 16:42:26 +02:00
Jonas Jenwald
17f65908ae Add more validation of the /Filter entry, in image dictionaries, to the PDFImage constructor
Given that the code is currently assuming that the /Filter entry is a `Name`, it cannot hurt to actually ensure that's the case.

Also fixes an error message, for JPEG 2000 images with unsupported ColorSpaces, since `this.numComps` hasn't been initialized when it's accessed during the `throw new Error()` invocation.
2018-08-01 16:41:15 +02:00
Jonas Jenwald
17eac2d48a Ensure that Type0, i.e. composite, OpenType fonts with CFF tables are *not* treated as CFF fonts if their glyph mapping is non-default (issue 9915)
This particular code-path has been the source of *numerous* regressions to date, so hopefully this patch won't cause any more of those.

Fixes 9915.
2018-07-29 23:06:15 +02:00
Jonas Jenwald
cfdb597e4a Ensure that the CIDSystemInfo strings, in Type0 fonts, are correctly decoded
This isn't directly related to the subsequent patch, but just something that I happened to notice while poking around in the font code.
2018-07-29 23:06:15 +02:00
Tim van der Meij
3521424576
Merge pull request #9920 from Snuffleupagus/getMetadata-linearization
[api-minor] Add an `IsLinearized` property to the `PDFDocument.documentInfo` getter, to allow accessing the linearization status through the API (via `PDFDocumentProxy.getMetadata`)
2018-07-29 20:23:22 +02:00
Tim van der Meij
f45450bd78
Merge pull request #9931 from Snuffleupagus/refactor-getPage
Refactor `getPage` (in the worker), and attempt to use the `Linearization` dictionary to lookup the first Page
2018-07-29 19:33:46 +02:00
Tim van der Meij
a2c317f12b
Merge pull request #9925 from Snuffleupagus/StreamsSequenceStream-maybeLength
Attempt to estimate the minimum required `buffer` length when initializing `StreamsSequenceStream` instances
2018-07-29 16:52:34 +02:00
Jonas Jenwald
ec3728b540 Use the Linearization dictionary, if it exists, when fetching the first Page
Since PDF.js already supports range requests and streaming, not to mention chunked rendering, attempting to use the `Linearization` dictionary in `PDFDocument.getPage` probably isn't going to improve performance in any noticeable way.
Nonetheless, when `Linearization` data is available, it will allow looking up the first Page *directly* without having to descend into the `Pages` tree to find the correct object.
2018-07-28 22:23:36 +02:00
Jonas Jenwald
fbb25ff4e2 Move getPage, on the worker side, from Catalog and into PDFDocument instead
Addresses an existing TODO, and avoids having to pass in a `pageFactory` when creating `Catalog` instances.
2018-07-28 22:23:36 +02:00
Jonas Jenwald
81b471c781 [Regression] Convert Catalog.builtInCMapCache into a Map, instead of an Object, to ensure that it's correctly reset (PR 8064 follow-up)
With the `builtInCMapCache` being a simple Object, it unfortunately means that the `Catalog.cleanup` method isn't resetting it as intended.
By just replacing the `builtInCMapCache` with an empty Object, existing references to it will not actually be updated. The result is that e.g. `Page` instances still keeps references to, what should have been removed, CMap data.

To fix these problems, the `builtInCMapCache` is converted into a `Map` instead (since it can be easily reset).
2018-07-28 22:20:43 +02:00
bion
c31ddf7edc [api-minor] Include export value for checkboxes 2018-07-28 00:30:41 -07:00
Jonas Jenwald
928b89382e [api-minor] Add an IsLinearized property to the PDFDocument.documentInfo getter, to allow accessing the linearization status through the API (via PDFDocumentProxy.getMetadata)
There was a (somewhat) recent question on IRC about accessing the linearization status of a PDF document, and this patch contains a simple way to expose that through already existing API methods.
Please note that during setup/parsing in `PDFDocument` the linearization data is already being fetched and parsed, provided of course that it exists. Hence this patch will *not* cause any additional data to be loaded.
2018-07-26 15:54:19 +02:00
Jonas Jenwald
8a4466139b Simplify the DocumentInfoValidators definition
With this file now being a proper (ES6) module, it's no longer (technically) necessary for this structure to be lazily initialized. Considering its size, and simplicity, I therefore cannot see the harm in letting `DocumentInfoValidators` just be simple Object instead.

While I'm not aware of any bugs caused by the current code, it cannot hurt to add an `isDict` check in `PDFDocument.documentInfo` (since the current code assumes that `infoDict` being defined implies it also being a Dictionary).

Finally, the patch also converts a couple of `var` to `let`/`const`.
2018-07-26 15:54:01 +02:00
Jonas Jenwald
2d51bce941 Remove unnecessary stream.length check from PDFDocument.linearization
Note first of all that `PDFDocument` will be initialized with either a `Stream` or a `ChunkedStream`, and that both of these have `length` getters. Secondly, the `PDFDocument` constructor will assert that the `stream` has a non-zero (and positive) length. Hence there's no point in checking `stream.length` in the `linearization` getter.
2018-07-26 15:54:01 +02:00
Jonas Jenwald
32bfa55d98 Attempt to estimate the minimum required buffer length when initializing StreamsSequenceStream instances
For most other `DecodeStream` based streams, we'll attempt to estimate the minimum `buffer` length based on the raw stream data. The purpose of this is to avoid having to unnecessarily re-size the `buffer`, thus reducing the number of *intermediate* allocations necessary when decoding the stream data.
However, currently no such optimization is attempted for `StreamsSequenceStream`, and given that they can often be quite large that seems unfortunate. To improve this, at least somewhat, this patch utilizes the raw sizes of the `StreamsSequenceStream` sub-streams to estimate the minimum required `buffer` length.

Most likely this patch won't have a huge effect on memory consumption, however for pathological cases it should help reduce peak memory usage slightly.
One example is the PDF file in issue 2813, where currently the `StreamsSequenceStream` instances would grow their `buffer`s as `2 MiB -> 4 MiB -> 8 MiB -> 16 MiB -> 32 MiB`. With this patch, the same stream `buffers`s grow as `8 MiB -> 16 MiB -> 32 MiB`, thus avoiding a total of `12 MiB` of *intermediate* allocations (since there's two `StreamsSequenceStream` used, for rendering/text-extraction).
2018-07-26 13:42:59 +02:00
Jonas Jenwald
36b683ca55 Provide custom messages for the no-restricted-globals ESLint rule, and refactor the .eslintrc files (PR 9868 follow-up)
Without providing useful (custom) error messages for the `no-restricted-globals` rule, see https://eslint.org/docs/rules/no-restricted-globals, it's quite likely that the rule will be incorrectly disabled rather than the required globals being imported as intended.

To reduced duplication of the `no-restricted-globals` rule in multiple `.eslintrc` files, it's instead moved to the top-level `.eslintrc` file and disabled as needed on a folder/file basis outside of `/src` and `/web`.
2018-07-23 14:10:13 +02:00
Jonas Jenwald
8ec99b200c Prevent Metadata/XML parsing from breaking PDFDocumentProxy.getMetadata when no XML root document is found (issue 8884)
With the new XML parser, see PR 9573, the referenced PDF file now causes `getMetadata` to fail when incomplete XML tags are encountered. This provides a simple, and hopefully generally useful, work-around that may also help prevent future bugs.

(Without being able to reproduce nor even understand the other (non XML) errors mentioned in issue 8884, I'd say that this patch is enough to close that one as fixed.)
2018-07-18 11:37:40 +02:00
Tim van der Meij
61db85ab64
Merge pull request #9886 from Snuffleupagus/bug-1473809
Prevent errors in `sanitizeTTProgram`, during parsing of CALL functions, when encountering invalid functions stack deltas (bug 1473809)
2018-07-15 17:23:52 +02:00
Jonas Jenwald
8e76d26e5b Move the toRoman helper function out of the Util scope
Compared to all the other (static) methods in `Util`, the `toRoman` one looks slightly out of place. Even more so considering that `Util` is being exposed through `pdfjsLib`, where access to a Roman numerals conversion method doesn't make much sense.
2018-07-10 10:45:25 +02:00
Jonas Jenwald
c1c49badff Remove the, now unused, Util.inherit helper function 2018-07-10 10:29:47 +02:00
Jonas Jenwald
2b25deb84c Prevent errors in sanitizeTTProgram, during parsing of CALL functions, when encountering invalid functions stack deltas (bug 1473809)
*I was feeling bored; so this is a very quick, and somewhat naive, attempt at fixing the bug.*

The breaking error, i.e. `Error during font loading: invalid array length`, was thrown when attempting to re-size the `stack` to a *negative* length when parsing the CALL functions.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1473809.
2018-07-10 09:45:55 +02:00
Jonas Jenwald
bf6d45f85a Convert CMap and IdentityCMap to ES6 classes
Also changes `var` to `let`/`const` in code already touched in the patch.
2018-07-09 21:12:01 +02:00
Jonas Jenwald
b773b356af Convert NameOrNumberTree, NameTree, and NumberTree to ES6 classes
Also changes `var` to `let`/`const` in code already touched in the patch.
2018-07-09 21:12:01 +02:00
Jonas Jenwald
ba1af46709 Convert CompiledFont, TrueTypeCompiled, and Type2Compiled to ES6 classes
Also changes `var` to `let`/`const` in code already touched in the patch.
2018-07-09 21:12:01 +02:00
Jonas Jenwald
775763a091 Ensure that CompiledFont.compileGlyph always returns an Array (PR 6141 follow-up)
PR 6141 changed `CompiledFont.compileGlyph` to, in the general case, return an Array. However, that PR apparenly forgot to update the no-glyph, empty-glyph, and endchar-glyph code-path and a String was still being (incorrectly) returned.

Given the way that `FontFaceObject.getPathGenerator` (on the API side) is implemented, this shouldn't have caused any bugs despite the Worker possible returning unexpected data.
2018-07-09 21:12:01 +02:00
Tim van der Meij
646d81cd09
Merge pull request #9837 from timvandermeij/unreachable
Replace `NotImplementedException` with `unreachable`
2018-07-09 21:10:36 +02:00
Tim van der Meij
907c7f190b
Convert src/code/pdf_manager.js to ES6 classes/syntax 2018-07-08 16:43:46 +02:00
Jonas Jenwald
a9ce4e8417 Stop exposing the URL polyfill in the global scope
This moves/exposes the `URL` polyfill similarily to the existing `ReadableStream` polyfill, rather than exposing it globally, to avoid interfering with any "outside" code.
Both the `URL` and `ReadableStream` polyfills are now exposed on the `pdfjsLib` object, such that they are accessible to the viewer components.
Furthermore, the `no-restricted-globals` ESLint rule is also enabled to prevent accidental usage of the native `URL`/`ReadableStream` implementations directly in the `src/` and `web/` folders; see also https://eslint.org/docs/rules/no-restricted-globals

Addresses the remaining TODO in https://github.com/mozilla/pdf.js/projects/6
2018-07-04 09:16:28 +02:00
Tim van der Meij
99f8f2c275
Merge pull request #9853 from Snuffleupagus/re-render-after-cancel
Fix re-rendering, using the same canvas, when rendering was previously cancelled (PR 8519 follow-up)
2018-06-29 23:25:43 +02:00
Tim van der Meij
6fa2c779b5
Merge pull request #9838 from Snuffleupagus/invalid-path-OPS
Error, rather than warn, once a number of invalid path operators are encountered in `EvaluatorPreprocessor.read` (bug 1443140)
2018-06-28 23:15:25 +02:00
Jonas Jenwald
bf0aca86d7 Fix re-rendering, using the same canvas, when rendering was previously cancelled (PR 8519 follow-up)
Currently if `RenderTask.cancel` is called *immediately* after rendering was started, then by the time that `InternalRenderTask.initializeGraphics` is called rendering will already have been cancelled.
However, we're still inserting the canvas into the `canvasInRendering` map, thus breaking any future attempts at re-rendering using the same canvas. Considering that `InternalRenderTask.cancel` always removes the canvas from the map, I cannot imagine that we'd ever want to re-add it *after* rendering was cancelled (it was likely just a simple oversight in PR 8519).

Fixes 9456.
2018-06-28 22:56:37 +02:00
Tim van der Meij
14b69a4c1c
Merge pull request #9729 from Snuffleupagus/gulp-image_decoders
Add a `gulp image_decoders` command to package the image decoders (i.e. jpg.js, jpx.js, jbig2.js) separately, and publish them in pdfjs-dist
2018-06-26 23:27:32 +02:00
Jonas Jenwald
74e9999044 Add unit-tests for PDFPageProxy.stats (PR 9245 follow-up)
This wasn't included in PR 9245, since all the API options were still global at that time.

Writing the unit-tests also uncovered an issue with `getOperatorList` not starting the "Page Request" timer.
2018-06-25 14:20:49 +02:00
Jonas Jenwald
7f21e38787 Error, rather than warn, once a number of invalid path operators are encountered in EvaluatorPreprocessor.read (bug 1443140)
Incomplete path operators, in particular, can result in fairly chaotic rendering artifacts, as can be observed on page four of the referenced PDF file.

The initial (naive) solution that was attempted, was to simply throw a `FormatError` as soon as any invalid (i.e. too short) operator was found and rely on the existing `ignoreErrors` code-paths. However, doing so would have caused regressions in some files; see the existing `issue2391-1` test-case, which was promoted to an `eq` test to help prevent future bugs.
Hence this patch, which adds special handling for invalid path operators since those may cause quite bad rendering artifacts.

You could, in all fairness, argue that the patch is a handwavy solution and I wouldn't object. However, given that this only concerns *corrupt* PDF files, the way that PDF viewers (PDF.js included) try to gracefully deal with those could probably be described as a best-effort solution anyway.

This patch also adjusts the existing `warn`/`info` messages to print the command name according to the PDF specification, rather than an internal PDF.js enumeration value. The former should be much more useful for debugging purposes.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1443140.
2018-06-24 16:05:08 +02:00
Tim van der Meij
2907827d31
Replace NotImplementedException with unreachable 2018-06-23 21:20:53 +02:00
Jonas Jenwald
275834ae66 Clean-up, and add JSDocs to, the PDFDocumentProxy.loadingParams method (PR 9830 follow-up) 2018-06-23 13:33:22 +02:00
Tim van der Meij
34594a5b02
Merge pull request #9830 from EugeneSqr/9824
Removed safari compatibility check (issue #9824)
2018-06-23 02:21:06 +02:00
Tim van der Meij
98ea39f9d0
Merge pull request #9827 from Snuffleupagus/misc-corrupt-pdf-fixes
Fix various corrupt PDF files (issue 9252, issue 9418)
2018-06-21 22:35:00 +02:00
eugenesqr
331ac8ae74 removed safari compatibility check 2018-06-21 12:57:56 +03:00
Brendan Dahl
a278c5a8dc
Merge pull request #9795 from timvandermeij/object-assign
Replace `Util.extendObj` by `Object.assign`
2018-06-20 10:50:40 -07:00
Jonas Jenwald
56e3648b65 Add basic validation of the 'trailer' dictionary candidates in XRef.indexObjects (issue 9418)
This patch avoids choosing a (possible) 'trailer' dictionary that `XRef.parse` and/or the `Catalog` constructor/methods will reject anyway.
Since `XRef.indexObjects` is already parsing the entire PDF file, the extra dictionary look-ups added here shouldn't matter much. Besides, this is a fallback code-path that only applies to corrupt PDF files anyway.
2018-06-20 13:41:22 +02:00
Jonas Jenwald
346810e02a Add basic validation of the 'Root' dictionary in XRef.parse and try to recover when possible
Note that the `Catalog` constructor, and some of its methods, are already enforcing that the 'Root' dictionary is valid/well-formed. However, by doing additional validation already in `XRef.parse` there's a slightly larger chance that corrupt PDF files could be successfully parsed/rendered.
2018-06-20 13:41:22 +02:00
Jonas Jenwald
e84813e7cc Prevent hard errors if fetching the Encrypt dictionary fails in XRef.parse 2018-06-20 13:41:22 +02:00
Jonas Jenwald
30ad62a86a Use the correct startPos when repeating the search for 'endobj' operators in XRef.indexObjects (PR 9288 follow-up) 2018-06-20 13:41:22 +02:00
Jonas Jenwald
6bbcafcd26 Let Lexer.getNumber treat a single decimal point as zero (issue 9252)
This is consistent with the behaviour in Adobe Reader.
2018-06-20 13:41:21 +02:00
Jonas Jenwald
df4799a12a Ensure that line-breaks are *only* skipped after operators in Lexer.getNumber (PR 8359 follow-up)
With the current code line-breaks are accepted not just after an operator, but after a decimal point as well. When looking at this again, the latter case seems prone to cause false positives and might also interfere with subsequent patches.

Hence this is code is adjusted to actually do what the original commit message says, and nothing more.
2018-06-20 13:41:15 +02:00
Jonas Jenwald
303537bcb1 Add a gulp image_decoders command to allow packaging/distributing the image decoders (i.e. jpg.js, jpx.js, jbig2.js) separately from the main PDF.js library
Please note that the standalone `pdf.image_decoders.js` file will be including the complete `src/shared/util.js` file, despite only using parts of it.[1] This was done *purposely*, to not negatively impact the readability/maintainability of the core PDF.js code.

Furthermore, to ensure that the compatibility is the same in the regular PDF.js library *and* in the the standalone image decoders, `src/shared/compatibility.js` was included as well.

To (hopefully) prevent future complaints about the size of the built `pdf.image_decoders.js` file, a few existing async-related polyfills are being skipped (since all of the image decoders are completely synchronous).
Obviously this required adding a couple of pre-processor statements, but given that these are all limited to "compatibility" code, I think this might be OK!?

---
[1] However, please note that previous commits moved `PageViewport` and `MessageHandler` out of `src/shared/util.js` which reduced its size.
2018-06-16 17:56:54 +02:00
Jonas Jenwald
bfc88ead66 Expose a Jbig2Image.parse method, by re-instating the parseJbig2 function
The purpose of this patch is to hopefully provide *slightly* better user ergonomics, if/when the PDF.js image decoders are used standalone.

This implementation is (basically) reverting the changes in PR 9386, in conjunction with code from the `parse` method found at https://github.com/notmasteryet/jpgjs/blob/master/src/pdfjs.js
2018-06-16 17:56:54 +02:00
Jonas Jenwald
682672db8e Change the signature of the JpegImage constructor, to allow passing in various options directly 2018-06-16 17:56:54 +02:00
Tim van der Meij
620da6f4df
Merge pull request #9802 from Snuffleupagus/ColorSpace-PDFImage-Uint8ClampedArray
Update `ColorSpace` and `PDFImage` to use `Uint8ClampedArray`s and remove manual clamping/rounding
2018-06-16 17:55:10 +02:00
Jonas Jenwald
0958006713 Send UnsupportedFeature notification when errors are ignored in FontFaceObject.getPathGenerator 2018-06-13 11:02:10 +02:00
Jonas Jenwald
bf0db0fb72 Pass the ignoreErrors API option to the FontFaceObject constructor, and utilize it in getPathGenerator to ignore missing glyphs
Obviously it's still not possible to render non-embedded fonts as paths, but in this way the rest of the page will at least be allowed to continue rendering.

*Please note:* Including the 14 standard fonts in PDF.js probably wouldn't be *that* difficult to implement. (I'm not a lawyer, but the fonts from PDFium could probably be used given their BSD license.)
However, the main blocker ought to be the total size of the necessary font data, since I cannot imagine people being OK with shipping ~5 MB of (additional) font data with Firefox. (Based on the reactions when the CMap files were added, and those are only ~1 MB in size.)
2018-06-13 11:02:06 +02:00
Jonas Jenwald
fe288bb872 Refactor the FontFaceObject.getPathGenerator method
- Reduce the overall indentation level, by making use of early returns.

 - Replace `var` with `let`.
2018-06-13 11:02:02 +02:00
Jonas Jenwald
778981ec89 Catch, and propagate, errors in the requestAnimationFrame branch of InternalRenderTask._scheduleNext
To support these changes, `InternalRenderTask._next` now returns a Promise.
2018-06-13 11:01:58 +02:00
Jonas Jenwald
d4ff541b78 Enforce the use, in non-production/test-only mode, of Uint8ClampedArray in all relevant methods in ColorSpace and PDFImage
Since `ColorSpace` now depends on the native clamping of `Uint8ClampedArray`, this patch adds non-production/test-only `assert`s to enforce that the expected TypedArray is used for the output.

These `assert`s are purposely *not* included in PRODUCTION builds since that would break rendering completely, as opposed to "only" displaying some weird colours, when a `Uint8Array` was used. Furthermore, these are mostly added to help catch explicit developer errors when working with the `ColorSpace` and `PDFImage` code.
2018-06-12 11:01:32 +02:00
Jonas Jenwald
4b69bb7fe9 Add a TESTING build option, to enable using non-production/test-only code-paths
Since the tests (currently) run with the `pdf.worker.js` file built, i.e. with `PRODUCTION = true` set, there's no simple way to add e.g. `assert` calls for both non-production *and* test-only builds without also affecting PRODUCTION builds.
2018-06-12 11:01:32 +02:00
Jonas Jenwald
f01e54eae1 Improve the warning messages printed by `PartialEvaluator.{getOperatorList, getTextContent} when errors are being ignored
Currently the actual errors aren't printed, which can make debugging harder than necessary.
2018-06-12 11:01:32 +02:00
Jonas Jenwald
731f2e6dfc Remove manual clamping/rounding from ColorSpace and PDFImage, by having their methods use Uint8ClampedArrays
The built-in image decoders are already using `Uint8ClampedArray` when returning data, and this patch simply extends that to the rest of the image/colorspace code.

As far as I can tell, the only reason for using manual clamping/rounding in the first place was because TypedArrays used to be polyfilled (using regular arrays). And trying to polyfill the native clamping/rounding would probably have been had too much overhead, but given that TypedArray support is required in PDF.js version `2.0` that's no longer a concern.

*Please note:* Because of different rounding behaviour, basically `Math.round` in `Uint8ClampedArray` respectively `Math.floor` in the old code, there will be very slight movement in quite a few existing test-cases. However, the changes should be imperceivable to the naked eye, given that the absolute difference is *at most* `1` for each RGB component when comparing `master` and this patch (see also the updated expectation values in the unit-tests).
2018-06-12 11:01:32 +02:00
Jonas Jenwald
55199aa281 Remove the unused bpc parameter from, and update the signature of, the resizeRgbImage function in src/core/colorspace.js 2018-06-12 11:01:32 +02:00
Jonas Jenwald
d1637056b3 Use shorthand method signatures in src/core/colorspace.js 2018-06-12 11:01:32 +02:00
Jonas Jenwald
32367c5968 Make the getBytes/peekBytes methods of Stream/DecodeStream/ChunkedStream able to return Uint8ClampedArrays
The built-in image decoders are already returning data as `Uint8ClampedArray`, and subsequently the JPEG/JBIG2/JPX streams are as well. However, for general streams we obviously don't want to force the use of `Uint8ClampedArray` unless an "Image" is actually being decoded.
Hence this patch, which adds a parameter that allows the caller of the `getBytes`/`peekBytes` methods to force a `Uint8ClampedArray` (rather than a `Uint8Array`) to be returned.
2018-06-12 11:01:32 +02:00
Brendan Dahl
3ac638fad3
Merge pull request #9689 from RafaPolit/master
Fixed critical unhandled promise that prevented error catching using API
2018-06-11 15:40:30 -06:00
Tim van der Meij
af8e88d00b
Replace Util.extendObj by Object.assign 2018-06-10 20:11:03 +02:00
Tim van der Meij
903bad1906
Remove Util.appendToArray and Util.prependToArray
The former may be replaced by regular JavaScript array concatenation and
the latter is unused. This avoids unnecessary function calls/imports.
2018-06-10 15:24:09 +02:00
Jonas Jenwald
07d610615c Move, and modernize, Util.loadScript from src/shared/util.js to src/display/dom_utils.js
Not only is the `Util.loadScript` helper function unused on the Worker side, even trying to use it there would throw an Error (since `document` isn't defined/available in Workers).
Hence this helper function is moved, and its code modernized slightly by having it return a Promise rather than needing a callback function.

Finally, to reduced code duplication, the "new" loadScript function is exported and used in the viewer.
2018-06-07 13:52:40 +02:00
Jonas Jenwald
547f119be6 Simplify the error handling slightly in the src/display/node_stream.js file
The various classes have `this._errored` and `this._reason` properties, where the first one is a boolean indicating if an error was encountered and the second one contains the actual `Error` (or `null` initially).

In practice this means that errors are basically tracked *twice*, rather than just once. This kind of double-bookkeeping is generally a bad idea, since it's quite easy for the properties to (accidentally) get into an inconsistent state whenever the relevant code is modified.

Rather than using a separate boolean, we can just as well check the "error" property directly (since `null` is falsy).

---

Somewhat unrelated to this patch, but `src/display/node_stream.js` is currently *not* handling errors in a consistent or even correct way; compared with `src/display/network.js` and `src/display/fetch_stream.js`.
Obviously using the `createResponseStatusError` utility function, from `src/display/network_utils.js`, might not make much sense in a Node.js environment. However at the *very* least it seems that `MissingPDFException`, or `UnknownErrorException` when one cannot tell that the PDF file is "missing", should be manually thrown.

As is, the API (i.e. `getDocument`) is not returning the *expected* errors when loading fails in Node.js environments (as evident from the `pending` API unit-test).
2018-06-06 09:05:45 +02:00
Jonas Jenwald
2fdaa3d54c Update the postMessageTransfers comment in createDocumentHandler in the src/core/worker.js file
Since the old comment mentions a now unsupported browser, let's update it such that someone won't accidentally conclude that the code in question can be removed.
2018-06-06 08:52:43 +02:00
Jonas Jenwald
871bf5c68b Remove the, now obsolete, handling of the CMapReaderFactory parameter in getDocument
This special handling was added in PR 8567, but was made redundant in PR 8721 which stopped sending everything but the kitchen sink to the Worker side.
2018-06-06 08:52:43 +02:00
Jonas Jenwald
c8e2163bbc Remove incorrect/unnecessary validation of the verbosity parameter in the PDFWorker constructor (PR 9480 follow-up) 2018-06-06 08:52:43 +02:00
Jonas Jenwald
b263b702e8 Rename PDFPageProxy.pageInfo to PDFPageProxy._pageInfo to indicate that the property should be considered "private"
Since `PDFPageProxy` already provide getters for all the data returned by `GetPage` (in the Worker), there isn't any compelling reason for accessing the `pageInfo` directly on `PDFPageProxy`.

The patch also changes the `GetPage` handler, in `src/core/worker.js`, to use modern JavaScript features.
2018-06-06 08:52:42 +02:00
Jonas Jenwald
4f4b50e01e Rename PDFDocumentProxy.pdfInfo to PDFDocumentProxy._pdfInfo to indicate that the property should be considered "private"
Since `PDFDocumentProxy` already provide getters for all the data returned by `GetDoc` (in the Worker), there isn't any compelling reason for accessing the `pdfInfo` directly on `PDFDocumentProxy`.
2018-06-06 08:52:42 +02:00
Jonas Jenwald
e89afa5899 Stop sending the PDFManagerReady message from the Worker, since it's unused in the API
After PR 8617 the `PDFManagerReady` message handler function, in `src/display/api.js`, is now a no-op. Hence it seems completely unnecessary to keep sending this message from `src/core/worker.js`.
2018-06-06 08:52:42 +02:00
Jonas Jenwald
eef53347fe Ensure that the correct data is sent, with the test message, from the worker if typed arrays aren't properly supported
With native typed array support now being mandatory in PDF.js, since version 2.0, this probably isn't a huge problem even though the current code seems wrong (it was changed in PR 6571).

Note how in the `!(data instanceof Uint8Array)` case we're currently attempting to send `handler.send('test', 'main', false);` to the main-thread, which doesn't really make any sense since the signature of the method reads `send(actionName, data, transfers) {`.
Hence the data that's *actually* being sent here is `'main'`, with `false` as the transferList, which just seems weird. On the main-thread, this means that we're in this case checking `data && data.supportTypedArray`, where `data` contains the string `'main'` rather than being falsy. Since a string doesn't have a `supportTypedArray` property, that check still fails as expected but it doesn't seem great nonetheless.
2018-06-06 08:52:42 +02:00
Jonas Jenwald
dc6e1b4176 Use Uint8ClampedArray for the image data returned by JpegDecode, in src/display/api.js
Since all the built-in PDF.js image decoders now return their data as `Uint8ClampedArray`, for consistency `JpegDecode` on the main-thread should be doing the same thing; follow-up to PR 8778.
2018-06-06 08:52:41 +02:00
Jonas Jenwald
47a9d38280 Add more validation in PDFWorker.fromPort
The signature of the `PDFWorker.fromPort` method, in addition to the `PDFWorker` constructor, was changed in PR 9480.
Hence it's probably a good idea to add a bit more validation to `PDFWorker.fromPort`, to ensure that it won't fail silently for an API consumer that updates to version 2.0 of the PDF.js library.
2018-06-06 08:52:41 +02:00
Jonas Jenwald
3c5c8d2a0b Remove the typed array check when calling LoopbackPort in PDFWorker._setupFakeWorker
With version 2.0, native support for typed arrays is now a requirement for using the PDF.js library; see PR 9094 where the old polyfills were removed.
Hence the `isTypedArraysPresent` check, when setting up fake workers, no longer serves any purpose here and can thus be removed.
2018-06-06 08:52:33 +02:00
Jonas Jenwald
89caaf4071 Use LoopbackPort in the "message_handler" unit-tests
There's no good reason, as far as I can tell, to duplicate the functionality of the `LoopbackPort` in the unit-tests. The only difference between the implementations is that `LoopbackPort` mimics the (native) structured cloning, however that shouldn't matter here since the tests are only sending "simple" data (strings respectively arrays with numbers).

Furthermore the patch also changes `LoopbackPort` to default to using "structured cloning" and deferred invocation of the listeners, since native typed array support is now a requirement for using the PDF.js library.
2018-06-04 12:53:08 +02:00
Jonas Jenwald
44d8afd46b Move MessageHandler into a separate src/shared/message_handler.js file
The `MessageHandler` itself, and its assorted helper functions, are currently the single largest[1] piece of code in the `src/shared/util.js` file. By moving this code into its own file, `src/shared/util.js` thus becomes smaller and more manageable.
2018-06-04 12:53:08 +02:00
Jonas Jenwald
69f2a77543 Update the signature of the PageViewport constructor, and improve the JSDoc comments for the class
This changes the constructor to take a parameter object, rather than a string of parameters.
2018-06-04 12:53:07 +02:00
Jonas Jenwald
51673dbc5a Convert the PageViewport to a proper ES6 class
Also converts all `var` to `let` for good measure.
2018-06-04 12:53:07 +02:00
Jonas Jenwald
5917b21702 Remove completely unused fontScale property from PageViewport
The `fontScale` property was added in PR 1531, see commit b312719d7e in particular, apparently for the sole purpose of supporting the "acroforms" example.

However, the `fontScale` property was never used anywhere else in the code-base, and after the modernization of the "acroforms" example in PR 8030 it's been completely unused.

Finally, note that there's also a (more suitably named) `scale` property on `PageViewport` instances, which contains the exact same information as the property being removed here.
2018-06-04 12:53:07 +02:00
Jonas Jenwald
08c8f8733d Move PageViewport from src/shared/util.js to src/display/dom_utils.js
Since the `PageViewport` is not used in the worker, duplicating this code on both the main and worker sides seems completely unnecessary.
2018-06-04 12:53:07 +02:00
Tim van der Meij
8ce24744f2
Merge pull request #9769 from Snuffleupagus/node-unittest-rm-console-errors
Reduce the amount of errors logged, primarily in Node.js/Travis, when running the unit-tests
2018-06-03 19:50:42 +02:00
Rob Wu
0e4e79169b Fall back to ISO-8859-1 in content_disposition.js
Updates content_disposition.js to include
9b789d9b3b
2018-06-03 16:17:28 +02:00
Rob Wu
e992480baa Fix multibyte decoding in content_disposition.js
I made some mistakes when trying to make the content_disposition.js
compatible with non-modern browsers (IE/Edge).
Notably, text decoding was usually skipped because of the inverted
logical check at the top of `textdecode`.

I verified that this new version works as expected, as follows:

1. Visit 55c71eb44e/test/
   and get  test-content-disposition.js
   also get test-content-disposition.node.js if using Node.js,
     or get test-content-disposition.html if you use a browser.
2. Modify `test-content-disposition.node.js` (or the HTML file) and
   change `../extension/content-disposition.js` to `PDFJS-content_disposition.js`
3. Copy the `getFilenameFromContentDispositionHeader` function from
   `content_disposition.js` (i.e. the file without the trailing exports)
   and save it as `PDFJS-content_disposition.js`.
4. Run the tests (`node test-content-disposition.node.js` or by opening
   `test-content-disposition.html` in a browser).
5. Confirm that there are no failures: "Finished all tests (0 failures)"

The code has a best-efforts fallback for Microsoft Edge, which lacks the
TextDecoder API. The fallback only supports the common UTF-8 encoding.
To simulate this in a test, modify `PDFJS-content_disposition.js` and
deliberately throw an error before `new TextDecoder`. There will be two
failures because we don't want to include too much code to support text
decoding for non-UTF-8 encodings in Edge

```
test-content-disposition.js:265 Assertion failed: Input: attachment; filename*=ISO-8859-1''%c3%a4
Expected: "ä"
Actual  : "ä"
test-content-disposition.js:268 Assertion failed: Input: attachment; filename*=ISO-8859-1''%e2%82%ac
Expected: "€"
Actual  : "€"
```
2018-06-03 15:28:22 +02:00
Jonas Jenwald
ef081a0531 Ensure that the WorkerTransport._passwordCapability is always rejected, even when errors are thrown in PDFDocumentLoadingTask.onPassword callback
Please note that while the current code works, both in the viewer and the unit-tests, it can leave the `WorkerTransport._passwordCapability` Promise in a pending state.
In the `PasswordRequest` handler, in src/display/api.js, we're returning the Promise from a `capability` object (rather than just a "plain" Promise). While an error thrown anywhere within this handler was fortunately enough to propagate it to the Worker side, it won't cause the Promise (in `WorkerTransport._passwordCapability`) to actually be rejected.
Finally note that while we're now catching errors in the `PasswordRequest` handler, those errors are still propagated to the Worker side via the (now) rejected Promise and the existing `return this._passwordCapability.promise;` line.

This prevents warnings about uncaught Promises, with messages such as "Error: Worker was destroyed during onPassword callback", when running the unit-tests both in browsers *and* in Node.js/Travis.
2018-06-03 00:28:40 +02:00
Jonas Jenwald
0ecc22cb04 Attempt to provide better default values for the disableFontFace/nativeImageDecoderSupport API options in Node.js
This should provide a better out-of-the-box experience when using PDF.js in a Node.js environment, since it's missing native support for both `@font-face` and `Image`.
Please note that this change *only* affects the default values, hence it's still possible for an API consumer to override those values when calling `getDocument`.

Also, prevents "ReferenceError: document is not defined" errors, when running the unit-tests in Node.js/Travis.
2018-06-03 00:28:37 +02:00
Tim van der Meij
36af85db92
Merge pull request #9740 from pedrotp/replace-get-getArray
Use Dict.getArray, instead of Dict.get, when getting the 'Size' in constructSampled in src/core/function.js
2018-06-02 19:50:09 +02:00
pedrotp
a190d21dd7 Use Dict.getArray, instead of Dict.get, when getting the 'Size' in constructSampled in src/core/function.js (PR 7295 follow-up) 2018-06-02 11:16:05 -04:00
Jonas Jenwald
83ff7d9de9 Simplify the DNL (Define Number of Lines) marker warning in JpegImage.parse 2018-05-30 22:40:11 +02:00
Jonas Jenwald
620f65488b Ignore the rest of the image when encountering an EOI (End of Image) marker while parsing Scan data (issue 9679) 2018-05-30 22:40:11 +02:00
Tim van der Meij
c5d5d29b03
Merge pull request #9756 from Snuffleupagus/Type1Parser-rm-makeSubStream
Remove usage of `makeSubStream` from `Type1Parser.extractFontProgram` in src/core/type1_parser.js (issue 9735)
2018-05-28 23:34:57 +02:00
Jonas Jenwald
f68f60099e Remove usage of makeSubStream from Type1Parser.extractFontProgram in src/core/type1_parser.js (issue 9735)
This avoids the initialization of, potentially thousands of, unnecessary `Stream` objects, by getting the required number of bytes directly instead.
Given the special behaviour, when `length === 0`, of the `getBytes`/`skip` methods, it's also necessary to handle that particular case to prevent errors when encountering empty CharStrings.
2018-05-28 14:32:20 +02:00
Mukul Mishra
949c3e9417 Add abort functionality in fetch stream 2018-05-22 12:46:59 +05:30
RafaPolit
d63b17dbe3 Fixed critical unhandled promise that prevented error catching using API 2018-04-24 13:10:00 -05:00
Jani Pehkonen
fe2cf2f73f SVG clip intersections and operators 2018-04-17 19:20:29 +03:00
Brendan Dahl
2dc4af525d
Merge pull request #9659 from yurydelendik/rm-createFromIR
Remove createFromIR from PDFFunctionFactory
2018-04-12 14:22:43 -07:00
Yury Delendik
20085aaa5e Remove createFromIR from PDFFunctionFactory; forgive invalid Dict values. 2018-04-10 18:49:31 -05:00
Jani Pehkonen
8ea505545a Use FDSelect and FDArray when converting CFF CID font to paths 2018-04-10 16:44:42 +03:00
Brendan Dahl
e8cf7fd512
Merge pull request #9624 from wojtekmaj/no-warning-on-dependency-operator
Prevent warning on unimplemented operator thrown for OPS.dependency
2018-04-03 10:55:29 -07:00
Wojciech Maj
acc0a0fe95 Prevent warning on unimplemented operator thrown for OPS.dependency 2018-04-02 14:29:34 +02:00
Wojciech Maj
ea2850e9a7 Fix typos 2018-04-01 23:20:41 +02:00
Tim van der Meij
8887a09e8f
Merge pull request #9588 from swftvsn/patch-1
Improve node.js support
2018-04-01 12:26:39 +02:00
Jonas Jenwald
8b09f7c34e Clean-up getMainThreadWorkerMessageHandler for non-PRODUCTION mode
*This is a final piece of clean-up of code that I recently wrote, after which I'm done :-)*

When the `getMainThreadWorkerMessageHandler` function was added, in PR 9385, it did so by basically introducing a `web/app.js` dependency in `src/display/api.js` through the `window.pdfjsNonProductionPdfWorker` property[1]. Even though this is limited to non-`PRODUCTION` mode, i.e. `gulp server`, it still seems unfortunate to have that sort of viewer dependency in the API code itself.

With the new, much nicer and shorter, names introduced in PR 9565 we can remove this non-`PRODUCTION` hack and just use `window.pdfjsWorker` in both the viewer and the API regardless of the build mode.

---

[1] It didn't seem correct to piggy-back on the `window.pdfjsDistBuildPdfWorker` property in non-`PRODUCTION` mode.
2018-03-29 11:03:47 +02:00
Tim van der Meij
5c1a16ba6e
Merge pull request #9586 from Snuffleupagus/pageSize-api-rotate
Ensure that `PDFPageProxy.pageSizeInches` handles non-default /Rotate entries correctly
2018-03-25 18:03:32 +02:00
Jonas Jenwald
d547936827 Ensure that PDFPageProxy.pageSizeInches handles non-default /Rotate entries correctly
Without this patch, the pageSize will be incorrectly reported for some PDF files.

---

Move pageSizeInches to ui_utils
2018-03-25 16:48:29 +02:00
Tim van der Meij
115fbc47fe
Merge pull request #9594 from Snuffleupagus/pageLabel-validation
Add stricter validation in `Catalog.readPageLabels`
2018-03-24 19:40:49 +01:00
Brendan Dahl
24f766b14d
Merge pull request #9573 from yurydelendik/xml_parser
New XML parser
2018-03-21 17:00:00 -07:00
Jonas Jenwald
374d074f6e Add stricter validation in Catalog.readPageLabels
The current PageLabel dictionary validation code won't catch some (unlikely) forms of corruption. For example: a `Type`/`S` entry being `null`/`0`/empty string, a `P`/`St` entry being `null`/`0`.

Please note: I'm not aware of any bugs caused by the old code, but I've had this patch sitting locally for some time and figured it couldn't hurt to submit it.
2018-03-21 14:36:05 +01:00
swftvsn
c20426efef Improve node.js support
This change fixes "Unhandled rejection ReferenceError: HTMLElement is not defined" issue that is discussed in more detail in #8489.
2018-03-21 13:43:53 +02:00
Brendan Dahl
63c7aee112
Merge pull request #9565 from brendandahl/new-name
Rename the globals to shorter names.
2018-03-20 13:49:04 -07:00
Yury Delendik
655c8d34d0 New XML parser 2018-03-19 20:51:41 -05:00
Jonas Jenwald
e0ae157582 [api-minor] Fix various issues related to the pageSize information
The `getPageSizeInches` method was implemented on `PDFDocumentProxy`, which seems conceptually wrong since the size property isn't global to the document but rather specific to each page. Hence the method is moved into `PDFPageProxy`, as `get pageSizeInches` instead to address this.

Despite the fact that new API functionality was implemented, no unit-tests were added. To prevent issues later on, we should *always* ensure that new functionality has at least some test-coverage; something that this patch also takes care of.

The new `PDFDocumentProperties._parsePageSize` method seemed unnecessary convoluted. Furthermore, in the "no data provided"-case it even returned incorrect data (an array, rather than the expected object).
Finally, the fallback strings didn't actually agree with the `en-US` locale. This inconsistency doesn't look too great, and it's thus addressed here as well.
2018-03-18 09:10:19 +01:00
Brendan Dahl
01bff1a81d Rename the globals to shorter names.
pdfjsDistBuildPdf=pdfjsLib
pdfjsDistWebPdfViewer=pdfjsViewer
pdfjsDistBuildPdfWorker=pdfjsWorker
2018-03-16 11:08:56 -07:00
Jonas Jenwald
d431ae069d Attempt to handle corrupt PDF documents that inline Page dictionaries in a Kids array (issue 9540)
According to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.1942297, the contents of a Kids array should be indirect objects.
2018-03-12 14:13:23 +01:00
Tim van der Meij
6662985a20
Merge pull request #9541 from Rob--W/crx-fetch-chrome61plus
[CRX] Disable fetch in Chrome 60-
2018-03-10 00:15:47 +01:00
Tim van der Meij
fded22c618
Merge pull request #9509 from timvandermeij/document
Implement a single `getInheritableProperty` utility function
2018-03-08 22:40:27 +01:00
Yury Delendik
e0fb18a339
Merge pull request #9508 from pal03377/file-info-page-size
Add paper size to document information/properties
2018-03-08 11:57:38 -06:00
Rob Wu
db428004f4 [CRX] Disable fetch in Chrome 60-
Chrome 60 and earlier does not include credentials (cookies) in requests
made with fetch, regardless of extension permissions. This was fixed in
61.0.3138.0 by
2e231cf052

This patch disables the fetch backend in all affected Chrome versions.
The browser detection is done by checking for a change that coincides
with the release of Chrome 61.

Test case:
1. Copy the `isChromeWithFetchCredentials` function from the patch.
2. Run it in the JS console of Chrome and verify the return value.

Verified results:
- 49.0.2623.75 - false (earliest supported version by us)
- 60.0.3112.90 - false (last major version affected by bug)
- 61.0.3163.100 - true (first major version without bug)
- 65.0.3325.146 - true (current stable)

Test case 2:
1. Build the extension (`gulp chromium`) and load it in Chrome.
2. Open the developer tools, and open any PDF file.
3. In the "Network tab" of the developer tools, look at "request type".
   In Chrome 60-: Should be "xhr"
   In Chrome 61+: Should be "fetch"
2018-03-08 18:27:30 +01:00
palsch
8558c5b1d9 Add page size to the document properties dialog 2018-03-08 18:23:47 +01:00
Tim van der Meij
f308d73d40
Implement a single getInheritableProperty utility function
This function combines the logic of two separate methods into one.
The loop limit is also a good thing to have for the calls in
`src/core/annotation.js`.

Moreover, since this is important functionality, a set of unit tests and
documentation is added.
2018-03-03 19:19:39 +01:00
Tim van der Meij
4e5eb59a33
Remove the getPageProp method in src/core/document.js
It's only used in two places in the class and those callsites can
directly get the information from the dictionary, which is more readable
and avoids an additional method call.
2018-03-03 14:57:42 +01:00
Jonas Jenwald
b8606abbc1 [api-major] Completely remove the global PDFJS object 2018-03-01 18:13:27 +01:00
Jonas Jenwald
4b4fcecf70 Ensure that we only pass in the necessary parameters when initializing PDFDataTransportStream/PDFNetworkStream in src/display/api.js
With options being moved from the global `PDFJS` object and into `getDocument`, a side-effect is that we're now passing in a fair number of useless parameters to the various transport/network streams.
Even though this doesn't *currently* cause any problems, it nonetheless seem like a good idea to explicitly provide the parameters that are actually necessary.
2018-03-01 18:11:17 +01:00
Jonas Jenwald
212553840f Move the pdfBug option from the global PDFJS object and into getDocument instead
Also removes the now unused `getDefaultSetting` helper function.
2018-03-01 18:11:17 +01:00
Jonas Jenwald
1d03ad0060 Move the disableCreateObjectURL option from the global PDFJS object and into getDocument instead 2018-03-01 18:11:17 +01:00
Jonas Jenwald
05c05bdef5 Move the disableStream option from the global PDFJS object and into getDocument instead 2018-03-01 18:11:16 +01:00
Jonas Jenwald
b69abf1111 Move the disableRange option from the global PDFJS object and into getDocument instead 2018-03-01 18:11:16 +01:00
Jonas Jenwald
69d7191034 Move the disableAutoFetch option from the global PDFJS object and into getDocument instead
One additional complication with removing this option from the global `PDFJS` object, is that the viewer currently needs to check `disableAutoFetch` in a couple of places. To address this I'm thus proposing adding a getter in `PDFDocumentProxy`, to allow checking the *actually* used values for a particular `getDocument` invocation.
2018-03-01 18:11:16 +01:00
Jonas Jenwald
c7c583583b Move the disableFontFace option from the global PDFJS object and into getDocument instead 2018-03-01 18:11:16 +01:00
Jonas Jenwald
f3900c4e57 Move the isEvalSupported option from the global PDFJS object and into getDocument instead 2018-03-01 18:11:16 +01:00
Jonas Jenwald
3c2fbdffe6 Move the cMapUrl and cMapPacked options from the global PDFJS object and into getDocument instead 2018-03-01 18:11:16 +01:00
Jonas Jenwald
b674409397 Move the maxImageSize option from the global PDFJS object and into getDocument instead 2018-03-01 18:11:16 +01:00
Jonas Jenwald
81c550903f Move various viewer components options from PDFJS/PDFViewerApplication.viewerPrefs and into AppOptions instead 2018-03-01 18:11:16 +01:00
Jonas Jenwald
5894bfa449 Move API specific compatibility options from src/shared/compatibility.js and into a separate file
Unfortunately, as far as I can tell, we still need the ability to adjust certain API options depending on the browser environment in PDF.js version `2.0`. However, we should be able to separate this from the general compatibility code in the `src/shared/compatibility.js` file.
2018-03-01 18:11:16 +01:00
Rob Wu
401f3a9d97
Merge pull request #9520 from Snuffleupagus/fix-includes-polyfills
Attempt to fix the `Array.prototype.includes` and `String.prototype.includes` polyfills (issue 9514, 9516)
2018-03-01 18:05:14 +01:00
Jonas Jenwald
cd12a177a9 Attempt to fix the Array.prototype.includes and String.prototype.includes polyfills (issue 9514, 9516)
I don't understand why the previous way importing the polyfills didn't work, and I don't have time to try and figure it out, however this patch seems to fix things.

Fixes 9514.
Fixes 9516.
2018-03-01 17:38:14 +01:00
Rob Wu
f893bcd41b
Merge pull request #9494 from pardeepshokeen/windows-drive-letter
Windows drive letter
2018-03-01 16:39:44 +01:00
pardeepshokeen
7ebbdd2579 helper function to parse url 2018-02-25 01:25:32 +05:30
Jonas Jenwald
a97901efb6 Move the verbosity option from the global PDFJS object and into getDocument/PDFWorker instead
Given the purpose of this option, it doesn't seem necessary to make it available through `GlobalWorkerOptions`.
2018-02-16 13:22:35 +01:00
Jonas Jenwald
fdd2376170 Move the postMessageTransfers option from the global PDFJS object and into getDocument/PDFWorker instead
Given the purpose of this option, it doesn't seem necessary to make it available through `GlobalWorkerOptions`.
2018-02-16 13:22:35 +01:00
Jonas Jenwald
83d52518da [api-major] Refactor PDFWorker to be initialized with a parameter object, rather than a bunch of regular parameters 2018-02-16 13:22:35 +01:00
Jonas Jenwald
c3c1fc511d Move the workerSrc option from the global PDFJS object and into GlobalWorkerOptions instead 2018-02-16 13:22:35 +01:00
Jonas Jenwald
45adf33187 Move the workerPort from the global PDFJS object and into GlobalWorkerOptions instead 2018-02-16 13:22:35 +01:00
Jonas Jenwald
003bd4044b Introduce a GlobalWorkerOptions object for (basic) Worker configuration
Compared to most other options currently/previously residing on the global `PDFJS` object, some of the Worker specific ones (e.g. `workerPort`/`workerSrc`) probably cannot be moved into options provided directly when initializing e.g. `PDFWorker`.
The reason is that in some cases, e.g. the Webpack examples, we try to provide Worker auto-configuration and I cannot see a good solution for that use-case if we completely remove the globally available Worker configuration.

However inline with previous patches for PDF.js version `2.0`, it does seem like a worthwhile goal to move away from storing options directly on the global `PDFJS` object, since that is a pattern we should avoid going forward. Especially since one of the (eventual) goals is to attempt to *completely* remove the global `PDFJS` object, and rely solely on exporting/importing the needed functionality.

By introducing the `GlobalWorkerOptions` we thus have larger flexibility in the future, if/when the global `PDFJS` object will be removed.
2018-02-16 13:22:35 +01:00
Rob Wu
a89071bdef
Merge pull request #9470 from Snuffleupagus/issue-4888
Ensure that `JpegImage.getData` returns the correct data length when `forceRGBoutput == true` (issue 4888)
2018-02-16 13:14:21 +01:00
Tim van der Meij
538dda1096
Merge pull request #9479 from Snuffleupagus/refactor-viewer-options
[api-major] Refactor viewer components initialization to reduce their dependency on the global `PDFJS` object
2018-02-14 22:47:33 +01:00
Jonas Jenwald
11ab3b5c00 Ensure that JpegImage.getData returns the correct data length when forceRGBoutput == true (issue 4888)
With PDF.js version `2.0` we'll only support browsers with built-in `TypedArray` functionality, hence there doesn't seem to be any good reason not to implement this now.

Fixes 4888.
2018-02-13 20:44:21 +01:00
Jonas Jenwald
e95c11a7f0 Remove the undocumented PDFJS.enableStats option
In order to simplify things, the undocumented `enableStats` option was removed and `pdfBug` is now instead used to enabled general debugging *and* page request/rendering stats.
Considering that in the default viewer the `stats` was only used when debugging was also enabled, this simplification (code wise) definitely seem worthwhile to me.
2018-02-13 16:56:57 +01:00
Jonas Jenwald
6bc3e1fb93 Remove version/build from the global PDFJS object
There's no need to expose these properties multiple times, since they're already exported by `src/display/api.js`.
2018-02-13 16:56:57 +01:00
Jonas Jenwald
77efed6626 Replace the PDFJS.disableWebGL option with a enableWebGL option passed, via BaseViewer/PDFPageView, to PDFPageProxy.render
Please note that the, pre-existing, viewer preference is already named `enableWebGL`; fixes 4919.
2018-02-13 16:56:56 +01:00
Jonas Jenwald
3a6f6d23d6 Move the externalLinkTarget and externalLinkRel options to PDFLinkService options
This removes the `PDFJS.externalLinkTarget`/`PDFJS.externalLinkRel` dependency from the viewer components, but please note that as a *temporary* solution the default viewer still uses it.
2018-02-13 14:28:40 +01:00
Jonas Jenwald
c45c394364 Move the imageResourcesPath option to a BaseViewer/PDFPageView/AnnotationLayerBuilder option
This removes the `PDFJS.imageResourcesPath` dependency from the viewer components and the test-suite, but please note that as a *temporary* solution the default viewer still uses it.
2018-02-13 14:28:38 +01:00
Jonas Jenwald
9e0a31f662 Move viewer specific compatibility options from src/shared/compatibility.js and into a separate file
Unfortunately, as far as I can tell, we still need the ability to adjust certain viewer options depending on the browser environment in PDF.js version `2.0`. However, we should be able to separate this from the general compatibility code in the `src/shared/compatibility.js` file.
2018-02-13 13:41:59 +01:00
Jonas Jenwald
f05e5c5460 Take the dictionary, and not just the image data, into account when caching inline images (issue 9398)
The reason for the bug is that we're only computing a checksum of the image data itself, but completely ignore the inline dictionary. The latter is important, since in practice it's not uncommon for inline images to be identical but use e.g. different ColourSpaces.

There's obviously a couple of different ways that we could compute a hash/checksum of the dictionary.
Initially I tried using `MurmurHash3_64` to compute a hash of the keys/values in the dictionary. Unfortunately this approach turned out to be *way* too slow in practice, especially for PDF files with a huge number of inline images; in particular issue 2618 would regresses quite badly with this solution.

The solution that is instead implemented in this patch, is to compute a checksum of the dictionary contents. While this is a much simpler, not to mention a lot more efficient, solution there's one drawback associated with it:
If the contents of inline image dictionaries are ordered differently, they will not be considered equal with this approach which could thus lead to failures to cache repeated inline images. In practice this doesn't seem to be a problem in any of the PDF files I've tested, and generally I'd rather err on the side of *not* caching given that too aggressive caching can easily lead to rendering bugs.

One small, but somewhat annoying, complication is that by the time `Parser.makeInlineImage` is called, we no longer know the *exact* stream position where the inline image dictionary starts. Having access to that information is crucial here, and the easiest solution I could come up with is to track this in the current `Lexer` instance.[1]

With the patch, we're thus able to fix the referenced issues without incurring large regressions in problematic cases such as issue 2618.

Fixes 9398; also improves/fixes the `issue8823` reference test.

---

[1] Obviously I'd have preferred if this patch could be limited to `Parser.makeInlineImage`, without the need for this "hack", but I'm not sure what that'd look like here.
2018-02-12 16:43:47 +01:00
Tim van der Meij
b3557483cb
Merge pull request #9465 from Snuffleupagus/streams-unify-disableRange-contentLength
Attempt to unify the `disableRange`/`contentLength` handling in the various network streams
2018-02-11 16:21:34 +01:00
Jonas Jenwald
1cf116ab88 Enable the mozilla/use-includes-instead-of-indexOf ESLint rule globally
This rule is available from https://www.npmjs.com/package/eslint-plugin-mozilla, and is enforced in mozilla-central. Note that we have the necessary `Array`/`String` polyfills and that most cases have already been fixed, see PRs 9032 and 9434.
2018-02-10 23:24:50 +01:00
Jonas Jenwald
2eb29409bc Enable the mozilla/avoid-removeChild ESLint rule globally
This rule is available from https://www.npmjs.com/package/eslint-plugin-mozilla, and is enforced in mozilla-central. Note that we have a polyfill for `ChildNode.remove()` and that most cases have already been fixed, see PRs 8056 and 8138.
2018-02-10 23:24:50 +01:00
Tim van der Meij
7bb066494f
Merge pull request #9427 from Snuffleupagus/native-JPEG-decoding-fallback
Fallback to the built-in JPEG decoder when browser decoding fails, and attempt to handle JPEG images with DNL (Define Number of Lines) markers (issue 8614)
2018-02-09 21:36:08 +01:00
Jonas Jenwald
ad06979cca Attempt to unify the disableRange/contentLength handling in the various network streams
First of all, note how in both `fetch_stream.js` and `node_stream.js` we always overwrite the `this._contentLength` property even when the response headers doesn't actually contain any (valid) length information. This could thus result in the `length` parameter, as passed to the network stream, being completely ignored despite having no better information available.
Secondly, in `node_stream.js` the `this._isRangeSupported` property wasn't always updated correctly based on the response headers.
2018-02-09 13:50:48 +01:00
Jonas Jenwald
25293628ff
Merge pull request #9459 from tonyjin/respect-worker-src
Respect workerSrc if set
2018-02-08 13:57:43 +01:00
Jonas Jenwald
a18c65ae9f Use the correct stream position when reading maxSizeOfInstructions from the maxp table (issue 9458)
Please refer to the `maxp` table specification, found at https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6maxp.html.

Fixes 9458.
2018-02-07 21:57:43 +01:00
Tony Jin
3c33f32dff Respect workerSrc if set
Respect user-defined workerSrc over internal overrides.
2018-02-07 11:31:18 -08:00
Jonas Jenwald
bf4166e6c9 Attempt to handle DNL (Define Number of Lines) markers when parsing JPEG images (issue 8614)
Please refer to the specification, found at https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=49

Given how the JPEG decoder is currently implemented, we need to know the value of the scanLines parameter (among others) *before* parsing of the SOS (Start of Scan) data begins.
Hence the best solution I could come up with here, is to re-parse the image in the *hopefully* rare case of JPEG images that include a DNL (Define Number of Lines) marker.

Fixes 8614.
2018-02-05 21:05:32 +01:00
Jonas Jenwald
80441346a3 Fallback to the built-in JPEG decoder if 'JpegStream', in src/display/api.js, fails to load the image
This works by making `PartialEvaluator.buildPaintImageXObject` wait for the success/failure of `loadJpegStream` on the API side *before* parsing continues.

Please note that in practice, it should be quite rare for the browser to fail loading/decoding of a JPEG image. In the general case, it should thus not be completely surprising if even `src/core/jpg.js` will fail to decode the image.
2018-02-05 21:05:31 +01:00
Jonas Jenwald
76afe1018b Fallback to built-in image decoding if the NativeImageDecoder fails
In particular this means that if 'JpegDecode', in `src/display/api.js`, fails we'll fallback to the built-in JPEG decoder.
2018-02-05 17:01:35 +01:00
Jonas Jenwald
2570717e77 Inline the code in loadJpegStream at the only call-site in src/display/api.js.js`
Since `loadJpegStream` is only used at a *single* spot in the code-base, and given that it's very heavily tailored to the calling code (since it relies on the data structure of `PDFObjects`), this patch simply inlines the code in `src/display/api.js` instead.
2018-02-05 17:01:35 +01:00
Jonas Jenwald
7f73fc9ace Re-factor PartialEvaluator.buildPaintImageXObject to make it asynchronous
This is necessary for upcoming changes, which will add fallback code-paths to allow graceful handling of native image decoding failures.
2018-02-05 17:01:35 +01:00
Jonas Jenwald
ec85d5c625 Change the signature of PartialEvaluator.buildPaintImageXObject to take a parameter object
This method currently requires a fair number of parameters, which creates quite	unwieldy call-sites. When invoking `buildPaintImageXObject`, you have to remember not only which arguments to supply, but also the correct order, to prevent run-time errors.
2018-02-05 17:01:35 +01:00
Rob Wu
2f19d9d906 Support spaces and semicolons in filename
Imports the following changes:
5b1afa7c29
7e2e35a38b
2018-02-04 16:19:40 +01:00
Jonas Jenwald
712090eff8 Upstream the changes from: Bug 1339461 - Convert foo.indexOf(...) == -1 to foo.includes() and implement an eslint rule to enforce this
Yet another case where PDF.js code was modified in `mozilla-central` without the changes happening in the GitHub repo first; *sigh*.
If we don't upstream at least the changes in `extensions/firefox/`, any future update of PDF.js in `mozilla-central` will be blocked.

Please see:
 - https://bugzilla.mozilla.org/show_bug.cgi?id=1339461
 - https://hg.mozilla.org/mozilla-central/rev/d5a5ad1dbbf2
2018-02-04 14:59:27 +01:00
Jonas Jenwald
9ac9ef8ef1 Polyfill String.prototype.includes using core-js
See https://github.com/zloirock/core-js#ecmascript-6-string.
2018-02-04 14:31:59 +01:00
Tim van der Meij
73436c0d12
Implement the AESBaseCipher class and let the AES128Cipher and AES256Cipher classes extend it 2018-02-03 20:16:33 +01:00
Tim van der Meij
9a959e4df7
Update the AES128Cipher and AES256Cipher implementations to be more similar
This commit is the first step for extracting a base class for the
`AES128Cipher` and the `AES256Cipher` classes. The objective here is to
make code changes (not altering the logic) to make the implementations
as similar as possible as found by creating a diff of both classes.

In particular, we extract the key size and cycles of repetitions
constants since they are different for AES-128 and AES-256. Moreover, we
rename functions to be similar.

In the `AES256Cipher` class, there was an additional assignment to
`this` in the decryption function. However, this was unnecessary because
the assignment would also be done when the loop was exited.
2018-02-03 20:16:29 +01:00
Jonas Jenwald
f4a95de694 Attempt to find the next valid marker when encountering invalid image data in JpegImage.parse (issue 9425)
In the JPEG images in the referenced PDF file, the DHT (Define Huffman Tables) segments contain more data than expected based on the length parameter.

Fixes 9425.
2018-02-03 16:01:19 +01:00
Jonas Jenwald
39c5e1ed1a Remove the (unnecessary) WorkerMessageHandler variable from the setupFakeWorkerGlobal() function in the src/display/api.js file 2018-01-31 12:52:10 +01:00
Jonas Jenwald
56a8c934dd [api-major] Remove the PDFJS.disableWorker option
Despite this patch removing the `disableWorker` option itself, please note that we'll still fallback to loading the worker file(s) on the main-thread when running in environments without proper Web Worker support.

Furthermore it's still possible, even with this patch, to force the use of fake workers by manually loading the necessary file using a `<script>` tag on the main-thread.[1]
That way, the functionality of the now removed `SINGLE_FILE` build target and the resulting `build/pdf.combined.js` file can still be achieved simply by adding e.g. `<script src="build/pdf.worker.js"></script>` to the HTML (obviously with the path adjusted as needed).

Finally note that the `disableWorker` option is a performance footgun, and unfortunately many existing third-party examples actually use it without providing any sort of warning/justification.

---

[1] This approach is used in the default viewer, since certain kind of debugging may be easier if the code is running directly on the main-thread.
2018-01-31 12:52:10 +01:00
Jonas Jenwald
a5aaf62754 [api-minor] Add a (static) PDFWorker.getWorkerSrc method that returns the current workerSrc
This method returns the currently used `workerSrc`, which thus allows obtaining the fallback `workerSrc` value (e.g. when the option wasn't set by the user).
2018-01-31 12:52:07 +01:00
Jonas Jenwald
c56f3f04dd [api-major] Remove the SINGLE_FILE build target
Please note that this build target, and the resulting `build/pdf.combined.js` file, is equivalent to setting the `PDFJS.disableWorker` option to `true` which is a performance footgun.
2018-01-29 14:44:44 +01:00
Rob Wu
5d1c541702 Enable some polyfills for compat with Chrome 49
Successfully tested with Chrome 49.
2018-01-26 12:31:41 +01:00
Rob Wu
44025a3ec1 Explicitly state intended support in compatibility.js
Add comments with supported browser versions where missing.

Method:
- Use MDN compat tables if available.
- Otherwise test in Chrome (31+) otherwise.
  (the Chrome Web Store does not update older versions of
   Chrome, so probably nobody is interested in even older
   versions, even though there is an existing comment for
   Chrome<29 at `document.currentScript`).
2018-01-26 12:31:41 +01:00
Jonas Jenwald
d33c763dd5 Re-factor resetting of StatTimer instances to fix completely broken benchmarking (PR 9245 follow-up)
It turns out that PR 9245 unfortunately broke benchmarking completely, sorry about that!

The bug is that we were attempting to reset the current instance of `StatTimer`, instead of creating a new one as was previously done. By resetting the current instance, the `StatTimer` data fetched in `test/driver.js` is now wiped out since it points to the *same* underlying object.
This re-use of a `StatTimer` instance was asked for during review, and unfortunately I didn't test this thoroughly enough before submitting the final version of the PR.[1]

---

[1] Note that while I did test the benchmarking scripts with that PR *before* initially submitting it, I did however forget to do that after addressing the review comments which might explain why this problem went unnoticed.
2018-01-25 19:46:03 +01:00
Jani Pehkonen
5593c970e0 Implement Huffman coding in JBIG2 2018-01-23 17:04:07 +02:00
Jonas Jenwald
f0216484bc
Merge pull request #9383 from Rob--W/better-content-disposition-parser
Better content disposition parser
2018-01-21 15:08:14 +01:00
Tim van der Meij
9746646511
Merge pull request #9386 from shikhar-scs/remove-parsejbig2-function
removed parseJbig2 function
2018-01-21 14:51:59 +01:00
Rob Wu
a4e907169e Improve correctness of Content-Disposition parser
Re-uses logic from 9f5fcae11c/extension/content-disposition.js
which is already covered by tests: 6f3bbb8bbf
2018-01-21 13:31:12 +01:00
Jonas Jenwald
fe5102a27f
Merge pull request #9363 from Rob--W/fetch-http/s-only
Limit PDFFetchStream to http(s) in the Chrome extension
2018-01-21 11:45:09 +01:00
Shikhar Agnihotri
43e003cf5c
removed parseJbig2 function 2018-01-20 19:49:06 +05:30
Jonas Jenwald
69a8336cf1 Address the final round of review comments for Content-Disposition filename extraction
This patch updates the `IPDFStreamReader` interface and ensures that the interface/implementation of `network.js`, `fetch_stream.js`, `node_stream.js`, and `transport_stream.js` all match properly.
The unit-tests are also adjusted, to more closely replicate the actual behaviour of the various actual `IPDFStreamReader` implementations.
Finally, this patch adjusts the use of the Content-Disposition filename when setting the title in the viewer, and adds `PDFDocumentProperties` support as well.
2018-01-18 17:39:22 +01:00
Juan Salvador Perez Garcia
eb1f6f4c24 Content disposition filename
File name is extracted from headers.
2018-01-18 17:38:44 +01:00
shikhar-scs
32080f1081 changed decodeURI to decodeURIComponent 2018-01-15 19:31:25 +05:30
Tim van der Meij
237bc2ef9d
Merge pull request #9323 from juncaixinchi/master
Get correct path in node_stream on windows platform( issue #9020)
2018-01-14 15:41:56 +01:00
Rob Wu
1c8cacd6b9 Limit PDFFetchStream to http(s) in the Chrome extension
The `fetch` API is only supported for http(s), even in Chrome extensions.
Because of this limitation, we should use the XMLHttpRequest API when the
requested URL is not a http(s) URL.

Fixes #9361
2018-01-14 00:34:46 +01:00
juncaixinchi
8e278d9a45 Get correct path in node_stream on windows platform 2018-01-13 21:13:24 +08:00
Jonas Jenwald
0e1b5589e7 Restore the btoa/atob polyfills for Node.js
These were removed in PR 9170, since they were unused in the browsers that we'll support in PDF.js version `2.0`.
However looking at the output of Travis, where a subset of the unit-tests are run using Node.js, there's warnings about `btoa` being undefined. This doesn't appear to cause any errors, which probably explains why we didn't notice this before (despite PR 9201).
2018-01-13 01:31:05 +01:00
Brendan Dahl
d77fc8882a
Merge pull request #9352 from Snuffleupagus/issue-9285
Attempt to actually resolve ColourSpace names in accordance with the specification (issue 9285)
2018-01-12 13:01:22 -08:00
Jonas Jenwald
d0c8992e8a Attempt to actually resolve ColourSpace names in accordance with the specification (issue 9285)
Please refer to the PDF specification, in particular http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.3801570

> A colour space shall be specified in one of two ways:
>  - Within a content stream, the CS or cs operator establishes the current colour space parameter in the graphics state. The operand shall always be name object, which either identifies one of the colour spaces that need no additional parameters (DeviceGray, DeviceRGB, DeviceCMYK, or some cases of Pattern) or shall be used as a key in the ColorSpace subdictionary of the current resource dictionary (see 7.8.3, "Resource Dictionaries"). In the latter case, the value of the dictionary entry in turn shall be a colour space array or name. A colour space array shall never be inline within a content stream.
>
> - Outside a content stream, certain objects, such as image XObjects, shall specify a colour space as an explicit parameter, often associated with the key ColorSpace. In this case, the colour space array or name shall always be defined directly as a PDF object, not by an entry in the ColorSpace resource subdictionary. This convention also applies when colour spaces are defined in terms of other colour spaces.
2018-01-10 20:20:43 +01:00
Jonas Jenwald
5a52ee0a79
Merge pull request #9350 from janpe2/svg-closeEOFillStroke
Implement `closeEOFillStroke` in SVG backend
2018-01-09 21:09:11 +01:00
Brendan Dahl
3925aab010
Merge pull request #9282 from Snuffleupagus/TrueType-Collection
Add support for TrueType Collection fonts (issue 9262)
2018-01-09 11:22:52 -08:00
Jani Pehkonen
d1e1dbfc14 Implement closeEOFillStroke in SVG backend 2018-01-09 19:42:12 +02:00
Jonas Jenwald
915e3f4c5f
Merge pull request #9099 from tiriana/allow-dontFlip-in-PDFPageProxy-getViewport
Allows 'dontFlip' as third arg in PDFPageProxy.getViewport
2018-01-09 18:27:26 +01:00
Radomir Wojtera
3dfc540d04 Allows 'dontFlip' as third argument in PDFPageProxy.getViewport 2018-01-09 13:08:24 +01:00
Jonas Jenwald
d6c028b946 Add support for TrueType Collection fonts (issue 9262)
The specification can be found at https://www.microsoft.com/typography/otspec/otff.htm, under the "Font Collections" heading.

Fixes 9262.
2018-01-08 22:31:08 +01:00
Tim van der Meij
6b2ed504b7
Merge pull request #9336 from Snuffleupagus/jpx-SIZ
Correctly extract component data from "Image and tile size" (SIZ) markers in JPEG 2000 images
2018-01-03 23:34:34 +01:00
Jonas Jenwald
873556865b Correctly extract component data from "Image and tile size" (SIZ) markers in JPEG 2000 images
This is something that I noticed while attempting to debug https://bugzilla.mozilla.org/show_bug.cgi?id=1374945.
Just looking at the code, the `YRsiz` parameter seemed immediately wrong and the fact that every component used the *same* data also looked strange.
Comparing with the specification, see https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-T.800-200208-S!!PDF-E&type=items#page=37, confirmed that this is indeed incorrect.

Note that I haven't got any example of a PDF file that is fixed by this patch, but that might be more luck than anything else. Manually checking a couple of files with included JPEG 2000 images, the `Csiz`/`XRsiz`/`YRsiz` parameters were `1` which could explain why this hasn't been an issue before.

Obviously we shouldn't generally make changes to `core` code without adding tests, but in this case I'm simply not sure how to obtain/create one. However, since the existing code doesn't make sense this patch could hopefully be deemed acceptable anyway.
2018-01-03 16:26:28 +01:00
Jonas Jenwald
2db75a2a3a Update the ESLint dependencies, and also tweak the no-multiple-empty-lines rules
Since multiple empty lines is virtually unused in the code-base, and the few cases that do exist look like "typos", let's enforce greater consistency here; please see https://eslint.org/docs/rules/no-multiple-empty-lines.
2018-01-03 13:32:57 +01:00
Tim van der Meij
d36c46b2c9
Remove the CustomStyle class
It is only used in a few places to handle prefixing style properties if
necessary. However, we used it only for `transform`, `transformOrigin`
and `borderRadius`, which according to Can I Use are supported natively
(unprefixed) in the browsers that PDF.js 2.0 supports. Therefore, we can
remove this class, which should help performance too since this avoids
extra function calls in parts of the code that are called often.
2017-12-31 14:22:11 +01:00
Jonas Jenwald
c5700211d6 Adjust decodeACSuccessive in src/core/jpg.js to improve the rendering quality of (progressive) JPEG images
I've been looking into the remaining point in 8637 about blurry images, to see if we could perhaps improve the rendering quality slightly there. After quite a bit of debugging, it seems that the issue is limited to certain progressive JPEG images.

As mentioned previously, I've got no detailed knowledge of the JPEG format, but this patch does seem to improve things quite a bit for the images in question.
Squinting at https://searchfox.org/mozilla-central/rev/6c33dde6ca02b389c52e8db3d22494df8b916f33/media/libjpeg/jdphuff.c#492-639, it seems reasonable that we should take the sign of the data into account. Furthermore, looking at the specification in https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=118, the "F.2.4.3 Decoding the binary decision sequence for non-zero DC differences and AC coefficients" section even contains a description of this (even though I cannot claim to really understand the details).
2017-12-30 15:24:09 +01:00
Jonas Jenwald
d6eed132e5 Correct the indentation in the switch statement in decodeACSuccessive in src/core/jpg.js 2017-12-30 15:22:30 +01:00
Tim van der Meij
18d82d9c54
Merge pull request #9287 from himanish-star/PDFjs-compatible-Librejs
PDFjs now compatible with Librejs
2017-12-30 14:00:25 +01:00
Jonas Jenwald
8c4b7d0439 Avoid truncating JPEG images with DeviceGray ColourSpaces when using the src/core/jpg.js built-in decoder
The bug that this patch fixes is limited to the built-in JPEG decoder, and was unearthed by PR 9260. The underlying issue has existed since PR 6984, where the contents of this patch ought to have been included (if it weren't for the fact that we had no *easy* way to test `src/core/jpg.js` back then).

*Please note:* The slight movement in the reference test is a result of using the `src/core/jpg.js` decoder, rather than the native browser one.
2017-12-29 18:44:07 +01:00
Tim van der Meij
25bbff4692
Merge pull request #9320 from Snuffleupagus/pr-9095-followup
Avoid rendering errors by passing in the `webGLContext` when creating a new `CanvasGraphics` in `getColorN_Pattern` (PR 9095 follow-up)
2017-12-28 23:17:30 +01:00
Jonas Jenwald
ec21bd9626
Merge pull request #9314 from timvandermeij/encodings
Implement unit tests for the encodings and fix missing items
2017-12-27 22:02:38 +01:00
Jonas Jenwald
06605abbc2 Avoid rendering errors by passing in the webGLContext when creating a new CanvasGraphics in getColorN_Pattern (PR 9095 follow-up)
This was an oversight in PR 9095, which unfortunately breaks rendering in some PDF files (e.g. the one from issue 6737).

It thus appears that we don't have any test-coverage for this code-path, and given the relative complexity of the PDF files affected by this bug I wasn't able to easily create a reduced test-case.
*Please note:* The linked test-case included in this patch is currently *not* rendered correctly (that'd be the PR 6606), but it at least gives us some test-coverage here.
2017-12-27 13:50:53 +01:00
Tim van der Meij
c7af2db2ec
Implement unit tests for the encodings and fix missing items
Initially I just implemented the unit tests, but quickly found that they
were failing my expectation of having a size of 256 items. Some of them
did contain 256 items and some did not. I looked up various resources
and figured that they indeed all need to have 256 items. One of the good
resources is https://github.com/davidben/poppler/blob/master/poppler/FontEncodingTables.cc

Aside from some missing `notdef` (empty string) entries at the end of
the arrays, which I assume causes issues since it may cause
out-of-bounds array access which in JavaScript gives `undefined`, there
was a `notdef` entry missing in the `MacExpertEncoding`, causing the
entries after that to be shifted. This fix for this is similar to the
one in #8589.

The unit tests verify that, for known encoding names, the return value
is not only an array, but that it is also of the right length and
contains only strings.
2017-12-24 18:14:40 +01:00
Jonas Jenwald
d4cd44fd16 Add a fallback for non-embedded LucidaSans-Demi fonts (issue 9291)
The PDF file in the issue uses a number of *embedded* versions of Lucida fonts, but for some reason does *not* embed the LucidaSans-Demi font. According to https://en.wikipedia.org/wiki/Lucida#Usages that one should be bold, so we can at least improve rendering here (even though it won't look perfect).

Fixes 9291.
2017-12-24 17:36:58 +01:00
Tim van der Meij
957e2d420d
Implement unit tests for the network utility code
This should provide 100% coverage for the file.
2017-12-23 19:24:11 +01:00
Jonas Jenwald
e58f2f513a [api-major] Remove the unused encrypted property from the pdfInfo object sent from the worker via the GetDoc message
I recall being confused as to the purpose of the `encrypted` property all the way back when working on PR 4750.

Looking at the history, this property was added in PR 1698 when password support was added to the API/viewer. However, its only purpose seem to have been to facilitate the addition of a `isEncrypted` function in the API. That function never, as far as I can tell, saw any use and was unceremoniously removed in PR 4144.

Since we want to avoid sending all non-essential data early during initial document loading (e.g. PR 4750), it seems correct to get rid of the `encrypted` property. Especially since it hasn't even been exposed in the API for over three years, with no complaints that I'm aware of.

Finally note that the `encrypt` property on the `XRef` instance isn't tied to the code that's being removed here. Given that we're calling `PDFDocument.parse` during `createDocumentHandler` in the worker which, via `PDFDocument.setup`, calls `XRef.parse` where the `Encrypt` data (if it exists) is always parsed.
2017-12-21 13:10:23 +01:00
Jonas Jenwald
9ff3c6f99d Remove the document.readyState polyfill
This is only relevant for browsers that we don't intend to support with PDF.js version `2.0`.
2017-12-19 15:05:19 +01:00
Jonas Jenwald
6af45052c5 Remove the input.type polyfill
This is only relevant for browsers that we don't intend to support with PDF.js version `2.0`.
2017-12-19 15:05:15 +01:00
Jonas Jenwald
cf88b7b212 Remove the ImageData.set polyfill
This is only relevant for browsers that we don't intend to support with PDF.js version `2.0`.
2017-12-19 15:05:14 +01:00
Jonas Jenwald
363e517acf Remove the HTMLElement.dataset polyfill
This is only relevant for browsers that we don't intend to support with PDF.js version `2.0`.
2017-12-19 14:50:18 +01:00
Jonas Jenwald
4880200cd4 Remove the XMLHttpRequest.response polyfill
This is only relevant for browsers that we don't intend to support with PDF.js version `2.0`.
2017-12-19 14:48:43 +01:00
Jonas Jenwald
8266cc18e7 Remove the webkitURL polyfill
This is only relevant for browsers that we don't intend to support with PDF.js version `2.0`.
2017-12-19 14:46:04 +01:00
Soumya Himanish Mohapatra
95ad956f68 PDFjs now compatible with Librejs 2017-12-19 15:13:50 +05:30
Jonas Jenwald
1dc54ddb40 Handle PDF files with missing 'endobj' operators, by searching for the "obj" string rather than "endobj" in XRef.indexObjects (issue 9105)
This patch refactors the searching for 'endobj', to try and find the next occurance of "obj" and then check if it was in fact an 'endobj' and continue searching otherwise.
This approach is used to avoid having to first find 'endobj', and then re-check the entire contents of the object and having to run (potentially expensive) regular expressions on arbitrary long strings.

Fixes 9105.
2017-12-18 13:17:45 +01:00
Tim van der Meij
6bbe91079b
Merge pull request #9272 from nveenjain/fix/8846
Replaced occurence of `throw new Error` with `unreachable`
2017-12-15 22:11:32 +01:00
Jonas Jenwald
6515b91118
Merge pull request #9276 from mozilla/loca-fix
Fix loca table when offsets aren't in ascending order.
2017-12-15 20:59:42 +01:00
Brendan Dahl
9b51cea724 Fix loca table when offsets aren't in ascending order. 2017-12-15 11:20:28 -06:00
Naveen Jain
1135674647 Replaced occurence of throw new Error with unreachable where applicable 2017-12-14 12:58:50 +05:30
Jonas Jenwald
ad5ed37059 Handle broken, Ghostscript generated, Metadata that contains HTML character names (bug 1424938)
Please note that while this could be considered a regression in user-facing behaviour, I'm not convinced that it's really a regression as such since prior to PR 8912 the Metadata would fail to parse (with an XML error) and thus be ignored when setting the viewer title.
With the refactored Metadata parsing we're now able to parse this, which uncovered issues with a subset of broken Ghostscript Metadata that uses HTML character names.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1424938
2017-12-13 14:32:47 +01:00
Tim van der Meij
095c63cc25
Merge pull request #9260 from Snuffleupagus/rm-JpegStream.getBytes
Attempt to remove the special `JpegStream.getBytes` method and utilize the regular `DecodeStream` one instead
2017-12-10 16:50:50 +01:00
Tim van der Meij
c35bbd11b0
Use native Math functions in the custom log2 function
It is quite confusing that the custom function is called `log2` while it
actually returns the ceiling value and handles zero and negative values
differently than the native function.

To resolve this, we add a comment that explains these differences and
make the function use the native `Math` functions internally instead of
using our own custom logic. To verify that the function does what we
expect, we add unit tests.

All browsers except for IE support `Math.log2` for quite a long time
already (see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/log2).
For IE, we use the core-js polyfill.

According to the microbenchmark at https://jsperf.com/log2-pdfjs/1,
using the native functions should also be faster, in my testing almost
six times as fast.
2017-12-10 16:35:17 +01:00
Jonas Jenwald
84de1e9a92 Attempt to remove the special JpegStream.getBytes method and utilize the regular DecodeStream one instead
Note that no other image stream implements a special `getBytes` method, which makes `JpegStream` look somewhat odd.

I'm actually not sure what purpose this methods serves, since I successfully ran all tests locally with it commented out. Furhermore, I also ran tests with an added `if (length && length !== this.bufferLength) { throw new Error('length mismatch'); }` check, and didn't get a single test failure in that case either.

Looking at the history, it seems that this code originated back in PR 4528, but as far as I can tell there's no mention in either commit messages nor PR comments of why it was necessary to add a "special" `getBytes` function for the `JpegStream`.
My assumption is that there's a good reason why this method was added, e.g. to address a *specific* regression in one of the reference tests. However, I did check out commit 58f697f977 locally and ran tests with this method commented out, and there didn't seem to be any image-related failures in that case either!?

Hence I'm suggesting that we attempt to simplify this code slightly be removing this special `getBytes` method. However, please note that there's perhaps a *small* risk of regressions in an edge-case where we currently have insufficient test-coverage.
2017-12-10 13:31:08 +01:00
Brendan Dahl
af1d80d45e
Merge pull request #9230 from Snuffleupagus/issue-9195
Add basic support for non-embedded Calibri fonts (issue 9195)
2017-12-08 10:15:43 -08:00
Jonas Jenwald
a5e3261b48
Merge pull request #9062 from mozilla/no_high
Move char codes from high surrogate pair range into private use.
2017-12-08 12:31:22 +01:00
Brendan Dahl
306999c325 Move char codes from high surrogate pair range into private use.
Fixes #2884
2017-12-07 10:35:50 -08:00
Jonas Jenwald
7c5ba9aad5 [api-major] Only create a StatTimer for pages when enableStats == true (issue 5215)
Unless the debugging tools (i.e. `PDFBug`) are enabled, or the `browsertest` is running, the `PDFPageProxy.stats` aren't actually used for anything.
Rather than initializing unnecessary `StatTimer` instances, we can simply re-use *one* dummy class (with static methods) for every page. Note that by using a dummy `StatTimer` in this way, rather than letting `PDFPageProxy.stats` be undefined, we don't need to guard *every* single stats collection callsite.

Since it wouldn't make much sense to attempt to use `PDFPageProxy.stats` when stat collection is disabled, it was instead changed to a "private" property (i.e. `PDFPageProxy._stats`) and a getter was added for accessing `PDFPageProxy.stats`. This getter will now return `null` when stat collection is disabled, making that case easy to handle.

For benchmarking purposes, the test-suite used to re-create the `StatTimer` after loading/rendering each page. However, modifying properties on various API code from the outside in this way seems very error-prone, and is an anti-pattern that we really should avoid at all cost. Hence the `PDFPageProxy.cleanup` method was modified to accept an optional parameter, which will take care of resetting `this.stats` when necessary, and `test/driver.js` was updated accordingly.

Finally, a tiny bit more validation was added on the viewer side, to ensure that all the code we're attempting to access is defined when handling `PDFPageProxy` stats.
2017-12-06 23:12:25 +01:00
Jonas Jenwald
50b72dec6e Convert StatTimer to an ES6 class 2017-12-06 13:59:03 +01:00
Jonas Jenwald
6b1eda3e12 Move StatTimer from src/shared/util.js to src/display/dom_utils.js
Since the `StatTimer` is not used in the worker, duplicating this code on both the main and worker sides seem completely unnecessary.
2017-12-06 13:51:04 +01:00
Jonas Jenwald
08de655177 Add basic support for non-embedded Calibri fonts (issue 9195)
There's a number of issues with the fonts in the referenced PDF file. First of all, they contain broken `ToUnicode` data (`NUL` bytes all over the place). However even if you skip those, the `ToUnicode` data appears to contain nothing but a `IdentityH` CMap which won't help provide a proper glyph mapping.

The real issue actually turns out to be that the PDF file uses the "Calibri" font[1], but doesn't include any font files. Since that one isn't a standard font, and uses a fairly different CID to GID map compared to the standard fonts, we're not able to render the file even remotely correct.
To work around this, I'm thus proposing that we include a (incomplete) glyph map for Calibri, and fallback to the standard Helvetica font. Obviously this isn't going to look perfect, but it's really the best that we can hope to achieve given that the PDF file is missing the necessary font data.

Finally, please note that none of the PDF readers I've tried (Adobe Reader, PDFium in Chrome) were able to extract the text (which isn't very surprising, given the broken `ToUnicode` data).

Fixes 9195.

---

[1] According to Wikipedia, see https://en.wikipedia.org/wiki/Calibri, Calibri is (primarily) a Windows font.
2017-12-03 17:23:33 +01:00
Jonas Jenwald
f3c50fe2f9
Merge pull request #9192 from Snuffleupagus/issue-8229
Build a fallback `ToUnicode` map for simple fonts (issue 8229)
2017-11-30 10:27:32 +01:00
Tim van der Meij
e320243870
Merge pull request #9206 from janpe2/svg-inv-images
Fix inverted 1-bit images in SVG backend
2017-11-28 22:46:43 +01:00
Jani Pehkonen
58b214eab3 Fix inverted 1-bit images in SVG backend 2017-11-28 21:24:27 +02:00
Jani Pehkonen
06d083b04b Fix pattern-filled text 2017-11-28 19:40:22 +02:00
Tim van der Meij
3e34eb31d9
Merge pull request #9191 from timvandermeij/pushbuttons
Button widget annotations: implement support for pushbuttons
2017-11-27 22:31:07 +01:00
Jonas Jenwald
61e19bee43 Build a fallback ToUnicode map for simple fonts (issue 8229)
In some fonts, the included `ToUnicode` data is incomplete causing text-selection to not work properly. For simple fonts that contain encoding data, we can manually build a `ToUnicode` map to attempt to improve things.

Please note that since we're currently using the `ToUnicode` data during glyph mapping, in an attempt to avoid rendering regressions, I purposely didn't want to amend to original `ToUnicode` data for this text-selection edge-case.
Instead, I opted for the current solution, which will (hopefully) give slightly better text-extraction results in PDF file with incomplete `ToUnicode` data.

According to the PDF specification, see [section 9.10.2](http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1873172):

> A conforming reader can use these methods, in the priority given, to map a character code to a Unicode value.
> ...

Reading that paragraph literally, it doesn't seem too unreasonable to use *different* methods for different charcodes.

Fixes 8229.
2017-11-26 14:45:15 +01:00
Tim van der Meij
0fe80df2a7
Button widget annotations: implement support for pushbuttons 2017-11-26 14:09:48 +01:00
Jonas Jenwald
ffbfc3c2a7 Refactor the building of ToUnicode maps for simple fonts a helper method 2017-11-26 13:30:29 +01:00
Jonas Jenwald
ab1f76cc37 Remove the unused capability parameter from the WorkerTransport.getPage method
That parameter, originally named `promise`, has been unused for over five years; ever since commit f0687c4d50 in PR 1531.
2017-11-25 11:49:33 +01:00
Jonas Jenwald
59b5e14301 Split the existing WebGLUtils in two classes, a private WebGLUtils and a public WebGLContext, and utilize the latter in the API to allow various code to access the methods of WebGLUtils
This patch is one (small) step on the way to reduce the general dependency on a global `PDFJS` object, for PDF.js version `2.0`.
2017-11-24 21:54:47 +01:00
Jonas Jenwald
cc47ef56ec Remove the onclick polyfill for old versions of Opera
This was only relevant for no obsolete versions Opera, that use the Presto engine. According to https://en.wikipedia.org/wiki/History_of_the_Opera_web_browser#Opera_2013, the last version affected was released in 2013.
2017-11-21 11:02:14 +01:00
Jonas Jenwald
d18b2a8e73 Remove the classList polyfill
This is only relevant for browsers that we don't intend to support with PDF.js version `2.0`.
2017-11-21 11:01:52 +01:00
Jonas Jenwald
4b15e8566b Remove the Function.prototype.bind polyfill
This is only relevant for browsers that we don't intend to support with PDF.js version `2.0`.
2017-11-21 11:00:55 +01:00
Jonas Jenwald
d8cb74d3e3 Remove the btoa/atob polyfills
This is only relevant for browsers that we don't intend to support with PDF.js version `2.0`.
2017-11-21 11:00:55 +01:00
Jonas Jenwald
150ac0788f Remove IE9 specific XMLHttpRequest polyfills that utilize VBArray 2017-11-21 11:00:55 +01:00
Jonas Jenwald
935c5c587f Remove the Object.defineProperty polyfill
This is only relevant for browsers that we don't intend to support with PDF.js version `2.0`.
2017-11-21 11:00:55 +01:00
Tim van der Meij
25b07812b9
Sanitize the display value for choice widget annotations 2017-11-18 20:37:27 +01:00
Tim van der Meij
9e8cf448b0
Merge pull request #9140 from Snuffleupagus/rm-console-polyfill
Remove the `console` polyfills
2017-11-18 15:49:19 +01:00
Tim van der Meij
edaf4b3173
Merge pull request #9037 from Snuffleupagus/refactor-streams-params
Re-factor how parameters are passed to the network streams
2017-11-18 15:41:15 +01:00
Tim van der Meij
ae07adf143
Merge pull request #9073 from Snuffleupagus/image-streams-fixes
Fix the interface of `JpegStream`/`JpxStream`/`Jbig2Stream` to agree with the other `DecodeStream`s
2017-11-17 23:26:36 +01:00
Tim van der Meij
1d67d9dccd
Merge pull request #9131 from janpe2/svg-empty-paths
Filling and stroking empty paths in SVG backend
2017-11-16 22:43:24 +01:00
Jonas Jenwald
42099c564f Remove the console polyfills
All browsers that we intend to support with PDF.js version 2.0 already supports `console` natively.
2017-11-16 09:34:51 +01:00
Jonas Jenwald
d5174cd826 Remove the requestAnimationFrame polyfill
According to https://developer.mozilla.org/en-US/docs/Web/API/window/requestAnimationFrame#Browser_compatibility and https://caniuse.com/#feat=requestanimationframe, the browsers we intend to support with PDF.js version 2.0 should all have native `requestAnimationFrame` support.

Note that the reason for indiscriminately polyfilling `requestAnimationFrame` in iOS, see PR 4961, was apparently because of a bug in iOS 6.
However, according to [Wikipedia](https://en.wikipedia.org/wiki/IOS_version_history#iOS_8): "Support for iOS 8 ended in 2017.", hence the lowest version currently supported is iOS 9.
2017-11-15 16:08:48 +01:00
Jani Pehkonen
4e8f7070da Filling and stroking empty paths in SVG backend 2017-11-14 18:35:39 +02:00
Jonas Jenwald
745cb73c65 Remove PDFJS.disableRange/PDFJS.disableStream code for now unsupported browsers in src/shared/compatibility.js
We're currently disabling range requests and streaming for a number of configurations. A couple of those will no longer be supported (with PDF.js version 2.0), hence we ought to be able to clean up the compatibility code slightly.
2017-11-14 15:28:50 +01:00
Jonas Jenwald
eb3a1f24a3 Remove the PDFJS.disableHistory code from src/shared/compatibility.js
This compatibility code is only relevant for browsers that will no longer be supported (with PDF.js version 2.0), hence we ought to be able to remove it.
2017-11-14 15:28:50 +01:00
Tim van der Meij
9686f6652c
Merge pull request #9089 from yurydelendik/rm-chunks
Extracts OperatorList class and prepares for streaming
2017-11-13 23:35:40 +01:00
Jonas Jenwald
23699cef1c Re-factor how parameters are passed to the network streams
*This patch is the result of me starting to look into moving parameters from `PDFJS` into `getDocument` and other API methods.*

When familiarizing myself with the code, the signatures of the various network streams seemed to be unnecessarily cumbersome since `disableRange` is currently handled separately from other parameters.
I'm assuming that the explanation for this is probably "for historical reasons", as is often the case. Hence I'd like to clean this up *before* we start the larger, and more invasive, `PDFJS` parameter re-factoring.
2017-11-11 11:23:29 +01:00
Jonas Jenwald
de5297b9ea Fix the interface of JpegStream/JpxStream/Jbig2Stream to agree with the other DecodeStreams
The interface of all of the "image" streams look kind of weird, and I'm actually a bit surprised that there hasn't been any errors because of it.
For example: None of them actually implement `readBlock` methods, and it seems more luck that anything else that we're not calling `getBytes()` (without providing a length) for those streams, since that would trigger a code-path in `getBytes` that assumes `readBlock` to exist.

To address this long-standing issue, the `ensureBuffer` methods are thus renamed to `readBlock`. Furthermore, the new `ensureBuffer` methods are now no-ops.
Finally, this patch also replaces `var` with `let` in a number of places.
2017-11-11 11:22:16 +01:00
Jonas Jenwald
36593d6bbc Move JpegStream and JpxStream to their own files 2017-11-11 11:22:16 +01:00
Yury Delendik
5fa56f6a9d For backwards compatibility: use addOp amount instead of queue size. 2017-11-09 18:46:48 -06:00
Yury Delendik
877c2d7743 Changing QueueOptimizer to be more iterative. 2017-11-09 18:46:48 -06:00
Max Schaefer
3ae37d1b06 Remove a few useless assignments. 2017-11-03 11:36:48 +00:00
Max Schaefer
bc8f673522 Remove spurious arguments to NullStream constructor. 2017-11-03 10:14:32 +00:00
Max Schaefer
3ab1a9922a Rearrange a few declarations so that they precede their uses. 2017-11-03 10:14:32 +00:00
Jonas Jenwald
2dbd3f2603 [api-major] Remove the TypedArray polyfills 2017-11-01 10:31:28 +01:00
Brendan Dahl
b46443f0c1
Merge pull request #9077 from yurydelendik/v2
Version 2.0 merge
2017-10-31 14:24:20 -07:00
Jonas Jenwald
83e8398ff2 For non-embedded fonts, map softhyphen (0x00AD) to regular hyphen (0x002D) (issue 9084)
In the PDF file, the `ToUnicode` data first maps the hyphen correctly, and then *overwrites* it to point to the softhyphen instead. That one cannot be rendered in browsers, and an empty space thus appear instead.

Fixes 9084.
2017-10-31 13:26:04 +01:00
Jonas Jenwald
92fcfce685
Merge pull request #9082 from brendandahl/issue7562
Overwrite glyphs contour count if it's less than -1.
2017-10-30 20:44:01 +01:00
Yury Delendik
85f544f55a Moves OperatorList and QueueOptimizer into separate file. 2017-10-30 13:29:58 -05:00
Brendan Dahl
17037b5e51 Overwrite glyphs contour count if it's less than -1.
The test pdf has a contour count of -70, but OTS doesn't
like values less than -1.

Fixes issue #7562.
2017-10-30 09:16:51 -07:00
Yury Delendik
b4e25fb2e8 Merge remote-tracking branch 'mozilla/version-2.0' into v2 2017-10-27 14:01:45 -05:00
Jonas Jenwald
5e627810e4 Use stringToBytes in more places
Rather than having (basically) verbatim copies of `stringToBytes` in a few places, we can simply use the helper function directly instead.
2017-10-26 11:01:13 +02:00
Jonas Jenwald
e94a0fd4e7 Extract the actual decoding in CCITTFaxStream into a new CCITTFaxDecoder "class", which the new CCITTFaxStream depends on 2017-10-24 16:03:08 +02:00
Jonas Jenwald
bb35095083 Move CCITTFaxStream and Jbig2Stream, from src/core/stream.js, to separate files 2017-10-24 12:00:40 +02:00
Jonas Jenwald
d71a576b30 Merge pull request #9045 from brendandahl/sani-name
Sanitize name index in compile phase of CFF.
2017-10-24 11:48:03 +02:00
Brendan Dahl
6b12612a52 Sanitize name index in compile phase of CFF.
Fixes #8960
2017-10-23 17:13:49 -07:00
Tim van der Meij
a0ec980e63 Merge pull request #9055 from Snuffleupagus/core-js-Number
Replace `Number` polyfills with the ones from core-js
2017-10-21 16:15:11 +02:00
Jonas Jenwald
dbbb763eaf Replace Number polyfills with the ones from core-js
Since we're already using core-js elsewhere in `compatibility.js`, we can reduce the amount of code we need to maintain ourselves.

https://github.com/zloirock/core-js#ecmascript-6-number
2017-10-21 12:58:53 +02:00
Jonas Jenwald
8684aa0ee5 Replace our Promise polyfill with the one from core-js
Since we're already using core-js elsewhere in `compatibility.js`, we can reduce the amount of code we need to maintain ourselves.

https://github.com/zloirock/core-js#ecmascript-6-promise
2017-10-21 12:51:14 +02:00
Tim van der Meij
10fd590c09 Merge pull request #9032 from Snuffleupagus/nativeImageDecoderSupport-2.0
Simplify the check, and remove the warning, for the `nativeImageDecoderSupport` API parameter
2017-10-20 21:30:47 +02:00
Brendan Dahl
fcc9943d04 Use charstring as plain text when lengthIV is -1.
Fixes #7769
2017-10-18 14:19:59 -07:00
Tim van der Meij
17cc94db4e Merge pull request #9034 from Snuffleupagus/javascript-null
[api-major] Change `getJavaScript` to return `null`, rather than an empty Array, when no JavaScript exists
2017-10-17 21:58:45 +02:00
Tim van der Meij
7d7edd9cc6
[api-major] Remove the PDFJS_NEXT option
Nothing uses this option anymore, so setting it is a no-op now. We can
safely remove it.

Use `SKIP_BABEL` (instead of `PDFJS_NEXT`) now if you want to skip Babel
translation for a build.
2017-10-16 23:16:51 +02:00
Jonas Jenwald
b5794bb26b Simplify the check, and remove the warning, for the nativeImageDecoderSupport API parameter
As discussed in PR 8982.
2017-10-16 09:11:39 +02:00
Jonas Jenwald
bcb29063c1 Polyfill Object.values and Array.prototype.includes using core-js
See https://github.com/zloirock/core-js#stage-4-proposals.
2017-10-16 09:11:39 +02:00
Jonas Jenwald
1cd1582cb9 [api-major] Change getJavaScript to return null, rather than an empty Array, when no JavaScript exists
Other API methods already return `null`, rather than empty Arrays/Objects, hence it makes sense to change `getJavaScript` to be consistent.
2017-10-15 22:17:14 +02:00
Tim van der Meij
815bc53a16 Merge pull request #9029 from Snuffleupagus/eslint--report-unused-disable-directives
Enable the `--report-unused-disable-directives` ESLint command line option
2017-10-15 15:12:20 +02:00
Jonas Jenwald
04b46831c1 Enable the --report-unused-disable-directives ESLint command line option
This option was added in [version `4.8.0` of ESLint](https://github.com/eslint/eslint/releases/tag/v4.8.0), which is already listed as the minimum version in our `package.json` file; please refer to https://eslint.org/docs/user-guide/command-line-interface#--report-unused-disable-directives for additional details.

Despite the caveat listed in the link above, I still think that using this option makes sense since it will help ensure that no longer necessary disable statements are removed.
2017-10-15 13:45:12 +02:00
Jonas Jenwald
0f90d5130c [api-major] Remove all remaining deprecated functions/methods 2017-10-15 13:27:10 +02:00
Jonas Jenwald
2f32131601 Replace our WeakMap polyfill with the one from core-js
Since we're already using core-js elsewhere in `compatibility.js`, we can reduce the amount of code we need to maintain ourselves.

https://github.com/zloirock/core-js#weakmap
2017-10-15 10:22:06 +02:00
Diego Casorran
11b1daa72d Mispelled isEvalSupported property at FontFaceObject() creation. 2017-10-07 20:05:21 +02:00
Tim van der Meij
509d3728f1 Merge pull request #8922 from Snuffleupagus/paintXObject-errors
Allow `getOperatorList`/`getTextContent` to skip errors when parsing broken XObjects (issue 8702, issue 8704)
2017-10-07 15:46:26 +02:00
Yury Delendik
fab59e0f91 Revert "Closes all promises/streams when handler is destroyed." 2017-10-06 11:55:28 -05:00
Jonas Jenwald
3d79bd5e87 [api-major] When rendering is cancelled, always reject with RenderingCancelledException 2017-10-04 18:09:28 +02:00
Tim van der Meij
ee9b5a12da
Remove the deprecated parameters for getDocument of the API
This is deprecated since October 2015 with a visible message, so we can
safely remove this now.
2017-10-02 23:05:32 +02:00
Tim van der Meij
0817375d2f
Remove the deprecated disableNativeImageDecoder parameter of the API
This is deprecated since May 2017 with a visible message, so we can
safely remove this now.
2017-10-02 23:01:57 +02:00
Tim van der Meij
b651cfb440
Remove the deprecated UnsupportedManager class of the API
This is deprecated since November 2015 with a visible message, so we
can safely remove this now.
2017-10-02 22:54:22 +02:00
Tim van der Meij
918bd98a2f
Remove the deprecated destroy method of the API
This is deprecated since October 2015 with a visible message, so we can
safely remove this now.
2017-10-02 22:54:15 +02:00
Tim van der Meij
9b353ef407
Remove the deprecated parameter handling in the render method of the API
This is deprecated since January 2015 with a visible message, so we can
safely remove this now.
2017-10-02 22:54:01 +02:00
Jonas Jenwald
5b67c7594c Remove the unused Util.sign function from src/shared/util.js
This has been completely unused since commit 10bb6c9ec0, in PR 2505, more than four and a half years ago.
2017-10-01 15:58:03 +02:00
Jonas Jenwald
65352c730f Remove the unused isWorker property from src/display/global.js
This has been completely unused since commit 1d12aed5ca, in PR 7126, almost one and a half years ago.
2017-10-01 15:49:55 +02:00
Brendan Dahl
d03e127434 Merge pull request #8971 from yurydelendik/close-handler
Closes all promises/streams when handler is destroyed.
2017-09-29 14:28:53 -07:00
Jonas Jenwald
b1472cddbb Allow getOperatorList/getTextContent to skip errors when parsing broken XObjects (issue 8702, issue 8704)
This patch makes use of the existing `ignoreErrors` property in `src/core/evaluator.js`, see PRs 8240 and 8441, thus allowing us to attempt to recovery as much as possible of a page even when it contains broken XObjects.

Fixes 8702.
Fixes 8704.
2017-09-29 17:14:21 +02:00
Jonas Jenwald
b8ec518a1e Split the existing PDFFunction in two classes, a private PDFFunction and a public PDFFunctionFactory, and utilize the latter in PDFDocument to allow various code to access the methods of PDFFunction`
*Follow-up to PR 8909.*

This requires us to pass around `pdfFunctionFactory` to quite a lot of existing code, however I don't see another way of handling this while still guaranteeing that we can access `PDFFunction` as freely as in the old code.

Please note that the patch passes all tests locally (unit, font, reference), and I *very* much hope that we have sufficient test-coverage for the code in question to catch any typos/mistakes in the re-factoring.
2017-09-29 15:30:53 +02:00
Jonas Jenwald
5c961c76bb Remove the unused inline parameter from various methods/functions in PDFImage, and change a couple of methods to use Objects rather than plain parameters
The `inline` parameter is passed to a number of methods/functions in `PDFImage`, despite not actually being used. Its value is never checked, nor is it ever assigned to the current `PDFImage` instance (i.e. no `this.inline = inline` exists).
Looking briefly at the history of this code, I was also unable to find a point in time where `inline` was being used.

As far as I'm concerned, `inline` does nothing more than add clutter to already very unwieldy method/function signatures, hence why I'm proposing that we just remove it.
To further simplify call-sites using `PDFImage`/`NativeImageDecoder`, a number of methods/functions are changed to take Objects rather than a bunch of (somewhat) randomly ordered parameters.
2017-09-29 15:30:40 +02:00
Yury Delendik
71b0e4e818 Closes all promises/streams when handler is destroyed. 2017-09-28 16:45:04 -05:00
Jonas Jenwald
a159c4f357 Check that this.baseUrl is defined before attempting to fetch any data in DOMCMapReaderFactory/NodeCMapReaderFactory 2017-09-28 12:34:57 +02:00
Jonas Jenwald
7d3efe43a2 Ensure that the same exact version of PDF.js is used in both the API and the Worker
I don't have a good example at hand right know, but I recall seeing custom deployments of PDF.js that bundle a *specific* version of the `build/pdf.js` file and then set `PDFJS.workerSrc` to point to https://mozilla.github.io/pdf.js/build/pdf.worker.js.
That practice seems really bad since, besides (obviously) causing unnecessary server load, it will very quickly result in a version mismatch between the `pdf.js` and `pdf.worker.js` files in those PDF.js deployments.
Such a version mismatch could easily lead to either breaking errors, or even worse slightly inconsistent behaviour for an API call (if the API -> Worker interface changes, which does happen from time to time).

To avoid the problems described above, I'm thus proposing that we enforce that the versions of the `pdf.js` and `pdf.worker.js` files must always match.
2017-09-27 15:41:57 +02:00
Brendan Dahl
18e2321845 Overwrite maxSizeOfInstructions in maxp with computed value.
In issue #7507 the value is less than the actuall max size
of the glyph instructions causing OTS to fail the font.
2017-09-25 17:53:26 -07:00
Jonas Jenwald
10727572a2 Merge pull request #8950 from timvandermeij/polygon-polyline-annotations
Implement support for polyline and polygon annotations
2017-09-24 15:16:14 +02:00
Tim van der Meij
c69a7a83da Merge pull request #8932 from janpe2/jbig2-sym-offset
JBIG2 symbol offsets
2017-09-23 17:11:45 +02:00
Tim van der Meij
8ccad276b2
Implement support for polygon annotations 2017-09-23 16:52:47 +02:00
Tim van der Meij
99b17a494d
Implement support for polyline annotations 2017-09-23 16:37:23 +02:00
Tim van der Meij
40b89e9ba4 Merge pull request #8949 from Snuffleupagus/ColorSpace-rm-instanceof-AlternateCS
Remove the `instanceof AlternateCS` check in `ColorSpace.parse` since it's dead code
2017-09-23 16:09:17 +02:00
Tim van der Meij
2aac994171 Merge pull request #8928 from mukulmishra18/decode-file-path
Fix #8907: Decode URL to get correct path in node_stream.
2017-09-23 14:44:58 +02:00
Jonas Jenwald
8a084aff0f Remove the instanceof AlternateCS check in ColorSpace.parse since it's dead code
Looking at `ColorSpace.parseToIR`, it will do one of the following things when called:
 1. Return a String.
 2. Return an Array.
 3. Throw a `FormatError`.
 4. In one case, return the result of *another* `ColorSpace.parseToIR` call.

However, under no circumstances will it ever return an `AlternateCS` instance.

Since it's often useful to understand why code, which has become unused, existed in the first place, let's grab a hard hat and a shovel and start digging through the history of this code :-)

The current condition was introduced in commit c198ec4323, in PR 794, but it was actually already obsolete by that time.
The preceeding `instanceof SeparationCS` condition predates commit a7278b7fbc, in PR 700.
That condition was originally introduced all the way back in commit 4e3f87b60c, in PR 692. However, it was made obsolete by commit 9dcefe1efc, which is included in the very same PR!

Hence we're left with the conclusion that not only has this code be unused for *almost* six years, it was basically never used at all save for a few refactoring commits that're part of PR 692.
2017-09-23 14:36:10 +02:00
Tim van der Meij
d7b37ae745 Merge pull request #8912 from timvandermeij/xml-parser
[api-minor] Replace `DOMParser` with `SimpleXMLParser`
2017-09-20 23:45:00 +02:00
Jonas Jenwald
abc864fca9 Merge pull request #8938 from brendandahl/bug1392647
Use font's default width even when 0. (bug 1392647)
2017-09-20 22:38:39 +02:00
Brendan Dahl
10ba292b46 Use font's default width even when 0.
Bug 1392647 has a PDF where the default width of the font
is 0. It draws some charcodes that don't have glyphs, but
we were wrongly using the 1000 default width for these
charcodes causing some text to be overlapping.
2017-09-20 11:38:30 -07:00
Tim van der Meij
d4309614f9
Replace DOMParser with SimpleXMLParser
The `DOMParser` is most likely overkill and may be less secure.
Moreover, it is not supported in Node.js environments.

This patch replaces the `DOMParser` with a simple XML parser. This
should be faster and gives us Node.js support for free. The simple XML
parser is a port of the one that existed in the examples folder with a
small regex fix to make the parsing work correctly.

The unit tests are extended for increased test coverage of the metadata
code. The new method `getAll` is provided so the example does not have
to access internal properties of the object anymore.
2017-09-19 23:09:07 +02:00
Jani Pehkonen
5d1074c110 Fix JBIG2 symbol offsets in text regions 2017-09-19 23:43:23 +03:00
Tim van der Meij
bc9afdf3c4
Convert src/display/metadata.js to ES6 syntax 2017-09-19 22:13:59 +02:00
Jani Pehkonen
3d99b8d706 CCITTFaxStream problem when EndOfBlock is false 2017-09-19 22:19:40 +03:00
Mukul Mishra
e4c09c7cba Fix #8907: Decode URL to get correct path in node_stream. 2017-09-19 15:10:36 +05:30
Tilman Hausherr
d75a497a6b support tiff predictor for 16bit
(for issue #6289)
This does the same for 16 bit as the existing 8 bit tiff predictor code, an addition of the last word to this word.

The last two "& 0xFF" may or may not be needed, I see this isn't done in the 8 bit code, but I'm not a JS developer.
2017-09-18 22:24:25 +02:00
Tim van der Meij
400e4aae0e
Implement support for stamp annotations 2017-09-16 16:37:50 +02:00
Tim van der Meij
3be941d982 Merge pull request #8909 from Snuffleupagus/PDFFunction-isEvalSupported
Check `isEvalSupported`, and test that `eval` is actually supported, before attempting to use the `PostScriptCompiler` (issue 5573)
2017-09-16 16:11:03 +02:00
Jonas Jenwald
eece66fa3e For /Filter entries containing Names, ignore the /DecodeParms entry if it contains an Array (issue 8895) 2017-09-15 23:02:16 +02:00
Jonas Jenwald
dc926ffc0f Check isEvalSupported, and test that eval is actually supported, before attempting to use the PostScriptCompiler (issue 5573)
Currently `PDFFunction` is implemented (basically) like a class with only `static` methods. Since it's used directly in a number of different `src/core/` files, attempting to pass in `isEvalSupported` would result in code that's *very* messy, not to mention difficult to maintain (since *every* single `PDFFunction` method call would need to include a `isEvalSupported` argument).

Rather than having to wait for a possible re-factoring of `PDFFunction` that would avoid the above problems by design, it probably makes sense to at least set `isEvalSupported` globally for `PDFFunction`.

Please note that there's one caveat with this solution: If `PDFJS.getDocument` is used to open multiple files simultaneously, with *different* `PDFJS.isEvalSupported` values set before each call, then the last one will always win.
However, that seems like enough of an edge-case that we shouldn't have to worry about it. Besides, since we'll also test that `eval` is actually supported, it should be fine.

Fixes 5573.
2017-09-15 12:02:45 +02:00
Mukul Mishra
ef7038fe34 Fix #8888: Change behaviour of fetch to make it compatible with XHR. 2017-09-14 23:53:06 +05:30
Jonas Jenwald
f2618eb2e4 Merge pull request #8808 from janpe2/issue8741
Fix color of image masks inside uncolored patterns
2017-09-12 14:27:56 +02:00
Tim van der Meij
320779e6ed Merge pull request #8691 from timvandermeij/square-circle-annotations
Implement support for square and circle annotations
2017-09-09 22:56:54 +02:00
Tim van der Meij
44c116ac49
Implement support for circle annotations 2017-09-09 21:36:27 +02:00
Tim van der Meij
cace2e9047
Implement support for square annotations 2017-09-09 21:36:27 +02:00
Tim van der Meij
f7fd1db52f
Introduce DOMSVGFactory
This patch provides a new unit tested factory for creating SVG
containers and elements. This code is duplicated twice in the
codebase, but with upcoming changes this would need to be duplicated
even more. Moreover, consolidating this code in one factory allows
us to replace it easily for e.g., supporting Node.js. Therefore, move
this to a central place and update/ES6-ify the related code.

Finally, we replace `setAttributeNS` with `setAttribute` because no
namespace is provided.
2017-09-09 21:36:27 +02:00
Tim van der Meij
437e9cb056 Merge pull request #8865 from Snuffleupagus/hide-unsupported-LinkAnnotation
Hide unsupported `LinkAnnotation`s (issue 3897)
2017-09-09 19:07:43 +02:00
Jonas Jenwald
8686baede5 Replace value === (value | 0) checks with Number.isInteger(value) in the src/ folder
Rather than doing what (at first) may seem like a fairly obscure comparison, using `Number.isInteger` will clearly indicate the intent of the code.
2017-09-09 14:12:52 +02:00
Jonas Jenwald
7115e136e4 Hide unsupported LinkAnnotations (issue 3897)
Rather than displaying links that does *nothing* when clicked, it probably makes more sense to simply not render them instead. Especially since it turns out that, at least at this point in time, this is *very* easy to both implement and test.

Fixes 3897.
2017-09-06 12:52:56 +02:00
Jani Pehkonen
86020396cb Fix color of image masks inside uncolored patterns 2017-09-06 13:41:48 +03:00
Jonas Jenwald
41415ba0a2 Correctly validate the response status for non-HTTP fetch requests (PR 8768 follow-up)
It seems that the status check, for non-HTTP loads, causes the default viewer to *refuse* to open local PDF files.

***STR:***
 1. Make sure that fetch support is enabled in the browser. In Firefox Nightly, set `dom.streams.enabled = true` and `javascript.options.streams = true` in `about:config`.
 2. Open https://mozilla.github.io/pdf.js/web/viewer.html.
 3. Click on the "Open file" button, and open a new PDF file.

***ER:***
 A new PDF file should open in the viewer.

***AR:***
 The PDF file fails to open, with an error message of the following format:
`Message: Unexpected server response (200) while retrieving PDF "blob:https://mozilla.github.io/a4fc455f-bc05-45b5-b6aa-2ecff3cb45ce".`
2017-09-05 17:07:44 +02:00
Jonas Jenwald
cfb4955a92 Replace the isArray helper function with the native Array.isArray function
*Follow-up to PR 8813.*
2017-09-01 20:27:13 +02:00
Jonas Jenwald
11408da340 Replace the isInt helper function with the native Number.isInteger function
*Follow-up to PR 8643.*
2017-09-01 16:52:50 +02:00
Tim van der Meij
d332f62d60 Merge pull request #8857 from Snuffleupagus/fetchUncompressed-type-checks
Avoid some redundant type checks in `XRef.fetchUncompressed`
2017-08-31 23:33:02 +02:00
Tim van der Meij
51be27853f Merge pull request #8847 from Snuffleupagus/AnnotationElement-isRenderable-regression
Correct the default value for `isRenderable` in the `AnnotationElement` constructor, to fix breaking errors when rendering unsupported annotations
2017-08-31 22:08:10 +02:00
Jonas Jenwald
772a5412a4 Avoid some redundant type checks in XRef.fetchUncompressed
When looking briefly at using `Number.isInteger`/`Number.isNan` rather than `isInt`/`isNaN`, I noticed that there's a couple of not entirely straightforward cases to consider.

At first I really couldn't understand why `parseInt` is being used like it is in `XRef.fetchUncompressed`, since the `num` and `gen` properties of an object reference should *always* be integers.
However, doing a bit of code archaeology pointed to PR 4348, and it thus seem that this was a very deliberate change. Since I didn't want to inadvertently introduce any regressions, I've kept the `parseInt` calls intact but moved them to occur *only* when actually necessary.[1]

Secondly, I noticed that there's a redundant `isCmd` check for an edge-case of broken operators. Since we're throwing a `FormatError` if `obj3` isn't a command, we don't need to repeat that check.

In practice, this patch could perhaps be considered as a micro-optimization, but considering that `XRef.fetchUncompressed` can be called *many* thousand times when loading larger PDF documents these changes at least cannot hurt.

---
[1] I even ran all tests locally, with an added `assert(Number.isInteger(obj1) && Number.isInteger(obj2));` check, and everything passed with flying colours.
However, since it appears that this was in fact necessary at one point, one possible explanation is that the failing test-case(s) have now been replaced by reduced ones.
2017-08-31 16:49:04 +02:00