Commit Graph

2090 Commits

Author SHA1 Message Date
Tim van der Meij
09df1ee0ce
Include a reduced, non-linked PDF file for the attachments API unit test 2019-08-25 15:14:57 +02:00
Jonas Jenwald
711040ecc5 Stop re-throwing errors in the 'GetOperatorList' and 'GetTextContent' handlers, in src/core/worker.js
These functions aren't returning anything, now that they're using `ReadableStream`s, and it thus doesn't seem necessary to re-throw errors (also given the console message that's caused by it).
2019-08-24 15:56:41 +02:00
Yury Delendik
66e0dd1b06 Use streams for OperatorList chunking (issue 10023)
*Please note:* The majority of this patch was written by Yury, and it's simply been rebased and slightly extended to prevent issues when dealing with `RenderingCancelledException`.

By leveraging streams this (finally) provides a simple way in which parsing can be aborted on the worker-thread, which will ultimately help save resources.
With this patch worker-thread parsing will *only* be aborted when the document is destroyed, and not when rendering is cancelled. There's a couple of reasons for this:

 - The API currently expects the *entire* OperatorList to be extracted, or an Error to occur, once it's been started. Hence additional re-factoring/re-writing of the API code will be necessary to properly support cancelling and re-starting of OperatorList parsing in cases where the `lastChunk` hasn't yet been seen.
 - Even with the above addressed, immediately cancelling when encountering a `RenderingCancelledException` will lead to worse performance in e.g. the default viewer. When zooming and/or rotation of the document occurs it's very likely that `cancel` will be (almost) immediately followed by a new `render` call. In that case you'd obviously *not* want to abort parsing on the worker-thread, since then you'd risk throwing away a partially parsed Page and thus be forced to re-parse it again which will regress perceived performance.
 - This patch is already *somewhat* risky, given that it touches fundamentally important/critical code, and trying to keep it somewhat small should hopefully reduce the risk of regressions (and simplify reviewing as well).

Time permitting, once this has landed and been in Nightly for awhile, I'll try to work on the remaining points outlined above.

Co-Authored-By: Yury Delendik <ydelendik@mozilla.com>
Co-Authored-By: Jonas Jenwald <jonas.jenwald@gmail.com>
2019-08-24 15:56:40 +02:00
Tim van der Meij
fbe8c6127c
Merge pull request #11059 from Snuffleupagus/boundingBox-more-validation
Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
2019-08-09 22:39:01 +02:00
Jonas Jenwald
d637b25e36 Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".

The patch makes the following notable changes:
 - Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
 - Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
 - Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
 - Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
 - Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.

---

[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.
2019-08-09 10:18:13 +02:00
Jonas Jenwald
0f78fdb229 Handle some corrupt/truncated JPEG images that are missing the EOI (End of Image) marker (issue 11052)
Note that even Adobe Reader cannot render the PDF file completely, which is always a good indication that it's corrupt.
2019-08-08 10:37:41 +02:00
Jonas Jenwald
5ac9c7c384 Support corrupt PDF files with invalid/non-existent Group /CS entries (issue 11045)
The PDF file in question tries to reference a non-existent ColorSpace, which should be quite rare in practice.
2019-08-06 14:33:05 +02:00
Tim van der Meij
be70ee236d
Merge pull request #11013 from timvandermeij/annotations-quadpoints
[api-minor] Implement quadpoints for annotations in the core layer
2019-08-04 16:06:10 +02:00
Jonas Jenwald
0276385e6e [api-minor] Fix completely broken getStats method by returning stats in Objects, rather than in Arrays (PR 11029 follow-up)
With the changes to the `StreamType`/`FontType` "enums" in PR 11029, one unfortunate result is that `getStats` now *always* returns empty Arrays. Something that everyone, myself included, apparently missed is that you obviously cannot index an Array with Strings :-)

I wrongly assumed that the unit-tests would catch any bugs, but they apparently suffered from the same issue as the code in `src/core/`.

Another possible option could perhaps be to use `Set`s, rather than objects, but that will require larger changes since `LoopbackPort` (in `src/display/api.js`) doesn't support them.
2019-08-02 14:09:24 +02:00
Jonas Jenwald
a3150166ec Ensure that ReadableStreams are cancelled with actual Errors
There's a number of spots in the current code, and tests, where `cancel` methods are not called with appropriate arguments (leading to Promises not being rejected with Errors as intended).
In some cases the cancel `reason` is implicitly set to `undefined`, and in others the cancel `reason` is just a plain String. To address this inconsistency, the patch changes things such that cancelling is done with `AbortException`s everywhere instead.
2019-08-01 16:40:46 +02:00
Tim van der Meij
d909b86b28
Merge pull request #11020 from Snuffleupagus/issue-11016
Add a work-around, in `glyphlist.js`, for bad PDF generators which use a non-standard `/f_f` string in the `Encoding` dictionary when referring to the ff ligature (issue 11016)
2019-07-31 23:33:34 +02:00
wangsongyan
c61205d980 decode filename when match an urlencode filename from contentDispositionFilename 2019-07-31 09:33:56 +08:00
Jonas Jenwald
9ad50521b1 Add a work-around, in glyphlist.js, for bad PDF generators which use a non-standard /f_f string in the Encoding dictionary when referring to the ff ligature (issue 11016)
This patch will not incur any (measurable) overhead, since the glyphlist is already quite long and one more entry won't really matter, which is important given that this sort of PDF corruption ought to be very rare.

Furthermore, this patch purposely does *not* add a bunch of similarly modified ligature names on pure speculation. Any similar additions, for other ligatures, should only be made if there's real-world examples of PDF files where that's actually necessary.
2019-07-30 17:06:58 +02:00
Tim van der Meij
9114004d5b
[api-minor] Implement quadpoints for annotations in the core layer 2019-07-28 20:36:21 +02:00
Jonas Jenwald
ff90aa4323 Inline the isCmd check in the Parser.shift method
For very large and complex PDF files this will help performance slightly, since `Parser.shift` is called *a lot* during parsing.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471 (with well over *four million* `Parser.shift` calls for just the one page), using the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 100,
       "type": "eq"
    }
]
```

This gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   100 |         3386 |        3322 | -65 | -1.92 |        faster
Firefox | Page Request |   100 |            1 |           1 |   0 | -8.08 |
Firefox | Rendering    |   100 |         3385 |        3321 | -65 | -1.92 |        faster
```
2019-07-22 12:07:36 +02:00
Tim van der Meij
6e96a158f4
Merge pull request #10820 from vlastimilmaca/annot-irt-rt-states
Annotations - Added parsing of IRT, RT, State and StateModel
2019-07-17 23:34:31 +02:00
vlastimilmaca
fe49f0f766 Annotations - Implement parsing of IRT, RT, State and StateModel 2019-07-16 23:33:07 +02:00
Jonas Jenwald
c7de6dbe41 Update the fingerprint API unit-tests to explicitly check for the expected result
The current tests won't catch inadvertent changes to the logic used to obtain/compute the document `fingerprint`.
2019-07-15 11:19:17 +02:00
Jonas Jenwald
c7fb7116d6 Add an API unit-test for the stopAtErrors option (PRs 8240 and 8922 follow-up)
Also fixes an inconsistency in the 'PageError' handler, for `getOperatorList`, in the API.
2019-07-13 16:06:05 +02:00
Tim van der Meij
e3496041b5
Merge pull request #10950 from monchouchou/master
Fixed testing webserver to handle paths correctly on Windows
2019-07-12 23:05:37 +02:00
Tim van der Meij
ed3954fc7a
Merge pull request #10851 from brendandahl/shading-bbox
Apply bounding box before using shading patterns.
2019-07-12 22:52:07 +02:00
Tim van der Meij
87f36e3520
Merge pull request #10850 from brendandahl/scale-line-width
Scale stroking line width when using a tiling pattern.
2019-07-12 22:50:32 +02:00
Brendan Dahl
6fab0a0dac Apply bounding box before using shading patterns.
Fixes #8092
2019-07-08 14:05:48 -07:00
Brendan Dahl
446efab707 Scale stroking line width when using a tiling pattern. 2019-07-08 13:47:54 -07:00
alephneo
f861d5c0d4 Fixed test/webserver to handle paths correctly on Windows 2019-07-07 02:42:50 +05:30
Tim van der Meij
f1867de492
Merge pull request #10925 from Snuffleupagus/eslint_no-unsanitized
Enable the `eslint-plugin-no-unsanitized` ESLint plugin to disallow unsafe usage of e.g. `innerHTML`
2019-06-27 20:32:24 +02:00
Jonas Jenwald
f710eb56e4 Change the signature of the Parser constructor to take a parameter object
A lot of the `new Parser()` call-sites look quite unwieldy/ugly as-is, with a bunch of somewhat randomly ordered arguments, which we can avoid by changing the constructor to accept an object instead. As an added bonus, this provides better documentation without having to add inline argument comments in the code.
2019-06-23 16:01:45 +02:00
Jonas Jenwald
5bb5e7741d Enable the eslint-plugin-no-unsanitized ESLint plugin to disallow unsafe usage of e.g. innerHTML
See https://github.com/mozilla/eslint-plugin-no-unsanitized

Since we've generally never allowed e.g. `innerHTML`, which is enforced during review, there's only one linting failure with this patch. (Which is white-listed, according to the existing comment and the fact that it's test-only code.)
2019-06-23 13:50:30 +02:00
Jonas Jenwald
876c962235 Ignore Annotations with too large border widths, to prevent the annotationLayer from rendering it over the surrounding document (bug 1552113)
The border `width` will instead fallback to the default value of `1`, rather than ignoring it altoghether, to also ensure that e.g. `LinkAnnotation`s become clickable as intended.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1552113
2019-06-01 15:51:22 +02:00
Jonas Jenwald
2fe9f3ff8f Add caching to reduce the number of Ref objects
This is similar to the existing caching used to reduced the number of `Cmd` and `Name` objects.
With the `tracemonkey.pdf` file, this patch changes the number of `Ref` objects as follows (in the default viewer):

|          | Loading the first page | Loading *all* the pages |
|----------|------------------------|-------------------------|
| `master` | 332                    | 3265                    |
| `patch`  | 163                    | 996                     |
2019-05-26 12:23:37 +02:00
Tim van der Meij
bc1eb49a77
Implement creation date only for markup annotations
The specification states that `CreationDate` is only available for
markup annotations instead of for all annotation types.

Moreover, popup annotations are not markup annotations according to the
specification, so the creation date inheritance from the parent
annotation is also removed there (note that only the modification date
is used in e.g., the viewer).
2019-05-25 15:31:06 +02:00
Tim van der Meij
cf07918ccb
Implement contents for every annotation type
The specification states that `Contents` can be available for every
annotation types instead of only for markup annotations.
2019-05-18 15:52:17 +02:00
Tim van der Meij
c8c937c257
Merge pull request #10794 from janpe2/cidtogidmap-zero
Fix glyph at index zero in CIDFontType2 that has a CIDToGIDMap stream
2019-05-15 00:04:39 +02:00
Jonas Jenwald
173fbef05b Enable the consistent-return ESLint rule
This rule is already enabled in mozilla-central, and helps ensure more consistent functions/methods, see https://searchfox.org/mozilla-central/rev/b9da45f63cb567244933c77b2c7e827a057d3f9b/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#119-120

Please see https://eslint.org/docs/rules/consistent-return for additional information.
2019-05-11 14:27:21 +02:00
Jonas Jenwald
57ad3a5acb Fuzzy match in the should parse PostScript numbers unit-test, to work-around rounding bugs in Chromium browsers 2019-05-08 14:01:10 +02:00
Jani Pehkonen
05c527f035 Fix glyph 0 in CIDFontType2 that has a CIDToGIDMap stream 2019-05-07 18:44:37 +03:00
Tim van der Meij
be1d6626a7
Implement creation/modification date for annotations
This includes the information in the core and display layers. The
date parsing logic from the document properties is rewritten according
to the specification and now includes unit tests.

Moreover, missing unit tests for the color of a popup annotation have
been added.

Finally the styling of the popup is changed slightly to make the text a
bit smaller (it's currently quite large in comparison to other viewers)
and to make the drop shadow a bit more subtle. The former is done to be
able to easily include the modification date in the popup similar to how
other viewers do this.
2019-05-05 14:51:03 +02:00
Jonas Jenwald
5335285cda Attempt to handle corrupt PDF documents that contains path operators inside of text object (issue 10542)
First of all, while this simple approach appears to work OK in practice I'm not sure if it's the best way of addressing the problem (assuming that you even want to).
Second of all, while the solution implemented here only requires tracking/checking one new boolean in order for this to work, I'm nonetheless not entirely happy about this since it will add additional overhead (albeit *very* small) to the parsing of path operators in PDF documents just for a handful of *corrupt* ones.
2019-04-30 23:35:33 +02:00
Tim van der Meij
762c58e0fc
Merge pull request #10738 from Snuffleupagus/ViewerPreferences-api
[api-minor] Add support for ViewerPreferences in the API (issue 10736)
2019-04-20 18:39:32 +02:00
Jonas Jenwald
34952b732e Add a getDocId method to the idFactory, in Page instances, to avoid passing around PDFManager instances unnecessarily (PR 7941 follow-up)
This way we can avoid manually building a "document id" in multiple places in `evaluator.js`, and it also let's us avoid passing in an otherwise unnecessary `PDFManager` instance when creating a `PartialEvaluator`.
2019-04-20 13:11:17 +02:00
Tim van der Meij
55d9b35d37
Merge pull request #10727 from Snuffleupagus/type3-image-resources
Support (rare) Type3 fonts which contains image resources (issue 10717)
2019-04-18 23:07:26 +02:00
Jonas Jenwald
311bac3ebb [api-minor] Add support for ViewerPreferences in the API (issue 10736)
Please see the specification, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#M11.9.12864.1Heading.71.Viewer.Preferences

Furthermore, note that this patch *only* adds API support and unit-tests but does not attempt to integrate e.g. the `ViewerPreferences -> Direction` property into the viewer (which would be necessary to address issue 10736).
The reason for this is that it's not entirely clear to me exactly if/how that could be implemented; e.g. would it be as simple as setting the `dir` attribute on the `viewerContainer` DOM element, or will it be more complicated?
There's also the question of how the `ViewerPreferences -> Direction` value interacts with the `PageMode`, and this will generally require a fair bit of manual testing. Since the direction of the *entire* viewer depends on the browser locale, there's also a somewhat open question regarding what default value to use for different locales.
Finally, if the viewer supports `ViewerPreferences -> Direction` then I'm assuming that it will be necessary to allow users to override the default value, which will require (most likely) new `SecondaryToolbar` buttons and icons for those etc.

Hence this patch only lays the necessary foundation for eventually addressing issue 10736, but defers the actual implementation until later. (Time permitting, I'll try to look into the viewer part later.)
2019-04-14 14:20:52 +02:00
Tim van der Meij
ae2a4dc3dd
Implement free text annotations 2019-04-13 18:45:22 +02:00
Jonas Jenwald
be604bd195 Support (rare) Type3 fonts which contains image resources (issue 10717)
The Type3 font type is not commonly used in PDF documents, as can be seen from telemetry data such as: https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2019-04-09&include_spill=0&keys=__none__!__none__!__none__&max_channel_version=nightly%252F68&measure=PDF_VIEWER_FONT_TYPES&min_channel_version=nightly%252F57&processType=*&product=Firefox&sanitize=1&sort_by_value=0&sort_keys=submissions&start_date=2019-03-18&table=0&trim=1&use_submission_date=0 (see also https://github.com/mozilla/pdf.js/wiki/Enumeration-Assignments-for-the-Telemetry-Histograms#pdf_viewer_font_types).

Type3 fonts containing image resources are *very* rare in practice, usually they only contain path rendering operators, but as the issue shows they unfortunately do exist.
Currently these Type3-related image resources are not handled in any special way, and given that fonts are document rather than page specific rendering breaks since the image resources are thus not available to the *entire* document.
Fortunately fixing this isn't too difficult, but it does require adding a couple of Type3-specific code-paths to the `PartialEvaluator`. In order to keep the implementation simple, particularily on the main-thread, these Type3 image resources are completely decoded on the worker-thread to avoid adding too many special cases. This should not cause any issues, only marginally less efficient code, but given how rare this kind of Type3 font is adding premature optimizations didn't seem at all warranted at this point.
2019-04-13 18:27:50 +02:00
Mukul Mishra
02e46d22d2 Add fetch stream spec 2019-04-07 13:14:03 +02:00
Jonas Jenwald
7a999d1d67 [api-minor] Add basic support for PageLayout in the API and the viewer
Please see the specification, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.2393749, and refer to the inline comments for additional details.
2019-04-05 11:32:01 +02:00
Tim van der Meij
072c5864fb
Merge pull request #10675 from Snuffleupagus/PDFDataTransportStream-disableRange
[Firefox regression] Fix `disableRange=true` bug in `PDFDataTransportStream`
2019-04-04 23:07:45 +02:00
Tim van der Meij
b4c3b94592
Merge pull request #6606 from Rob--W/pattern-scaling
Improve performance and correctness of Tiling Patterns
2019-03-29 00:01:38 +01:00
Tim van der Meij
f9c58115fc
Merge pull request #10683 from janpe2/type0-noncid-cmap
Use CMap in Type0 fonts when CFF is not a CID font
2019-03-28 00:07:08 +01:00
Rob Wu
d3dc8f16b5 TilingPattern: Reverse transform after painting
This transform resulted in an incorrectly positioned object when the
bounding box's upper-left corner did not start at (0,0), because
the translation was not reverted. This patch adds the missing transform.

The test file (tiling-pattern-box.pdf) is based on the PDF from #2825.
All but the first cube (including the PDF data) have been removed.
To trigger the bug that is fixed by this commit, I changed the BBox of
the first pattern from "[ 0 0 596 842]" to "[90 0 596 842]". Without
this patch, the dashed vertical line that intersects the corners at A
and E would disappear.
2019-03-27 17:50:35 +01:00
Rob Wu
a72a8e921f Avoid extreme sizing / scaling in tiling pattern
The new test file (tiling-pattern-large-steps.pdf) was manually created,
to have the following characteristics:
- Large xstep and ystep (90000)
- Page width is 4000 (which is larger than MAX_PATTERN_SIZE)
- Visually, the page consists of a red rectangle with a black border,
  surrounded by a 50 unit white padding.
- Before patch: blurry; After patch: sharp

Fixes #6496
Fixes #5698
Fixes #1434
Fixes #2825
2019-03-27 17:44:04 +01:00
Jonas Jenwald
9077abc263 Take the FirstChar/LastChar properties into account when computing the hash in PartialEvaluator.preEvaluateFont (issue 10665)
Without this some fonts may incorrectly end up with matching `hash`es, thus breaking rendering since we'll not actually try to load/parse some of the fonts.
2019-03-27 16:27:10 +01:00
Jani Pehkonen
49c6233fbc Use CMap in Type0 fonts when CFF is not a CID font 2019-03-26 19:38:44 +02:00
Jonas Jenwald
bb384dd5ed [Firefox regression] Fix disableRange=true bug in PDFDataTransportStream
Currently if trying to set `disableRange=true` in the built-in PDF Viewer in Firefox, either through `about:config` or via the URL hash, the PDF document will never load. It appears that this has been broken for a couple of years, without anyone noticing.

Obviously it's not a good idea to set `disableRange=true`, however it seems that this bug affects the PDF Viewer in Firefox even with default settings:
 - In the case where `initialData` already contains the *entire* file, we're forced to dispatch a range request to re-fetch already available data just so that file loading may complete.
 - (In the case where the data arrives, via streaming, before being specifically requested through `requestDataRange`, we're also forced to re-fetch data unnecessarily.) *This part was removed, to reduce the scope/risk of the patch somewhat.*

In the cases outlined above, we're having to re-fetch already available data thus potentially delaying loading/rendering of PDF files in Firefox (and wasting resources in the process).
2019-03-26 16:34:13 +01:00
Jonas Jenwald
234c1d2b2a Remove the Firefox-specific 'read with streaming' unit-test
Support for the non-standard `moz-chunked-arraybuffer` response type is in the process of being removed from Firefox; see e.g. https://bugzilla.mozilla.org/show_bug.cgi?id=1411865

For the time being, you probably want to keep support for this in the general PDF.js library given that feature detection is used. However, removing the unit-test immediately seems reasonable, since it will otherwise start failing once the platform support for `moz-chunked-arraybuffer` is gone.

Fixes 8851; please note that if unit-tests for the code in `fetch_stream.js` are wanted, which I'm assuming they are, those should live in their own file rather than being lumped into `network_spec.js` anyway.
2019-03-22 12:43:18 +01:00
Thomas den Hollander
b24a14738a
Update test case description 2019-03-20 12:52:32 +01:00
Tim van der Meij
33bfbef6ba
Merge pull request #10635 from timvandermeij/lexer-parser
Convert `src/core/parser.js` to ES6 syntax and write more unit tests for the lexer and the parser
2019-03-19 23:17:34 +01:00
Tim van der Meij
4a4b197b9d
Write more unit tests for the lexer and the parser
Moreover, group the lexer unit tests per method. This matches what we
do for other classes and makes it more easily visible which methods
we don't or insufficiently unit test.

The parser itself is not unit tested yet, so this patch provides a start
for doing so. The `inlineStreamSkipEI` method is used in other end
marker detection methods, so it's important that its functionality is
correct for proper parsing.
2019-03-17 13:36:23 +01:00
Tim van der Meij
2ee299a62b
Convert test/unit/parser_spec.js to ES6 syntax
Moreover, disable `var` usage for this file.
2019-03-17 13:27:46 +01:00
Tim van der Meij
80135378ca
Merge pull request #10636 from Snuffleupagus/PDFDocumentProxy-destroy
Small clean-up of the `PDFDocumentProxy.destroy` method and related code
2019-03-13 23:46:41 +01:00
Jonas Jenwald
24fc4f83ca Small clean-up of the PDFDocumentProxy.destroy method and related code
Note how `PDFDocumentProxy.destroy` is a nothing more than an alias for `PDFDocumentLoadingTask.destroy`. While removing the latter method would be a breaking API change, there's still room for at least some clean-up here.

The main changes in this patch are:
 - Stop providing a `PDFDocumentLoadingTask` instance *separately* when creating a `PDFDocumentProxy`, since the loadingTask is already available through the `WorkerTransport` instance.
 - Stop tracking the `PDFDocumentProxy` instance on the `WorkerTransport`, since that property is completely unused.
 - Simplify the 'Multiple `getDocument` instances' unit-tests by only destroying *once*, rather than twice, for each document.
2019-03-12 13:25:29 +01:00
Jonas Jenwald
88f9e633dd Try to improve text-selection for Type3 fonts that utilize a non-default /FontMatrix (bug 1513120)
For Type3 fonts text-selection is often not that great, and there's a couple of heuristics used to try and improve things. This patch simple extends those heuristics a bit, and fixes a pre-existing "naive" array comparison, but this all feels a bit brittle to say the least.

The existing Type3 test-coverage isn't that great in general, and in particular Type3 `text` tests are few and far between, hence why this patch adds *two* different new `text` tests.
2019-03-12 10:32:08 +01:00
Tim van der Meij
8b149b818e
Merge pull request #10615 from Snuffleupagus/corrupt-inline-ASCII85Decode
Handle corrupt ASCII85Decode inline images with whitespace "inside" of the EOD marker (issue 10614)
2019-03-08 23:06:01 +01:00
Tim van der Meij
b244622f7e
Improve unit test coverage for src/display/display_utils.js
The `DOMCanvasFactory` class is now fully covered. Moreover, missing
cases for the `getFilenameFromUrl` function have been included.

Finally, `var` usage has been removed.
2019-03-06 23:41:54 +01:00
Jonas Jenwald
3ce8fe7927 Handle corrupt ASCII85Decode inline images with whitespace "inside" of the EOD marker (issue 10614)
There's a number of things wrong with the PDF document, since its inline images are first all *a lot* larger than the 4 KB limit (as mandated by the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.1852045).

Furthermore the actual ASCII85Decode data is interspersed with *a lot* of needless whitespace, in particular also "inside" of the EOD (end-of-data) marker which thus completely breaks the detection.
Note that according to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.1940130, this patch should be safe since it explicitly mentions that *all* whitespace should be ignored.
2019-03-04 23:41:36 +01:00
Brendan Dahl
34022d2fd1
Merge pull request #10591 from brendandahl/fix-charset
Add unique glyph names for CFF fonts.
2019-02-28 17:22:29 -08:00
Tim van der Meij
af5597b7e5
Merge pull request #10573 from Snuffleupagus/type3-avoid-truncation
Avoid truncating/breaking some Type3 glyphs in `compileType3Glyph` (bug 1245391, issue 10568)
2019-02-28 23:25:45 +01:00
Brendan Dahl
8a596ef5d5 Add unique glyph names for CFF fonts.
Printing on MacOS was broken with the previous approach of just mapping
all the glyphs to notdef.
2019-02-27 15:00:29 -08:00
Jonas Jenwald
f664e074c9 Avoid using the Fetch API, in GENERIC builds, for unsupported protocols (issue 10587) 2019-02-27 13:04:20 +01:00
Jonas Jenwald
cbc07f985b Load built-in CMap files using the Fetch API when possible 2019-02-27 13:04:19 +01:00
Jonas Jenwald
c5cf3ab808 Run the custom_spec unit-tests in Node.js/Travis (PR 10537 follow-up) 2019-02-26 22:40:55 +01:00
Jonas Jenwald
db5dc14158 Move worker-thread only functions from src/shared/util.js and into a new src/core/core_utils.js file
The `src/shared/util.js` file is being bundled into both the `pdf.js` and `pdf.worker.js` files, meaning that its code is by definition duplicated.
Some main-thread only utility functions have already been moved to a separate `src/display/display_utils.js` file, and this patch simply extends that concept to utility functions which are used *only* on the worker-thread.

Note in particular the `getInheritableProperty` function, which expects a `Dict` as input and thus *cannot* possibly ever be used on the main-thread.
2019-02-24 00:35:39 +01:00
Jonas Jenwald
a1f7517996 Rename the src/display/dom_utils.js file to src/display/display_utils.js
This file (currently) contains not only DOM-specific helper functions/classes, but is used generally for various helper code relevant for main-thread functionality.
2019-02-23 16:30:16 +01:00
Jonas Jenwald
fb774a65b0 Avoid truncating/breaking some Type3 glyphs in compileType3Glyph (bug 1245391, issue 10568)
*Hopefully this patch makes sense, since I cannot claim to fully understand this function.*

With the changes made in PR 3354 *some* Type3 glyph outlines are no longer rendering correctly, since the final paths were being accidentally ignored.
The fact that Type3 fonts are not very common in PDF documents, and that most Type3 glyphs are unaffected by this regression, probably explains why this has gone unnoticed since 2013.
2019-02-21 23:29:43 +01:00
Jonas Jenwald
a0354494bd Re-factor the PDFDataRangeTransport unit-tests and enable them in Node.js/Travis
There doesn't appear to be any particular reason for only running these unit-tests in browsers, since the `PDFDataRangeTransport` functionality itself should be back-end agnostic.
2019-02-17 14:45:17 +01:00
Jonas Jenwald
507e0a4907 Add a new DOMFileReaderFactory helper to the unit-tests, and re-factor NodeFileReaderFactory to be asynchronous
This allows simplification of the 'creates pdf doc from URL and aborts loading after worker initialized' API unit-test.

Note that the `DOMFileReaderFactory` uses the Fetch API, for simplicity, since it should be available in all browsers where we're running tests.
2019-02-17 14:41:14 +01:00
Jonas Jenwald
60f6d49ff7 [api-minor] Expose the existence of a Collection dictionary via the getMetadata API method (issue 10555)
Given the complexity of this functionality, and the fact that it doesn't seem widely used, I highly doubt that it'd ever make sense to support Collections; see also https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#M11.9.39646.2Heading.824.Collections
2019-02-15 15:40:31 +01:00
Tim van der Meij
1d90c76097
Merge pull request #10537 from timvandermeij/unittest
Improve unit test coverage
2019-02-12 00:12:29 +01:00
Tim van der Meij
7c91e94b19
Implement the NodeCanvasFactory class to execute more unit tests in Node.js 2019-02-10 19:37:34 +01:00
Tim van der Meij
b6eddc40b5
Write unit tests for the string32 and toRomanNumerals utility functions 2019-02-10 18:58:52 +01:00
Tsukasa OI
96ba6afd47 Fix copying on supplementary plane characters
pdf.js had a problem when copying characters on supplementary planes
(0xPPXXXX where PP is nonzero).  This is because certain methods of
PartialEvaluator use classic String.fromCharCode instead of ES6's
String.fromCodePoint.

Despite the fact that readToUnicode method *tried* to parse out-of-UCS2
code points by parsing UTF-16BE, it was inadequate because
String.fromCharCode only supports UCS-2 range of Unicode.
2019-02-10 18:14:53 +09:00
Jonas Jenwald
22468817e1 Add a settled property, tracking the fulfilled/rejected stated of the Promise, to createPromiseCapability
This allows cleaning-up code which is currently manually tracking the state of the Promise of a `createPromiseCapability` instance.
2019-02-02 15:18:56 +01:00
Jonas Jenwald
2b0b6178f7 Clean-up after the gets operatorList with JPEG image (issue 4888) unit-test
This unit-test wasn't destroying the `loadingTask` when complete, as it should have done.
2019-01-29 15:24:08 +01:00
Jonas Jenwald
6f94a05a29 Do the final text scaling correctly in flushTextContentItem (issue 8276)
It's necessary to take into account whether or not the text is vertical, to avoid either the textContent `width` or `height` becoming incorrect.
2019-01-29 15:24:04 +01:00
Tim van der Meij
e2701d5422
Merge pull request #10482 from janpe2/indexed-decode
Implement Decode entry in Indexed images
2019-01-24 23:46:55 +01:00
Jonas Jenwald
41fbc71ef9 Ensure that XRef.indexObjects can handle object numbers with zero-padding (issue 10491)
All objects in the PDF document follow this pattern:
```
0000000001 0 obj
<<
% Some content here...
>>
endobj
0000000002 0 obj
<<
% More content here...
endobj

```
2019-01-24 22:37:18 +01:00
Jani Pehkonen
26121177ab Implement Decode entry in Indexed images 2019-01-22 22:51:04 +02:00
Tim van der Meij
c4fe4087d3
Implement a unit test for metadata parsing to ensure that it's not vulnerable to the billion laughs attack 2019-01-19 19:54:08 +01:00
Jonas Jenwald
24a688d6c6 Convert some usage of indexOf to startsWith/includes where applicable
In many cases in the code you don't actually care about the index itself, but rather just want to know if something exists in a String/Array or if a String starts in a particular way. With modern JavaScript functionality, it's thus possible to remove a number of existing `indexOf` cases.
2019-01-18 17:57:41 +01:00
Jonas Jenwald
9f45f8dfda When parsing Metadata, attempt to remove "junk" before the first tag (PR 10398 follow-up)
This will allow the Metadata to be successfully extracted from the PDF file in issue 10395.
Furthermore, this patch also fixes a bug in `Metadata.get` which causes the method to return `null` rather than an empty string or zero (since either ought to be allowed).
2019-01-16 12:44:27 +01:00
Jonas Jenwald
5d90224409 Add a unit-test for issue 10395 (PR 10398 follow-up) 2019-01-16 11:30:36 +01:00
Tim van der Meij
5cb00b7967
Merge pull request #10443 from Snuffleupagus/getVisibleElements-fixes
Prevent `TypeError: views[index] is undefined` being throw in `getVisibleElements` when the viewer, or all pages, are hidden
2019-01-13 15:41:48 +01:00
Tim van der Meij
6279fc601a
Handle malformed URIs as bad requests in the development webserver
Fixes #10445 (found by Dhiraj Mishra).
2019-01-13 14:57:20 +01:00
Jonas Jenwald
b2235ec9c4 Add a unit-test to check that the sortByVisibility parameter, in getVisibleElements, works correctly 2019-01-13 11:34:38 +01:00
Jonas Jenwald
9743708a24 Prevent TypeError: views[index] is undefined being throw in getVisibleElements when the viewer, or all pages, are hidden
Previously a couple of different attempts at fixing this problem has been rejected, given how *crucial* this code is for the correct function of the viewer, since no one has thus far provided any evidence that the problem actually affects the default viewer[1] nor an example using the viewer components directly (without another library on top).
The fact that none of the prior patches contained even a *simple* unit-test probably contributed to the unwillingness of a reviewer to sign off on the suggested changes.

However, it turns out that it's possible to create a reduced test-case, using the default viewer, that demonstrates the error[2]. Since this utilizes a hidden `<iframe>`, please note that this error will thus affect Firefox as well.
Note that while errors are thrown when the hidden `<iframe>` loads, the default viewer doesn't break completely since rendering does start working once the `<iframe>` becomes visible (although the errors do break the initial Toolbar state).

Before making any changes here, I carefully read through not just the immediately relevant code but also the rendering code in the viewer (given it's dependence on `getVisibleElements`). After concluding that the changes should be safe in general, the default viewer was tested without any issues found. (The above being much easier with significant prior experience of working with the viewer code.)
Finally the patch also adds new unit-tests, one of which explicitly triggers the relevant code-path and will thus fail with the current `master` branch.

This patch also makes `PDFViewerApplication` slightly more robust against errors during document opening, to ensure that viewer/document initialization always completes as expected.
Please keep in mind that even though this patch prevents an error in `getVisibleElements`, it's still not possible to set the initial position/zoom level/sidebar view etc. when the viewer is hidden since rendering and scrolling is completely dependent[3] on being able to actually access the DOM elements.

---
[1] And hence the PDF Viewer that's built-in to Firefox.

[2] Copy the HTML code below and save it as `iframe.html`, and place the file in the `web/` folder. Then start the server, with `gulp server`, and navigate to http://localhost:8888/web/iframe.html

```html
<!DOCTYPE html>
<html>
  <head>
    <title>Iframe test</title>

    <script>
      window.onload = function() {
        const button = document.getElementById('button1');
        const frame = document.getElementById('frame1');

        button.addEventListener('click', function(evt) {
          frame.hidden = !frame.hidden;
        });
      };
    </script>
  </head>

  <body>
    <button id="button1">Toggle iframe</button>
    <br>
    <iframe id="frame1" width="800" height="600" src="http://localhost:8888/web/viewer.html" hidden="true"></iframe>
  </body>
</html>
```

[3] This is an old, pre-exisiting, issue that's not relevant to this patch as such (and it's already being tracked elsewhere).
2019-01-13 11:34:24 +01:00
Tim van der Meij
ed918bad21
Remove left-over console log from the find controller unit tests 2019-01-12 22:27:40 +01:00
Tim van der Meij
a37ea16013
Merge pull request #10425 from timvandermeij/find-controller-unit-tests
Write more unit tests for the find controller
2019-01-12 22:20:53 +01:00
Tim van der Meij
b1cef896f4
Write more unit tests for the find controller
Fixes #7356.
2019-01-12 22:17:46 +01:00
Jonas Jenwald
b531fc4106 Avoid truncating inline images, where the data and the "EI" marker is glued together (issue 10388) (#10436)
Thanks to the *excellent* debugging done by @janpe2, this was easy to fix!
2019-01-12 20:31:23 +01:00
Jonas Jenwald
d4a3858ed5 Handle more cases of corrupt PDF files with missing 'endobj' operators, where the "obj" string is immediately followed by the dictionary (PR 9288 follow-up) 2019-01-10 17:55:28 +01:00
Tim van der Meij
b81984f0cb
Merge pull request #10417 from brendandahl/metric-length
Fix reading number of HTMX metrics.
2019-01-05 13:35:16 +01:00
Jonas Jenwald
e8f4b47d59 Prevent errors, in SimpleXMLParser.onEndElement, when the stack has already been completely parsed (issue 10410)
The error was triggered for a particular set of metadata, where an end tag was encountered without the corresponding begin tag being present in the data.
(The patch also fixes a minor oversight, from a recent PR, in the `SimpleDOMNode.nextSibling` method.)
2019-01-05 11:15:34 +01:00
Brendan Dahl
32eace043b Fix reading number of HTMX metrics.
The length of the HHEA table can be incorrect, so it is better to
read the number of metrics offset from beginning of table instead.
2019-01-04 15:13:13 -08:00
Tim van der Meij
b39ec7af96
Merge pull request #10408 from Snuffleupagus/issue-10407
Prevent errors, because of incorrect scope, in the `XMLParserBase._resolveEntities` method (issue 10407)
2019-01-04 23:45:26 +01:00
Jonas Jenwald
66fccd860b Adjust how AnnotationBorderStyle.setWidth handles the input being a Name (issue 10385)
In order to be consistent with the behaviour in Adobe Reader, the width will now always be set to zero when the input is a `Name`.
2019-01-04 10:38:10 +01:00
Jonas Jenwald
6cd9ff48f3 Prevent errors, because of incorrect scope, in the XMLParserBase._resolveEntities method (issue 10407) 2019-01-04 10:13:32 +01:00
Brendan Dahl
e2686db49b
Merge pull request #10277 from janpe2/cff-stems
Repair CFF fonts if stem hints are in wrong order
2019-01-03 10:30:43 -08:00
Jonas Jenwald
76a9580aeb Ensure that AnnotationBorderStyle.setWidth is able to handle the input being a Name, to correctly deal with corrupt PDF documents (issue 10385) 2018-12-31 12:21:28 +01:00
Jonas Jenwald
60bcce184e Check that the first page can be successfully loaded, to try and ascertain the validity of the XRef table (issue 7496, issue 10326)
For PDF documents with sufficiently broken XRef tables, it's usually quite obvious when you need to fallback to indexing the entire file. However, for certain kinds of corrupted PDF documents the XRef table will, for all intents and purposes, appear to be valid. It's not until you actually try to fetch various objects that things will start to break, which is the case in the referenced issues[1].

Since there's generally a real effort being in made PDF.js to load even corrupt PDF documents, this patch contains a suggested approach to attempt to do a bit more validation of the XRef table during the initial document loading phase.

Here the choice is made to attempt to load the *first* page, as a basic sanity check of the validity of the XRef table. Please note that attempting to load a more-or-less arbitrarily chosen object without any context of what it's supposed to be isn't a very useful, which is why this particular choice was made.
Obviously, just because the first page can be loaded successfully that doesn't guarantee that the *entire* XRef table is valid, however if even the first page fails to load you can be reasonably sure that the document is *not* valid[2].

Even though this patch won't cause any significant increase in the amount of parsing required during initial loading of the document[3], it will require loading of more data upfront which thus delays the initial `getDocument` call.
Whether or not this is a problem depends very much on what you actually measure, please consider the following examples:

```javascript
console.time('first');
getDocument(...).promise.then((pdfDocument) => {
  console.timeEnd('first');
});

console.time('second');
getDocument(...).promise.then((pdfDocument) => {
  pdfDocument.getPage(1).then((pdfPage) => { // Note: the API uses `pageNumber >= 1`, the Worker uses `pageIndex >= 0`.
    console.timeEnd('second');
  });
});
```

The first case is pretty much guaranteed to show a small regression, however the second case won't be affected at all since the Worker caches the result of `getPage` calls. Again, please remember that the second case is what matters for the standard PDF.js use-case which is why I'm hoping that this patch is deemed acceptable.

---
[1] In issue 7496, the problem is that the document is edited without the XRef table being correctly updated.
In issue 10326, the generator was sorting the XRef table according to the offsets rather than the objects.

[2] The idea of checking the first page in particular came from the "standard" use-case for the PDF.js library, i.e. the default viewer, where a failure to load the first page basically means that nothing will work; note how `{BaseViewer, PDFThumbnailViewer}.setDocument` depends completely on being able to fetch the *first* page.

[3] The only extra parsing is caused by, potentially, having to traverse *part* of the `Pages` tree to find the first page.
2018-12-29 12:47:25 +01:00
Tim van der Meij
103f4616ac
Merge pull request #10334 from Snuffleupagus/OpenAction-dest
[api-minor] Add support for OpenAction destinations (issue 10332)
2018-12-23 20:49:50 +01:00
Jonas Jenwald
f0719ed565 [api-minor] Change the getViewport method, on PDFPageProxy, to take a parameter object rather than a bunch of (randomly) ordered parameters
If, as PR 10368 suggests, more parameters should be added to `getViewport` I think that it would be a mistake to not change the signature *first* to avoid needlessly unwieldy call-sites.

To not break any existing code and third-party use-cases, this is obviously implemented with a deprecation warning *and* with a working fallback[1] for the old method signature.

---
[1] This is limited to `GENERIC` builds, which should be sufficient.
2018-12-21 11:55:20 +01:00
Jonas Jenwald
b05f053287 [api-minor] Add support for OpenAction destinations (issue 10332)
Note that the OpenAction dictionary may contain other information besides just a destination array, e.g. instructions for auto-printing[1].
Given first of all that an arbitrary `Dict` cannot be sent from the Worker (since cloning would fail), and second of all that the data obviously needs to be validated, this patch purposely only adds support for fetching a destination from the OpenAction entry[2].

---
[1] This information is, currently in PDF.js, being included through the `getJavaScript` API method.

[2] This significantly reduces the complexity of the implementation, which seems fine for now. If there's ever need for other kinds of OpenAction to be fetched, additional API methods could/should be implemented as necessary (could e.g. follow the `getOpenActionWhatever` naming scheme).
2018-12-19 11:45:16 +01:00
Jonas Jenwald
ba2edeae18 [api-minor] Add support, in getMetadata, for custom information dictionary entries (issue 5970, issue 10344) (#10346)
The custom entries, provided that they exist *and* that their types are safe to include, are exposed through a new `Custom` infoDict entry to clearly separate them from the standard ones.

Fixes 5970.
Fixes 10344.
2018-12-18 23:26:02 +01:00
April King
64cb8c6b98
Add protection against directory traversal attacks 2018-12-10 12:59:04 -06:00
Wojciech Maj
616135962a Fix badly formatted .eslintrc 2018-11-23 13:49:58 +01:00
Jani Pehkonen
9e990f6f3e Repair CFF fonts if stem hints are in wrong order 2018-11-20 18:50:37 +02:00
Jonas Jenwald
2c003a82d5 Convert RenderTask, in src/display/api.js, to an ES6 class
Also deprecates the `then` method, in favour of the `promise` getter.
2018-11-18 19:08:00 +01:00
Jonas Jenwald
ef8e5fd77c Convert PDFDocumentLoadingTask, in src/display/api.js, to an ES6 class
Also deprecates the `then` method, in favour of the `promise` getter.
2018-11-18 19:07:57 +01:00
PalmerAL
5f15dc2023 Use span instead of div in the text layer
This improves copy/pasting text content since it reduces the amount of unnecessary newlines.
2018-11-18 15:54:08 +01:00
Tim van der Meij
016c2761da
Resolve deprecation warnings for Jasmine
Jasmine recommends to use the `configure` method on the environment
during boot. This commit makes the code correspond to how it's done in
Jasmine's default boot file. The options dropdown in the HTML reporter
now works again after these changes, because this broke in the upgrade
to Jasmine 3, and the unit tests are executed in a random order by
default, which is important to make sure the unit tests are
self-contained and don't depend on the result of another unit test.
2018-11-17 23:31:22 +01:00
Thomas den Hollander
b157d8b478
Change splice to pop in annotation tests
This line in the annotation tests subtracts an array from a number. This is because `splice(-1, 1)` returns a one-element array, while `pop()` returns only the element itself.
2018-10-24 13:08:08 +02:00
Jonas Jenwald
327f2eb588 Ensure that onProgress is always called when the entire PDF file has been loaded, regardless of how it was fetched (issue 10160)
*Please note:* I'm totally fine with this patch being rejected, and the issue closed as WONTFIX; however these changes should address the issue if that's desired.

From a conceptual point of view, reporting loading progress doesn't really make a lot of sense for PDF files opened by passing raw binary data directly to `getDocument` (since obviously *all* data was loaded).
This is compared to PDF files loaded via e.g. `XMLHttpRequest` or the Fetch API, where the entire PDF file isn't available from the start and knowing the loading progress makes total sense.

However I can certainly see why the current API could be considered inconsistent, which isn't great, since a registered `onProgress` callback will never be called for certain `getDocument` calls.
The simplest solution to this inconsistency thus seem to be to ensure that `onProgress` is always called when handling the `DataLoaded` message, since that will *always* be dispatched[1] from the worker-thread.

---
[1] Note that this isn't guaranteed to happen, since setting `disableAutoFetch = true` often prevents the *entire* file from ever loading. However, this isn't relevant for the issue at hand, and is a well-known consequence of using `disableAutoFetch = true`; note how the default viewer even has a specialized code-path for hiding the loadingBar.
2018-10-16 13:51:12 +02:00
Kevin Lee Drum
4cf10ac79d set returnValues.suggestedLength to Content-Length if integer 2018-10-07 13:26:29 -04:00
Tim van der Meij
ff2df9c5b6
Merge pull request #10117 from leblanc-simon/ink-annotation-support
Add support of Ink annotation
2018-10-04 23:39:41 +02:00
Jonas Jenwald
2ed3591b22 Make PDFFindController less confusing to use, by allowing searching to start when setDocument is called
*This patch is based on something that I noticed while working on PR 10126.*

The recent re-factoring of `PDFFindController` brought many improvements, among those the fact that access to `BaseViewer` is no longer required. However, with these changes there's one thing which now strikes me as not particularly user-friendly[1]: The fact that in order for searching to actually work, `PDFFindController.setDocument` must be called *and* a 'pagesinit' event must be dispatched (from somewhere).

For all other viewer components, calling the `setDocument` method[2] is enough in order for the component to actually be usable.
The `PDFFindController` thus stands out quite a bit, and it also becomes difficult to work with in any sort of custom implementation. For example: Imagine someone trying to use `PDFFindController` separately from the viewer[3], which *should* now be relatively simple given the re-factoring, and thus having to (somehow) figure out that they'll also need to manually dispatch a 'pagesinit' event for searching to work.

Note that the above even affects the unit-tests, where an out-of-place 'pagesinit' event is being used.
To attempt to address these problems, I'm thus suggesting that *only* `setDocument` should be used to indicate that searching may start. For the default viewer and/or the viewer components, `BaseViewer.setDocument` will now call `PDFFindController.setDocument` when the document is ready, thus requiring no outside configuration anymore[4]. For custom implementation, and the unit-tests, it's now as simple as just calling `PDFFindController.setDocument` to allow searching to start.

---
[1] I should have caught this during review of PR 10099, but unfortunately it's sometimes not until you actually work with the code in question that things like these become clear.

[2] Assuming, obviously, that the viewer component in question actually implements such a method :-)

[3] There's even a very recent issue, filed by someone trying to do just that.

[4] Short of providing a `PDFFindController` instance when creating a `BaseViewer` instance, of course.
2018-10-04 10:28:50 +02:00
Simon Leblanc
b5806735d8 Add support of Ink annotation 2018-10-03 00:28:49 +02:00
Tim van der Meij
1b402996cf
Implement a basic unit test for the find controller
This commit shows that we can now unit test the find controller and
that executing regular queries works. Note that this is only a first
step and not a complete suite of unit tests for all possible options
of the find controller.

While writing this unit test, I found two smaller issues that I
addressed directly. The first one is that in the previous find
controller refactoring I forgot to rename some occurrences of a now
private member variable. Fortunately this did not cause any bugs since
we did have a public getter and the fetched value may be changed by
reference, but it's nevertheless good to fix. The second issue is that
some entries in the `test/unit/clitests.json` file were not correct,
resulting in these tests not being executed on e.g., Travis CI.
2018-09-30 18:32:34 +02:00
Jonas Jenwald
1c814e208e Prevent getPDFFileNameFromURL from breaking if the url parameter is not a string 2018-09-30 12:28:59 +02:00
Jonas Jenwald
842e9206c0 Replace String.prototype.substr() occurrences with String.prototype.substring()
As outlined in https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/substr, which refers to the ECMA-262 specification, using the `substr` function is advised against.

Hence this PR, which replaces all remaining `substr` occurrences with `substring` instead. Please refer to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/substr#Syntax respectively https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/substring#Syntax for the differences between the two functions.

Note that in most cases in the code-base there's only one argument passed to `substr`, and those require no other changes except replacing "substr" with "substring". For the other cases, the `substr(start, length)` calls are changed to `substring(start, start + length)` instead.
2018-09-28 11:41:07 +02:00
Jonas Jenwald
39776168a2 Add EventBus unit-tests to ensure that the (optional) argument handling works correctly 2018-09-21 14:31:35 +02:00
Jonas Jenwald
f317a2cb40 Ensure that the DOM event listeners are removed at the end of the relevant EventBus unit-tests, to prevent the tests from interfering with each other 2018-09-20 23:12:01 +02:00
Tim van der Meij
99de25d6cc
Implement unit tests for the isSameOrigin and createValidAbsoluteUrl utility functions
Moreover, mark the `isValidProtocol` function as private since it's only
used in the utilities file and is not (meant to be) exported.
2018-09-11 16:17:45 +02:00
Jonas Jenwald
6d804d657f Add initial support for "Whole words" searching in the viewer
As outlined in https://bugzilla.mozilla.org/show_bug.cgi?id=1282759 the internal Firefox name for the feature is `entireWord`, hence that name is used here as well for consistency (with "Whole words" being limited to the UI).

Given existing limitations of the PDF.js search functionality, e.g. the existing problems of searching across "new lines", there's some edge-cases where "Whole words" searching will ignore (valid) results.
However, considering that this is a pre-existing issue related to the way that the find controller joins text-content together, that shouldn't have to block this new feature in my opionion.

*Please note:* In order to enable this feature in the `MOZCENTRAL` version, a small follow-up patch for [PdfjsChromeUtils.jsm](https://hg.mozilla.org/mozilla-central/file/tip/browser/extensions/pdfjs/content/PdfjsChromeUtils.jsm) will be required once this has landed in `mozilla-central`.
2018-09-10 11:59:29 +02:00
Tim van der Meij
66422eb83e
Merge pull request #9340 from brendandahl/private-use
Map all glyphs to the private use area and duplicate the first glyph.
2018-09-08 17:51:04 +02:00
Brendan Dahl
b76cf665ec Map all glyphs to the private use area and duplicate the first glyph.
There have been lots of problems with trying to map glyphs to their unicode
values. It's more reliable to just use the private use areas so the browser's
font renderer doesn't mess with the glyphs.

Using the private use area for all glyphs did highlight other issues that this
patch also had to fix:

  * small private use area - Previously, only the BMP private use area was used
    which can't map many glyphs. Now, the (much bigger) PUP 16 area can also be
    used.

  * glyph zero not shown - Browsers will not use the glyph from a font if it is
    glyph id = 0. This issue was less prevalent when we mapped to unicode values
    since the fallback font would be used. However, when using the private use
    area, the glyph would not be drawn at all. This is illustrated in one of the
    current test cases (issue #8234) where there's an "ä" glyph at position
    zero. The PDF looked like it rendered correctly, but it was actually not
    using the glyph from the font. To properly show the first glyph it is always
    duplicated and appended to the glyphs and the maps are adjusted.

  * supplementary characters - The private use area PUP 16 is 4 bytes, so
    String.fromCodePoint must be used where we previously used
    String.fromCharCode. This is actually an issue that should have been fixed
    regardless of this patch.

  * charset - Freetype fails to load fonts when the charset size doesn't match
    number of glyphs in the font. We now write out a fake charset with the
    correct length. This also brought up the issue that glyphs with seac/endchar
    should only ever write a standard charset, but we now write a custom one.
    To get around this the seac analysis is permanently enabled so those glyphs
    are instead always drawn as two glyphs.
2018-09-05 14:04:54 -07:00
Jonas Jenwald
e5a6d892b4
Revert "Attempt to combine separate beginText/endText sequences in getTextContent (issue 9984)" 2018-09-05 18:01:33 +02:00
Tim van der Meij
e812c6e7ac
Use shorter code for failing a test in test/unit/api_spec.js 2018-09-02 21:23:09 +02:00
Tim van der Meij
959ed3705b
Implement a permissions API 2018-09-02 21:23:09 +02:00
Tim van der Meij
a096e0c647
Merge pull request #10032 from timvandermeij/test-link
Replace broken link for `pr8808.pdf.link`
2018-09-02 14:52:21 +02:00
Tim van der Meij
b62f14f3f5
Replace broken link for pr8808.pdf.link
The current link had an invalid certificate and was a redirect to this
new link anyway. The MD5 hash is equal.
2018-09-02 14:48:26 +02:00
Tim van der Meij
c94df0fef3
Merge pull request #9986 from Snuffleupagus/issue-9984
Attempt to combine separate beginText/endText sequences in `getTextContent` (issue 9984)
2018-09-01 21:21:29 +02:00
Tim van der Meij
f2f2e05bb8
Merge pull request #10019 from Snuffleupagus/eventBusDispatchToDOM
Add general support for re-dispatching events, on `EventBus` instances, to the DOM
2018-09-01 19:11:23 +02:00
Jonas Jenwald
0b1f41c5b3 Add general support for re-dispatching events, on EventBus instances, to the DOM
This patch is the first step to be able to eventually get rid of the `attachDOMEventsToEventBus` function, by allowing `EventBus` instances to simply re-dispatch most[1] events to the DOM.
Note that the re-dispatching is purposely implemented to occur *after* all registered `EventBus` listeners have been serviced, to prevent the ordering issues that necessitated the duplicated page/scale-change events.

The DOM events are currently necessary for the `mozilla-central` tests, see https://hg.mozilla.org/mozilla-central/file/tip/browser/extensions/pdfjs/test, and perhaps also for custom deployments of the PDF.js default viewer.

Once this have landed, and been successfully uplifted to `mozilla-central`, I intent to submit a patch to update the test-code to utilize the new preference. This will thus, eventually, make it possible to remove the `attachDOMEventsToEventBus` functionality.

*Please note:* I've successfully ran all `mozilla-central` tests locally, with these patches applied.

---
[1] The exception being events that originated on the `window` or `document`, since those are already globally available anyway.
2018-08-30 17:28:12 +02:00
Jonas Jenwald
95e5bad4c4 Attempt to find truncated endstream commands, in the fallback code-path, in Parser.makeStream (issue 10004)
Apparently there's some PDF generators, in this case the culprit is "Nooog Pdf Library / Nooog PStoPDF v1.5", that manage to mess up PDF creation enough that endstream[1] commands actually become truncated.

*Please note:* The solution implemented here isn't perfect, since it won't be able to cope with PDF files that contains a *mixture* of correct and truncated endstream commands.
However, considering that this particular mode of corruption *fortunately* doesn't seem very common[2], a slightly less complex solution ought to suffice for now.

Fixes 10004.

---
[1] Scanning through the PDF data to find endstream commands becomes necessary, in order to determine the stream length in cases where the `Length` entry of the (stream) dictionary is missing/incorrect.

[2] I cannot recall having seen any (previous) issues/bugs with "Missing endstream" errors.
2018-08-26 11:51:11 +02:00
Jonas Jenwald
497b765ede Attempt to combine separate beginText/endText sequences in getTextContent (issue 9984)
Please note that while this *improves* issue 9984 slightly (and likely others too), it's not a complete solution.
The remaining issues are related to the, more general, problems with the existing heuristics related to attempting to combine separate text items.
2018-08-18 13:45:32 +02:00
Tim van der Meij
1268aea2b6
Merge pull request #9975 from Snuffleupagus/getDestination-refactor
Re-factor `destinations`/`getDestination` to reduce unnecessary duplication, and reject non-string inputs
2018-08-12 15:51:58 +02:00
Tim van der Meij
af19ed6ee9
Merge pull request #9822 from timvandermeij/annotations
[api-minor] Refactor the annotation code to be asynchronous
2018-08-11 20:39:50 +02:00
Tim van der Meij
bbc769cf81
Convert test/unit/annotation_spec.js to ES6 syntax 2018-08-11 19:00:29 +02:00
dmitryskey
3741becb9b
[api-minor] Refactor the annotation code to be asynchronous
This commit is the first step towards implementing parsing for the
appearance streams of annotations.

Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
Co-authored-by: Tim van der Meij <timvandermeij@gmail.com>
2018-08-11 19:00:29 +02:00
Jonas Jenwald
1179584fd6 Reject getDestination, in the API, for non-string inputs
Note how e.g. the `getPage` method does basic validation of the input.
2018-08-11 16:06:35 +02:00
Jonas Jenwald
f78efd883e Attempt to throw MissingPDFException when applicable in node_stream.js (issue 9791) 2018-08-06 10:00:03 +02:00
Tim van der Meij
f6eaa99cb2
Reword test reporter message
The font tests use Jasmine too, so while they are technically unit
tests, it's a bit confusing to see `Started unit tests` when the font
tests are run on the bots.
2018-08-05 21:21:46 +02:00
Tim van der Meij
4111871ac5
Merge pull request #9958 from brendandahl/always-fallback
Always fallback to system font on font failure.
2018-08-05 19:58:48 +02:00
Tim van der Meij
27e8a2f6fe
Merge pull request #9959 from brendandahl/test-util
Utility script to add a reference test.
2018-08-05 16:53:37 +02:00
Tim van der Meij
b65d0450f5
Merge pull request #9960 from brendandahl/strict-verify
Fail when MD5 of test files fails on bots.
2018-08-05 16:44:12 +02:00
Brendan Dahl
482ea2af32 Fail when MD5 of test files fails on bots. 2018-08-03 17:48:47 -07:00
Brendan Dahl
8b3ed473c1 Utility script to add a reference test. 2018-08-03 17:24:24 -07:00
Brendan Dahl
5f67a6a237 Always fallback to system font on font failure.
The font in the PDF is marked as a CIDFontType0, but the font file is
actually a true type font. To fully address this issue we should really
peek into the font file and try to determine what it is. However, this
is the first case of this issue, so I think this solution is acceptable for
now.
2018-08-03 16:49:22 -07:00
Tim van der Meij
444976bcd5
Merge pull request #9956 from brendandahl/allow-zero-progress
Allow loaded progress of 0 in unit tests.
2018-08-04 00:19:02 +02:00
Tim van der Meij
f19ee127a3
Merge pull request #9874 from boundlesshq/master
[api-minor] Include export value for checkboxes
2018-08-03 23:43:23 +02:00
Brendan Dahl
d762567bcf Allow loaded progress of 0 in unit tests. 2018-08-03 10:31:46 -07:00
Tim van der Meij
8a4be24645
Merge pull request #9948 from Snuffleupagus/url-polyfill-unit-tests
Add (basic) unit-tests for the non-global `URL` constructor (PR 9868 follow-up)
2018-08-02 23:32:07 +02:00
Brian
2a665ebad4 Removed Extraneous Matrix Check in CalRGB Conversion 2018-08-02 10:16:42 -07:00
Jonas Jenwald
f8388710e6 Add (basic) unit-tests for the non-global URL constructor (PR 9868 follow-up)
This should really have been included in PR 9868, since it will help ensure that the `URL` constructor is correctly imported/exported by `src/shared/util.js`.
2018-08-02 10:32:06 +02:00
Tim van der Meij
716acf63d4
Merge pull request #9938 from Snuffleupagus/issue-9915
Ensure that Type0, i.e. composite, OpenType fonts with `CFF ` tables are *not* treated as CFF fonts if their glyph mapping is non-default (issue 9915)
2018-08-02 00:11:18 +02:00
Jonas Jenwald
3ce420131f Prefer the Width/Height of the image data, rather than the image dictionary, for JPEG 2000 images (issue 9650)
According to the PDF specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=45
> When using the JPXDecode filter with image XObjects, the following changes to and constraints on some entries in the image dictionary shall apply (see 8.9.5, "Image Dictionaries" for details on these entries):
>
>  - Width and Height shall match the corresponding width and height values in the JPEG2000 data.
>
>  - . . .

Hence it seems reasonable to use the Width/Height of the image data *itself*, rather than the image dictionary when there's a mismatch. Given that JPEG 2000 images are already being parsed, in order to obtain basic parameters, the actual Width/Height is readily available in the `PDFImage` constructor.
2018-08-01 16:42:26 +02:00
Jonas Jenwald
690bcc8c8a Add a reduced, eq, test-case for issue 9915 2018-07-29 23:06:15 +02:00
bion
c31ddf7edc [api-minor] Include export value for checkboxes 2018-07-28 00:30:41 -07:00
Jonas Jenwald
928b89382e [api-minor] Add an IsLinearized property to the PDFDocument.documentInfo getter, to allow accessing the linearization status through the API (via PDFDocumentProxy.getMetadata)
There was a (somewhat) recent question on IRC about accessing the linearization status of a PDF document, and this patch contains a simple way to expose that through already existing API methods.
Please note that during setup/parsing in `PDFDocument` the linearization data is already being fetched and parsed, provided of course that it exists. Hence this patch will *not* cause any additional data to be loaded.
2018-07-26 15:54:19 +02:00
Jonas Jenwald
36b683ca55 Provide custom messages for the no-restricted-globals ESLint rule, and refactor the .eslintrc files (PR 9868 follow-up)
Without providing useful (custom) error messages for the `no-restricted-globals` rule, see https://eslint.org/docs/rules/no-restricted-globals, it's quite likely that the rule will be incorrectly disabled rather than the required globals being imported as intended.

To reduced duplication of the `no-restricted-globals` rule in multiple `.eslintrc` files, it's instead moved to the top-level `.eslintrc` file and disabled as needed on a folder/file basis outside of `/src` and `/web`.
2018-07-23 14:10:13 +02:00
Jonas Jenwald
8ec99b200c Prevent Metadata/XML parsing from breaking PDFDocumentProxy.getMetadata when no XML root document is found (issue 8884)
With the new XML parser, see PR 9573, the referenced PDF file now causes `getMetadata` to fail when incomplete XML tags are encountered. This provides a simple, and hopefully generally useful, work-around that may also help prevent future bugs.

(Without being able to reproduce nor even understand the other (non XML) errors mentioned in issue 8884, I'd say that this patch is enough to close that one as fixed.)
2018-07-18 11:37:40 +02:00
Tim van der Meij
61db85ab64
Merge pull request #9886 from Snuffleupagus/bug-1473809
Prevent errors in `sanitizeTTProgram`, during parsing of CALL functions, when encountering invalid functions stack deltas (bug 1473809)
2018-07-15 17:23:52 +02:00
Jonas Jenwald
2b25deb84c Prevent errors in sanitizeTTProgram, during parsing of CALL functions, when encountering invalid functions stack deltas (bug 1473809)
*I was feeling bored; so this is a very quick, and somewhat naive, attempt at fixing the bug.*

The breaking error, i.e. `Error during font loading: invalid array length`, was thrown when attempting to re-size the `stack` to a *negative* length when parsing the CALL functions.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1473809.
2018-07-10 09:45:55 +02:00
Jonas Jenwald
61186698c3 Replace the remaining occurences of instanceof Array with Array.isArray()
*Follow-up to PRs 8864 and 8813.*

As explained in https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/isArray, `instanceof Array` can have inconsistent behavior. To ensure that only `Array.isArray` is used, an ESLint plugin/rule is added to enforce this.
2018-07-09 13:17:41 +02:00
Tim van der Meij
99f8f2c275
Merge pull request #9853 from Snuffleupagus/re-render-after-cancel
Fix re-rendering, using the same canvas, when rendering was previously cancelled (PR 8519 follow-up)
2018-06-29 23:25:43 +02:00
Tim van der Meij
6fa2c779b5
Merge pull request #9838 from Snuffleupagus/invalid-path-OPS
Error, rather than warn, once a number of invalid path operators are encountered in `EvaluatorPreprocessor.read` (bug 1443140)
2018-06-28 23:15:25 +02:00
Jonas Jenwald
bf0aca86d7 Fix re-rendering, using the same canvas, when rendering was previously cancelled (PR 8519 follow-up)
Currently if `RenderTask.cancel` is called *immediately* after rendering was started, then by the time that `InternalRenderTask.initializeGraphics` is called rendering will already have been cancelled.
However, we're still inserting the canvas into the `canvasInRendering` map, thus breaking any future attempts at re-rendering using the same canvas. Considering that `InternalRenderTask.cancel` always removes the canvas from the map, I cannot imagine that we'd ever want to re-add it *after* rendering was cancelled (it was likely just a simple oversight in PR 8519).

Fixes 9456.
2018-06-28 22:56:37 +02:00
Jonas Jenwald
74e9999044 Add unit-tests for PDFPageProxy.stats (PR 9245 follow-up)
This wasn't included in PR 9245, since all the API options were still global at that time.

Writing the unit-tests also uncovered an issue with `getOperatorList` not starting the "Page Request" timer.
2018-06-25 14:20:49 +02:00
Jonas Jenwald
7f21e38787 Error, rather than warn, once a number of invalid path operators are encountered in EvaluatorPreprocessor.read (bug 1443140)
Incomplete path operators, in particular, can result in fairly chaotic rendering artifacts, as can be observed on page four of the referenced PDF file.

The initial (naive) solution that was attempted, was to simply throw a `FormatError` as soon as any invalid (i.e. too short) operator was found and rely on the existing `ignoreErrors` code-paths. However, doing so would have caused regressions in some files; see the existing `issue2391-1` test-case, which was promoted to an `eq` test to help prevent future bugs.
Hence this patch, which adds special handling for invalid path operators since those may cause quite bad rendering artifacts.

You could, in all fairness, argue that the patch is a handwavy solution and I wouldn't object. However, given that this only concerns *corrupt* PDF files, the way that PDF viewers (PDF.js included) try to gracefully deal with those could probably be described as a best-effort solution anyway.

This patch also adjusts the existing `warn`/`info` messages to print the command name according to the PDF specification, rather than an internal PDF.js enumeration value. The former should be much more useful for debugging purposes.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1443140.
2018-06-24 16:05:08 +02:00
Jonas Jenwald
56e3648b65 Add basic validation of the 'trailer' dictionary candidates in XRef.indexObjects (issue 9418)
This patch avoids choosing a (possible) 'trailer' dictionary that `XRef.parse` and/or the `Catalog` constructor/methods will reject anyway.
Since `XRef.indexObjects` is already parsing the entire PDF file, the extra dictionary look-ups added here shouldn't matter much. Besides, this is a fallback code-path that only applies to corrupt PDF files anyway.
2018-06-20 13:41:22 +02:00
Jonas Jenwald
6bbcafcd26 Let Lexer.getNumber treat a single decimal point as zero (issue 9252)
This is consistent with the behaviour in Adobe Reader.
2018-06-20 13:41:21 +02:00
Jonas Jenwald
df4799a12a Ensure that line-breaks are *only* skipped after operators in Lexer.getNumber (PR 8359 follow-up)
With the current code line-breaks are accepted not just after an operator, but after a decimal point as well. When looking at this again, the latter case seems prone to cause false positives and might also interfere with subsequent patches.

Hence this is code is adjusted to actually do what the original commit message says, and nothing more.
2018-06-20 13:41:15 +02:00
Tim van der Meij
620da6f4df
Merge pull request #9802 from Snuffleupagus/ColorSpace-PDFImage-Uint8ClampedArray
Update `ColorSpace` and `PDFImage` to use `Uint8ClampedArray`s and remove manual clamping/rounding
2018-06-16 17:55:10 +02:00
Tim van der Meij
280f20bf3c
Merge pull request #9809 from Snuffleupagus/getPathGenerator-ignoreErrors
Allow `FontFaceObject.getPathGenerator` to ignore non-embedded fonts during rendering
2018-06-16 16:37:52 +02:00
Jonas Jenwald
bf0db0fb72 Pass the ignoreErrors API option to the FontFaceObject constructor, and utilize it in getPathGenerator to ignore missing glyphs
Obviously it's still not possible to render non-embedded fonts as paths, but in this way the rest of the page will at least be allowed to continue rendering.

*Please note:* Including the 14 standard fonts in PDF.js probably wouldn't be *that* difficult to implement. (I'm not a lawyer, but the fonts from PDFium could probably be used given their BSD license.)
However, the main blocker ought to be the total size of the necessary font data, since I cannot imagine people being OK with shipping ~5 MB of (additional) font data with Firefox. (Based on the reactions when the CMap files were added, and those are only ~1 MB in size.)
2018-06-13 11:02:06 +02:00
Jonas Jenwald
731f2e6dfc Remove manual clamping/rounding from ColorSpace and PDFImage, by having their methods use Uint8ClampedArrays
The built-in image decoders are already using `Uint8ClampedArray` when returning data, and this patch simply extends that to the rest of the image/colorspace code.

As far as I can tell, the only reason for using manual clamping/rounding in the first place was because TypedArrays used to be polyfilled (using regular arrays). And trying to polyfill the native clamping/rounding would probably have been had too much overhead, but given that TypedArray support is required in PDF.js version `2.0` that's no longer a concern.

*Please note:* Because of different rounding behaviour, basically `Math.round` in `Uint8ClampedArray` respectively `Math.floor` in the old code, there will be very slight movement in quite a few existing test-cases. However, the changes should be imperceivable to the naked eye, given that the absolute difference is *at most* `1` for each RGB component when comparing `master` and this patch (see also the updated expectation values in the unit-tests).
2018-06-12 11:01:32 +02:00
Jonas Jenwald
32367c5968 Make the getBytes/peekBytes methods of Stream/DecodeStream/ChunkedStream able to return Uint8ClampedArrays
The built-in image decoders are already returning data as `Uint8ClampedArray`, and subsequently the JPEG/JBIG2/JPX streams are as well. However, for general streams we obviously don't want to force the use of `Uint8ClampedArray` unless an "Image" is actually being decoded.
Hence this patch, which adds a parameter that allows the caller of the `getBytes`/`peekBytes` methods to force a `Uint8ClampedArray` (rather than a `Uint8Array`) to be returned.
2018-06-12 11:01:32 +02:00
youngroz
09359efca0 Replace deprecated constructor with 2018-06-11 20:41:56 -07:00
Tim van der Meij
db874b6680
Merge pull request #9660 from brendandahl/headless
Support running the tests headlessly.
2018-06-09 15:14:42 +02:00
Jonas Jenwald
547f119be6 Simplify the error handling slightly in the src/display/node_stream.js file
The various classes have `this._errored` and `this._reason` properties, where the first one is a boolean indicating if an error was encountered and the second one contains the actual `Error` (or `null` initially).

In practice this means that errors are basically tracked *twice*, rather than just once. This kind of double-bookkeeping is generally a bad idea, since it's quite easy for the properties to (accidentally) get into an inconsistent state whenever the relevant code is modified.

Rather than using a separate boolean, we can just as well check the "error" property directly (since `null` is falsy).

---

Somewhat unrelated to this patch, but `src/display/node_stream.js` is currently *not* handling errors in a consistent or even correct way; compared with `src/display/network.js` and `src/display/fetch_stream.js`.
Obviously using the `createResponseStatusError` utility function, from `src/display/network_utils.js`, might not make much sense in a Node.js environment. However at the *very* least it seems that `MissingPDFException`, or `UnknownErrorException` when one cannot tell that the PDF file is "missing", should be manually thrown.

As is, the API (i.e. `getDocument`) is not returning the *expected* errors when loading fails in Node.js environments (as evident from the `pending` API unit-test).
2018-06-06 09:05:45 +02:00
Brendan Dahl
127590b1c3 Support running the tests headlessly. 2018-06-05 11:29:58 -07:00
Jonas Jenwald
89caaf4071 Use LoopbackPort in the "message_handler" unit-tests
There's no good reason, as far as I can tell, to duplicate the functionality of the `LoopbackPort` in the unit-tests. The only difference between the implementations is that `LoopbackPort` mimics the (native) structured cloning, however that shouldn't matter here since the tests are only sending "simple" data (strings respectively arrays with numbers).

Furthermore the patch also changes `LoopbackPort` to default to using "structured cloning" and deferred invocation of the listeners, since native typed array support is now a requirement for using the PDF.js library.
2018-06-04 12:53:08 +02:00
Jonas Jenwald
44d8afd46b Move MessageHandler into a separate src/shared/message_handler.js file
The `MessageHandler` itself, and its assorted helper functions, are currently the single largest[1] piece of code in the `src/shared/util.js` file. By moving this code into its own file, `src/shared/util.js` thus becomes smaller and more manageable.
2018-06-04 12:53:08 +02:00
Tim van der Meij
90750d624b
Use fs.unlinkSync instead of fs.unlink when removing files in the font tests 2018-06-03 22:17:05 +02:00
Jonas Jenwald
11b4613e20 Reduce the amount of console "spam", by ignoring info/warn calls, when running the unit-tests in Node.js/Travis
Compared to running the unit-tests in "regular" browsers, where any console output won't get mixed up with test output, in Node.js/Travis the test output looks quite noisy.
By ignoring `info`/`warn` calls, when running unit-tests in Node.js/Travis, the test output is a lot smaller not to mention that any *actual* failures are more easily spotted.
2018-06-03 00:28:40 +02:00
Jonas Jenwald
620f65488b Ignore the rest of the image when encountering an EOI (End of Image) marker while parsing Scan data (issue 9679) 2018-05-30 22:40:11 +02:00
Tim van der Meij
4f8dae683e
Use fs.unlinkSync instead of fs.unlink when removing eq.log
This is necessary because Node.js crashes when `fs.unlink` is called
without a callback function.
2018-05-28 23:48:20 +02:00
Tim van der Meij
2f3b05fd5a
Convert all PDF links from HTTP to HTTPS 2018-05-27 16:02:04 +02:00
Tim van der Meij
4958b59369
Fix broken links to Bugzilla PDF attachments 2018-05-27 15:40:33 +02:00
Ryan Hendrickson
91cbc185da Add scrolling modes to web viewer
In addition to the default scrolling mode (vertical), this commit adds
horizontal and wrapped scrolling, implemented primarily with CSS.
2018-05-14 23:10:32 -04:00
Jani Pehkonen
8ea505545a Use FDSelect and FDArray when converting CFF CID font to paths 2018-04-10 16:44:42 +03:00
Tim van der Meij
69e9fe2494
Provide a prefixed appearance CSS rule for reference testing in Chrome
In `rasterizeAnnotationLayer` we load the source CSS files directly, so
these are not processed by Autoprefixer. Since the prefixed rules have
now been removed from the source CSS files, we must manually provide one
prefixed rule that Chrome needs in the overrides CSS file for checkbox
and radio button rendering to work in the reference tests.
2018-04-08 13:54:16 +02:00
Jonas Jenwald
77d025dc14 Move the isPortraitOrientation helper function from web/base_viewer.js to web/ui_utils.js
A couple of basic unit-tests are added, and a manual `isLandscape` check (in `web/base_viewer.js`) is also converted to use the helper function instead.
2018-03-25 18:48:53 +02:00
Tim van der Meij
5c1a16ba6e
Merge pull request #9586 from Snuffleupagus/pageSize-api-rotate
Ensure that `PDFPageProxy.pageSizeInches` handles non-default /Rotate entries correctly
2018-03-25 18:03:32 +02:00
Tim van der Meij
6cc0efe1cc
Merge pull request #9576 from timvandermeij/versions
Update packages
2018-03-25 17:52:26 +02:00
Tim van der Meij
95de23e6e3
Update packages
Jasmine had a major version bump and required a few minor changes in our
booting code. Most notably, using `pending` in a `describe` block is no
longer supported, so we can only return early there. On the positive
side, the unit tests now run in a random order by default, which
eliminates any dependencies between unit tests.

Note that upgrading to Webpack 4 is out of scope for this patch since
the bots cannot work well with the newly generated bundles (both
browsers on both bots do not react within 120 seconds). Webpack 4 is not
faster for us than Webpack 3, so for now there is no need to upgrade.
2018-03-25 16:59:50 +02:00
Jonas Jenwald
d547936827 Ensure that PDFPageProxy.pageSizeInches handles non-default /Rotate entries correctly
Without this patch, the pageSize will be incorrectly reported for some PDF files.

---

Move pageSizeInches to ui_utils
2018-03-25 16:48:29 +02:00
Brendan Dahl
63c7aee112
Merge pull request #9565 from brendandahl/new-name
Rename the globals to shorter names.
2018-03-20 13:49:04 -07:00
Jonas Jenwald
e0ae157582 [api-minor] Fix various issues related to the pageSize information
The `getPageSizeInches` method was implemented on `PDFDocumentProxy`, which seems conceptually wrong since the size property isn't global to the document but rather specific to each page. Hence the method is moved into `PDFPageProxy`, as `get pageSizeInches` instead to address this.

Despite the fact that new API functionality was implemented, no unit-tests were added. To prevent issues later on, we should *always* ensure that new functionality has at least some test-coverage; something that this patch also takes care of.

The new `PDFDocumentProperties._parsePageSize` method seemed unnecessary convoluted. Furthermore, in the "no data provided"-case it even returned incorrect data (an array, rather than the expected object).
Finally, the fallback strings didn't actually agree with the `en-US` locale. This inconsistency doesn't look too great, and it's thus addressed here as well.
2018-03-18 09:10:19 +01:00
Brendan Dahl
01bff1a81d Rename the globals to shorter names.
pdfjsDistBuildPdf=pdfjsLib
pdfjsDistWebPdfViewer=pdfjsViewer
pdfjsDistBuildPdfWorker=pdfjsWorker
2018-03-16 11:08:56 -07:00
Jonas Jenwald
d431ae069d Attempt to handle corrupt PDF documents that inline Page dictionaries in a Kids array (issue 9540)
According to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.1942297, the contents of a Kids array should be indirect objects.
2018-03-12 14:13:23 +01:00
Tim van der Meij
f308d73d40
Implement a single getInheritableProperty utility function
This function combines the logic of two separate methods into one.
The loop limit is also a good thing to have for the calls in
`src/core/annotation.js`.

Moreover, since this is important functionality, a set of unit tests and
documentation is added.
2018-03-03 19:19:39 +01:00
Jonas Jenwald
b8606abbc1 [api-major] Completely remove the global PDFJS object 2018-03-01 18:13:27 +01:00
Jonas Jenwald
212553840f Move the pdfBug option from the global PDFJS object and into getDocument instead
Also removes the now unused `getDefaultSetting` helper function.
2018-03-01 18:11:17 +01:00
Jonas Jenwald
b69abf1111 Move the disableRange option from the global PDFJS object and into getDocument instead 2018-03-01 18:11:16 +01:00
Jonas Jenwald
69d7191034 Move the disableAutoFetch option from the global PDFJS object and into getDocument instead
One additional complication with removing this option from the global `PDFJS` object, is that the viewer currently needs to check `disableAutoFetch` in a couple of places. To address this I'm thus proposing adding a getter in `PDFDocumentProxy`, to allow checking the *actually* used values for a particular `getDocument` invocation.
2018-03-01 18:11:16 +01:00
Jonas Jenwald
3c2fbdffe6 Move the cMapUrl and cMapPacked options from the global PDFJS object and into getDocument instead 2018-03-01 18:11:16 +01:00
Jonas Jenwald
83d52518da [api-major] Refactor PDFWorker to be initialized with a parameter object, rather than a bunch of regular parameters 2018-02-16 13:22:35 +01:00
Jonas Jenwald
c3c1fc511d Move the workerSrc option from the global PDFJS object and into GlobalWorkerOptions instead 2018-02-16 13:22:35 +01:00
Rob Wu
a89071bdef
Merge pull request #9470 from Snuffleupagus/issue-4888
Ensure that `JpegImage.getData` returns the correct data length when `forceRGBoutput == true` (issue 4888)
2018-02-16 13:14:21 +01:00
Brendan Dahl
4f5fb78237
Merge pull request #9401 from brendandahl/svg-fail
Make the test framework more resilient to errors.
2018-02-14 17:05:40 -08:00
Brendan Dahl
53d19b619d Make the test framework more resilient to errors. 2018-02-14 17:02:19 -08:00
Tim van der Meij
538dda1096
Merge pull request #9479 from Snuffleupagus/refactor-viewer-options
[api-major] Refactor viewer components initialization to reduce their dependency on the global `PDFJS` object
2018-02-14 22:47:33 +01:00
Jonas Jenwald
11ab3b5c00 Ensure that JpegImage.getData returns the correct data length when forceRGBoutput == true (issue 4888)
With PDF.js version `2.0` we'll only support browsers with built-in `TypedArray` functionality, hence there doesn't seem to be any good reason not to implement this now.

Fixes 4888.
2018-02-13 20:44:21 +01:00
Jonas Jenwald
e95c11a7f0 Remove the undocumented PDFJS.enableStats option
In order to simplify things, the undocumented `enableStats` option was removed and `pdfBug` is now instead used to enabled general debugging *and* page request/rendering stats.
Considering that in the default viewer the `stats` was only used when debugging was also enabled, this simplification (code wise) definitely seem worthwhile to me.
2018-02-13 16:56:57 +01:00
Jonas Jenwald
3a6f6d23d6 Move the externalLinkTarget and externalLinkRel options to PDFLinkService options
This removes the `PDFJS.externalLinkTarget`/`PDFJS.externalLinkRel` dependency from the viewer components, but please note that as a *temporary* solution the default viewer still uses it.
2018-02-13 14:28:40 +01:00
Jonas Jenwald
c45c394364 Move the imageResourcesPath option to a BaseViewer/PDFPageView/AnnotationLayerBuilder option
This removes the `PDFJS.imageResourcesPath` dependency from the viewer components and the test-suite, but please note that as a *temporary* solution the default viewer still uses it.
2018-02-13 14:28:38 +01:00
Jonas Jenwald
f05e5c5460 Take the dictionary, and not just the image data, into account when caching inline images (issue 9398)
The reason for the bug is that we're only computing a checksum of the image data itself, but completely ignore the inline dictionary. The latter is important, since in practice it's not uncommon for inline images to be identical but use e.g. different ColourSpaces.

There's obviously a couple of different ways that we could compute a hash/checksum of the dictionary.
Initially I tried using `MurmurHash3_64` to compute a hash of the keys/values in the dictionary. Unfortunately this approach turned out to be *way* too slow in practice, especially for PDF files with a huge number of inline images; in particular issue 2618 would regresses quite badly with this solution.

The solution that is instead implemented in this patch, is to compute a checksum of the dictionary contents. While this is a much simpler, not to mention a lot more efficient, solution there's one drawback associated with it:
If the contents of inline image dictionaries are ordered differently, they will not be considered equal with this approach which could thus lead to failures to cache repeated inline images. In practice this doesn't seem to be a problem in any of the PDF files I've tested, and generally I'd rather err on the side of *not* caching given that too aggressive caching can easily lead to rendering bugs.

One small, but somewhat annoying, complication is that by the time `Parser.makeInlineImage` is called, we no longer know the *exact* stream position where the inline image dictionary starts. Having access to that information is crucial here, and the easiest solution I could come up with is to track this in the current `Lexer` instance.[1]

With the patch, we're thus able to fix the referenced issues without incurring large regressions in problematic cases such as issue 2618.

Fixes 9398; also improves/fixes the `issue8823` reference test.

---

[1] Obviously I'd have preferred if this patch could be limited to `Parser.makeInlineImage`, without the need for this "hack", but I'm not sure what that'd look like here.
2018-02-12 16:43:47 +01:00
Jonas Jenwald
1cf116ab88 Enable the mozilla/use-includes-instead-of-indexOf ESLint rule globally
This rule is available from https://www.npmjs.com/package/eslint-plugin-mozilla, and is enforced in mozilla-central. Note that we have the necessary `Array`/`String` polyfills and that most cases have already been fixed, see PRs 9032 and 9434.
2018-02-10 23:24:50 +01:00
Jonas Jenwald
2eb29409bc Enable the mozilla/avoid-removeChild ESLint rule globally
This rule is available from https://www.npmjs.com/package/eslint-plugin-mozilla, and is enforced in mozilla-central. Note that we have a polyfill for `ChildNode.remove()` and that most cases have already been fixed, see PRs 8056 and 8138.
2018-02-10 23:24:50 +01:00
Tim van der Meij
7bb066494f
Merge pull request #9427 from Snuffleupagus/native-JPEG-decoding-fallback
Fallback to the built-in JPEG decoder when browser decoding fails, and attempt to handle JPEG images with DNL (Define Number of Lines) markers (issue 8614)
2018-02-09 21:36:08 +01:00
Jonas Jenwald
a18c65ae9f Use the correct stream position when reading maxSizeOfInstructions from the maxp table (issue 9458)
Please refer to the `maxp` table specification, found at https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6maxp.html.

Fixes 9458.
2018-02-07 21:57:43 +01:00
Jonas Jenwald
bf4166e6c9 Attempt to handle DNL (Define Number of Lines) markers when parsing JPEG images (issue 8614)
Please refer to the specification, found at https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=49

Given how the JPEG decoder is currently implemented, we need to know the value of the scanLines parameter (among others) *before* parsing of the SOS (Start of Scan) data begins.
Hence the best solution I could come up with here, is to re-parse the image in the *hopefully* rare case of JPEG images that include a DNL (Define Number of Lines) marker.

Fixes 8614.
2018-02-05 21:05:32 +01:00
Rob Wu
911659cd70 Add tests for file names with spaces and semicolons 2018-02-04 17:58:10 +01:00
Jonas Jenwald
f4a95de694 Attempt to find the next valid marker when encountering invalid image data in JpegImage.parse (issue 9425)
In the JPEG images in the referenced PDF file, the DHT (Define Huffman Tables) segments contain more data than expected based on the length parameter.

Fixes 9425.
2018-02-03 16:01:19 +01:00
Jonas Jenwald
56a8c934dd [api-major] Remove the PDFJS.disableWorker option
Despite this patch removing the `disableWorker` option itself, please note that we'll still fallback to loading the worker file(s) on the main-thread when running in environments without proper Web Worker support.

Furthermore it's still possible, even with this patch, to force the use of fake workers by manually loading the necessary file using a `<script>` tag on the main-thread.[1]
That way, the functionality of the now removed `SINGLE_FILE` build target and the resulting `build/pdf.combined.js` file can still be achieved simply by adding e.g. `<script src="build/pdf.worker.js"></script>` to the HTML (obviously with the path adjusted as needed).

Finally note that the `disableWorker` option is a performance footgun, and unfortunately many existing third-party examples actually use it without providing any sort of warning/justification.

---

[1] This approach is used in the default viewer, since certain kind of debugging may be easier if the code is running directly on the main-thread.
2018-01-31 12:52:10 +01:00
Jonas Jenwald
a5aaf62754 [api-minor] Add a (static) PDFWorker.getWorkerSrc method that returns the current workerSrc
This method returns the currently used `workerSrc`, which thus allows obtaining the fallback `workerSrc` value (e.g. when the option wasn't set by the user).
2018-01-31 12:52:07 +01:00
Jonas Jenwald
42c71cd99f Utilize PDFNodeStream to run more API unit-tests on Node.js/Travis 2018-01-28 17:14:08 +01:00
Jani Pehkonen
5593c970e0 Implement Huffman coding in JBIG2 2018-01-23 17:04:07 +02:00
Jonas Jenwald
f0216484bc
Merge pull request #9383 from Rob--W/better-content-disposition-parser
Better content disposition parser
2018-01-21 15:08:14 +01:00
Rob Wu
a4e907169e Improve correctness of Content-Disposition parser
Re-uses logic from 9f5fcae11c/extension/content-disposition.js
which is already covered by tests: 6f3bbb8bbf
2018-01-21 13:31:12 +01:00
Jonas Jenwald
fe5102a27f
Merge pull request #9363 from Rob--W/fetch-http/s-only
Limit PDFFetchStream to http(s) in the Chrome extension
2018-01-21 11:45:09 +01:00
Rob Wu
0ffe9b9289 Remove useless test from network_utils_spec.js
Remove "returns null when content disposition is form-data".
The name of the test is already misleading: It suggests that
the return value is null if the Content-Disposition starts with
"form-data". This is not the case, anything with the "filename"
parameter is accepted.

So, to correct this, one would have to rephrase the test description to
"returns null when content disposition has no filename".
But this is already tested by the test called
"gets the filename from the response header".

So, remove the test.
2018-01-19 17:28:47 +01:00
Jonas Jenwald
69a8336cf1 Address the final round of review comments for Content-Disposition filename extraction
This patch updates the `IPDFStreamReader` interface and ensures that the interface/implementation of `network.js`, `fetch_stream.js`, `node_stream.js`, and `transport_stream.js` all match properly.
The unit-tests are also adjusted, to more closely replicate the actual behaviour of the various actual `IPDFStreamReader` implementations.
Finally, this patch adjusts the use of the Content-Disposition filename when setting the title in the viewer, and adds `PDFDocumentProperties` support as well.
2018-01-18 17:39:22 +01:00
Juan Salvador Perez Garcia
eb1f6f4c24 Content disposition filename
File name is extracted from headers.
2018-01-18 17:38:44 +01:00
Jonas Jenwald
f774abc8d3
Merge pull request #9368 from acchou/9362
end() is the official way to release a Writable stream
2018-01-17 12:45:23 +01:00
Andy Chou
b867c3299b end() is the official way to release a Writable stream 2018-01-16 12:01:33 -08:00
Rob Wu
1c8cacd6b9 Limit PDFFetchStream to http(s) in the Chrome extension
The `fetch` API is only supported for http(s), even in Chrome extensions.
Because of this limitation, we should use the XMLHttpRequest API when the
requested URL is not a http(s) URL.

Fixes #9361
2018-01-14 00:34:46 +01:00
Jonas Jenwald
0e1b5589e7 Restore the btoa/atob polyfills for Node.js
These were removed in PR 9170, since they were unused in the browsers that we'll support in PDF.js version `2.0`.
However looking at the output of Travis, where a subset of the unit-tests are run using Node.js, there's warnings about `btoa` being undefined. This doesn't appear to cause any errors, which probably explains why we didn't notice this before (despite PR 9201).
2018-01-13 01:31:05 +01:00
Jonas Jenwald
d0c8992e8a Attempt to actually resolve ColourSpace names in accordance with the specification (issue 9285)
Please refer to the PDF specification, in particular http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.3801570

> A colour space shall be specified in one of two ways:
>  - Within a content stream, the CS or cs operator establishes the current colour space parameter in the graphics state. The operand shall always be name object, which either identifies one of the colour spaces that need no additional parameters (DeviceGray, DeviceRGB, DeviceCMYK, or some cases of Pattern) or shall be used as a key in the ColorSpace subdictionary of the current resource dictionary (see 7.8.3, "Resource Dictionaries"). In the latter case, the value of the dictionary entry in turn shall be a colour space array or name. A colour space array shall never be inline within a content stream.
>
> - Outside a content stream, certain objects, such as image XObjects, shall specify a colour space as an explicit parameter, often associated with the key ColorSpace. In this case, the colour space array or name shall always be defined directly as a PDF object, not by an entry in the ColorSpace resource subdictionary. This convention also applies when colour spaces are defined in terms of other colour spaces.
2018-01-10 20:20:43 +01:00