Commit Graph

1790 Commits

Author SHA1 Message Date
Jonas Jenwald
f78efd883e Attempt to throw MissingPDFException when applicable in node_stream.js (issue 9791) 2018-08-06 10:00:03 +02:00
Tim van der Meij
f6eaa99cb2
Reword test reporter message
The font tests use Jasmine too, so while they are technically unit
tests, it's a bit confusing to see `Started unit tests` when the font
tests are run on the bots.
2018-08-05 21:21:46 +02:00
Tim van der Meij
4111871ac5
Merge pull request #9958 from brendandahl/always-fallback
Always fallback to system font on font failure.
2018-08-05 19:58:48 +02:00
Tim van der Meij
27e8a2f6fe
Merge pull request #9959 from brendandahl/test-util
Utility script to add a reference test.
2018-08-05 16:53:37 +02:00
Tim van der Meij
b65d0450f5
Merge pull request #9960 from brendandahl/strict-verify
Fail when MD5 of test files fails on bots.
2018-08-05 16:44:12 +02:00
Brendan Dahl
482ea2af32 Fail when MD5 of test files fails on bots. 2018-08-03 17:48:47 -07:00
Brendan Dahl
8b3ed473c1 Utility script to add a reference test. 2018-08-03 17:24:24 -07:00
Brendan Dahl
5f67a6a237 Always fallback to system font on font failure.
The font in the PDF is marked as a CIDFontType0, but the font file is
actually a true type font. To fully address this issue we should really
peek into the font file and try to determine what it is. However, this
is the first case of this issue, so I think this solution is acceptable for
now.
2018-08-03 16:49:22 -07:00
Tim van der Meij
444976bcd5
Merge pull request #9956 from brendandahl/allow-zero-progress
Allow loaded progress of 0 in unit tests.
2018-08-04 00:19:02 +02:00
Tim van der Meij
f19ee127a3
Merge pull request #9874 from boundlesshq/master
[api-minor] Include export value for checkboxes
2018-08-03 23:43:23 +02:00
Brendan Dahl
d762567bcf Allow loaded progress of 0 in unit tests. 2018-08-03 10:31:46 -07:00
Tim van der Meij
8a4be24645
Merge pull request #9948 from Snuffleupagus/url-polyfill-unit-tests
Add (basic) unit-tests for the non-global `URL` constructor (PR 9868 follow-up)
2018-08-02 23:32:07 +02:00
Brian
2a665ebad4 Removed Extraneous Matrix Check in CalRGB Conversion 2018-08-02 10:16:42 -07:00
Jonas Jenwald
f8388710e6 Add (basic) unit-tests for the non-global URL constructor (PR 9868 follow-up)
This should really have been included in PR 9868, since it will help ensure that the `URL` constructor is correctly imported/exported by `src/shared/util.js`.
2018-08-02 10:32:06 +02:00
Tim van der Meij
716acf63d4
Merge pull request #9938 from Snuffleupagus/issue-9915
Ensure that Type0, i.e. composite, OpenType fonts with `CFF ` tables are *not* treated as CFF fonts if their glyph mapping is non-default (issue 9915)
2018-08-02 00:11:18 +02:00
Jonas Jenwald
3ce420131f Prefer the Width/Height of the image data, rather than the image dictionary, for JPEG 2000 images (issue 9650)
According to the PDF specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=45
> When using the JPXDecode filter with image XObjects, the following changes to and constraints on some entries in the image dictionary shall apply (see 8.9.5, "Image Dictionaries" for details on these entries):
>
>  - Width and Height shall match the corresponding width and height values in the JPEG2000 data.
>
>  - . . .

Hence it seems reasonable to use the Width/Height of the image data *itself*, rather than the image dictionary when there's a mismatch. Given that JPEG 2000 images are already being parsed, in order to obtain basic parameters, the actual Width/Height is readily available in the `PDFImage` constructor.
2018-08-01 16:42:26 +02:00
Jonas Jenwald
690bcc8c8a Add a reduced, eq, test-case for issue 9915 2018-07-29 23:06:15 +02:00
bion
c31ddf7edc [api-minor] Include export value for checkboxes 2018-07-28 00:30:41 -07:00
Jonas Jenwald
928b89382e [api-minor] Add an IsLinearized property to the PDFDocument.documentInfo getter, to allow accessing the linearization status through the API (via PDFDocumentProxy.getMetadata)
There was a (somewhat) recent question on IRC about accessing the linearization status of a PDF document, and this patch contains a simple way to expose that through already existing API methods.
Please note that during setup/parsing in `PDFDocument` the linearization data is already being fetched and parsed, provided of course that it exists. Hence this patch will *not* cause any additional data to be loaded.
2018-07-26 15:54:19 +02:00
Jonas Jenwald
36b683ca55 Provide custom messages for the no-restricted-globals ESLint rule, and refactor the .eslintrc files (PR 9868 follow-up)
Without providing useful (custom) error messages for the `no-restricted-globals` rule, see https://eslint.org/docs/rules/no-restricted-globals, it's quite likely that the rule will be incorrectly disabled rather than the required globals being imported as intended.

To reduced duplication of the `no-restricted-globals` rule in multiple `.eslintrc` files, it's instead moved to the top-level `.eslintrc` file and disabled as needed on a folder/file basis outside of `/src` and `/web`.
2018-07-23 14:10:13 +02:00
Jonas Jenwald
8ec99b200c Prevent Metadata/XML parsing from breaking PDFDocumentProxy.getMetadata when no XML root document is found (issue 8884)
With the new XML parser, see PR 9573, the referenced PDF file now causes `getMetadata` to fail when incomplete XML tags are encountered. This provides a simple, and hopefully generally useful, work-around that may also help prevent future bugs.

(Without being able to reproduce nor even understand the other (non XML) errors mentioned in issue 8884, I'd say that this patch is enough to close that one as fixed.)
2018-07-18 11:37:40 +02:00
Tim van der Meij
61db85ab64
Merge pull request #9886 from Snuffleupagus/bug-1473809
Prevent errors in `sanitizeTTProgram`, during parsing of CALL functions, when encountering invalid functions stack deltas (bug 1473809)
2018-07-15 17:23:52 +02:00
Jonas Jenwald
2b25deb84c Prevent errors in sanitizeTTProgram, during parsing of CALL functions, when encountering invalid functions stack deltas (bug 1473809)
*I was feeling bored; so this is a very quick, and somewhat naive, attempt at fixing the bug.*

The breaking error, i.e. `Error during font loading: invalid array length`, was thrown when attempting to re-size the `stack` to a *negative* length when parsing the CALL functions.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1473809.
2018-07-10 09:45:55 +02:00
Jonas Jenwald
61186698c3 Replace the remaining occurences of instanceof Array with Array.isArray()
*Follow-up to PRs 8864 and 8813.*

As explained in https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/isArray, `instanceof Array` can have inconsistent behavior. To ensure that only `Array.isArray` is used, an ESLint plugin/rule is added to enforce this.
2018-07-09 13:17:41 +02:00
Tim van der Meij
99f8f2c275
Merge pull request #9853 from Snuffleupagus/re-render-after-cancel
Fix re-rendering, using the same canvas, when rendering was previously cancelled (PR 8519 follow-up)
2018-06-29 23:25:43 +02:00
Tim van der Meij
6fa2c779b5
Merge pull request #9838 from Snuffleupagus/invalid-path-OPS
Error, rather than warn, once a number of invalid path operators are encountered in `EvaluatorPreprocessor.read` (bug 1443140)
2018-06-28 23:15:25 +02:00
Jonas Jenwald
bf0aca86d7 Fix re-rendering, using the same canvas, when rendering was previously cancelled (PR 8519 follow-up)
Currently if `RenderTask.cancel` is called *immediately* after rendering was started, then by the time that `InternalRenderTask.initializeGraphics` is called rendering will already have been cancelled.
However, we're still inserting the canvas into the `canvasInRendering` map, thus breaking any future attempts at re-rendering using the same canvas. Considering that `InternalRenderTask.cancel` always removes the canvas from the map, I cannot imagine that we'd ever want to re-add it *after* rendering was cancelled (it was likely just a simple oversight in PR 8519).

Fixes 9456.
2018-06-28 22:56:37 +02:00
Jonas Jenwald
74e9999044 Add unit-tests for PDFPageProxy.stats (PR 9245 follow-up)
This wasn't included in PR 9245, since all the API options were still global at that time.

Writing the unit-tests also uncovered an issue with `getOperatorList` not starting the "Page Request" timer.
2018-06-25 14:20:49 +02:00
Jonas Jenwald
7f21e38787 Error, rather than warn, once a number of invalid path operators are encountered in EvaluatorPreprocessor.read (bug 1443140)
Incomplete path operators, in particular, can result in fairly chaotic rendering artifacts, as can be observed on page four of the referenced PDF file.

The initial (naive) solution that was attempted, was to simply throw a `FormatError` as soon as any invalid (i.e. too short) operator was found and rely on the existing `ignoreErrors` code-paths. However, doing so would have caused regressions in some files; see the existing `issue2391-1` test-case, which was promoted to an `eq` test to help prevent future bugs.
Hence this patch, which adds special handling for invalid path operators since those may cause quite bad rendering artifacts.

You could, in all fairness, argue that the patch is a handwavy solution and I wouldn't object. However, given that this only concerns *corrupt* PDF files, the way that PDF viewers (PDF.js included) try to gracefully deal with those could probably be described as a best-effort solution anyway.

This patch also adjusts the existing `warn`/`info` messages to print the command name according to the PDF specification, rather than an internal PDF.js enumeration value. The former should be much more useful for debugging purposes.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1443140.
2018-06-24 16:05:08 +02:00
Jonas Jenwald
56e3648b65 Add basic validation of the 'trailer' dictionary candidates in XRef.indexObjects (issue 9418)
This patch avoids choosing a (possible) 'trailer' dictionary that `XRef.parse` and/or the `Catalog` constructor/methods will reject anyway.
Since `XRef.indexObjects` is already parsing the entire PDF file, the extra dictionary look-ups added here shouldn't matter much. Besides, this is a fallback code-path that only applies to corrupt PDF files anyway.
2018-06-20 13:41:22 +02:00
Jonas Jenwald
6bbcafcd26 Let Lexer.getNumber treat a single decimal point as zero (issue 9252)
This is consistent with the behaviour in Adobe Reader.
2018-06-20 13:41:21 +02:00
Jonas Jenwald
df4799a12a Ensure that line-breaks are *only* skipped after operators in Lexer.getNumber (PR 8359 follow-up)
With the current code line-breaks are accepted not just after an operator, but after a decimal point as well. When looking at this again, the latter case seems prone to cause false positives and might also interfere with subsequent patches.

Hence this is code is adjusted to actually do what the original commit message says, and nothing more.
2018-06-20 13:41:15 +02:00
Tim van der Meij
620da6f4df
Merge pull request #9802 from Snuffleupagus/ColorSpace-PDFImage-Uint8ClampedArray
Update `ColorSpace` and `PDFImage` to use `Uint8ClampedArray`s and remove manual clamping/rounding
2018-06-16 17:55:10 +02:00
Tim van der Meij
280f20bf3c
Merge pull request #9809 from Snuffleupagus/getPathGenerator-ignoreErrors
Allow `FontFaceObject.getPathGenerator` to ignore non-embedded fonts during rendering
2018-06-16 16:37:52 +02:00
Jonas Jenwald
bf0db0fb72 Pass the ignoreErrors API option to the FontFaceObject constructor, and utilize it in getPathGenerator to ignore missing glyphs
Obviously it's still not possible to render non-embedded fonts as paths, but in this way the rest of the page will at least be allowed to continue rendering.

*Please note:* Including the 14 standard fonts in PDF.js probably wouldn't be *that* difficult to implement. (I'm not a lawyer, but the fonts from PDFium could probably be used given their BSD license.)
However, the main blocker ought to be the total size of the necessary font data, since I cannot imagine people being OK with shipping ~5 MB of (additional) font data with Firefox. (Based on the reactions when the CMap files were added, and those are only ~1 MB in size.)
2018-06-13 11:02:06 +02:00
Jonas Jenwald
731f2e6dfc Remove manual clamping/rounding from ColorSpace and PDFImage, by having their methods use Uint8ClampedArrays
The built-in image decoders are already using `Uint8ClampedArray` when returning data, and this patch simply extends that to the rest of the image/colorspace code.

As far as I can tell, the only reason for using manual clamping/rounding in the first place was because TypedArrays used to be polyfilled (using regular arrays). And trying to polyfill the native clamping/rounding would probably have been had too much overhead, but given that TypedArray support is required in PDF.js version `2.0` that's no longer a concern.

*Please note:* Because of different rounding behaviour, basically `Math.round` in `Uint8ClampedArray` respectively `Math.floor` in the old code, there will be very slight movement in quite a few existing test-cases. However, the changes should be imperceivable to the naked eye, given that the absolute difference is *at most* `1` for each RGB component when comparing `master` and this patch (see also the updated expectation values in the unit-tests).
2018-06-12 11:01:32 +02:00
Jonas Jenwald
32367c5968 Make the getBytes/peekBytes methods of Stream/DecodeStream/ChunkedStream able to return Uint8ClampedArrays
The built-in image decoders are already returning data as `Uint8ClampedArray`, and subsequently the JPEG/JBIG2/JPX streams are as well. However, for general streams we obviously don't want to force the use of `Uint8ClampedArray` unless an "Image" is actually being decoded.
Hence this patch, which adds a parameter that allows the caller of the `getBytes`/`peekBytes` methods to force a `Uint8ClampedArray` (rather than a `Uint8Array`) to be returned.
2018-06-12 11:01:32 +02:00
youngroz
09359efca0 Replace deprecated constructor with 2018-06-11 20:41:56 -07:00
Tim van der Meij
db874b6680
Merge pull request #9660 from brendandahl/headless
Support running the tests headlessly.
2018-06-09 15:14:42 +02:00
Jonas Jenwald
547f119be6 Simplify the error handling slightly in the src/display/node_stream.js file
The various classes have `this._errored` and `this._reason` properties, where the first one is a boolean indicating if an error was encountered and the second one contains the actual `Error` (or `null` initially).

In practice this means that errors are basically tracked *twice*, rather than just once. This kind of double-bookkeeping is generally a bad idea, since it's quite easy for the properties to (accidentally) get into an inconsistent state whenever the relevant code is modified.

Rather than using a separate boolean, we can just as well check the "error" property directly (since `null` is falsy).

---

Somewhat unrelated to this patch, but `src/display/node_stream.js` is currently *not* handling errors in a consistent or even correct way; compared with `src/display/network.js` and `src/display/fetch_stream.js`.
Obviously using the `createResponseStatusError` utility function, from `src/display/network_utils.js`, might not make much sense in a Node.js environment. However at the *very* least it seems that `MissingPDFException`, or `UnknownErrorException` when one cannot tell that the PDF file is "missing", should be manually thrown.

As is, the API (i.e. `getDocument`) is not returning the *expected* errors when loading fails in Node.js environments (as evident from the `pending` API unit-test).
2018-06-06 09:05:45 +02:00
Brendan Dahl
127590b1c3 Support running the tests headlessly. 2018-06-05 11:29:58 -07:00
Jonas Jenwald
89caaf4071 Use LoopbackPort in the "message_handler" unit-tests
There's no good reason, as far as I can tell, to duplicate the functionality of the `LoopbackPort` in the unit-tests. The only difference between the implementations is that `LoopbackPort` mimics the (native) structured cloning, however that shouldn't matter here since the tests are only sending "simple" data (strings respectively arrays with numbers).

Furthermore the patch also changes `LoopbackPort` to default to using "structured cloning" and deferred invocation of the listeners, since native typed array support is now a requirement for using the PDF.js library.
2018-06-04 12:53:08 +02:00
Jonas Jenwald
44d8afd46b Move MessageHandler into a separate src/shared/message_handler.js file
The `MessageHandler` itself, and its assorted helper functions, are currently the single largest[1] piece of code in the `src/shared/util.js` file. By moving this code into its own file, `src/shared/util.js` thus becomes smaller and more manageable.
2018-06-04 12:53:08 +02:00
Tim van der Meij
90750d624b
Use fs.unlinkSync instead of fs.unlink when removing files in the font tests 2018-06-03 22:17:05 +02:00
Jonas Jenwald
11b4613e20 Reduce the amount of console "spam", by ignoring info/warn calls, when running the unit-tests in Node.js/Travis
Compared to running the unit-tests in "regular" browsers, where any console output won't get mixed up with test output, in Node.js/Travis the test output looks quite noisy.
By ignoring `info`/`warn` calls, when running unit-tests in Node.js/Travis, the test output is a lot smaller not to mention that any *actual* failures are more easily spotted.
2018-06-03 00:28:40 +02:00
Jonas Jenwald
620f65488b Ignore the rest of the image when encountering an EOI (End of Image) marker while parsing Scan data (issue 9679) 2018-05-30 22:40:11 +02:00
Tim van der Meij
4f8dae683e
Use fs.unlinkSync instead of fs.unlink when removing eq.log
This is necessary because Node.js crashes when `fs.unlink` is called
without a callback function.
2018-05-28 23:48:20 +02:00
Tim van der Meij
2f3b05fd5a
Convert all PDF links from HTTP to HTTPS 2018-05-27 16:02:04 +02:00
Tim van der Meij
4958b59369
Fix broken links to Bugzilla PDF attachments 2018-05-27 15:40:33 +02:00
Ryan Hendrickson
91cbc185da Add scrolling modes to web viewer
In addition to the default scrolling mode (vertical), this commit adds
horizontal and wrapped scrolling, implemented primarily with CSS.
2018-05-14 23:10:32 -04:00
Jani Pehkonen
8ea505545a Use FDSelect and FDArray when converting CFF CID font to paths 2018-04-10 16:44:42 +03:00
Tim van der Meij
69e9fe2494
Provide a prefixed appearance CSS rule for reference testing in Chrome
In `rasterizeAnnotationLayer` we load the source CSS files directly, so
these are not processed by Autoprefixer. Since the prefixed rules have
now been removed from the source CSS files, we must manually provide one
prefixed rule that Chrome needs in the overrides CSS file for checkbox
and radio button rendering to work in the reference tests.
2018-04-08 13:54:16 +02:00
Jonas Jenwald
77d025dc14 Move the isPortraitOrientation helper function from web/base_viewer.js to web/ui_utils.js
A couple of basic unit-tests are added, and a manual `isLandscape` check (in `web/base_viewer.js`) is also converted to use the helper function instead.
2018-03-25 18:48:53 +02:00
Tim van der Meij
5c1a16ba6e
Merge pull request #9586 from Snuffleupagus/pageSize-api-rotate
Ensure that `PDFPageProxy.pageSizeInches` handles non-default /Rotate entries correctly
2018-03-25 18:03:32 +02:00
Tim van der Meij
6cc0efe1cc
Merge pull request #9576 from timvandermeij/versions
Update packages
2018-03-25 17:52:26 +02:00
Tim van der Meij
95de23e6e3
Update packages
Jasmine had a major version bump and required a few minor changes in our
booting code. Most notably, using `pending` in a `describe` block is no
longer supported, so we can only return early there. On the positive
side, the unit tests now run in a random order by default, which
eliminates any dependencies between unit tests.

Note that upgrading to Webpack 4 is out of scope for this patch since
the bots cannot work well with the newly generated bundles (both
browsers on both bots do not react within 120 seconds). Webpack 4 is not
faster for us than Webpack 3, so for now there is no need to upgrade.
2018-03-25 16:59:50 +02:00
Jonas Jenwald
d547936827 Ensure that PDFPageProxy.pageSizeInches handles non-default /Rotate entries correctly
Without this patch, the pageSize will be incorrectly reported for some PDF files.

---

Move pageSizeInches to ui_utils
2018-03-25 16:48:29 +02:00
Brendan Dahl
63c7aee112
Merge pull request #9565 from brendandahl/new-name
Rename the globals to shorter names.
2018-03-20 13:49:04 -07:00
Jonas Jenwald
e0ae157582 [api-minor] Fix various issues related to the pageSize information
The `getPageSizeInches` method was implemented on `PDFDocumentProxy`, which seems conceptually wrong since the size property isn't global to the document but rather specific to each page. Hence the method is moved into `PDFPageProxy`, as `get pageSizeInches` instead to address this.

Despite the fact that new API functionality was implemented, no unit-tests were added. To prevent issues later on, we should *always* ensure that new functionality has at least some test-coverage; something that this patch also takes care of.

The new `PDFDocumentProperties._parsePageSize` method seemed unnecessary convoluted. Furthermore, in the "no data provided"-case it even returned incorrect data (an array, rather than the expected object).
Finally, the fallback strings didn't actually agree with the `en-US` locale. This inconsistency doesn't look too great, and it's thus addressed here as well.
2018-03-18 09:10:19 +01:00
Brendan Dahl
01bff1a81d Rename the globals to shorter names.
pdfjsDistBuildPdf=pdfjsLib
pdfjsDistWebPdfViewer=pdfjsViewer
pdfjsDistBuildPdfWorker=pdfjsWorker
2018-03-16 11:08:56 -07:00
Jonas Jenwald
d431ae069d Attempt to handle corrupt PDF documents that inline Page dictionaries in a Kids array (issue 9540)
According to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.1942297, the contents of a Kids array should be indirect objects.
2018-03-12 14:13:23 +01:00
Tim van der Meij
f308d73d40
Implement a single getInheritableProperty utility function
This function combines the logic of two separate methods into one.
The loop limit is also a good thing to have for the calls in
`src/core/annotation.js`.

Moreover, since this is important functionality, a set of unit tests and
documentation is added.
2018-03-03 19:19:39 +01:00
Jonas Jenwald
b8606abbc1 [api-major] Completely remove the global PDFJS object 2018-03-01 18:13:27 +01:00
Jonas Jenwald
212553840f Move the pdfBug option from the global PDFJS object and into getDocument instead
Also removes the now unused `getDefaultSetting` helper function.
2018-03-01 18:11:17 +01:00
Jonas Jenwald
b69abf1111 Move the disableRange option from the global PDFJS object and into getDocument instead 2018-03-01 18:11:16 +01:00
Jonas Jenwald
69d7191034 Move the disableAutoFetch option from the global PDFJS object and into getDocument instead
One additional complication with removing this option from the global `PDFJS` object, is that the viewer currently needs to check `disableAutoFetch` in a couple of places. To address this I'm thus proposing adding a getter in `PDFDocumentProxy`, to allow checking the *actually* used values for a particular `getDocument` invocation.
2018-03-01 18:11:16 +01:00
Jonas Jenwald
3c2fbdffe6 Move the cMapUrl and cMapPacked options from the global PDFJS object and into getDocument instead 2018-03-01 18:11:16 +01:00
Jonas Jenwald
83d52518da [api-major] Refactor PDFWorker to be initialized with a parameter object, rather than a bunch of regular parameters 2018-02-16 13:22:35 +01:00
Jonas Jenwald
c3c1fc511d Move the workerSrc option from the global PDFJS object and into GlobalWorkerOptions instead 2018-02-16 13:22:35 +01:00
Rob Wu
a89071bdef
Merge pull request #9470 from Snuffleupagus/issue-4888
Ensure that `JpegImage.getData` returns the correct data length when `forceRGBoutput == true` (issue 4888)
2018-02-16 13:14:21 +01:00
Brendan Dahl
4f5fb78237
Merge pull request #9401 from brendandahl/svg-fail
Make the test framework more resilient to errors.
2018-02-14 17:05:40 -08:00
Brendan Dahl
53d19b619d Make the test framework more resilient to errors. 2018-02-14 17:02:19 -08:00
Tim van der Meij
538dda1096
Merge pull request #9479 from Snuffleupagus/refactor-viewer-options
[api-major] Refactor viewer components initialization to reduce their dependency on the global `PDFJS` object
2018-02-14 22:47:33 +01:00
Jonas Jenwald
11ab3b5c00 Ensure that JpegImage.getData returns the correct data length when forceRGBoutput == true (issue 4888)
With PDF.js version `2.0` we'll only support browsers with built-in `TypedArray` functionality, hence there doesn't seem to be any good reason not to implement this now.

Fixes 4888.
2018-02-13 20:44:21 +01:00
Jonas Jenwald
e95c11a7f0 Remove the undocumented PDFJS.enableStats option
In order to simplify things, the undocumented `enableStats` option was removed and `pdfBug` is now instead used to enabled general debugging *and* page request/rendering stats.
Considering that in the default viewer the `stats` was only used when debugging was also enabled, this simplification (code wise) definitely seem worthwhile to me.
2018-02-13 16:56:57 +01:00
Jonas Jenwald
3a6f6d23d6 Move the externalLinkTarget and externalLinkRel options to PDFLinkService options
This removes the `PDFJS.externalLinkTarget`/`PDFJS.externalLinkRel` dependency from the viewer components, but please note that as a *temporary* solution the default viewer still uses it.
2018-02-13 14:28:40 +01:00
Jonas Jenwald
c45c394364 Move the imageResourcesPath option to a BaseViewer/PDFPageView/AnnotationLayerBuilder option
This removes the `PDFJS.imageResourcesPath` dependency from the viewer components and the test-suite, but please note that as a *temporary* solution the default viewer still uses it.
2018-02-13 14:28:38 +01:00
Jonas Jenwald
f05e5c5460 Take the dictionary, and not just the image data, into account when caching inline images (issue 9398)
The reason for the bug is that we're only computing a checksum of the image data itself, but completely ignore the inline dictionary. The latter is important, since in practice it's not uncommon for inline images to be identical but use e.g. different ColourSpaces.

There's obviously a couple of different ways that we could compute a hash/checksum of the dictionary.
Initially I tried using `MurmurHash3_64` to compute a hash of the keys/values in the dictionary. Unfortunately this approach turned out to be *way* too slow in practice, especially for PDF files with a huge number of inline images; in particular issue 2618 would regresses quite badly with this solution.

The solution that is instead implemented in this patch, is to compute a checksum of the dictionary contents. While this is a much simpler, not to mention a lot more efficient, solution there's one drawback associated with it:
If the contents of inline image dictionaries are ordered differently, they will not be considered equal with this approach which could thus lead to failures to cache repeated inline images. In practice this doesn't seem to be a problem in any of the PDF files I've tested, and generally I'd rather err on the side of *not* caching given that too aggressive caching can easily lead to rendering bugs.

One small, but somewhat annoying, complication is that by the time `Parser.makeInlineImage` is called, we no longer know the *exact* stream position where the inline image dictionary starts. Having access to that information is crucial here, and the easiest solution I could come up with is to track this in the current `Lexer` instance.[1]

With the patch, we're thus able to fix the referenced issues without incurring large regressions in problematic cases such as issue 2618.

Fixes 9398; also improves/fixes the `issue8823` reference test.

---

[1] Obviously I'd have preferred if this patch could be limited to `Parser.makeInlineImage`, without the need for this "hack", but I'm not sure what that'd look like here.
2018-02-12 16:43:47 +01:00
Jonas Jenwald
1cf116ab88 Enable the mozilla/use-includes-instead-of-indexOf ESLint rule globally
This rule is available from https://www.npmjs.com/package/eslint-plugin-mozilla, and is enforced in mozilla-central. Note that we have the necessary `Array`/`String` polyfills and that most cases have already been fixed, see PRs 9032 and 9434.
2018-02-10 23:24:50 +01:00
Jonas Jenwald
2eb29409bc Enable the mozilla/avoid-removeChild ESLint rule globally
This rule is available from https://www.npmjs.com/package/eslint-plugin-mozilla, and is enforced in mozilla-central. Note that we have a polyfill for `ChildNode.remove()` and that most cases have already been fixed, see PRs 8056 and 8138.
2018-02-10 23:24:50 +01:00
Tim van der Meij
7bb066494f
Merge pull request #9427 from Snuffleupagus/native-JPEG-decoding-fallback
Fallback to the built-in JPEG decoder when browser decoding fails, and attempt to handle JPEG images with DNL (Define Number of Lines) markers (issue 8614)
2018-02-09 21:36:08 +01:00
Jonas Jenwald
a18c65ae9f Use the correct stream position when reading maxSizeOfInstructions from the maxp table (issue 9458)
Please refer to the `maxp` table specification, found at https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6maxp.html.

Fixes 9458.
2018-02-07 21:57:43 +01:00
Jonas Jenwald
bf4166e6c9 Attempt to handle DNL (Define Number of Lines) markers when parsing JPEG images (issue 8614)
Please refer to the specification, found at https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=49

Given how the JPEG decoder is currently implemented, we need to know the value of the scanLines parameter (among others) *before* parsing of the SOS (Start of Scan) data begins.
Hence the best solution I could come up with here, is to re-parse the image in the *hopefully* rare case of JPEG images that include a DNL (Define Number of Lines) marker.

Fixes 8614.
2018-02-05 21:05:32 +01:00
Rob Wu
911659cd70 Add tests for file names with spaces and semicolons 2018-02-04 17:58:10 +01:00
Jonas Jenwald
f4a95de694 Attempt to find the next valid marker when encountering invalid image data in JpegImage.parse (issue 9425)
In the JPEG images in the referenced PDF file, the DHT (Define Huffman Tables) segments contain more data than expected based on the length parameter.

Fixes 9425.
2018-02-03 16:01:19 +01:00
Jonas Jenwald
56a8c934dd [api-major] Remove the PDFJS.disableWorker option
Despite this patch removing the `disableWorker` option itself, please note that we'll still fallback to loading the worker file(s) on the main-thread when running in environments without proper Web Worker support.

Furthermore it's still possible, even with this patch, to force the use of fake workers by manually loading the necessary file using a `<script>` tag on the main-thread.[1]
That way, the functionality of the now removed `SINGLE_FILE` build target and the resulting `build/pdf.combined.js` file can still be achieved simply by adding e.g. `<script src="build/pdf.worker.js"></script>` to the HTML (obviously with the path adjusted as needed).

Finally note that the `disableWorker` option is a performance footgun, and unfortunately many existing third-party examples actually use it without providing any sort of warning/justification.

---

[1] This approach is used in the default viewer, since certain kind of debugging may be easier if the code is running directly on the main-thread.
2018-01-31 12:52:10 +01:00
Jonas Jenwald
a5aaf62754 [api-minor] Add a (static) PDFWorker.getWorkerSrc method that returns the current workerSrc
This method returns the currently used `workerSrc`, which thus allows obtaining the fallback `workerSrc` value (e.g. when the option wasn't set by the user).
2018-01-31 12:52:07 +01:00
Jonas Jenwald
42c71cd99f Utilize PDFNodeStream to run more API unit-tests on Node.js/Travis 2018-01-28 17:14:08 +01:00
Jani Pehkonen
5593c970e0 Implement Huffman coding in JBIG2 2018-01-23 17:04:07 +02:00
Jonas Jenwald
f0216484bc
Merge pull request #9383 from Rob--W/better-content-disposition-parser
Better content disposition parser
2018-01-21 15:08:14 +01:00
Rob Wu
a4e907169e Improve correctness of Content-Disposition parser
Re-uses logic from 9f5fcae11c/extension/content-disposition.js
which is already covered by tests: 6f3bbb8bbf
2018-01-21 13:31:12 +01:00
Jonas Jenwald
fe5102a27f
Merge pull request #9363 from Rob--W/fetch-http/s-only
Limit PDFFetchStream to http(s) in the Chrome extension
2018-01-21 11:45:09 +01:00
Rob Wu
0ffe9b9289 Remove useless test from network_utils_spec.js
Remove "returns null when content disposition is form-data".
The name of the test is already misleading: It suggests that
the return value is null if the Content-Disposition starts with
"form-data". This is not the case, anything with the "filename"
parameter is accepted.

So, to correct this, one would have to rephrase the test description to
"returns null when content disposition has no filename".
But this is already tested by the test called
"gets the filename from the response header".

So, remove the test.
2018-01-19 17:28:47 +01:00
Jonas Jenwald
69a8336cf1 Address the final round of review comments for Content-Disposition filename extraction
This patch updates the `IPDFStreamReader` interface and ensures that the interface/implementation of `network.js`, `fetch_stream.js`, `node_stream.js`, and `transport_stream.js` all match properly.
The unit-tests are also adjusted, to more closely replicate the actual behaviour of the various actual `IPDFStreamReader` implementations.
Finally, this patch adjusts the use of the Content-Disposition filename when setting the title in the viewer, and adds `PDFDocumentProperties` support as well.
2018-01-18 17:39:22 +01:00
Juan Salvador Perez Garcia
eb1f6f4c24 Content disposition filename
File name is extracted from headers.
2018-01-18 17:38:44 +01:00
Jonas Jenwald
f774abc8d3
Merge pull request #9368 from acchou/9362
end() is the official way to release a Writable stream
2018-01-17 12:45:23 +01:00
Andy Chou
b867c3299b end() is the official way to release a Writable stream 2018-01-16 12:01:33 -08:00
Rob Wu
1c8cacd6b9 Limit PDFFetchStream to http(s) in the Chrome extension
The `fetch` API is only supported for http(s), even in Chrome extensions.
Because of this limitation, we should use the XMLHttpRequest API when the
requested URL is not a http(s) URL.

Fixes #9361
2018-01-14 00:34:46 +01:00
Jonas Jenwald
0e1b5589e7 Restore the btoa/atob polyfills for Node.js
These were removed in PR 9170, since they were unused in the browsers that we'll support in PDF.js version `2.0`.
However looking at the output of Travis, where a subset of the unit-tests are run using Node.js, there's warnings about `btoa` being undefined. This doesn't appear to cause any errors, which probably explains why we didn't notice this before (despite PR 9201).
2018-01-13 01:31:05 +01:00
Jonas Jenwald
d0c8992e8a Attempt to actually resolve ColourSpace names in accordance with the specification (issue 9285)
Please refer to the PDF specification, in particular http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.3801570

> A colour space shall be specified in one of two ways:
>  - Within a content stream, the CS or cs operator establishes the current colour space parameter in the graphics state. The operand shall always be name object, which either identifies one of the colour spaces that need no additional parameters (DeviceGray, DeviceRGB, DeviceCMYK, or some cases of Pattern) or shall be used as a key in the ColorSpace subdictionary of the current resource dictionary (see 7.8.3, "Resource Dictionaries"). In the latter case, the value of the dictionary entry in turn shall be a colour space array or name. A colour space array shall never be inline within a content stream.
>
> - Outside a content stream, certain objects, such as image XObjects, shall specify a colour space as an explicit parameter, often associated with the key ColorSpace. In this case, the colour space array or name shall always be defined directly as a PDF object, not by an entry in the ColorSpace resource subdictionary. This convention also applies when colour spaces are defined in terms of other colour spaces.
2018-01-10 20:20:43 +01:00