Commit Graph

1979 Commits

Author SHA1 Message Date
Jonas Jenwald
cdc60402f6 [api-minor] Change PageViewport to throw when the rotation is not a multiple of 90 degrees
As evident from the code, `PageViewport` only supports[1] `rotation` values which are a multiple of 90 degrees. Besides it being somewhat difficult to imagine meaningful use-cases for a non-multiple of 90 degrees `rotation`, the code also becomes both simpler and more efficient by not having to consider arbitrary `rotation` values.

However, any invalid rotation will *silently* fallback to assume zero `rotation` which probably isn't great for e.g. `PDFPageProxy.getViewport` in the API. Hence this patch, which will now enforce that only valid `rotation` values are accepted.

---
[1] As far as I can tell, from looking through the history, nothing else has ever been supported either.
2020-04-22 15:19:13 +02:00
Jonas Jenwald
1cc3dbb694 Enable the dot-notation ESLint rule
*Please note:* These changes were done automatically, using the `gulp lint --fix` command.

This rule is already enabled in mozilla-central, see https://searchfox.org/mozilla-central/rev/567b68b8ff4b6d607ba34a6f1926873d21a7b4d7/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#103-104

The main advantage, besides improved consistency, of this rule is that it reduces the size of the code (by 3 bytes for each case). In the PDF.js code-base there's close to 8000 instances being fixed by the `dot-notation` ESLint rule, which end up reducing the size of even the *built* files significantly; the total size of the `gulp mozcentral` build target changes from `3 247 456` to `3 224 278` bytes, which is a *reduction* of `23 178` bytes (or ~0.7%) for a completely mechanical change.

A large number of these changes affect the (large) lookup tables used on the worker-thread, but given that they are still initialized lazily I don't *think* that the new formatting this patch introduces should undo any of the improvements from PR 6915.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/dot-notation
2020-04-17 12:24:46 +02:00
Tim van der Meij
96923eb2a6
Merge pull request #11805 from Snuffleupagus/issue-11794
Always skip over any additional, unexpected, RSTx (restart) markers in corrupt JPEG images (issue 11794)
2020-04-16 00:08:58 +02:00
Tim van der Meij
a7def05aa1
Merge pull request #11810 from Snuffleupagus/fromCodePoint-followup
A couple of small `String.fromCodePoint` improvements (PR 11698 and 11769 follow-up)
2020-04-16 00:08:16 +02:00
Jonas Jenwald
44b4a74f48 A couple of small String.fromCodePoint improvements (PR 11698 and 11769 follow-up)
- Add a reduced test-case for issue 11768, to prevent future regressions.
   (Given that PR 11769 is only a work-around, rather than a proper solution, it may not be entirely accurate for the issue to be closed as fixed.)

 - Add more validation of the charCode, as found by the heuristics, in `PartialEvaluator._buildSimpleFontToUnicode` to prevent future issues.
2020-04-15 13:45:08 +02:00
Jonas Jenwald
06f6f8719f Always skip over any additional, unexpected, RSTx (restart) markers in corrupt JPEG images (issue 11794) 2020-04-14 23:27:08 +02:00
Jonas Jenwald
746eaf3154 [api-minor] Fix the return value of PDFDocumentProxy.getViewerPreferences when no viewer preferences are present (PR 10738 follow-up)
This patch fixes yet another instalment in the never-ending series of "what the *bleep* was I thinking", by changing the `PDFDocumentProxy.getViewerPreferences` method to return `null` by default.
Not only is this method now consistent with many other API methods, for the data not present case, but it also avoids having to e.g. loop through an object to check if it's actually empty (note the old unit-test).
2020-04-14 23:25:50 +02:00
Jonas Jenwald
426945b480 Update Prettier to version 2.0
Please note that these changes were done automatically, using `gulp lint --fix`.

Given that the major version number was increased, there's a fair number of (primarily whitespace) changes; please see https://prettier.io/blog/2020/03/21/2.0.0.html
In order to reduce the size of these changes somewhat, this patch maintains the old "arrowParens" style for now (once mozilla-central updates Prettier we can simply choose the same formatting, assuming it will differ here).
2020-04-14 12:28:14 +02:00
Jonas Jenwald
91efde5246 Add a heuristic to scale even single-char text, when the horizontal/vertical scaling differs significantly (issue 11713)
At this point in time, compared to when the "ignore single-char" code was added, we *should* generally be doing a much better job of combining text into as few chunks as possible.
However, there's still bad cases where we're not able to combine text as much as one would like, which is why I'm *not* proposing to simply measure/scale all text. Instead this patch will to only measure/scale single-char text in cases where the horizontal/vertical scale is off significantly, since that's were you'd expect bad text-selection behaviour otherwise.

Note that most of the movement caused by this patch is with Type3 fonts, which is a somewhat special font type and one where our current text-selection behaviour is probably the least good.
2020-04-07 00:36:23 +02:00
Jonas Jenwald
938d519192 Create the glyph mapping correctly for composite Type1, i.e. CIDFontType0, fonts (issue 11740)
This updates `Type1Font.getGlyphMapping` with a code-path "borrowed" from `CFFFont.getGlyphMapping`.
2020-04-06 11:21:02 +02:00
Jonas Jenwald
710704508c Fail early, in modern GENERIC builds, if certain required browser functionality is missing (issue 11762)
With two kind of builds now being produced, with/without translation/polyfills, it's unfortunately somewhat easy for users to accidentally pick the wrong one.

In the case where a user would attempt to use a modern build of PDF.js in an older browser, such as e.g. IE11, the failure would be immediate when the code is loaded (given the use of unsupported ECMAScript features).
However in some browsers/environments, in particular Node.js, a modern PDF.js build may load correctly and thus *appear* to function, only to fail for e.g. certain API calls. To hopefully lessen the support burden, and to try and improve things overall, this patch adds checks to ensure that a modern build of PDF.js cannot be used in browsers/environments which lack native support for critical functionality (such as e.g. `ReadableStream`). Hence we'll fail early, with an error message telling users to pick an ES5-compatible build instead.

To ensure that we actually test things better especially w.r.t. usage of the PDF.js library in Node.js environments, the `gulp npm-test` task as used by Node.js/Travis was changed (back) to test an ES5-compatible build.
(Since the bots still test the code as-is, without transpilation/polyfills, this shouldn't really be a problem as far as I can tell.)
As part of these changes there's now both `gulp lib` and `gulp lib-es5` build targets, similar to e.g. the generic builds, which thanks to some re-factoring only required adding a small amount of code.

*Please note:* While it's probably too early to tell if this will be a widespread issue, it's possible that this is the sort of patch that *may* warrant being `git cherry-pick`ed onto the current beta version (v2.4.456).
2020-04-01 19:42:48 +02:00
Jonas Jenwald
664b79abe0 [api-minor] Remove the eventBusDispatchToDOM option/preference, and thus the general ability to dispatch "viewer components" events to the DOM
This functionality was only added to the default viewer for backwards compatibility and to support the various PDF viewer tests in mozilla-central, with the intention to eventually remove it completely.
While the different mozilla-central tests cannot be *easily* converted from DOM events, it's however possible to limit that functionality to only MOZCENTRAL builds *and* when tests are running.

Rather than depending of the re-dispatching of internal events to the DOM, the default viewer can instead be used in e.g. the following way:
```javascript
document.addEventListener("webviewerloaded", function() {
  PDFViewerApplication.initializedPromise.then(function() {
    // The viewer has now been initialized, and its properties can be accessed.

    PDFViewerApplication.eventBus.on("pagerendered", function(event) {
      console.log("Has rendered page number: " + event.pageNumber);
    });
  });
});
```
2020-03-29 12:24:46 +02:00
Jonas Jenwald
fdfcde2b40 Remove a spurious console.log from the ChromiumBrowser function in test/webbrowser.js file
This looks entirely like something which was left-over from debugging, and that line hasn't been touched since PR 4515, especially considering that the corresponding branch in `FirefoxBrowser` doesn't print anything.
2020-03-25 11:57:12 +01:00
Jonas Jenwald
dcb16af968 Whitelist closure related cases to address the remaining no-shadow linting errors
Given the way that "classes" were previously implemented in PDF.js, using regular functions and closures, there's a fair number of false positives when the `no-shadow` ESLint rule was enabled.

Note that while *some* of these `eslint-disable` statements can be removed if/when the relevant code is converted to proper `class`es, we'll probably never be able to get rid of all of them given our naming/coding conventions (however I don't really see this being a problem).
2020-03-25 11:57:12 +01:00
Jonas Jenwald
1d2f787d6a Enable the ESLint no-shadow rule
This rule is *not* currently enabled in mozilla-central, but it appears commented out[1] in the ESLint definition file; see https://searchfox.org/mozilla-central/rev/c80fa7258c935223fe319c5345b58eae85d4c6ae/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#238-239

Unfortunately this rule is, for fairly obvious reasons, impossible to `--fix` automatically (even partially) and each case thus required careful manual analysis.
Hence this ESLint rule is, by some margin, probably the most difficult one that we've enabled thus far. However, using this rule does seem like a good idea in general since allowing variable shadowing could lead to subtle (and difficult to find) bugs or at the very least confusing code.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-shadow

---
[1] Most likely, a very large number of lint errors have prevented this rule from being enabled thus far.
2020-03-25 11:56:05 +01:00
Tim van der Meij
475fa1f97f
Merge pull request #11744 from janpe2/cff-glyph-zero
The first glyph in CFF CIDFonts must be named 0 instead of ".notdef"
2020-03-24 23:52:21 +01:00
Jani Pehkonen
a22c0eab48 The first glyph in CFF CIDFonts must be named 0 instead of ".notdef"
Fixes #11718 in which the `ff` ligature glyph is at index zero in a CFF font. Beacuse this is a CIDFont, glyph names are CIDs, which are integers. Thus the string `".notdef"` is not correct. The rest of the charset data is already parsed correctly as integers when the boolean argument `cid` is true.
2020-03-24 15:56:50 +02:00
Jonas Jenwald
66ee8f5acd Remove variable shadowing from the JavaScript files in the test/unit/ folder
*This is part of a series of patches that will try to split PR 11566 into smaller chunks, to make reviewing more feasible.*

Once all the code has been fixed, we'll be able to eventually enable the ESLint no-shadow rule; see https://eslint.org/docs/rules/no-shadow
2020-03-24 10:44:17 +01:00
Jonas Jenwald
b02be3b268 Update the eslint-plugin-no-unsanitized package to the latest version 2020-03-20 11:25:39 +01:00
Jonas Jenwald
ae2900e510 [api-minor] Change the pageIndex, on PDFPageProxy instances, to a private property
This property has never been documented and/or *intentionally* exposed through the API, instead the `PDFPageProxy.pageNumber` property is the documented/intended API to use here.
Hence pageIndex is changed to a "private" property on `PDFPageProxy` instances, and internal API functionality is also updated to *consistently* use `this._pageIndex` rather than a mix of formats.
2020-03-19 15:47:11 +01:00
Tim van der Meij
1bc5cef2b5
Merge pull request #11698 from Snuffleupagus/issue-11697
Don't accidentally accept invalid glyphNames which *appear* to follow the Cdd{d}/cdd{d} format in `PartialEvaluator._buildSimpleFontToUnicode` (issue 11697)
2020-03-15 13:36:09 +01:00
Tim van der Meij
aa3e5a2b8f
Merge pull request #11644 from Snuffleupagus/openAction
[api-minor] Add more general OpenAction support (PR 10334 follow-up, issue 11642)
2020-03-15 13:16:37 +01:00
Jonas Jenwald
15e8692eff Don't accidentally accept invalid glyphNames which *appear* to follow the Cdd{d}/cdd{d} format in PartialEvaluator._buildSimpleFontToUnicode (issue 11697)
The /Differences array of the problematic font contains a `/c.1` entry, which is consequently detected as a *possible* Cdd{d}/cdd{d} glyphName by the existing heuristics.
Because of how the base 10 conversion is implemented, which is necessary for the base 16 special case, the parsed charCode becomes `0.1` thus causing `String.fromCodePoint` to throw since that obviously isn't a valid code point.

To fix the referenced issue, and to hopefully prevent similar ones in the future, the patch adds *additional* validation of the charCode found by the heuristics.
2020-03-13 23:35:47 +01:00
Jonas Jenwald
c5f67300e9 Rename the isSpace helper function to isWhiteSpace
Trying to enable the ESLint rule `no-shadow`, against the `master` branch, would result in a fair number of errors in the `Glyph` class in `src/core/fonts.js`.
Since the glyphs are exposed through the API, we can't very well change the `isSpace` property on `Glyph` instances. Thus the best approach seems, at least to me, to simply rename the `isSpace` helper function to `isWhiteSpace` which shouldn't cause any issues given that it's only used in the `src/core/` folder.
2020-03-12 11:36:59 +01:00
Jani Pehkonen
e0daabd2dd Magnifier positioning in reftest analyzer
When reftest analyzer shows magnified pixels, there is a seemingly random offset between the mouse position and the magnified position. The reason for this is that reftest analyzer assumes all images have 800 * 1000 pixels but actually the test images have varying sizes.
2020-03-10 19:09:15 +02:00
Jonas Jenwald
65e6ea2cb2 Prevent lookup errors in PartialEvaluator.hasBlendModes from breaking all parsing/rendering of a page (issue 11678)
The PDF document in question is *corrupt*, since it contains an XObject with a truncated dictionary and where the stream contents start without a "stream" operator.
2020-03-09 12:00:12 +01:00
Tim van der Meij
1a97c142b3
Merge pull request #11523 from Snuffleupagus/issue-10880
Add a heuristic, in `src/core/jpg.js`, to handle JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10880)
2020-03-06 23:03:09 +01:00
Jonas Jenwald
160cfc4084 Slightly simplify the lookup of data in Dict.{get, getAsync, has}
Note that `Dict.set` will only be called with values returned through `Parser.getObj`, and thus indirectly via `Lexer.getObj`. Since neither of those methods will ever return `undefined`, we can simply assert that that's the case when inserting data into the `Dict` and thus get rid of `in` checks when doing the data lookups.
In this case, since `Dict.set` is fairly hot, the patch utilizes an *inline check* and when necessary a direct call to `unreachable` to not affect performance of `gulp server/test` too much (rather than always just calling `assert`).

For very large and complex PDF files this will help performance *slightly*, since `Dict.{get, getAsync, has}` is called *a lot* during parsing in the worker.

This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 250,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |   250 |         2838 |        2820 | -18 | -0.65 |        faster
Firefox | Page Request |   250 |            1 |           2 |   0 | 11.92 |        slower
Firefox | Rendering    |   250 |         2837 |        2818 | -19 | -0.65 |        faster
```
2020-03-06 14:12:14 +01:00
Jonas Jenwald
01fb309a2a [api-minor] Add more general OpenAction support (PR 10334 follow-up, issue 11642)
This patch deprecates the existing `getOpenActionDestination` API method, in favor of a better and more general `getOpenAction` method instead. (For now JavaScript actions, related to printing, are still handled as before.)

By clearly separating "regular" Print actions from the JavaScript handling, it's thus possible to get rid of the somewhat annoying and strictly incorrect warning when the viewer loads.
2020-03-06 13:03:00 +01:00
Tim van der Meij
c95b9b1e17
Merge pull request #11653 from Snuffleupagus/ensureStateFont
Ensure that there's always a setFont (Tf) operator before text rendering operators (issue 11651)
2020-03-03 23:33:13 +01:00
Jani Pehkonen
71e7686950 Fix Type1 font parsing when .notdef is not at index zero
Fixes #11477
The PDF draws many space characters but the embedded fonts don't have a glyph named `space`, so `.notdef` should be drawn instead. PDF.js assumed that Type1 fonts define `.notdef` as the first glyph (index 0). However, now the fonts have the glyph `A` at index 0 and `.notdef` is the last one, so `A` appears where spaces are expected.

Because the rest of the font machinery in `core/fonts.js` assumes `.notdef` is at index zero, it's easiest to modify `core/type1_parser.js` so that it "repairs" fonts and makes sure `.notdef` is at index 0.
2020-03-03 21:55:51 +02:00
Jonas Jenwald
65e514e063 Ensure that there's always a setFont (Tf) operator before text rendering operators (issue 11651)
The PDF document in question is *corrupt*, since it contains multiple instances of incorrect operators.
We obviously don't want to slow down parsing of *all* documents (since most are valid), just to accommodate a particular bad PDF generator, hence the reason for the inline check before calling the `ensureStateFont` method.
2020-03-03 10:05:18 +01:00
Takashi Tamura
d8c9f119b0 Fix the vertical writing mode with horizontal scaling. #11555.
It is not valid to multiply textHScale when the writing mode is vertical.

See 9.4.4 Text Space Details, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1694762
2020-02-29 07:48:29 +09:00
Tim van der Meij
e1586016c5
Merge pull request #11577 from Snuffleupagus/Pages-tree-refs
Prevent circular references in the /Pages tree
2020-02-27 23:36:11 +01:00
Tim van der Meij
965ebe63fd
Merge pull request #11540 from tamuratak/charspacing
Fix text spacing with vertical fonts. #7687 and #11526.
2020-02-26 22:26:27 +01:00
Jonas Jenwald
bf09d79eea Use the ESLint no-restricted-syntax rule to prevent direct usage of new Cmd()/new Name()/new Ref()
Given that all of these primitives implement caching, to avoid unnecessarily duplicating those objects *a lot* during parsing, it would thus be good to actually enforce usage of `Cmd.get()`/`Name.get()`/`Ref.get()` in the code-base.
Luckily it turns out that there's an ESLint rule, which is fairly easy to use, that can be used to disallow arbitrary JavaScript syntax.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-restricted-syntax
2020-02-22 21:15:00 +01:00
Jonas Jenwald
c3c3b8cd81 Add a heuristic, in src/core/jpg.js, to handle JPEG images with a wildly incorrect SOF (Start of Frame) scanLines parameter (issue 10880)
*This whole patch feels somewhat arbitrary, and I'd be slightly worried about possibly breaking something else.*

To limit the impact of these changes, we only re-parse JPEG images using a reduced `scanLines` value if and only if: An unexpected EOI (End of Image) marker was encountered during decoding of Scan data *and* the "actual" `scanLines` value is at least one order of magnitude smaller than expected.
2020-02-22 14:16:07 +01:00
Jonas Jenwald
3c7b7be100 Prevent circular references in the /Pages tree 2020-02-19 01:49:39 +01:00
Takashi Tamura
512dbe3060 Fix text spacing with vertical fonts. #7687 and #11526.
When the writing mode is vertical, we have to reverse
the sign of spacing since we are subtracting it from
current.y. We have to add it to current.y.
See 9.4.4 Text Space Details, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1694762
2020-02-11 08:49:23 +09:00
Tim van der Meij
dced0a3821
Merge pull request #11579 from Snuffleupagus/issue-11578
Ignore spaces when normalizing the font name in `Font.fallbackToSystemFont` (issue 11578)
2020-02-09 17:33:09 +01:00
Tim van der Meij
61056a9238
Merge pull request #11551 from Snuffleupagus/issue-11549
Allow skipping of errors when reading broken/corrupt ToUnicode data (issue 11549)
2020-02-09 17:32:35 +01:00
Tim van der Meij
2fb4076e05 Merge pull request #11568 from Snuffleupagus/PDF-header-validation
Ensure that the PDF header contains an actual number (PR 11463 follow-up)
2020-02-09 17:16:25 +01:00
Jonas Jenwald
7937165537 Ignore spaces when normalizing the font name in Font.fallbackToSystemFont (issue 11578) 2020-02-08 19:59:04 +01:00
Jonas Jenwald
7117ee03d6 [api-minor] Change PDFDocumentProxy.cleanup/PDFPageProxy.cleanup to return data
This patch makes the following changes, to improve these API methods:

 - Let `PDFPageProxy.cleanup` return a boolean indicating if clean-up actually happened, since ongoing rendering will block clean-up.
   Besides being used in other parts of this patch, it seems that an API user may also be interested in the return value given that clean-up isn't *guaranteed* to happen.

 - Let `PDFDocumentProxy.cleanup` return the promise indicating when clean-up is finished.

 - Improve the JSDoc comment for `PDFDocumentProxy.cleanup` to mention that clean-up is triggered on *both* threads (without going into unnecessary specifics regarding what *exactly* said data actually is).
   Add a note in the JSDoc comment about not calling this method when rendering is ongoing.

 - Change `WorkerTransport.startCleanup` to throw an `Error` if it's called when rendering is ongoing, to prevent rendering from breaking.
   Please note that this won't stop *worker-thread* clean-up from happening (since there's no general "something is rendering"-flag), however I'm not sure if that's really a problem; but please don't quote me on that :-)
   All of the caches that's being cleared in `Catalog.cleanup`, on the worker-thread, *should* be re-filled automatically even if cleared *during* parsing/rendering, and the only thing that probably happens is that e.g. font data would have to be re-parsed.
  On the main-thread, on the other hand, clearing the caches is more-or-less guaranteed to cause rendering errors, since the rendering code in `src/display/canvas.js` isn't able to re-request any image/font data that's suddenly being pulled out from under it.

 - Last, but not least, add a couple of basic unit-tests for the clean-up functionality.
2020-02-07 17:00:29 +01:00
Jonas Jenwald
88c35d872f Ensure that the PDF header contains an actual number (PR 11463 follow-up)
While it would be nice to change the `PDFFormatVersion` property, as returned through `PDFDocumentProxy.getMetadata`, to a number (rather than a string) that would unfortunately be a breaking API change.
However, it does seem like a good idea to at least *validate* the PDF header version on the worker-thread, rather than potentially returning an arbitrary string.
2020-02-07 12:25:07 +01:00
Brendan Dahl
09a6e17d22
Merge pull request #11528 from janpe2/type1-nonemb-notdef
Hide .notdef glyphs in non-embedded Type1 fonts and don't ignore Widths
2020-02-06 13:30:07 -08:00
Jonas Jenwald
4c54395ff6 Allow skipping of errors when reading broken/corrupt ToUnicode data (issue 11549)
This will allow font loading/parsing to continue, rather than immediately failing, when broken/corrupt CMap data is encountered.
2020-01-30 13:19:05 +01:00
Tim van der Meij
474fe1757e
Merge pull request #11508 from Snuffleupagus/jpg-default-marker
Simplify the handling of unsupported/incorrect markers in `src/core/jpg.js`
2020-01-26 21:32:13 +01:00
Jonas Jenwald
62b2b984cc Render Popup annotations last, once all other annotations have been rendered (issue 11362)
In the current `AnnotationLayer` implementation, Popup annotations require that the parent annotation have already been rendered (otherwise they're simply ignored).
Usually the annotations are ordered, in the `/Annots` array, in such a way that this isn't a problem, however there's obviously no guarantee that all PDF generators actually do so. Hence we simply ensure, when rendering the `AnnotationLayer`, that the Popup annotations are handled last.
2020-01-26 15:49:55 +01:00
Jonas Jenwald
13930e5202 Simplify the handling of unsupported/incorrect markers in src/core/jpg.js
- Re-factor the "incorrect encoding" check, since this can be easily achieved using the general `findNextFileMarker` helper function (with a suitable `startPos` argument).

 - Tweak a condition, to make it easier to see that the end of the data has been reached.

 - Add a reference test for issue 1877, since it's what prompted the "incorrect encoding" check.
2020-01-25 22:52:24 +01:00