Commit Graph

12690 Commits

Author SHA1 Message Date
Tim van der Meij
9871ccc69f
Merge pull request #11777 from Snuffleupagus/Font-exportData-2
[api-minor] Change `Font.exportData` to, by default, stop exporting properties which are completely unused on the main-thread and/or in the API (PR 11773 follow-up)
2020-04-06 22:54:14 +02:00
Jonas Jenwald
2d46230d23 [api-minor] Change Font.exportData to, by default, stop exporting properties which are completely unused on the main-thread and/or in the API (PR 11773 follow-up)
For years now, the `Font.exportData` method has (because of its previous implementation) been exporting many properties despite them being completely unused on the main-thread and/or in the API.
This is unfortunate, since among those properties there's a number of potentially very large data-structures, containing e.g. Arrays and Objects, which thus have to be first structured cloned and then stored on the main-thread.

With the changes in this patch, we'll thus by default save memory for *every* `Font` instance created (there can be a lot in longer documents). The memory savings obviously depends a lot on the actual font data, but some approximate figures are: For non-embedded fonts it can save a couple of kilobytes, for simple embedded fonts a handful of kilobytes, and for composite fonts the size of this auxiliary can even be larger than the actual font program itself.

All-in-all, there's no good reason to keep exporting these properties by default when they're unused. However, since we cannot be sure that every property is unused in custom implementations of the PDF.js library, this patch adds a new `getDocument` option (named `fontExtraProperties`) that still allows access to the following properties:

 - "cMap": An internal data structure, only used with composite fonts and never really intended to be exposed on the main-thread and/or in the API.
   Note also that the `CMap`/`IdentityCMap` classes are a lot more complex than simple Objects, but only their "internal" properties survive the structured cloning used to send data to the main-thread. Given that CMaps can often be *very* large, not exporting them can also save a fair bit of memory.

 - "defaultEncoding": An internal property used with simple fonts, and used when building the glyph mapping on the worker-thread. Considering how complex that topic is, and given that not all font types are handled identically, exposing this on the main-thread and/or in the API most likely isn't useful.

 - "differences": An internal property used with simple fonts, and used when building the glyph mapping on the worker-thread. Considering how complex that topic is, and given that not all font types are handled identically, exposing this on the main-thread and/or in the API most likely isn't useful.

 - "isSymbolicFont": An internal property, used during font parsing and building of the glyph mapping on the worker-thread.

  - "seacMap": An internal map, only potentially used with *some* Type1/CFF fonts and never intended to be exposed in the API. The existing `Font.{charToGlyph, charToGlyphs}` functionality already takes this data into account when handling text.

 - "toFontChar": The glyph map, necessary for mapping characters to glyphs in the font, which is built upon the various encoding information contained in the font dictionary and/or font program. This is not directly used on the main-thread and/or in the API.

 - "toUnicode": The unicode map, necessary for text-extraction to work correctly, which is built upon the ToUnicode/CMap information contained in the font dictionary, but not directly used on the main-thread and/or in the API.

 - "vmetrics": An array of width data used with fonts which are composite *and* vertical, but not directly used on the main-thread and/or in the API.

 - "widths": An array of width data used with most fonts, but not directly used on the main-thread and/or in the API.
2020-04-06 11:47:09 +02:00
Jonas Jenwald
8770ca3014 Make the decryptAscii helper function, in src/core/type1_parser.js, slightly more efficient
By slicing the Uint8Array directly, rather than using the prototype and a `call` invocation, the runtime of `decryptAscii` is decreased slightly (~30% based on quick logging).
The `decryptAscii` function is still less efficient than `decrypt`, however ASCII encoded Type1 font programs are sufficiently rare that it probably doesn't matter much (we've only seen *two* examples, issue 4630 and 11740).
2020-04-06 11:21:02 +02:00
Jonas Jenwald
938d519192 Create the glyph mapping correctly for composite Type1, i.e. CIDFontType0, fonts (issue 11740)
This updates `Type1Font.getGlyphMapping` with a code-path "borrowed" from `CFFFont.getGlyphMapping`.
2020-04-06 11:21:02 +02:00
Jonas Jenwald
6a8c591301 Improve detection of binary/ASCII eexec encrypted Type1 font programs in Type1Parser (issue 11740)
The PDF document, in the referenced issue, actually contains ASCII-encoded Type1 data which we currently *incorrectly* identify as binary.

According to the specification, see https://www-cdf.fnal.gov/offline/PostScript/T1_SPEC.PDF#[{%22num%22%3A203%2C%22gen%22%3A0}%2C{%22name%22%3A%22XYZ%22}%2C87%2C452%2Cnull], the current checks are insufficient to decide between binary/ASCII encoded Type1 font programs.
2020-04-06 11:21:02 +02:00
Jonas Jenwald
2619272d73 Change the signature of TranslatedFont, and convert it to a proper class
In preparation for the next patch, this changes the signature of `TranslatedFont` to take an object rather than individual parameters. This also, in my opinion, makes the call-sites easier to read since it essentially provides a small bit of documentation of the arguments.

Finally, since it was necessary to touch `TranslatedFont` anyway it seemed like a good idea to also convert it to a proper `class`.
2020-04-05 20:53:48 +02:00
Tim van der Meij
0400109b87
Merge pull request #11773 from Snuffleupagus/Font-exportData-1
[api-minor] Change `Font.exportData` to use an explicit white-list of exportable properties, and stop exporting internal/unused properties
2020-04-05 20:50:33 +02:00
Jonas Jenwald
59f54b946d Ensure that all Font instances have the vertical property set to a boolean
Given that the `vertical` property is always accessed on the main-thread, ensuring that the property is explicitly defined seems like the correct thing to do since it also avoids boolean casting elsewhere in the code-base.
2020-04-05 16:27:50 +02:00
Jonas Jenwald
c5e1fd3fde Use "standard" shadowing in the Font.spaceWidth method
With `Font.exportData` now only exporting white-listed properties, there should no longer be any reason to not use standard shadowing in the `Font.spaceWidth` method.
Furthermore, considering the amount of other changes to the code-base over the years it's not even clear to me that the special-case was necessary any more (regardless of the preceding patches).
2020-04-05 16:27:50 +02:00
Jonas Jenwald
a5e4cccf13 [api-minor] Prevent Font.exportData from exporting internal/unused properties
A number of *internal* font properties, which only make sense on the worker-thread, were previously exported. Some of these properties could also contain potentially large Arrays/Objects, which thus unnecessarily increases memory usage since we're forced to copy these to the main-thread and also store them there.

This patch stops exporting the following font properties:

 - "_shadowWidth": An internal property, which was never intended to be exported.

 - "charsCache": An internal cache, which was never intended to be exported and doesn't make any sense on the main-thread. Furthermore, by the time `Font.exportData` is called it's usually `undefined` or a mostly empty Object as well.

 - "cidEncoding": An internal property used with (some) composite fonts.
   As can be seen in the `PartialEvaluator.translateFont` method, `cidEncoding` will only be assigned a value when the font dictionary has an "Encoding" entry which is a `Name` (and not in the `Stream` case, since those obviously cannot be cloned).
   All-in-all this property doesn't really make sense on the main-thread and/or in the API, and note also that the resulting `cMap` property is (partially) available already.

 - "fallbackToUnicode": An internal map, part of the heuristics used to improve text-selection in (some) badly generated PDF documents with simple fonts. This was never intended to be exposed on the main-thread and/or in the API.

 - "glyphCache": An internal cache, which was never intended to be exported and which doesn't make any sense on the main-thread. Furthermore, by the time `Font.exportData` is called it's usually a mostly empty Object as well.

 - "isOpenType": An internal property, used only during font parsing on the worker-thread. In the *very* unlikely event that an API consumer actually needs that information, then `fontType` should be a (generally) much better property to use.

Finally, in the (hopefully) unlikely event that any of these properties become necessary on the main-thread, re-adding them to the white-list is easy to do.
2020-04-05 16:27:50 +02:00
Jonas Jenwald
664f7de540 Change Font.exportData to use an explicit white-list of exportable properties
This patch addresses an existing, and very long standing, TODO in the code such that it's no longer possible to send arbitrary/unnecessary font properties to the main-thread.
Furthermore, by having a white-list it's also very easy to see *exactly* which font properties are being exported.

Please note that in its current form, the list of exported properties contains *every* possible enumerable property that may exist in a `Font` instance.
In practice no single font will contain *all* of these properties, and e.g. embedded/non-embedded/Type3 fonts will all differ slightly with respect to what properties are being defined. Hence why only explicitly set properties are included in the exported data, to avoid half of them being `undefined`, which however should not be a problem for any existing consumer (since they'd already need to handle those cases).

Since a fair number of these font properties are completely *internal* functionality, and doesn't make any sense to expose on the main-thread and/or in the API, follow-up patch(es) will be required to trim down the list. (I purposely included all properties here for brevity and future documentation purposes.)
2020-04-05 16:27:48 +02:00
Tim van der Meij
09cccd8ecc
Merge pull request #11780 from Snuffleupagus/refactor-PDFViewerApplication-load
Move the initialization of "page labels"/"metadata"/"auto print" out of `PDFViewerApplication.load`
2020-04-05 15:46:36 +02:00
Jonas Jenwald
9ef58347ed A couple of small improvements of the PDFViewerApplication.{_initializeMetadata, _initializePdfHistory} methods
- Use template strings when printing document/viewer information in `_initializeMetadata`, since the old format feels overly verbose.
   Also, get the WebGL state from the `BaseViewer` instance[1] rather than the `AppOptions`. Since the `AppOptions` value could theoretically have been changed (by the user) after the viewer components were initialized, it seems much more useful to print the *actual* value that'll be used during rendering.

 - Change `_initializePdfHistory` to actually do the "is embedded"-check first, in accordance with the comment and given that the "disableHistory" option usually shouldn't be set.

---
[1] Admittedly reaching into the `BaseViewer` instance and just grabbing the value perhaps isn't a great approach overall, but given that the WebGL-backend isn't even on by default this probably doesn't matter too much.
2020-04-05 15:41:00 +02:00
Jonas Jenwald
b9add65099 Move the initialization of "auto print" out of PDFViewerApplication.load
Over time, with more and more API-functionality added, the `PDFViewerApplication.load` method has become quite large and complex. In an attempt to improve the current situation somewhat, this patch moves the fetching and initialization of "auto print" out into its own (private) helper method instead.
2020-04-05 15:41:00 +02:00
Jonas Jenwald
d07be1a89b Move the initialization of "metadata" out of PDFViewerApplication.load
Over time, with more and more API-functionality added, the `PDFViewerApplication.load` method has become quite large and complex. In an attempt to improve the current situation somewhat, this patch moves the fetching and initialization of "metadata" out into its own (private) helper method instead.
2020-04-05 15:41:00 +02:00
Jonas Jenwald
32f1d0de76 Move the initialization of "page labels" out of PDFViewerApplication.load
Over time, with more and more API-functionality added, the `PDFViewerApplication.load` method has become quite large and complex. In an attempt to improve the current situation somewhat, this patch moves the fetching and initialization of "page labels" out into its own (private) helper method instead.
2020-04-05 15:41:00 +02:00
Tim van der Meij
9dedaa5eb9
Merge pull request #11781 from Snuffleupagus/fix-gulp-jsdoc
Update the "gulp jsdoc" task to account for API changes in the `mkdirp` package (PR 11772 follow-up)
2020-04-05 15:34:11 +02:00
Jonas Jenwald
f53e1409f6 Update the "gulp jsdoc" task to account for API changes in the mkdirp package (PR 11772 follow-up)
I completely overlooked the fact that we had *one* occurrence of an asynchronous `mkdirp` call in the gulpfile, which thus breaks since the package now uses Promises rather than a callback function; sorry about that!
2020-04-05 12:20:10 +02:00
Tim van der Meij
702fec534d
Merge pull request #11769 from Snuffleupagus/charToGlyph-fontCharCode-range
Ensure that `Font.charToGlyph` won't fail because `String.fromCodePoint` is given an invalid code point (issue 11768)
2020-04-04 14:36:52 +02:00
Jonas Jenwald
87142a635e Ensure that Font.charToGlyph won't fail because String.fromCodePoint is given an invalid code point (issue 11768)
*Please note:* This patch on its own is *not* sufficient to address the underlying problem in the referenced issue, hence why no test-case is included since the *actual* bug still needs to be fixed.

As can be seen in the specification, https://tc39.es/ecma262/#sec-string.fromcodepoint, `String.fromCodePoint` will throw a RangeError for invalid code points.

In the event that a CMap, in a composite font, contains invalid data and/or we fail to parse it correctly, it's thus possible that the glyph mapping that we build end up with entires that cause `String.fromCodePoint` to throw and thus `Font.charToGlyph` to break.
If that happens, as is the case in issue 11768, significant portions of a page/document may fail to render which seems very unfortunate.

While this patch doesn't fix the underlying problem, it's hopefully deemed useful not only for the referenced issue but also to prevent similar bugs in the future.
2020-04-03 09:49:50 +02:00
Tim van der Meij
79a99737a0
Merge pull request #11772 from Snuffleupagus/update-packages
Update packages and translations
2020-04-02 23:44:38 +02:00
Jonas Jenwald
9a3b52f52b Update l10n files 2020-04-02 12:22:18 +02:00
Jonas Jenwald
7b7fe60210 Update the mkdirp package, since its major version was increased 2020-04-02 12:22:13 +02:00
Jonas Jenwald
412fec1545 Update npm packages 2020-04-02 12:13:14 +02:00
Tim van der Meij
7ed71a0d7c
Merge pull request #11771 from Snuffleupagus/issue-11762
Fail early, in modern `GENERIC` builds, if certain required browser functionality is missing (issue 11762)
2020-04-01 22:05:19 +02:00
Jonas Jenwald
710704508c Fail early, in modern GENERIC builds, if certain required browser functionality is missing (issue 11762)
With two kind of builds now being produced, with/without translation/polyfills, it's unfortunately somewhat easy for users to accidentally pick the wrong one.

In the case where a user would attempt to use a modern build of PDF.js in an older browser, such as e.g. IE11, the failure would be immediate when the code is loaded (given the use of unsupported ECMAScript features).
However in some browsers/environments, in particular Node.js, a modern PDF.js build may load correctly and thus *appear* to function, only to fail for e.g. certain API calls. To hopefully lessen the support burden, and to try and improve things overall, this patch adds checks to ensure that a modern build of PDF.js cannot be used in browsers/environments which lack native support for critical functionality (such as e.g. `ReadableStream`). Hence we'll fail early, with an error message telling users to pick an ES5-compatible build instead.

To ensure that we actually test things better especially w.r.t. usage of the PDF.js library in Node.js environments, the `gulp npm-test` task as used by Node.js/Travis was changed (back) to test an ES5-compatible build.
(Since the bots still test the code as-is, without transpilation/polyfills, this shouldn't really be a problem as far as I can tell.)
As part of these changes there's now both `gulp lib` and `gulp lib-es5` build targets, similar to e.g. the generic builds, which thanks to some re-factoring only required adding a small amount of code.

*Please note:* While it's probably too early to tell if this will be a widespread issue, it's possible that this is the sort of patch that *may* warrant being `git cherry-pick`ed onto the current beta version (v2.4.456).
2020-04-01 19:42:48 +02:00
Tim van der Meij
ce1727626c
Merge pull request #11655 from Snuffleupagus/rm-getGlobalEventBus
[api-minor] Remove the `getGlobalEventBus` viewer functionality, and the `eventBusDispatchToDOM` option/preference (PR 11631 follow-up)
2020-03-31 00:17:30 +02:00
Tim van der Meij
35c9f8de38
Merge pull request #11767 from Snuffleupagus/issue-11766
Replace the RTL images with CSS transforms of the standard images (issue 11766)
2020-03-30 23:53:49 +02:00
Jonas Jenwald
63efe61245 Replace the RTL images with CSS transforms of the standard images (issue 11766)
This avoids unnecessary duplication of many images, thus reducing the size of PDF.js image resources slightly.

Note that since the images should only be flipped horizontally, this required specifying the horizontal/vertical scaling separately for the hiDPI-images.
2020-03-30 22:47:49 +02:00
Jonas Jenwald
664b79abe0 [api-minor] Remove the eventBusDispatchToDOM option/preference, and thus the general ability to dispatch "viewer components" events to the DOM
This functionality was only added to the default viewer for backwards compatibility and to support the various PDF viewer tests in mozilla-central, with the intention to eventually remove it completely.
While the different mozilla-central tests cannot be *easily* converted from DOM events, it's however possible to limit that functionality to only MOZCENTRAL builds *and* when tests are running.

Rather than depending of the re-dispatching of internal events to the DOM, the default viewer can instead be used in e.g. the following way:
```javascript
document.addEventListener("webviewerloaded", function() {
  PDFViewerApplication.initializedPromise.then(function() {
    // The viewer has now been initialized, and its properties can be accessed.

    PDFViewerApplication.eventBus.on("pagerendered", function(event) {
      console.log("Has rendered page number: " + event.pageNumber);
    });
  });
});
```
2020-03-29 12:24:46 +02:00
Jonas Jenwald
7fd5f2dd61 [api-minor] Remove the getGlobalEventBus viewer functionality (PR 11631 follow-up)
The correct/intended way of working with the "viewer components" is by providing an `EventBus` instance upon initialization, and the `getGlobalEventBus` was only added for backwards compatibility.
Note, for example, that using `getGlobalEventBus` doesn't really work at all well with a use-case where there's *multiple* `PDFViewer` instances on a one page, since it may then be difficult/impossible to tell which viewer a particular event originated from.

All of the "viewer components" examples have been previously updated, such that there's no longer any code/examples which relies on the now removed `getGlobalEventBus` functionality.
2020-03-29 12:20:23 +02:00
Tim van der Meij
c12ea21c14
Merge pull request #11755 from Snuffleupagus/rm-fonts-sizes-encoding
Remove the unused `sizes` and `encoding` properties on `Font` instances
2020-03-27 21:44:16 +01:00
Jonas Jenwald
14c999e3ee Remove the unused sizes and encoding properties on Font instances
The `sizes` property doesn't appear to have been used ever since the code was first split into main/worker-threads, which is so many years ago that I wasn't able to easily find exactly in which PR/commit it became unused.

The `encoding` property is always assigned the `properties.baseEncoding` value, however the `PartialEvaluator` doesn't actually compute/set that value any more. Again it was difficult to determine when it became unused, but it's been that way for years.
2020-03-27 10:12:01 +01:00
Tim van der Meij
fa4b431091
Merge pull request #11745 from Snuffleupagus/eslint-no-shadow
Enable the ESLint `no-shadow` rule
2020-03-25 22:48:07 +01:00
Tim van der Meij
ff0f9fd018
Merge pull request #11747 from gdh1995/fix-removing-wheel
Add `passive: false` when removing wheel listeners
2020-03-25 22:37:41 +01:00
Tim van der Meij
8745286dc1
Merge pull request #11646 from Snuffleupagus/_setDocumentAllowFetchPages
Ensure that automatic printing still works when the viewer and/or its pages are hidden (bug 1618621, bug 1618955)
2020-03-25 22:27:19 +01:00
gdh1995
a527eb8c92 Add passive: false when removing wheel listeners
Code of listening `wheel` event uses `{passive: false}`,
while this argument will be treated as `true` before Firefox 49,
accordin to https://developer.mozilla.org/en-US/docs/Web/API/EventTarget/addEventListener#Browser_compatibility .

This commit adds it when removing wheel listeners,
so that such listeners can be really removed.
2020-03-25 22:42:27 +08:00
Jonas Jenwald
fdfcde2b40 Remove a spurious console.log from the ChromiumBrowser function in test/webbrowser.js file
This looks entirely like something which was left-over from debugging, and that line hasn't been touched since PR 4515, especially considering that the corresponding branch in `FirefoxBrowser` doesn't print anything.
2020-03-25 11:57:12 +01:00
Jonas Jenwald
dcb16af968 Whitelist closure related cases to address the remaining no-shadow linting errors
Given the way that "classes" were previously implemented in PDF.js, using regular functions and closures, there's a fair number of false positives when the `no-shadow` ESLint rule was enabled.

Note that while *some* of these `eslint-disable` statements can be removed if/when the relevant code is converted to proper `class`es, we'll probably never be able to get rid of all of them given our naming/coding conventions (however I don't really see this being a problem).
2020-03-25 11:57:12 +01:00
Jonas Jenwald
1d2f787d6a Enable the ESLint no-shadow rule
This rule is *not* currently enabled in mozilla-central, but it appears commented out[1] in the ESLint definition file; see https://searchfox.org/mozilla-central/rev/c80fa7258c935223fe319c5345b58eae85d4c6ae/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#238-239

Unfortunately this rule is, for fairly obvious reasons, impossible to `--fix` automatically (even partially) and each case thus required careful manual analysis.
Hence this ESLint rule is, by some margin, probably the most difficult one that we've enabled thus far. However, using this rule does seem like a good idea in general since allowing variable shadowing could lead to subtle (and difficult to find) bugs or at the very least confusing code.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-shadow

---
[1] Most likely, a very large number of lint errors have prevented this rule from being enabled thus far.
2020-03-25 11:56:05 +01:00
Tim van der Meij
475fa1f97f
Merge pull request #11744 from janpe2/cff-glyph-zero
The first glyph in CFF CIDFonts must be named 0 instead of ".notdef"
2020-03-24 23:52:21 +01:00
Tim van der Meij
292b77fe7b
Merge pull request #11707 from Snuffleupagus/issue-11694
Always prefer the PDF.js JPEG decoder for very large images, in order to reduce peak memory usage (issue 11694)
2020-03-24 23:51:31 +01:00
Tim van der Meij
f85105379e
Merge pull request #11738 from Snuffleupagus/no-shadow-src-core
Remove variable shadowing from the JavaScript files in the `src/core/` folder
2020-03-24 23:10:37 +01:00
Tim van der Meij
c54e773637
Merge pull request #11742 from Snuffleupagus/no-shadow-test-unit
Remove variable shadowing from the JavaScript files in the `test/unit/` folder
2020-03-24 22:44:23 +01:00
Jonas Jenwald
a24ad28d75 Rename BaseViewer._setDocumentViewerElement to BaseViewer._viewerElement
It was pointed out the the old name felt confusing, so let's just rename the getter since it's an internal property anyway.
2020-03-24 16:54:37 +01:00
Jonas Jenwald
c5b0b5c754 Ensure that automatic printing still works when the viewer and/or its pages are hidden (bug 1618621, bug 1618955)
Please note that this patch, on its own, won't magically fix all of these printing bugs without [bug 1618553](https://bugzilla.mozilla.org/show_bug.cgi?id=1618553) also being fixed.
(However I don't foresee that being too difficult, famous last words :-), but it will as suggested require a platform API that we can notify when the viewer is ready.)

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1618621
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1618955
Fixes 8208
2020-03-24 16:26:29 +01:00
Jani Pehkonen
a22c0eab48 The first glyph in CFF CIDFonts must be named 0 instead of ".notdef"
Fixes #11718 in which the `ff` ligature glyph is at index zero in a CFF font. Beacuse this is a CIDFont, glyph names are CIDs, which are integers. Thus the string `".notdef"` is not correct. The rest of the charset data is already parsed correctly as integers when the boolean argument `cid` is true.
2020-03-24 15:56:50 +02:00
Jonas Jenwald
66ee8f5acd Remove variable shadowing from the JavaScript files in the test/unit/ folder
*This is part of a series of patches that will try to split PR 11566 into smaller chunks, to make reviewing more feasible.*

Once all the code has been fixed, we'll be able to eventually enable the ESLint no-shadow rule; see https://eslint.org/docs/rules/no-shadow
2020-03-24 10:44:17 +01:00
Tim van der Meij
85838fc505
Merge pull request #11736 from Snuffleupagus/more-wheel-passive
Add `passive: false` to the `wheel` event listener used in `PDFPresentationMode` (issue 11735, PR 10765 follow-up)
2020-03-24 00:12:20 +01:00
Tim van der Meij
404d698dd2
Merge pull request #11734 from Snuffleupagus/rm-throw-methods
Remove old API methods which were previously converted to throwing (PR 11219 follow-up)
2020-03-24 00:06:02 +01:00