It shouldn't be possible for the `getBytes`-call to throw a `MissingDataException`, since all resources are loaded *before* e.g. font-parsing ever starts; see f0817015bd/src/core/object_loader.js (L111-L126)
Furthermore, even if we'd *somehow* re-throw a `MissingDataException` here that still won't help considering where the `Type1Font`-instance is created. Note how in the `Font`-constructor we simply catch any errors and fallback to a standard font, which means that a `MissingDataException` would just lead to rendering errors anyway; see f0817015bd/src/core/fonts.js (L648-L691)
All-in-all, it's not possible for a `MissingDataException` to be thrown in `getHeaderBlock` and this code-path can thus be removed.
Obviously the `Font`-class is still *very* large, given particularly how TrueType fonts are handled, however this patch-series at least improves things by moving a number of functions/classes into their own files.
As a follow-up it might make sense to try and re-factor/extract the TrueType parsing into its own file, since all of this code is quite old, however that's probably best left for another time.
For e.g. `gulp mozcentral`, the *built* `pdf.worker.js` files decreases from `1 620 332` to `1 617 466` bytes with this patch-series.
These changes were made automatically, using `gulp lint --fix`.
Given the large size of this patch, the manual fixes are done separately in the next commit.
These changes were made *mostly* automatically, using `gulp lint --fix`, with the following manual changes:
```diff
diff --git a/src/core/fonts_utils.js b/src/core/fonts_utils.js
index f88ce4a8c..c4b3f3808 100644
--- a/src/core/fonts_utils.js
+++ b/src/core/fonts_utils.js
@@ -167,8 +167,8 @@ function type1FontGlyphMapping(properties, builtInEncoding,
glyphNames) {
}
// Lastly, merge in the differences.
- let differences = properties.differences,
- glyphsUnicodeMap;
+ const differences = properties.differences;
+ let glyphsUnicodeMap;
if (differences) {
for (charCode in differences) {
const glyphName = differences[charCode];
```
- `FontFlags`, is used in both `src/core/fonts.js` and `src/core/evaluator.js`.
- `getFontType`, same as the above.
- `MacStandardGlyphOrdering`, is a fairly large data-structure and `src/core/fonts.js` is already a *very* large file.
- `recoverGlyphName`, a dependency of `type1FontGlyphMapping`; please see below.
- `SEAC_ANALYSIS_ENABLED`, is used by both `Type1Font`, `CFFFont`, and unit-tests; please see below.
- `type1FontGlyphMapping`, is used by both `Type1Font` and `CFFFont` which a later patch will move to their own files.
This is done automatically with `gulp lint --fix` and the following
manual changes:
```diff
diff --git a/src/core/function.js b/src/core/function.js
index 878001057..b7e3e6ccf 100644
--- a/src/core/function.js
+++ b/src/core/function.js
@@ -131,7 +131,7 @@ function toNumberArray(arr) {
return arr;
}
-var PDFFunction = (function PDFFunctionClosure() {
+const PDFFunction = (function PDFFunctionClosure() {
const CONSTRUCT_SAMPLED = 0;
const CONSTRUCT_INTERPOLATED = 2;
const CONSTRUCT_STICHED = 3;
@@ -484,7 +484,9 @@ var PDFFunction = (function PDFFunctionClosure() {
// clip to domain
const v = clip(src[srcOffset], domain[0], domain[1]);
// calculate which bound the value is in
- for (var i = 0, ii = bounds.length; i < ii; ++i) {
+ const length = bounds.length;
+ let i;
+ for (i = 0; i < length; ++i) {
if (v < bounds[i]) {
break;
}
@@ -673,23 +675,21 @@ const PostScriptStack = (function PostScriptStackClosure() {
roll(n, p) {
const stack = this.stack;
const l = stack.length - n;
- let r = stack.length - 1,
- c = l + (p - Math.floor(p / n) * n),
- i,
- j,
- t;
- for (i = l, j = r; i < j; i++, j--) {
- t = stack[i];
+ const r = stack.length - 1;
+ const c = l + (p - Math.floor(p / n) * n);
+
+ for (let i = l, j = r; i < j; i++, j--) {
+ const t = stack[i];
stack[i] = stack[j];
stack[j] = t;
}
- for (i = l, j = c - 1; i < j; i++, j--) {
- t = stack[i];
+ for (let i = l, j = c - 1; i < j; i++, j--) {
+ const t = stack[i];
stack[i] = stack[j];
stack[j] = t;
}
- for (i = c, j = r; i < j; i++, j--) {
- t = stack[i];
+ for (let i = c, j = r; i < j; i++, j--) {
+ const t = stack[i];
stack[i] = stack[j];
stack[j] = t;
}
@@ -939,7 +939,7 @@ class PostScriptEvaluator {
// We can compile most of such programs, and at the same moment, we can
// optimize some expressions using basic math properties. Keeping track of
// min/max values will allow us to avoid extra Math.min/Math.max calls.
-var PostScriptCompiler = (function PostScriptCompilerClosure() {
+const PostScriptCompiler = (function PostScriptCompilerClosure() {
class AstNode {
constructor(type) {
this.type = type;
```
The `this.data` property is, when defined, sent from the worker-thread as a `Uint8Array` and there's thus no reason to re-initialize the TypedArray here.
Note also the `FontFaceObject.createNativeFontFace` method just above, where we simply use `this.data` as-is.
The explanation for this code looking like it does is, as is often the case, for historical reasons. Originally we only supported `@font-face`, before the Font Loading API existed, and back then we also polyfilled TypedArrays (using regular Arrays) which should explain this particular line of code.
Given that the `bytesToString(BaseStream.getBytes(...))` pattern is somewhat common throughout the `src/core/` code, it cannot hurt to add a new `BaseStream`-method which handles that case internally.
- Improve chunking in order to fix some bugs where the spaces aren't here:
* track the last position where a glyph has been drawn;
* when a new glyph (first glyph in a chunk) is added then compare its position with the last saved one and add a space or break:
- there are multiple ways to move the glyphs and to avoid to have to deal with all the different possibilities it's a way easier to just compare positions;
- and so there is now one function (i.e. "compareWithLastPosition") where all the job is done.
- Add some breaks in order to get lines;
- Remove the multiple whites spaces:
* some spaces were filled with several whites spaces and so it makes harder to find some sequences of words using the search tool;
* other pdf readers replace spaces by one white space.
Update src/core/evaluator.js
Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
Given that both the textLayer rendering *and* the structTree parsing is asynchronous, it's possible that we'll attempt to insert the structTree in a removed page. While there's thankfully no outright breakage caused by this, it will nonetheless lead to errors being printed in the console and we should obviously avoid this.
To reproduce this bug (without the patch), open http://localhost:8888/web/viewer.html?file=/test/pdfs/pdf.pdf#disableStream=true&disableAutoFetch=true and scroll *very quickly* through the document and notice the following error being (intermittently) printed in the console:
```
Uncaught (in promise) TypeError: can't access property "appendChild", this.canvas is undefined
```
Looking at the `ChunkedStream` implementation, it's basically a "regular" `Stream` but with added functionality in order to deal with fetching/loading of missing data.
Hence, by letting `ChunkedStream` extend `Stream`, we can remove some duplicate methods from the `ChunkedStream` class.
For all of the other `DecodeStream`s we're not passing in a `Dict`-instance manually, but instead get it from the `stream`-parameter. Hence there's no particularly good reason, as far as I can tell, to not do the same thing in `Jbig2Stream`/`JpegStream`/`JpxStream` as well.
The way that `getBaseStreams` is currently handled has bothered me from time to time, especially how we're checking if the method exists before calling it.
By adding a dummy `BaseStream.getBaseStreams` method, and having the call-sites simply check the return value, we can improve some of the relevant code.
Note in particular how the `ObjectLoader._walk` method didn't actually check that the data in question is a Stream instance, and instead only checked the `currentNode` (which could be anything) for the existence of a `getBaseStreams` property.
By having an abstract base-class, it becomes a lot clearer exactly which methods/getters are expected to exist on all Stream instances.
Furthermore, since a number of the methods are *identical* for all Stream implementations, this reduces unnecessary code duplication in the `Stream`, `DecodeStream`, and `ChunkedStream` classes.
For e.g. `gulp mozcentral`, the *built* `pdf.worker.js` files decreases from `1 619 329` to `1 616 115` bytes with this patch-series.