The way that `getBaseStreams` is currently handled has bothered me from time to time, especially how we're checking if the method exists before calling it.
By adding a dummy `BaseStream.getBaseStreams` method, and having the call-sites simply check the return value, we can improve some of the relevant code.
Note in particular how the `ObjectLoader._walk` method didn't actually check that the data in question is a Stream instance, and instead only checked the `currentNode` (which could be anything) for the existence of a `getBaseStreams` property.
By having an abstract base-class, it becomes a lot clearer exactly which methods/getters are expected to exist on all Stream instances.
Furthermore, since a number of the methods are *identical* for all Stream implementations, this reduces unnecessary code duplication in the `Stream`, `DecodeStream`, and `ChunkedStream` classes.
For e.g. `gulp mozcentral`, the *built* `pdf.worker.js` files decreases from `1 619 329` to `1 616 115` bytes with this patch-series.
Given that we're using modules, meaning that only explicitly `export`ed things are visible to the outside, it's no longer necessary to wrap all of the code in a closure.
This is done automatically with `gulp lint --fix` and the following
manual changes:
```diff
diff --git a/src/core/image.js b/src/core/image.js
index 35c06b8ab..e718b9937 100644
--- a/src/core/image.js
+++ b/src/core/image.js
@@ -97,7 +97,7 @@ class PDFImage {
if (isName(filter)) {
switch (filter.name) {
case "JPXDecode":
- var jpxImage = new JpxImage();
+ const jpxImage = new JpxImage();
jpxImage.parseImageProperties(image.stream);
image.stream.reset();
```
Using `for...of` is a modern and generally much nicer pattern, since it gets rid of unnecessary callback-functions. (In a couple of spots, a "regular" `for` loop had to be used.)
This patch first of all moves all checking/validation into the `appendIfJavaScriptDict` function, to avoid duplicating it in multiple places. Secondly, also removes what's now an outdated/incorrect comment since we have implemented scripting support.
Given that we're (almost) always iterating through the result of the `getAll`-calls, using a `Map` seems nicer overall since it's more suited to iteration compared to a regular Object.
Also, add a couple of `Dict`-checks in existing code touched by this patch, since it really cannot hurt to prevent *potential* errors in a corrupt PDF document.
First of all, while it should be very unlikely that the /ID-entry is an *indirect* object, note how we're using `Dict.get` when parsing it e.g. in `PDFDocument.fingerprint`. Hence we definitely should be consistent here, since if the /ID-entry is an *indirect* object the existing code in `src/core/writer.js` would already fail.
Secondly, to fix the referenced issue, we also need to check that the /ID-entry actually is an Array before attempting to access its contents in `src/core/writer.js`.
*Drive-by change:* In the `xrefInfo` object passed to the `incrementalUpdate` function, re-name the `encrypt` property to `encryptRef` since its data is fetched using `Dict.getRaw` (given the names of the other properties fetched similarly).
For CFF fonts without proper `ToUnicode`/`Encoding` data, utilize the "charset"/"Encoding"-data from the font file to improve text-selection (issue 13260)
By not waiting for the /Properties to load, before parsing of the operatorList/textContent starts, there's a very real risk that a `MissingDataException` will be thrown when trying to access the data in the `PartialEvaluator.parseMarkedContentProps` method.
If this ever happens it will thus lead to incomplete and/or outright broken rendering, and with e.g. `disableAutoFetch=true` set the likelihood of this occuring would increase quite a bit.
*Please note:* While I've not yet seen this error in an actual PDF document, it can happen during loading if you're unlucky enough with e.g. the structure of the PDF document and/or the download speed offered by the server.
- Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible.
- Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.)
- Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be.
Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.)
- Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.
- Different fonts can be used in xfa and some of them are embedded in the pdf.
- Load all the fonts in window.document.
Update src/core/document.js
Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
Update src/core/worker.js
Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>