To save time or resources during development it can be useful to run
tests only in Firefox. Previously this could be done by editing the
browser manifest file, but since that file is no longer used for
Puppeteer, this command line flag replaces it. For example, executing
`gulp unittest --noChrome` will only run the unit tests in Firefox.
This commit replaces our own infrastructure for handling browsers during
tests with Puppeteer. Using our own infrastructure for this had a few
downsides:
- It has proven to not always be reliable, especially when closing the
browser, causing failures on the bots because browsers were still
running even though they should have been stopped. Puppeteer should do
a better job with this because it uses the browser's test built-in
instrumentation tools for this (the devtools protocol) which our code
didn't. This also means that we don't have to pass
parameters/preferences to tweak browser behavior anymore.
- It requires the browsers under test to be installed on the system,
whereas Puppeteer downloads the browsers before the test. This means
that setup is much easier (no more manual installations and browser
manifest files) as well as testing with different browser versions
(since they can be provisioned on demand). Moreover, this ensures that
contributors always run the tests in both Firefox and Chrome,
regardless of which browsers they have installed locally.
- It's all code we have to maintain, so Puppeteer abstracts away how the
browsers start/stop for us so we don't have to keep that code.
By default, Puppeteer only installs one browser during installation,
hence the need for a post-install script to install the second browser.
This requires `cross-env` to make passing the environment variable work
on both Linux and Windows.
The other testing code already uses the name of the browser as the
unique identifier, so I don't see a good reason to not use that for
identifying browsers to quit as well. Doing so simplifies the (already
somewhat complex) testing logic and ensures that we can use existing
functionality (such as the `getSession` function) to retrieve sessions.
As evident from the code, `PageViewport` only supports[1] `rotation` values which are a multiple of 90 degrees. Besides it being somewhat difficult to imagine meaningful use-cases for a non-multiple of 90 degrees `rotation`, the code also becomes both simpler and more efficient by not having to consider arbitrary `rotation` values.
However, any invalid rotation will *silently* fallback to assume zero `rotation` which probably isn't great for e.g. `PDFPageProxy.getViewport` in the API. Hence this patch, which will now enforce that only valid `rotation` values are accepted.
---
[1] As far as I can tell, from looking through the history, nothing else has ever been supported either.
*Please note:* These changes were done automatically, using the `gulp lint --fix` command.
This rule is already enabled in mozilla-central, see https://searchfox.org/mozilla-central/rev/567b68b8ff4b6d607ba34a6f1926873d21a7b4d7/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#103-104
The main advantage, besides improved consistency, of this rule is that it reduces the size of the code (by 3 bytes for each case). In the PDF.js code-base there's close to 8000 instances being fixed by the `dot-notation` ESLint rule, which end up reducing the size of even the *built* files significantly; the total size of the `gulp mozcentral` build target changes from `3 247 456` to `3 224 278` bytes, which is a *reduction* of `23 178` bytes (or ~0.7%) for a completely mechanical change.
A large number of these changes affect the (large) lookup tables used on the worker-thread, but given that they are still initialized lazily I don't *think* that the new formatting this patch introduces should undo any of the improvements from PR 6915.
Please find additional details about the ESLint rule at https://eslint.org/docs/rules/dot-notation
- Add a reduced test-case for issue 11768, to prevent future regressions.
(Given that PR 11769 is only a work-around, rather than a proper solution, it may not be entirely accurate for the issue to be closed as fixed.)
- Add more validation of the charCode, as found by the heuristics, in `PartialEvaluator._buildSimpleFontToUnicode` to prevent future issues.
This patch fixes yet another instalment in the never-ending series of "what the *bleep* was I thinking", by changing the `PDFDocumentProxy.getViewerPreferences` method to return `null` by default.
Not only is this method now consistent with many other API methods, for the data not present case, but it also avoids having to e.g. loop through an object to check if it's actually empty (note the old unit-test).
Please note that these changes were done automatically, using `gulp lint --fix`.
Given that the major version number was increased, there's a fair number of (primarily whitespace) changes; please see https://prettier.io/blog/2020/03/21/2.0.0.html
In order to reduce the size of these changes somewhat, this patch maintains the old "arrowParens" style for now (once mozilla-central updates Prettier we can simply choose the same formatting, assuming it will differ here).
At this point in time, compared to when the "ignore single-char" code was added, we *should* generally be doing a much better job of combining text into as few chunks as possible.
However, there's still bad cases where we're not able to combine text as much as one would like, which is why I'm *not* proposing to simply measure/scale all text. Instead this patch will to only measure/scale single-char text in cases where the horizontal/vertical scale is off significantly, since that's were you'd expect bad text-selection behaviour otherwise.
Note that most of the movement caused by this patch is with Type3 fonts, which is a somewhat special font type and one where our current text-selection behaviour is probably the least good.
With two kind of builds now being produced, with/without translation/polyfills, it's unfortunately somewhat easy for users to accidentally pick the wrong one.
In the case where a user would attempt to use a modern build of PDF.js in an older browser, such as e.g. IE11, the failure would be immediate when the code is loaded (given the use of unsupported ECMAScript features).
However in some browsers/environments, in particular Node.js, a modern PDF.js build may load correctly and thus *appear* to function, only to fail for e.g. certain API calls. To hopefully lessen the support burden, and to try and improve things overall, this patch adds checks to ensure that a modern build of PDF.js cannot be used in browsers/environments which lack native support for critical functionality (such as e.g. `ReadableStream`). Hence we'll fail early, with an error message telling users to pick an ES5-compatible build instead.
To ensure that we actually test things better especially w.r.t. usage of the PDF.js library in Node.js environments, the `gulp npm-test` task as used by Node.js/Travis was changed (back) to test an ES5-compatible build.
(Since the bots still test the code as-is, without transpilation/polyfills, this shouldn't really be a problem as far as I can tell.)
As part of these changes there's now both `gulp lib` and `gulp lib-es5` build targets, similar to e.g. the generic builds, which thanks to some re-factoring only required adding a small amount of code.
*Please note:* While it's probably too early to tell if this will be a widespread issue, it's possible that this is the sort of patch that *may* warrant being `git cherry-pick`ed onto the current beta version (v2.4.456).
This functionality was only added to the default viewer for backwards compatibility and to support the various PDF viewer tests in mozilla-central, with the intention to eventually remove it completely.
While the different mozilla-central tests cannot be *easily* converted from DOM events, it's however possible to limit that functionality to only MOZCENTRAL builds *and* when tests are running.
Rather than depending of the re-dispatching of internal events to the DOM, the default viewer can instead be used in e.g. the following way:
```javascript
document.addEventListener("webviewerloaded", function() {
PDFViewerApplication.initializedPromise.then(function() {
// The viewer has now been initialized, and its properties can be accessed.
PDFViewerApplication.eventBus.on("pagerendered", function(event) {
console.log("Has rendered page number: " + event.pageNumber);
});
});
});
```
This looks entirely like something which was left-over from debugging, and that line hasn't been touched since PR 4515, especially considering that the corresponding branch in `FirefoxBrowser` doesn't print anything.
Given the way that "classes" were previously implemented in PDF.js, using regular functions and closures, there's a fair number of false positives when the `no-shadow` ESLint rule was enabled.
Note that while *some* of these `eslint-disable` statements can be removed if/when the relevant code is converted to proper `class`es, we'll probably never be able to get rid of all of them given our naming/coding conventions (however I don't really see this being a problem).
This rule is *not* currently enabled in mozilla-central, but it appears commented out[1] in the ESLint definition file; see https://searchfox.org/mozilla-central/rev/c80fa7258c935223fe319c5345b58eae85d4c6ae/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#238-239
Unfortunately this rule is, for fairly obvious reasons, impossible to `--fix` automatically (even partially) and each case thus required careful manual analysis.
Hence this ESLint rule is, by some margin, probably the most difficult one that we've enabled thus far. However, using this rule does seem like a good idea in general since allowing variable shadowing could lead to subtle (and difficult to find) bugs or at the very least confusing code.
Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-shadow
---
[1] Most likely, a very large number of lint errors have prevented this rule from being enabled thus far.
Fixes#11718 in which the `ff` ligature glyph is at index zero in a CFF font. Beacuse this is a CIDFont, glyph names are CIDs, which are integers. Thus the string `".notdef"` is not correct. The rest of the charset data is already parsed correctly as integers when the boolean argument `cid` is true.
*This is part of a series of patches that will try to split PR 11566 into smaller chunks, to make reviewing more feasible.*
Once all the code has been fixed, we'll be able to eventually enable the ESLint no-shadow rule; see https://eslint.org/docs/rules/no-shadow
This property has never been documented and/or *intentionally* exposed through the API, instead the `PDFPageProxy.pageNumber` property is the documented/intended API to use here.
Hence pageIndex is changed to a "private" property on `PDFPageProxy` instances, and internal API functionality is also updated to *consistently* use `this._pageIndex` rather than a mix of formats.
Don't accidentally accept invalid glyphNames which *appear* to follow the Cdd{d}/cdd{d} format in `PartialEvaluator._buildSimpleFontToUnicode` (issue 11697)
The /Differences array of the problematic font contains a `/c.1` entry, which is consequently detected as a *possible* Cdd{d}/cdd{d} glyphName by the existing heuristics.
Because of how the base 10 conversion is implemented, which is necessary for the base 16 special case, the parsed charCode becomes `0.1` thus causing `String.fromCodePoint` to throw since that obviously isn't a valid code point.
To fix the referenced issue, and to hopefully prevent similar ones in the future, the patch adds *additional* validation of the charCode found by the heuristics.
Trying to enable the ESLint rule `no-shadow`, against the `master` branch, would result in a fair number of errors in the `Glyph` class in `src/core/fonts.js`.
Since the glyphs are exposed through the API, we can't very well change the `isSpace` property on `Glyph` instances. Thus the best approach seems, at least to me, to simply rename the `isSpace` helper function to `isWhiteSpace` which shouldn't cause any issues given that it's only used in the `src/core/` folder.
When reftest analyzer shows magnified pixels, there is a seemingly random offset between the mouse position and the magnified position. The reason for this is that reftest analyzer assumes all images have 800 * 1000 pixels but actually the test images have varying sizes.
The PDF document in question is *corrupt*, since it contains an XObject with a truncated dictionary and where the stream contents start without a "stream" operator.
Note that `Dict.set` will only be called with values returned through `Parser.getObj`, and thus indirectly via `Lexer.getObj`. Since neither of those methods will ever return `undefined`, we can simply assert that that's the case when inserting data into the `Dict` and thus get rid of `in` checks when doing the data lookups.
In this case, since `Dict.set` is fairly hot, the patch utilizes an *inline check* and when necessary a direct call to `unreachable` to not affect performance of `gulp server/test` too much (rather than always just calling `assert`).
For very large and complex PDF files this will help performance *slightly*, since `Dict.{get, getAsync, has}` is called *a lot* during parsing in the worker.
This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file:
```
[
{ "id": "issue2618",
"file": "../web/pdfs/issue2618.pdf",
"md5": "",
"rounds": 250,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall | 250 | 2838 | 2820 | -18 | -0.65 | faster
Firefox | Page Request | 250 | 1 | 2 | 0 | 11.92 | slower
Firefox | Rendering | 250 | 2837 | 2818 | -19 | -0.65 | faster
```
This patch deprecates the existing `getOpenActionDestination` API method, in favor of a better and more general `getOpenAction` method instead. (For now JavaScript actions, related to printing, are still handled as before.)
By clearly separating "regular" Print actions from the JavaScript handling, it's thus possible to get rid of the somewhat annoying and strictly incorrect warning when the viewer loads.
Fixes#11477
The PDF draws many space characters but the embedded fonts don't have a glyph named `space`, so `.notdef` should be drawn instead. PDF.js assumed that Type1 fonts define `.notdef` as the first glyph (index 0). However, now the fonts have the glyph `A` at index 0 and `.notdef` is the last one, so `A` appears where spaces are expected.
Because the rest of the font machinery in `core/fonts.js` assumes `.notdef` is at index zero, it's easiest to modify `core/type1_parser.js` so that it "repairs" fonts and makes sure `.notdef` is at index 0.
The PDF document in question is *corrupt*, since it contains multiple instances of incorrect operators.
We obviously don't want to slow down parsing of *all* documents (since most are valid), just to accommodate a particular bad PDF generator, hence the reason for the inline check before calling the `ensureStateFont` method.
Given that all of these primitives implement caching, to avoid unnecessarily duplicating those objects *a lot* during parsing, it would thus be good to actually enforce usage of `Cmd.get()`/`Name.get()`/`Ref.get()` in the code-base.
Luckily it turns out that there's an ESLint rule, which is fairly easy to use, that can be used to disallow arbitrary JavaScript syntax.
Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-restricted-syntax
*This whole patch feels somewhat arbitrary, and I'd be slightly worried about possibly breaking something else.*
To limit the impact of these changes, we only re-parse JPEG images using a reduced `scanLines` value if and only if: An unexpected EOI (End of Image) marker was encountered during decoding of Scan data *and* the "actual" `scanLines` value is at least one order of magnitude smaller than expected.
This patch makes the following changes, to improve these API methods:
- Let `PDFPageProxy.cleanup` return a boolean indicating if clean-up actually happened, since ongoing rendering will block clean-up.
Besides being used in other parts of this patch, it seems that an API user may also be interested in the return value given that clean-up isn't *guaranteed* to happen.
- Let `PDFDocumentProxy.cleanup` return the promise indicating when clean-up is finished.
- Improve the JSDoc comment for `PDFDocumentProxy.cleanup` to mention that clean-up is triggered on *both* threads (without going into unnecessary specifics regarding what *exactly* said data actually is).
Add a note in the JSDoc comment about not calling this method when rendering is ongoing.
- Change `WorkerTransport.startCleanup` to throw an `Error` if it's called when rendering is ongoing, to prevent rendering from breaking.
Please note that this won't stop *worker-thread* clean-up from happening (since there's no general "something is rendering"-flag), however I'm not sure if that's really a problem; but please don't quote me on that :-)
All of the caches that's being cleared in `Catalog.cleanup`, on the worker-thread, *should* be re-filled automatically even if cleared *during* parsing/rendering, and the only thing that probably happens is that e.g. font data would have to be re-parsed.
On the main-thread, on the other hand, clearing the caches is more-or-less guaranteed to cause rendering errors, since the rendering code in `src/display/canvas.js` isn't able to re-request any image/font data that's suddenly being pulled out from under it.
- Last, but not least, add a couple of basic unit-tests for the clean-up functionality.
While it would be nice to change the `PDFFormatVersion` property, as returned through `PDFDocumentProxy.getMetadata`, to a number (rather than a string) that would unfortunately be a breaking API change.
However, it does seem like a good idea to at least *validate* the PDF header version on the worker-thread, rather than potentially returning an arbitrary string.
In the current `AnnotationLayer` implementation, Popup annotations require that the parent annotation have already been rendered (otherwise they're simply ignored).
Usually the annotations are ordered, in the `/Annots` array, in such a way that this isn't a problem, however there's obviously no guarantee that all PDF generators actually do so. Hence we simply ensure, when rendering the `AnnotationLayer`, that the Popup annotations are handled last.
- Re-factor the "incorrect encoding" check, since this can be easily achieved using the general `findNextFileMarker` helper function (with a suitable `startPos` argument).
- Tweak a condition, to make it easier to see that the end of the data has been reached.
- Add a reference test for issue 1877, since it's what prompted the "incorrect encoding" check.
Please find additional details about the ESLint rule at https://eslint.org/docs/rules/prefer-const
With the recent introduction of Prettier this sort of mass enabling of ESLint rules becomes a lot easier, since the code will be automatically reformatted as necessary to account for e.g. changed line lengths.
Note that this patch is generated automatically, by using the ESLint `--fix` argument, and will thus require some additional clean-up (which is done separately).
Fixes#11403
The PDF uses the non-embedded Type1 font Helvetica. Character codes 194 and 160 (`Â` and `NBSP`) are encoded as `.notdef`. We shouldn't show those glyphs because it seems that Acrobat Reader doesn't draw glyphs that are named `.notdef` in fonts like this.
In addition to testing `glyphName === ".notdef"`, we must test also `glyphName === ""` because the name `""` is used in `core/encodings.js` for undefined glyphs in encodings like `WinAnsiEncoding`.
The solution above hides the `Â` characters but now the replacement character (space) appears to be too wide. I found out that PDF.js ignores font's `Widths` array if the font has no `FontDescriptor` entry. That happens in #11403, so the default widths of Helvetica were used as specified in `core/metrics.js` and `.nodef` got a width of 333. The correct width is 0 as specified by the `Widths` array in the PDF. Thus we must never ignore `Widths`.
Note that this will still allow the FBF tests to run locally, and also on the bots when invoked with `test`/`browsertest` (to not lose all the FBF test-coverage), but will no longer prevent `makeref` from running successfully on the bots.
In order to eventually get rid of SystemJS and start using native `import`s instead, we'll need to provide "complete" file identifiers since otherwise there'll be MIME type errors when attempting to use `import`.
Rather than having a copy of this regular expression in the `test/unit/api_spec.js` file, with a comment about keeping it up-to-date with the code in the viewer (note the incorrect file reference as well), we can just import it instead to simplify all of this.
The original issue did not contain a (reduced) test case that we could
include and linked test cases are not ideal for unit tests, so the
original PR could only be verified manually.
I found this a bit unfortunate considering that the print data is
exposed through the API, so I thought about how we could have an
automated test and managed to create a reduced test case with the
OpenAction dictionary from the file in the original issue.
Therefore, this commit includes a unit test for parsing OpenAction
dictionaries without `Type` entries. I verified that this PDF file
behaves the same as the original one, i.e., no print dialog is shown for
older viewers and the print dialog is shown for the most recent viewer.
This patch makes the follow changes:
- Remove no longer necessary inline `// eslint-disable-...` comments.
- Fix `// eslint-disable-...` comments that Prettier moved down, thus causing new linting errors.
- Concatenate strings which now fit on just one line.
- Fix comments that are now too long.
- Finally, and most importantly, adjust comments that Prettier moved down, since the new positions often is confusing or outright wrong.
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
There's a fair number of (primarily) `Array`s/`TypedArray`s whose formatting we don't want disturb, since in many cases that would lead to the code becoming much more difficult to read and/or break existing inline comments.
*Please note:* It may be a good idea to look through these cases individually, and possibly re-write some of the them (especially the `String` ones) to reduce the need for all of these ignore commands.
I recently noticed a couple of intermittent failures on Travis, hence this patch which changes the expectation to be identical to the 'Page Request' check in the preceding test-case.
Note that most (reasonably) modern browsers have supported this for a while now, see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#Browser_compatibility
By moving the polyfill into `src/shared/compatibility.js` we can thus get rid of the need to manually export/import `ReadableStream` and simply use it directly instead.
The only change here which *could* possibly lead to a difference in behavior is in the `isFetchSupported` function. Previously we attempted to check for the existence of a global `ReadableStream` implementation, which could now pass (assuming obviously that the preceding checks also succeeded).
However I'm not sure if that's a problem, since the previous check only confirmed the existence of a native `ReadableStream` implementation and not that it actually worked correctly. Finally it *could* just as well have been a globally registered polyfill from an application embedding the PDF.js library.
Given that the error in question is surfaced on the API-side, this patch makes the following changes:
- Updates the wording such that it'll hopefully be slightly easier for users to understand.
- Changes the plain `Error` to an `InvalidPDFException` instead, since that should work better with the existing Error handling.
- Adds a unit-test which loads an empty PDF document (and also improves a pre-existing `InvalidPDFException` message and its test-case).
In the PDF document in question, there's an ASCII85Decode inline image where the '>' part of EOD (end-of-data) marker is missing; hence the PDF document is corrupt.
For documents with a Linearization dictionary the computed `startXRef` position will be relative to the raw file, rather than the actual PDF document itself (which begins with `%PDF-`).
Hence it's necessary to subtract `stream.start` in this case, since otherwise the `XRef.readXRef` method will increment the position too far resulting in parsing errors.
The bug report seem to suggest that we don't support UTF-16 strings with a BOM (byte order mark), which we *actually* do as evident by both the code and a unit-test.
The issue at play here is rather that we previously only supported big-endian UTF-16 BOM, and the `Title` string in the PDF document is using a *little-endian* UTF-16 BOM instead.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1593902
This will allow us to attempt to recover as much as possible of a page, rather than immediately failing, when a broken/unsupported ColorSpace is encountered. This patch thus extends the framework added in PRs such as e.g. 8240 and 8922, to also cover parsing of ColorSpaces.
As can be seen in `PageViewport` only multiples of 90 degrees are really supported by the code, hence the unit-test doesn't really make sense.
(Possibly this should be enforced in the API, to avoid surprises, but given that this problem has always existed I'm passing on that for now.)
Obviously this won't look exactly right, but considering that the PDF file doesn't bother embedding non-standard fonts this is the best that we can do here.
Originally only `skipPages` existed, but given that `firstPage`/`lastPage` has existed for a long time now using them whenever possible looks simpler overall.
- In the `ibwa-bad` case the sixteenth page contains corrupt/incomplete commands, but given that we're suppressing `Error`s by default now skipping hardly seems warranted any more.
- In the `geothermal.pdf` case the first page contains an unsupported ColourSpace, but again we're suppressing `Error`s by default now and skipping hardly seems warranted any more.
This patch is making me somewhat worried about future regressions, since it's certainly easy to imagine this completely breaking certain kinds of corrupt/edited PDF documents while fixing others.[1]
Obviously it passes all existing reference tests (and even improves one), however compared to many other patches there's no telling how much it could break.
The only reason that I'm even submitting this patch, is because of the number of open issues that it would address.
Generally speaking though, the best course of action would probably be if `XRef.indexObjects` was re-written to be much more robust (since it currently feels somewhat hand-wavy in parts). E.g. by actually checking/validating more of the objects before committing to them.
---
[1] Especially given that it's reverting part of PR 5910, however in the case of issue 5909 it seems that other (more recent) changes have actually made that PR redundant.
As part of attempting to fix a number issues containing PDF documents with corrupt XRef tables, I'd like to improve the reference test-coverage slightly *first*.
Obviously this will increase the runtime of the tests a bit, however I'd rather "waste" resources on the bots instead of developer time fixing regressions which could have been avoided.
*Please note:* I've been thinking about possible ways of addressing this issue for a while now, but all of the solutions I came up with became too complicated and thus hurt readability of the code.
However, it occured to me that we're essentially trying to add a heuristic *on top* of another heuristic, and that it shouldn't matter how efficient the code is as long as it works.
In the PDF file in the issue the Encoding contains glyphNames of the `Cdd` format, which our existing heuristics will treat as base 10 values. However, in this particular file they actually contain base 16 values, which we thus attempt to detect and fix such that text-selection works.
Having these methods fallback to returning `null` in only *one* particular case seems outright wrong, since a "falsy" value will thus be handled incorrectly.
The only reason that this hasn't caused issues in practice is that there's only one call-site passing in three keys, and in that case we're trying to read a font file where falling back to `null` isn't a problem.
Hopefully this patch makes sense, and in order to reduce the regression risk the implementation ensures that only completely missing widths are being replaced.
Having recently worked with this code, it struck me that most of the `postMessage` calls where `Error`s are involved have never been correctly implemented (i.e. missing `wrapReason` calls).
These functions aren't returning anything, now that they're using `ReadableStream`s, and it thus doesn't seem necessary to re-throw errors (also given the console message that's caused by it).
*Please note:* The majority of this patch was written by Yury, and it's simply been rebased and slightly extended to prevent issues when dealing with `RenderingCancelledException`.
By leveraging streams this (finally) provides a simple way in which parsing can be aborted on the worker-thread, which will ultimately help save resources.
With this patch worker-thread parsing will *only* be aborted when the document is destroyed, and not when rendering is cancelled. There's a couple of reasons for this:
- The API currently expects the *entire* OperatorList to be extracted, or an Error to occur, once it's been started. Hence additional re-factoring/re-writing of the API code will be necessary to properly support cancelling and re-starting of OperatorList parsing in cases where the `lastChunk` hasn't yet been seen.
- Even with the above addressed, immediately cancelling when encountering a `RenderingCancelledException` will lead to worse performance in e.g. the default viewer. When zooming and/or rotation of the document occurs it's very likely that `cancel` will be (almost) immediately followed by a new `render` call. In that case you'd obviously *not* want to abort parsing on the worker-thread, since then you'd risk throwing away a partially parsed Page and thus be forced to re-parse it again which will regress perceived performance.
- This patch is already *somewhat* risky, given that it touches fundamentally important/critical code, and trying to keep it somewhat small should hopefully reduce the risk of regressions (and simplify reviewing as well).
Time permitting, once this has landed and been in Nightly for awhile, I'll try to work on the remaining points outlined above.
Co-Authored-By: Yury Delendik <ydelendik@mozilla.com>
Co-Authored-By: Jonas Jenwald <jonas.jenwald@gmail.com>
This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it.
Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly".
The patch makes the following notable changes:
- Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.)
- Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer.
- Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`.
- Add an *optional* parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty.
- Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange.
---
[1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.