This was an oversight in PR 9095, which unfortunately breaks rendering in some PDF files (e.g. the one from issue 6737).
It thus appears that we don't have any test-coverage for this code-path, and given the relative complexity of the PDF files affected by this bug I wasn't able to easily create a reduced test-case.
*Please note:* The linked test-case included in this patch is currently *not* rendered correctly (that'd be the PR 6606), but it at least gives us some test-coverage here.
I recall being confused as to the purpose of the `encrypted` property all the way back when working on PR 4750.
Looking at the history, this property was added in PR 1698 when password support was added to the API/viewer. However, its only purpose seem to have been to facilitate the addition of a `isEncrypted` function in the API. That function never, as far as I can tell, saw any use and was unceremoniously removed in PR 4144.
Since we want to avoid sending all non-essential data early during initial document loading (e.g. PR 4750), it seems correct to get rid of the `encrypted` property. Especially since it hasn't even been exposed in the API for over three years, with no complaints that I'm aware of.
Finally note that the `encrypt` property on the `XRef` instance isn't tied to the code that's being removed here. Given that we're calling `PDFDocument.parse` during `createDocumentHandler` in the worker which, via `PDFDocument.setup`, calls `XRef.parse` where the `Encrypt` data (if it exists) is always parsed.
Please note that while this could be considered a regression in user-facing behaviour, I'm not convinced that it's really a regression as such since prior to PR 8912 the Metadata would fail to parse (with an XML error) and thus be ignored when setting the viewer title.
With the refactored Metadata parsing we're now able to parse this, which uncovered issues with a subset of broken Ghostscript Metadata that uses HTML character names.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1424938
Unless the debugging tools (i.e. `PDFBug`) are enabled, or the `browsertest` is running, the `PDFPageProxy.stats` aren't actually used for anything.
Rather than initializing unnecessary `StatTimer` instances, we can simply re-use *one* dummy class (with static methods) for every page. Note that by using a dummy `StatTimer` in this way, rather than letting `PDFPageProxy.stats` be undefined, we don't need to guard *every* single stats collection callsite.
Since it wouldn't make much sense to attempt to use `PDFPageProxy.stats` when stat collection is disabled, it was instead changed to a "private" property (i.e. `PDFPageProxy._stats`) and a getter was added for accessing `PDFPageProxy.stats`. This getter will now return `null` when stat collection is disabled, making that case easy to handle.
For benchmarking purposes, the test-suite used to re-create the `StatTimer` after loading/rendering each page. However, modifying properties on various API code from the outside in this way seems very error-prone, and is an anti-pattern that we really should avoid at all cost. Hence the `PDFPageProxy.cleanup` method was modified to accept an optional parameter, which will take care of resetting `this.stats` when necessary, and `test/driver.js` was updated accordingly.
Finally, a tiny bit more validation was added on the viewer side, to ensure that all the code we're attempting to access is defined when handling `PDFPageProxy` stats.
*This patch is the result of me starting to look into moving parameters from `PDFJS` into `getDocument` and other API methods.*
When familiarizing myself with the code, the signatures of the various network streams seemed to be unnecessarily cumbersome since `disableRange` is currently handled separately from other parameters.
I'm assuming that the explanation for this is probably "for historical reasons", as is often the case. Hence I'd like to clean this up *before* we start the larger, and more invasive, `PDFJS` parameter re-factoring.
Nothing uses this option anymore, so setting it is a no-op now. We can
safely remove it.
Use `SKIP_BABEL` (instead of `PDFJS_NEXT`) now if you want to skip Babel
translation for a build.
I don't have a good example at hand right know, but I recall seeing custom deployments of PDF.js that bundle a *specific* version of the `build/pdf.js` file and then set `PDFJS.workerSrc` to point to https://mozilla.github.io/pdf.js/build/pdf.worker.js.
That practice seems really bad since, besides (obviously) causing unnecessary server load, it will very quickly result in a version mismatch between the `pdf.js` and `pdf.worker.js` files in those PDF.js deployments.
Such a version mismatch could easily lead to either breaking errors, or even worse slightly inconsistent behaviour for an API call (if the API -> Worker interface changes, which does happen from time to time).
To avoid the problems described above, I'm thus proposing that we enforce that the versions of the `pdf.js` and `pdf.worker.js` files must always match.
The `DOMParser` is most likely overkill and may be less secure.
Moreover, it is not supported in Node.js environments.
This patch replaces the `DOMParser` with a simple XML parser. This
should be faster and gives us Node.js support for free. The simple XML
parser is a port of the one that existed in the examples folder with a
small regex fix to make the parsing work correctly.
The unit tests are extended for increased test coverage of the metadata
code. The new method `getAll` is provided so the example does not have
to access internal properties of the object anymore.