This patch *attempts* to actually implement what's described for the `Count`-entry in the PDF specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G11.2095911, which I mostly ignored back in PR 10890 since it seemed unnecessarily complicated[1].
Besides issue 12704, I've also tested a couple of other documents (e.g. the PDF specification) and these changes don't *seem* to break anything else; additional testing would be helpful though!
---
[1] At the time, all PDF documents that I tested worked even with a very simple approach and I thus hoped that it'd would suffice.
Similar to the previous patch, the GENERIC default viewer is capable of opening more than *one* PDF document and we should ensure that we handle that case correctly.
I was actually quite surprised to find that, despite the various `scripting`-getters implementing `destroySandbox` methods, there were no attempts at actually cleaning-up either the "sandbox" or removing the globally registered event listeners.
This patch also changes the method to skip *all* data fetching when "enableScripting" isn't active. Finally, simplifies some event-data accesses in the "updateFromSandbox" listener.
Another possible option here could be to use the `contentLength`, when it exists, and then using e.g. a custom event to always update the "filesize" in the sandbox "after the fact" with the result of the `getDownloadInfo`-call.
We can easily avoid unnecessary API-calls here, since most of the time the `metadata` will already be available here. In the *rare* case that it's not available, we can simply wait for the existing `getMetadata`-call to resolve.
This will be useful in the following patch, and note that there's also an old issue (see 5765) which asked for such an event. However, given that the use-case wasn't *clearly* specified, and that we didn't have an internal use for it at the time it wasn't implemented.
Also, ensure that all of the metadata-related properties are actually reset when the document is closed.
This simplifies not just this code, but the unit-tests as well, and should be sufficient as far as I can tell.
Note also that currently, in the *built* `pdf.sandbox.js` file, there's even a line reading `testMode = testMode && false;` because of an accidentally flipped pre-processor statement.
Finally, in the `scripting_spec.js` unit-test, defines `sandboxBundleSrc` at the top of the file to make it easier to find and/or change it when necessary.
Compared to the, previously removed, `sandbox`/`watch-sandbox` gulp-tasks, these ones should work even when run against an non-existent/empty `build`-folder.
Also, to ensure that the development viewer actually works out-of-the-box, `gulp server` will now also include `gulp watch-dev-sandbox` to remove the need to *manually* invoke the build-tasks.
Finally, this patch also removes the `web/devcom.js` file since it shouldn't actually be needed, assuming that the "sandbox"-loading code in the `web/genericcom.js` file is actually *correctly* implemented.
The way that the `pdf.sandbox.js` building was implemented feels all kinds of inconsistent/wrong, and it "sticks out" quite a bit when compared to the rest of the `gulpfile.js`. This patch thus attempts to improve the current situation slightly, to hopefully make future maintenance easier.
One thing that strikes you, pretty immediately, when looking at PR 12604 is that the two new `gulp`-tasks added (i.e. `sandbox` and `watch-sandbox`) don't even work!?
The reason for this is that they implicitly dependent upon the result of the `buildnumber`-task, which isn't listed as a dependency. (Try running `gulp clean` *first*, and invoking any of the new `gulp`-tasks will inevitably fail.)
Furthermore, there's another (potentially big) problem with the implementation of e.g. the `gulp sandbox` task, since it doesn't actually wait for all building to complete before the task is considered as "done". This has the potential to cause all sorts of subtle bugs elsewhere, and the fact that things even "work" as-is can probably be attributed mostly to luck.
Unfortunately there's no *perfect* way to improve things here, since the `pdf.sandbox.js` file depends on including the `pdf.scripting.js` file as a string, however I firmly believe that improvements are still possible here.
To that end, this patch updates all relevant build-targets to create a *temporary* `pdf.scripting.js` file as part of the setup in the `gulp`-tasks, and then reads that file during the `pdf.sandbox.js` building.
This at least allows us to bring all of this "sandbox"-build code much more in-line with the existing build-system.
Given the somewhat "specialized" nature of the `pdf.sandbox.js` building, it ought to be possible to re-factor how some of the options are handled.
Note in particular that the `gulp-strip-comments` dependency seems somewhat unncessary, since the *main* source of comments are just the default license header. Hence I seems much more reasonable to simply not include that to begin with, rather than removing it after the fact (the few remaining Webpack-related should be few/small enough to not really matter much in practice).
This way we're able to further reduce the special-casing related to the `pdf.sandbox.js`-building, which will make future changes/maintenance easier by bringing this code more in-line with existing patterns in `gulpfile.js`.
(If we really want to reduce the filesize, we might want to consider always minifying the `GENERIC`-build of the `pdf.sandbox.js` file.)
Each quadrilateral needs to have its own link element, so the first
quadrilateral can use the already created element, but the next
quadrilaterals need to clone that element.
Not only does this reduce boilerplate since the documentation is the
same for all annotation classes, it also wasn't correct for the
annotation types that support quadpoints since they return an array of
section elements instead of a single one.
There's no good reason, as far as I can tell, to use search-and-replace to include the *stringified* `pdf.scripting.js` file in the built `pdf.sandbox.js` file. Instead we could, and even should, utilize the existing `PDFJSDev.eval(...)`-functionality, which is not only simpler but will also be more efficient as well (no need for a regular expression).
The current location feels somewhat strange, and also inconsistent with the existing way that bundling is done.
Finally, add the version/build numbers at the top of the *built* `pdf.sandbox.js` files, since all other built files include that information given that it's often helpful to be able to easily determine the *exact* version.
[Regression] Prevent the *built* `pdf.scripting.js`/`pdf.sandbox.js` files from accidentally including most of the main-thread code (PR 12631 follow-up)
*This is a recent regression, which I stumbled upon while working on cleaning-up the gulpfile related to `pdf.sandbox.js` building.*
By placing the `ColorConverters` functionality in the `src/display/display_utils.js` file, you end up including a *significant* chunk of the `pdf.js` file in the built `pdf.scripting.js`/`pdf.sandbox.js` files.
Given that I cannot imagine that this was actually intended, since it inflates the built files with unnecessary/unused code, this moves `ColorConverters` to a new file instead (thus breaking the dependencies).
To hopefully reduce the risk future bugs, along these lines, a big comment is also placed at the top of the new file.
Finally, the `ColorConverters` is converted to a class with static methods, since this felt slightly cleaner overall.
Rather than having two slightly different ways of setting the pending/notFound appearance on the "findInput", we can simply use "data-status" in both cases since they're obviously mutually exclusive.
This seems like a very minor issue, since in general we can't really help if domains are blocked from certain networks, however in this particular case I suppose that using the Internet Archive should work.
As mentioned in https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support, PDF.js version `2.6.347` is the last release with IE 11/Edge support.
Hence we should now be able to reduce unnecessary duplication in the default viewer image resources, note the files in the `web/images/` folder with a `-dark` suffix, by using only *one* SVG-image for each icon and letting the `background-color` depend on the CSS theme instead.
For the `gulp mozcentral` build-target, the resulting `web/images/` folder is reduced from `43 997` to `28 566` bytes (~35 percent).
*Please note:* I don't really know if this implementation is necessarily the *best* solution, but it seems to work well enough in e.g. Firefox Nightly and Google Chrome Beta as far as my testing goes.
In addition to the existing /Root and /Pages validation, also check that the /Pages-entry actually is a dictionary and that it has a valid /Count-entry.
This way we can avoid picking a trailer candidate which e.g. the `Catalog.numPages` getter will just end up rejecting, thus breaking PDF document loading completely.