For PDF documents with sufficiently broken XRef tables, it's usually quite obvious when you need to fallback to indexing the entire file. However, for certain kinds of corrupted PDF documents the XRef table will, for all intents and purposes, appear to be valid. It's not until you actually try to fetch various objects that things will start to break, which is the case in the referenced issues[1].
Since there's generally a real effort being in made PDF.js to load even corrupt PDF documents, this patch contains a suggested approach to attempt to do a bit more validation of the XRef table during the initial document loading phase.
Here the choice is made to attempt to load the *first* page, as a basic sanity check of the validity of the XRef table. Please note that attempting to load a more-or-less arbitrarily chosen object without any context of what it's supposed to be isn't a very useful, which is why this particular choice was made.
Obviously, just because the first page can be loaded successfully that doesn't guarantee that the *entire* XRef table is valid, however if even the first page fails to load you can be reasonably sure that the document is *not* valid[2].
Even though this patch won't cause any significant increase in the amount of parsing required during initial loading of the document[3], it will require loading of more data upfront which thus delays the initial `getDocument` call.
Whether or not this is a problem depends very much on what you actually measure, please consider the following examples:
```javascript
console.time('first');
getDocument(...).promise.then((pdfDocument) => {
console.timeEnd('first');
});
console.time('second');
getDocument(...).promise.then((pdfDocument) => {
pdfDocument.getPage(1).then((pdfPage) => { // Note: the API uses `pageNumber >= 1`, the Worker uses `pageIndex >= 0`.
console.timeEnd('second');
});
});
```
The first case is pretty much guaranteed to show a small regression, however the second case won't be affected at all since the Worker caches the result of `getPage` calls. Again, please remember that the second case is what matters for the standard PDF.js use-case which is why I'm hoping that this patch is deemed acceptable.
---
[1] In issue 7496, the problem is that the document is edited without the XRef table being correctly updated.
In issue 10326, the generator was sorting the XRef table according to the offsets rather than the objects.
[2] The idea of checking the first page in particular came from the "standard" use-case for the PDF.js library, i.e. the default viewer, where a failure to load the first page basically means that nothing will work; note how `{BaseViewer, PDFThumbnailViewer}.setDocument` depends completely on being able to fetch the *first* page.
[3] The only extra parsing is caused by, potentially, having to traverse *part* of the `Pages` tree to find the first page.
If, as PR 10368 suggests, more parameters should be added to `getViewport` I think that it would be a mistake to not change the signature *first* to avoid needlessly unwieldy call-sites.
To not break any existing code and third-party use-cases, this is obviously implemented with a deprecation warning *and* with a working fallback[1] for the old method signature.
---
[1] This is limited to `GENERIC` builds, which should be sufficient.
Given that it's really not clear to me if this is actually desired functionality in the default viewer, and considering that it doesn't fit in *great* with the way that `PDFHistory` is initialized, this feature is currently off by default[1].
---
[1] It's controlled with the `disableOpenActionDestination` Preference/AppOption.
Note that the OpenAction dictionary may contain other information besides just a destination array, e.g. instructions for auto-printing[1].
Given first of all that an arbitrary `Dict` cannot be sent from the Worker (since cloning would fail), and second of all that the data obviously needs to be validated, this patch purposely only adds support for fetching a destination from the OpenAction entry[2].
---
[1] This information is, currently in PDF.js, being included through the `getJavaScript` API method.
[2] This significantly reduces the complexity of the implementation, which seems fine for now. If there's ever need for other kinds of OpenAction to be fetched, additional API methods could/should be implemented as necessary (could e.g. follow the `getOpenActionWhatever` naming scheme).
The custom entries, provided that they exist *and* that their types are safe to include, are exposed through a new `Custom` infoDict entry to clearly separate them from the standard ones.
Fixes 5970.
Fixes 10344.
This required the following changes in the Gulpfile:
- Defining a series of tasks is no longer done with arrays, but with the
`gulp.series` function. The `web` target is refactored to use a
smaller number of tasks to prevent tasks from running multiple times.
- Getting all tasks must now be done through the task registry.
- Tasks that don't return anything must call `done` upon completion.
Moreover, this upgrade allows us to use the latest Node.js on Travis CI
again.
Given that Signature (Widget) annotations are currently not supported, since they cannot be validated, simply ignoring the `fieldValue` seems OK for now considering that attempting to blindly include unparsed/unvalidated data isn't very useful.
Fixes 10347.
This patch re-factors, and extends, the already existing `zoomDisabledTimeout` used during mouse wheel zooming.
Unfortunately I haven't got the required hardware to actually test this patch, but there's a decent chance that it will fix, or at least reduce, the problems reported in https://bugzilla.mozilla.org/show_bug.cgi?id=1503412.
Given that a larger number of pages may now be visible at once, and importantly that their layout may be non-vertical, one of the conditions should be tweaked to not accidentally miss cases where a page is still visible.
Please note: This patch is based on code-inspection, and the only ill effect occurring without it would be a couple of (near) duplicate history entries in some *rare* edge-cases.
This patch does three things:
- Updates the `gulp unittestcli` command, using `gulp lint` as a guide, such that it can be run locally on Windows without any modifications.
- Updates the `gulp lib` command to support disabling of Babel through the `SKIP_BABEL` environment variable. Note that all other build targets support this mode, and there's no good reason for `lib` to be any different here.
- Updates the `npm test` command, used in Node.js/Travis, to test the code as-is test. Since modern Node.js versions seem to have no problems with ES6 compatible code in general, we should just test the source code as-is instead (similar to the tests running on the regular bots).
With the removal of the global `PDFJS` object, in PDF.js version `2.0`, the viewer options are no longer as easily accessible as they previously were (and issues have been filed about this).
In particular, since the viewer files aren't necessarily loaded *immediately*, this means that `PDFViewerApplication`/`PDFViewerApplicationOptions` aren't necessarily available directly. By dispatching an event once all viewer files are loaded but *before* the viewer initialization has run, setting `AppOptions` during load (in custom implementations of the default viewer) should hopefully become a little bit easier[1].
---
[1] In hindsight, this should probably have been implemented when the global `PDFJS` object was removed...
Rather than having a (somewhat) randomly choosen list of Preferences which `AppOptions` are allowed to override, it makes much more sense to simply add an AppOption to allow custom implementations to ignore Preferences altogether (it's also inline with the AppOption that allows the `ViewHistory` to be bypassed on load).
This should have been part of the previous commit that updated the
Wintersmith dependency. Markdown support is no longer included in Pug
itself and should be done by a transformer instead.