When we fail to find a usable PDF document `trailer` *and* there were errors during parsing, try and fallback to a *previous* generation as a last resort during fetching of uncompressed references.
*Please note:* This will not affect "normal" PDF documents, with valid /XRef data, and even most *corrupt* documents should be completely unaffected by these changes.
Part of this is very old code, and back when support for parsing the catalog-version was added things became less clear (in my opinion).
Hence this patch tries to improve things, by e.g. validating the header- and catalog-version separately.
Note how we're currently skipping all main-thread cleanup when document destruction has started, but for some reason we're still dispatching the "Cleanup" message.
This seems like a simple oversight, since destruction will already invoke the `BasePdfManager.cleanup` method (on the worker-thread) to fully clear-out all caches.
Given the sheer number of heuristics added to this method over the years, moving the *valid* unicode found case to the top should improve readability of the code.
- Fix Field::getArray in order to collect only the fields which have a value;
- Fix AFSimple_Calculate:
* allow to have a string with a list of field names as argument;
* since a field can be non-terminal, use Field::getArray to collect
the field under it and then apply the calculation on all the descendants.
This code was added all the way back in PR 6698, almost seven years ago, for backwards compatibility reasons. At this point in time, it seems that we can remove that since:
- We have more fine-grained "UnsupportedFeature" reporting elsewhere in the worker-thread code nowadays.
- The GetOperatorList-handling is now using `ReadableStream`s, which means that errors are being forwarded to the main-thread anyway.
- We're also no longer displaying a notification-bar, in the *built-in* Firefox PDF Viewer, for any of these "UnsupportedFeature" messages.
*Please note:* I don't really know what I'm doing here, however the patch appears to fix the referenced issue when comparing the rendering with Adobe Reader (with the caveat that I don't speak the language in question).
When a new PDF document is opened in the GENERIC viewer we (obviously) create a new `AnnotationEditorUIManager`-instance, since those are document-specific, and thus we need to ensure that we actually register the `editorTypes` for each one.
Note how after having found the "%PDF-" prefix we then read both the prefix and the version in the loop, only to then remove the prefix at the end.
It seems better to instead advance the stream position past the "%PDF-" prefix, and then read only the version data.
Finally the loop-condition can also be simplified slightly, to further clean-up some very old code.
*Fixes a regression from PR 15246, sorry about that!*
The return value of all `Annotation.getOperatorList` methods was changed in PR 15246, however I missed updating the error code-path in `Page.getOperatorList` which thus breaks all operatorList-parsing for pages with corrupt Annotations.
Looking at the code on the worker-thread, there doesn't appear to be any particular reason for placing *some* of the properties in a `source`-object when sending them with "GetDocRequest".
As is often the case the explanation for this structure is rather "for historical reasons", since originally we simply sent the `source`-object as-is. Doing that was obviously a bad idea, for a couple of reasons:
- It makes it less clear what is/isn't actually needed on the worker-thread.
- Sending unused properties will unnecessarily increase memory usage.
- The `source`-object may contain unclonable data, which would break the library.
Rather than sending all of these parameters individually and then grouping them together on the worker-thread, we can simply handle that in the API instead.
All of the these constants have been deprecated for a while, and with the upcoming *major* version this seems like a good time to remove them.
For the string-constants we can simply remove them, but the number-constants are left commented out since we don't want to re-number the list to prevent third-party breakage.
I noticed the 256 % 3 (which is equal to 1) so I slighty simplify the code.
The sum of the 16 Uint8 doesn't exceed 2^12, hence we can just take the
sum modulo 3.
This method was originally added in PR 1157 (back in 2012), however its only call-site was then removed in PR 2423 (also in 2012).
Hence this method has been completely unused for nearly a decade, and it should thus be safe to remove it.
This patch first of all makes `isOffscreenCanvasSupported` configurable, defaulting to `true` in browsers and `false` in Node.js environments, with a new `getDocument` parameter. While you normally want to use this, in order to improve performance, it should still be possible for users to control it (similar to e.g. `isEvalSupported`).
The specific problem, as reported in issue 14952, is that the SVG back-end doesn't support the new ImageMask data-format that's introduced in PR 14754. In particular:
- When the SVG back-end is used in Node.js environments, this patch will "just work" without the user needing to make any code changes.
- If the SVG back-end is used in browsers, this patch will require that `isOffscreenCanvasSupported: false` is added to the `getDocument`-call.
*Please note:* The referenced issue is the only mention that I can find, in either GitHub or Bugzilla, of "GoToE" actions.
Hence why I've purposely settled for a very simple, and partial, "GoToE" implementation to avoid complicating things initially.[1] In particular, this patch only supports "GoToE" actions that references the /EmbeddedFiles-dict in the PDF document.
See https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2048909
---
[1] Usually I always prefer having *real-world* test-cases to work with, whenever I'm implementing new features.
This is yet another small piece of clean-up of the `FontLoader`-code, since we've not used this `id`-property for anything ever since PR 6571 (which landed almost seven years ago). Furthermore, by default we're also not even using that code-path now since the Font Loading API will always be used when available.
*Please note:* This is tagged `[api-minor]` since it's technically observable from the outside, however no user ought to be directly interacting with these CSS font rules.
Currently the compatibility-file is loaded using a standard `import`-statement and while its code is enclosed in a pre-processor block, and thus is excluded in e.g. the MOZCENTRAL build-target, it still results in the *built* `pdf.js`/`pdf.worker.js` files having an effectively empty closure as a result.
By moving the checks from `src/shared/compatibility.js` and into `src/shared/util.js` instead, we can load the file using a build-time `require`-statement and thus avoid that closure.
Note that with these changes the compatibility-file will no longer be loaded in development mode, i.e. when `gulp server` is used. However, this shouldn't be a big issue given that none of its included polyfills could be loaded then anyway (since `require`-statements are being used) and that it's really only intended for the `legacy`-builds of the library.
Note that this PR only adds the "underscore"-variant of *actually existing* ligatures, however the referenced PDF document also uses a couple of non-standard ones (e.g. `ft`, `Th`, and `fh`) that we cannot easily support without larger changes (since they don't have official Unicode-entries).
Given that it's clearly the PDF document, and its fonts, that's the culprit here it's not entirely clear to me that we actually want to attempt a larger refactoring/rewriting of the `glyphlist.js` code, assuming it's even generally possible. Especially when this patch alone already improves our copy-paste behaviour when compared to both Adobe Reader and PDFium, and that this is only the *second* time this sort of bug has been reported.
Fewer dependencies shouldn't be a bad idea in general, and given that the `node-canvas` package already include a `DOMMatrix` polyfill we can simply use that one instead.
Given that Firefox supports *synchronous* font loading, when the Font Loading API isn't being used, there's really no point including code which if called would just throw in the MOZCENTRAL build. (This is safe, since the `FontLoader.isSyncFontLoadingSupported`-getter always return `true` there.)
After the changes in PR 10539 (which landed over three years ago) the `FontLoader.bind` method can only be called with *a single* font at a time, hence the `_prepareFontLoadEvent` method obviously don't need to support multiple fonts any more.
By having just *one* class, and using pre-processor blocks directly in the relevant methods, we reduce the size of this code in the *built* `pdf.js` file.
Originally, when the `BaseFontLoader` abstraction was added in PR 9982, the idea was probably that additional build-targets would get their own implementations. Given that this hasn't happened in the four years since that landed, it doesn't seem meaningful to keep it around.
The existing `loadingContext` class-property can be simplified slightly, since we've not been using the `id`-property on the requests ever since PR 3477 (which landed nine years ago).
Furthermore, by default we're also not even using that code-path now since the Font Loading API will always be used when available.