While e.g. the `simpleviewer` and `singlepageviewer` examples work, since they're based on the `BaseViewer`-class, the standalone `pageviewer` example currently doesn't support either XFA- or StructTree-layers. This seems like an obvious oversight, which can be easily addressed simply by exporting the necessary functionality through `pdf_viewer.component.js`, similar to the existing Text/Annotation-layers.
While working on, and testing, these changes I happened to notice a number of smaller things that's also fixed in this patch:
- Ensure that `XfaLayerBuilder.render` always have a *consistent* return type, to prevent possible run-time failures in `PDFPageView`; PR 13908 follow-up.
- Change the order of the options in the `XfaLayerBuilder`-constructor to agree with the parameter order in the `DefaultXfaLayerFactory.createXfaLayerBuilder`-method.
- Add a missing `textHighlighterFactory`-option, in the JSDocs for the `PDFPageView`-class.
- A couple of small tweaks in the `TextLayerBuilder.render`-method: Re-use an existing Array rather than creating a new one, and replace an `if` with optional chaining instead.
*Please note:* For now XFA-support is currently disabled by default, similar to the regular viewer.
While running the unit-tests with some logging statements added to this code, I noticed that `PasswordException` was missing from the list of potential Errors that could be passed to the `wrapReason` function.
A lot of the code in this getter has existed ever since the initial PresentationMode-implementation was first added all the way back in PR 1938 (which is nine years ago now).
At this point in time however, there's now a simpler way detect if a browser supports the FullScreen API and we should thus be able to simplify this getter; please refer to https://developer.mozilla.org/en-US/docs/Web/API/Document/fullscreenEnabled#browser_compatibility
This command was added all the way back when basic CI-support was first introduced (using Travis at the time), however it's never really intended to be used e.g. for local development.
By having a `npm test`-command listed in the `package.json` file, there's a very real risk that someone unfamiliar with the code-base would only run that one and thus miss all the other (more important) test-suites[1].
Hence this patch which removes the `npm test`-command, and instead simply calls the relevant gulp-task[2] directly in the GitHub Actions configuration.
---
[1] Which consist of the unit-tests (run in browsers), the font-tests (potentially), the reference-tests, and the integration-tests.
[2] Which is also renamed slightly, to better fit its current usage.
Something that I *just* realized is that while PR 13854 fixed an issue as reported, it could still cause bugs in other similarily broken documents since we'll not insert a matching endMarkedContent-operator in the operatorList.
Currently, in the `PartialEvaluator`, we only support Optional Content in Form-/XObjects. Hence this patch adds support for Image-/XObjects as well, which looks like a simple oversight in PR 12095 since the canvas-implementation already contains the necessary code to support this.
The Viewer API definitions do not compile because of missing imports and
anonymous objects are typed as `Object`. These issues were not caught
during CI because the test project was not compiling anything from the
Viewer API.
As an example of the first problem:
```
/**
* @implements MyInterface
*/
export class MyClass {
...
}
```
will generate a broken definition that doesn’t import MyInterface:
```
/**
* @implements MyInterface
*/
export class MyClass implements MyInterface {
...
}
```
This can be fixed by adding a typedef jsdoc to specify the import:
```
/** @typedef {import("./otherFile").MyInterface} MyInterface */
```
See https://github.com/jsdoc/jsdoc/issues/1537 and
https://github.com/microsoft/TypeScript/issues/22160 for more details.
As an example of the second problem:
```
/**
* Gets the size of the specified page, converted from PDF units to inches.
* @param {Object} An Object containing the properties: {Array} `view`,
* {number} `userUnit`, and {number} `rotate`.
*/
function getPageSizeInches({ view, userUnit, rotate }) {
...
}
```
generates the broken definition:
```
function getPageSizeInches({ view, userUnit, rotate }: Object) {
...
}
```
The jsdoc should specify the type of each nested property:
```
/**
* Gets the size of the specified page, converted from PDF units to inches.
* @param {Object} options An object containing the properties: {Array} `view`,
* {number} `userUnit`, and {number} `rotate`.
* @param {number[]} options.view
* @param {number} options.userUnit
* @param {number} options.rotate
*/
```
- Use `Node.TEXT_NODE` rather than a magical constant, in `TextHighlighter._convertMatches`, to improve readability. According to MDN, this has been supported since "forever": https://developer.mozilla.org/en-US/docs/Web/API/Node/nodeType#browser_compatibility
- Remove the `pageIdx`-property, on `TextLayerBuilder`-instances, since the re-factoring in PR 13908 meant that it's now unused.
- Remove the `matches`-property, on `TextLayerBuilder`-instances, since the re-factoring in PR 13908 meant that it's now unused.
Generalizing, and documenting, the `PDFHistory`-implementation as part of the web-interfaces doesn't seem entirely necessary and in hindsight I'm not entirely sure why we need it since:
- The `PDFHistory` implementation is/was written specifically for the default viewer use-case, which is why e.g. the `simpleviewer` component example isn't using it. (While it *could* be used there, it'd need to be manually created/initialized correctly.)
- There's only *one* `PDFHistory`-implementation present (and no other viewer-component will fail without it being available), as opposed to the other web-interfaces documented in this file.
- The `PDFHistory` implementation is not even usable with e.g. the `pageviewer` component example, since it (obviously) requires a complete viewer to work. (This is in contrast to e.g. `IPDFTextLayerFactory` and `IPDFAnnotationLayerFactory`.)
*This is done separately from the previous patch, to make it easier to revert these changes once they've been included in a couple of releases.*
Please note that because these two options are mutually exclusive, which is a large part of the reason for the previous patch, it's not guaranteed that the fallback-values will always be correct in every situation (but it's the best that we can do).
*This is a follow-up to PRs 13867 and 13899.*
This patch is tagged `api-minor` for the following reasons:
- It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.
- For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.
- It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.
---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
Moves the logic out of TextLayerBuilder to handle
highlighting matches into a new separate class `TextHighlighter`
that can be used with regular PDFs and XFA PDFs.
To mimic the current find functionality in XFA, two arrays
from the XFA rendering are created to get the text content
and map those to DOM nodes.
Fixes#13878
Current pdf_viewer definitions result in errors like the following when
trying to use them in a ts project:
[error] TypeScript error
node_modules/.pnpm/pdfjs-dist@2.10.377/node_modules/pdfjs-dist/web/pdf_viewer.d.ts:1:15
- error TS2691: An import path cannot end with a '.d.ts' extension.
Consider importing 'pdfjs-dist/types/web/pdf_viewer.component.js'
instead.
1 export * from "pdfjs-dist/types/web/pdf_viewer.component.d.ts";
Import/export statements in typescript should not include file extensions.
Currently a `TESTING = true` environment variable will *always* take precedence in the various build-tasks, and there's no way to explicitly disable it for a particular build. That's clearly an oversight on my part, however it's easy enough to fix this; sorry about breaking this!
The `loadAndEnablePDFBug` helper function, in `web/app.js`, can be simplified a little bit by making it `async`. Furthermore, given how `PDFBug` is being used, we can also (slightly) re-factor `PDFBug.init` such that the `PDFBug.enable`-call is done internally rather than having to handle that manually at the call-site.
(Finally, utilize `await` more in the `loadFakeWorker` helper function.)
This way there cannot be any *incorrect* cache hits, since Refs are guaranteed to be unique.
Please note that the reason for caching by Ref rather than doing something along the lines of the `localShadingPatternCache` (which uses a `Map` directly), is that TilingPatterns are streams and those cannot be cached on the `XRef`-instance (this way we avoid unnecessary parsing).
*This patch is very similar to the recently fixed `renderInteractiveForms`-options, see PR 13867.*
As far as I can tell, this *subtle* bug has existed ever since `AnnotationStorage`-support was first added in PR 12106 (a little over a year ago).
The value of the `includeAnnotationStorage`-option, as passed to the `PDFPageProxy.render` method, will (potentially) affect the size/content of the operatorList that's returned from the worker (for documents with forms).
Given that operatorLists will generally, unless they contain huge images, be cached in the API, repeated `PDFPageProxy.render` calls where the form-data has been changed by the user in between, can thus *wrongly* return a cached operatorList.
In the viewer we're only using the `includeAnnotationStorage`-option when printing, which is probably why this has gone unnoticed for so long. Note that we, for performance reasons, don't cache printing-operatorLists in the API.
However, there's nothing stopping an API-user from using the `includeAnnotationStorage`-option during "normal" rendering, which could thus result in *subtle* (and difficult to understand) rendering bugs.
In order to handle this, we need to know if the `AnnotationStorage`-instance has been updated since the last `PDFPageProxy.render` call. The most "correct" solution would obviously be to create a hash of the `AnnotationStorage` contents, however that would require adding a bunch of code, complexity, and runtime overhead.
Given that operatorList caching in the API doesn't have to be perfect[1], but only have to avoid *false* cache-hits, we can simplify things significantly be only keeping track of the last time that the `AnnotationStorage`-data was modified.
*Please note:* While working on this patch, I also noticed that the `renderInteractiveForms`- and `includeAnnotationStorage`-options in the `PDFPageProxy.render` method are mutually exclusive.[2]
Given that the various Annotation-related options in `PDFPageProxy.render` have been added at different times, this has unfortunately led to the current "messy" situation.[3]
---
[1] Note how we're already not caching operatorLists for pages with *huge* images, in order to save memory, hence there's no guarantee that operatorLists will always be cached.
[2] Setting both to `true` will result in undefined behaviour, since trying to insert `AnnotationStorage`-values into fields that are being excluded from the operatorList-building will obviously not work, which isn't at all clear from the documentation.
[3] My intention is to try and fix this in a follow-up PR, and I've got a WIP patch locally, however it will result in a number of API-observable changes.
At this point in time, all of the supported browsers (in the PDF.js project) have native `ReadableStream` implementations; see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#browser_compatibility
Hence the polyfill is *only* necessary in Node.js environments now, and we shouldn't need to do any detailed feature detection either (since that was only done for the non-Chromium versions of the MS Edge browser).
Finally, we can slightly reduce the size of the Chromium-extension since the polyfill shouldn't be needed there either.