Generalizing, and documenting, the `PDFHistory`-implementation as part of the web-interfaces doesn't seem entirely necessary and in hindsight I'm not entirely sure why we need it since:
- The `PDFHistory` implementation is/was written specifically for the default viewer use-case, which is why e.g. the `simpleviewer` component example isn't using it. (While it *could* be used there, it'd need to be manually created/initialized correctly.)
- There's only *one* `PDFHistory`-implementation present (and no other viewer-component will fail without it being available), as opposed to the other web-interfaces documented in this file.
- The `PDFHistory` implementation is not even usable with e.g. the `pageviewer` component example, since it (obviously) requires a complete viewer to work. (This is in contrast to e.g. `IPDFTextLayerFactory` and `IPDFAnnotationLayerFactory`.)
*This is a follow-up to PRs 13867 and 13899.*
This patch is tagged `api-minor` for the following reasons:
- It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.
- For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.
- It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.
---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
Moves the logic out of TextLayerBuilder to handle
highlighting matches into a new separate class `TextHighlighter`
that can be used with regular PDFs and XFA PDFs.
To mimic the current find functionality in XFA, two arrays
from the XFA rendering are created to get the text content
and map those to DOM nodes.
Fixes#13878
When a PDF is "marked" we now generate a separate DOM that represents
the structure tree from the PDF. This DOM is inserted into the <canvas>
element and allows screen readers to walk the tree and have more
information about headings, images, links, etc. To link the structure
tree DOM (which is empty) to the text layer aria-owns is used. This
required modifying the text layer creation so that marked items are
now tracked.
- add an option to enable XFA rendering if any;
- for now, let the canvas layer: it could be useful to implement XFAF forms (embedded pdf in xml stream for the background and xfa form for the foreground);
- ui elements in template DOM are pretty close to their html counterpart so we generate a fake html DOM from template one:
- it makes easier to translate template properties to html ones;
- it makes faster the creation of the html element in the main thread.
There's built-in ESLint rule, see `sort-imports`, to ensure that all `import`-statements are sorted alphabetically, since that often helps with readability.
Unfortunately there's no corresponding rule to sort `export`-statements alphabetically, however there's an ESLint plugin which does this; please see https://www.npmjs.com/package/eslint-plugin-sort-exports
The only downside here is that it's not automatically fixable, but the re-ordering is a one-time "cost" and the plugin will help maintain a *consistent* ordering of `export`-statements in the future.
*Note:* To reduce the possibility of introducing any errors here, the re-ordering was done by simply selecting the relevant lines and then using the built-in sort-functionality of my editor.
This patch will help reduce memory usage, especially for longer documents, when the user scrolls around in the thumbnailView (in the sidebar).
Note how the `PDFPageProxy.cleanup` method will, assuming it's safe to do so, release main-thread resources associated with the page. These include things such as e.g. image data (which can be arbitrarily large), and also the operatorList (which can also be quite large).
Hence when pages are evicted from the `PDFPageViewBuffer`, on the `BaseViewer`-instance, the `PDFPageView.destroy` method is invoked which will (among other things) call `PDFPageProxy.cleanup` in the API.
However, looking at the `PDFThumbnailViewer`/`PDFThumbnailView` classes you'll notice that there's no attempt to ever call `PDFPageProxy.cleanup`, which implies that in certain circumstances we'll essentially keep all resources allocated permanently on the `PDFPageProxy`-instances in the API.
In particular, this happens when the users opens the sidebar and starts scrolling around in the thumbnails. Generally speaking you obviously need to keep all thumbnail *images* around, since otherwise the thumbnailView is useless, but there's still room for improvement here.
Please note that the case where a *rendered page* is used to create the thumbnail is (obviously) completely unaffected by the issues described above, and this rather only applies to thumbnails being explicitly rendered by the `PDFThumbnailView.draw` method.
For the latter case, we can fix these issues simply by calling `PDFPageProxy.cleanup` once rendering has finished. To prevent *accidentally* pulling the rug out from under `PDFPageViewBuffer` in the viewer, which expects data to be available, this required adding a couple of new methods[1] to enable checking that it's indeed safe to call `PDFPageProxy.cleanup` from the `PDFThumbnailView.draw` method.
It's really quite fascinating that no one has noticed this issue before, since it's been around since basically "forever".
---
[1] While it should be *very* rare for `PDFThumbnailView.draw` to be called for a pageView that's also in the `PDFPageViewBuffer`, given that pages are rendered before thumbnails and that the *rendered page* is used to create the thumbnail, it can still happen since rendering is asynchronous.
Furthermore, it's also possible for `PDFThumbnailView.setImage` to be disabled, in which case checking the `PDFPageViewBuffer` for active pageViews *really* matters.
* When no actions then set it to null instead of empty object
* Even if a field has no actions, it needs to listen to events from the sandbox in order to be updated if an action changes something in it.
This patch addresses a review comment, which pointed out that we should *also* handle the pageNumber-input, from PR 12493.
Given that a user *manually* changing pages using the pageNumber-input, on the toolbar, could be regarded as a pretty strong indication of user-intent w.r.t. navigation in the document, hence I suppose that updating the browser history in this case as well probably won't hurt.
While the referenced issue could very well be seen as an edge-case, this patch adds support for updating of the browser history when interacting with the thumbnails in the sidebar (assuming we want to do this).
The main reason for adding the history implementation in the first place, was to simplify navigating back to a previous position in the document when named/explicit destinations are used (e.g. when clicking on "links" or when using the outline in the sidebar).
As such, it never really crossed by mind to update the browser history when the thumbnails are used. However, a user clicking on thumbnails could be regarded as a pretty strong indication of user-intent w.r.t. navigation in the document, hence I suppose that updating the browser history in this particular case probably won't hurt.
This modernizes and improves the code, by using `async`/`await` and by extracting the helper function to its own method.
To hopefully avoid confusion, given the next patch, the method is also re-named to `goToDestination` to make is slightly clearer what it actually does.
Given that `renderInteractiveForms` is now enabled by default in "full" viewer, it seems reasonable to enable it by default in the viewer components as well.
Especially considering that it's simple to disable, when creating the affected components, for anyone implementing their own viewer.
*The [api-minor] label probably ought to have been added to the original PR, given the changes to the `createAnnotationLayerBuilder` signature (if nothing else).*
This patch fixes the following things:
- Let the `AnnotationLayer.render` method create an `AnnotationStorage`-instance if none was provided, thus making the parameter *properly* optional. This not only fixes the reference tests, it also prevents issues when the viewer components are used.
- Stop exporting `AnnotationStorage` in the official API, i.e. the `src/pdf.js` file, since it's no longer necessary given the change above. Generally speaking, unless absolutely necessary we probably shouldn't export unused things in the API.
- Fix a number of JSDocs `typedef`s, in `src/display/` and `web/` code, to actually account for the new `annotationStorage` parameter.
- Update `web/interfaces.js` to account for the changes in `createAnnotationLayerBuilder`.
- Initialize the storage, in `AnnotationStorage`, using `Object.create(null)` rather than `{}` (which is the PDF.js default).
To avoid outright breaking third-party usages of the "viewer components" the `getGlobalEventBus` functionality is left intact, but a deprecation message is printed if the function is invoked.
The various examples are updated to *explicitly* initialize an `EventBus` instance, and provide that when initializing the relevant viewer components.
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
This patch addresses a couple of smaller issues with the `PDFHistory` class:
- Most, if not all, other viewer components can be reset in one way or another, and there's no good reason for the `PDFHistory` implementation to be different here.
- Currently it's (technically) possible to keep adding entries to the browser history, via the `PDFHistory` instance, even after the document has been closed. That obviously makes no sense, and is caused by the lack of a `reset` method.
- The internal `this._isPagesLoaded` property was never actually reset, which would lead to it being temporarily wrong when a new document was opened in the default viewer.
Sometimes we also used `@return` or `@returns`, but `@type` is what
the JSDoc documentation recommends. This also improves the documentation
because before this commit the types were not shown and now they are.
This is *really* the best that we can do here, since other proposed solutions would interfere with (and break) the painstakingly implemented browsing history that's present in the default viewer.
I'm still not convinced that this is a good idea in general, but this patch implements it in a way where it is possible to toggle[1] for users that wish to have this feature. In particular, there's a couple of reasons why I'm not finding this feature necessary/great:
- It's already possible to easily obtain the current hash, by simply clicking on the `viewBookmark` button at any time.
- Hash changes requires a bit of special handling[2], i.e. extra code, to prevent issues when the browser history is traversed (see `PDFHistory._popState`). Currently this is only necessary when the user has manually changed the hash, with this patch it will always be the case (assuming the feature is active).
- It's not always possible to change the URL when updating the browser history. For example: In the Firefox built-in viewer, the URL cannot be modified for local files (i.e. those using the `file://` protocol).
This leads to inconsistent behaviour, and may in some cases even result in errors being thrown and the history thus not updating, if the browser prevents changes to the URL during `pushState`/`replaceState` calls.
---
[1] Using the `historyUpdateUrl` viewer preference.
[2] This depends, to a great extent, on browsers always firing `popstate` events *before* `hashchange` events, which may or may not actually be guaranteed.
Given that it's really not clear to me if this is actually desired functionality in the default viewer, and considering that it doesn't fit in *great* with the way that `PDFHistory` is initialized, this feature is currently off by default[1].
---
[1] It's controlled with the `disableOpenActionDestination` Preference/AppOption.
Currently we'll only attempt to start from the current page when a new search is done, however for 'findagain' operations we'll always continue from the last match position.
This could easily lead to confusing behaviour if the user has scrolled to a completely different part of the document. In an attempt to improve this somewhat, for repeated 'findagain' operations, we'll instead reset the position to the current page when it's *absolutely* certain that the user has scrolled.
Note that this required adding a new `BaseViewer` method, and exposing that through `PDFLinkService`, in order to check if a given page is visible.
In an attempt to avoid issues, in custom implementations of `PDFFindController`, the code checks for the existence of the `PDFLinkService.isPageVisible` method *before* using it.
This attempts to reduced the level of indirection, and the amount of code, when dispatching `fileattachmentannotation` events, by removing the `PDFLinkService.onFileAttachmentAnnotation` method and just accessing `PDFLinkService.eventBus` directly in the `FileAttachmentAnnotationElement` constructor.
Given that other properties, such as `externalLinkTarget`/`externalLinkRel`, are already being accessed directly this pattern seems fine here as well.
This removes the `PDFJS.imageResourcesPath` dependency from the viewer components and the test-suite, but please note that as a *temporary* solution the default viewer still uses it.
This patch completely re-implements `PDFHistory` to get rid of various bugs currently present, and to hopefully make maintenance slightly easier. Most of the interface is similar to the existing one, but it should be somewhat simplified.
The new implementation should be more robust against failure, compared to the old one. Previously, it was too easy to end up in a state which basically caused the browser history to lock-up, preventing the user from navigating back/forward. (In the new implementation, the browser history should not be updated rather than breaking if things go wrong.)
Given that the code has to deal with various edge-cases, it's still not as simple as I would have liked, but it should now be somewhat easier to deal with.
The main source of complication in the code is actually that we allow the user to change the hash of a already loaded document (we'll no longer try to navigate back-and-forth in this case, since the next commit contains a workaround).
In the new code, there's also *a lot* more comments (perhaps too many?) to attempt to explain the logic. This is something that the old implementation was serverly lacking, which is a one of the reasons why it was so difficult to maintain.
One particular thing to note is that the new code uses the `pagehide` event rather than `beforeunload`, since the latter seems to be a bad idea based on https://bugzilla.mozilla.org/show_bug.cgi?id=1336763.
The current implementation of `PDFHistory` contains a number of smaller bugs, which are *very* difficult to address without breaking other parts of its code.
Possibly the main issue with the current implementation, is that I wrote it quite some time ago, and at the time my understanding of the various edge-cases the code has to deal with was quite limited.
Currently `PDFHistory` may, despite most of those cases being fixed, in certain edge-cases lock-up the browser history, essentially preventing the user from navigating back/forward.
Hence rather than trying to iterate on `PDFHistory` to make it better, the only viable approach is unfortunately rip it out in its entirety and re-write it from scratch.
Please see http://eslint.org/docs/rules/object-shorthand.
For the most part, these changes are of the search-and-replace kind, and the previously enabled `no-undef` rule should complement the tests in helping ensure that no stupid errors crept into to the patch.
1. Expanding divs to improve text selection. (Yury)
2. Adding enhanceTextSelection as an option.
3. Moving feature functionality from text_layer_builder.js to text_layer.js.
4. Added expandTextDivs method to only load expanded divs on first click, and only show on subsequent clicks
This patch makes the naming consistent with the `TextLayerBuilder`, and also the new `AnnotationLayer`, and should thus help reduce possible confusion when working with the code.
Please note that the files were renamed using `git mv`, in order to preserve blame.