This is relevant for e.g. `PDFSinglePageViewer`, and `PDFViewer` with Presentation Mode active.
By moving this code to a helper method in `BaseViewer`, it's thus possible to reduce the amount of duplicate code that currently needed in `PDFViewer` and `PDFSinglePageViewer`.
For a short document, such as e.g. the `tracemonkey` file, this repeated normalization won't matter much, but for documents with a couple of thousand pages it seems completely unnecessary (and wasteful) to keep repeating the normalization whenever for every single page.
Currently the text-content is normalized every time that a new search operation is started, which seems completely useless considering that the "raw" text-content is never used for anything.
For a short document, such as e.g. the `tracemonkey` file, this repeated normalization won't matter much, but for documents with a couple of thousand pages it seems completely unnecessary (and wasteful) to keep repeating the normalization whenever e.g. a new search operation starts.
In the event that multiple instances of `PDFFindController` ever exists simultaneously, they will all be able to share just one `normalize` function in this way. Furthermore, the regular expression is now created lazily rather than at class construction time.
Despite all highlighted matches being removed in response to the 'findbarclose' event, there's a risk that a match could still be scrolled into view *after* the findbar has been closed[1].
Hence we need to ensure that long running searches, particularily those happening in large and/or slow loading documents[2], are ignored as well.
---
[1] The match is hidden, as expected, but the document could still scroll unexpectedly.
[2] Large documents loaded with `disableAutoFetch = true` and `disableStream = true` set are particularily susceptible to this issue.
Given that dispatching the 'updatetextlayermatches' event with `pageIndex = -1` set is now used to target the textLayers of *all* pages, there's no need to send individual events to every single page during `_nextMatch`. Since there can be an arbitrary number of pages in a document, this small/simple optimization seems too easy to ignore.
This patch does four things:
- Change the search related methods in `TextLayerBuilder` to be "private", since there're only called from within the class itself now.
- Use `const` for local variables not intended to change in the search related methods in `TextLayerBuilder`.
- Finally, removes most `this.findController` checks since they are redundant. Note how both `this._convertMatches` and `this._renderMatches` are *only* ever called, from `this._updateMatches`, when `this.findController` is actually defined. Hence there's really no need to repeat those checks all over the place, especially with all the relevant methods now being marked as "private".
- Always initialize the `this._pageMatchesLength` property with an empty array, to simplify the code in `TextLayerBuilder`.
This should, hopefully, cover all the possible ways[1] in which "fake workers" are loaded. Given the different code-paths, adding unit-tests might not be that simple.
Note that in order to make this work, the various `fakeWorkerFilesLoader` functions were converted to return `Promises`.
---
[1] Unfortunately there's lots of them, for various build targets and configurations.
Rather than having every invocation of `Toolbar._updateUIState` compute a valid `pageScaleValue`, it seems easier to simply ensure that it happens when the value is actually updated.
Looking at the history of this code, this parameter has never been used.
I'm guessing that most likely the code in `web/toolbar.js` began life as a copy of `web/secondary_toolbar.js`, which would probably explain why that parameter exists.
*This patch is based on something that I noticed while working on PR 10126.*
The recent re-factoring of `PDFFindController` brought many improvements, among those the fact that access to `BaseViewer` is no longer required. However, with these changes there's one thing which now strikes me as not particularly user-friendly[1]: The fact that in order for searching to actually work, `PDFFindController.setDocument` must be called *and* a 'pagesinit' event must be dispatched (from somewhere).
For all other viewer components, calling the `setDocument` method[2] is enough in order for the component to actually be usable.
The `PDFFindController` thus stands out quite a bit, and it also becomes difficult to work with in any sort of custom implementation. For example: Imagine someone trying to use `PDFFindController` separately from the viewer[3], which *should* now be relatively simple given the re-factoring, and thus having to (somehow) figure out that they'll also need to manually dispatch a 'pagesinit' event for searching to work.
Note that the above even affects the unit-tests, where an out-of-place 'pagesinit' event is being used.
To attempt to address these problems, I'm thus suggesting that *only* `setDocument` should be used to indicate that searching may start. For the default viewer and/or the viewer components, `BaseViewer.setDocument` will now call `PDFFindController.setDocument` when the document is ready, thus requiring no outside configuration anymore[4]. For custom implementation, and the unit-tests, it's now as simple as just calling `PDFFindController.setDocument` to allow searching to start.
---
[1] I should have caught this during review of PR 10099, but unfortunately it's sometimes not until you actually work with the code in question that things like these become clear.
[2] Assuming, obviously, that the viewer component in question actually implements such a method :-)
[3] There's even a very recent issue, filed by someone trying to do just that.
[4] Short of providing a `PDFFindController` instance when creating a `BaseViewer` instance, of course.
Since searching itself is an asynchronous operation, removal of highlights needs to be asynchronous too since otherwise there's a risk that the events happen in the wrong order and find highlights thus remain visible.
Also, this patch will now ensure that only 'findbarclose' events for the *current* document is handled since other ones doesn't really matter. Note in particular that when no document is loaded text-layers are, obviously, not present and subsequently it's unnecessary to attempt to hide non-existent find highlights.
For many years it's been possible to enter a search term into the findbar(s) before the document has finised loading, such that searching starts immediately once it has loaded.
PR 10099 accidentally broke that, which I unfortunately missed during reviewing.
Since searching is asynchronous you cannot directly check in `executeCommand` if the document is loaded/current, but need to wait until searching is actually enabled first.
Furthermore this patch also ensures that the `_findTimeout` is always correctly cleared given that it adds further asynchronous behaviour to searching, since you obviously only want to deal with searches relevant to the current document.
This is similar to the format used by a number of other viewer components, and should simplify the `PDFSidebar` initialization slightly.
Furthermore, by using the `eventBus` it's no longer necessary for `PDFSidebar` to have a direct dependency on `PDFOutlineViewer`.
There's still room for improvement here, but this patch is at least a start (since it's not clear to me how best to handle the viewers).
This attempts to reduced the level of indirection, and the amount of code, when dispatching `fileattachmentannotation` events, by removing the `PDFLinkService.onFileAttachmentAnnotation` method and just accessing `PDFLinkService.eventBus` directly in the `FileAttachmentAnnotationElement` constructor.
Given that other properties, such as `externalLinkTarget`/`externalLinkRel`, are already being accessed directly this pattern seems fine here as well.
This commit shows that we can now unit test the find controller and
that executing regular queries works. Note that this is only a first
step and not a complete suite of unit tests for all possible options
of the find controller.
While writing this unit test, I found two smaller issues that I
addressed directly. The first one is that in the previous find
controller refactoring I forgot to rename some occurrences of a now
private member variable. Fortunately this did not cause any bugs since
we did have a public getter and the fetched value may be changed by
reference, but it's nevertheless good to fix. The second issue is that
some entries in the `test/unit/clitests.json` file were not correct,
resulting in these tests not being executed on e.g., Travis CI.
With `PDFFindController` instances no longer (directly) depending on
`BaseViewer` instances, we can pass a single `findController` when
initializing a viewer, similar to other components.
This removes the dependency on a `PDFViewer` instance from the find
controller, which makes it more similar to other components and makes it
easier to unit test with a mock link service.
Finally, we remove the search capabilities from the SVG example since it
doesn't work there because there is no separate text layer.
Now it follows the same pattern as e.g., the document properties
component, which allows us to have one instance of the find controller
and set a new document to search upon switching documents.
Moreover, this allows us to get rid of the dependency on `pdfViewer` in
order to fetch the text content for a page. This is working towards
getting rid of the `pdfViewer` dependency upon initializing the
component entirely in future commits.
Finally, we make the `reset` method private since it's not supposed to
be used from the outside anymore now that `setDocument` takes care of
this, similar to other components.
This way the resetting of `PDFLinkService`/`PDFDocumentProperties` instances, as is done in `PDFViewerApplication.close`, only requires passing in *one* `null` argument instead of two.
The built-in PDF Viewer (in Firefox) cannot use the browser findbar when PDF files are embedded in e.g. iframe/object tags, and the PDF.js findbar (i.e. `PDFFindBar`) will thus be used instead in those cases.
This is slightly problematic, since the `MOZCENTRAL` version of the viewer uses a special, slimmed down, version of the `l10n.js` file that doesn't (currently) support plural forms. To prevent the matchesCounter from breaking completely in this edge-case, temporarily hard-code the plural form to use the default `[other]` version of the locale strings.
This prevents the findbar from intermittently displaying `0 of {number} matches`, which *could* theoretically happen for large and/or slow loading documents.
The find controller should only coordinate finding a string in the
document and should not be responsible for presenting the matches to the
user. The text layer builder already contains the logic to render the
matches in the viewer, so it should also take care of scrolling the
selected match into view.
The find controller already has quite a lot of state to maintain. We can
avoid keeping track of this member variable because when the find
controller is reset, so is the extract text promises array. Therefore,
we can just check if that array contains items or not to determine if
text extraction already started.
Moreover, there is no need to reset the `pageContents` array since the
`reset` method already takes care of that.
This is an unfortunate oversight on my part, which I stumbled upon when (locally) testing the `mozilla-central` follow-up patch necessary to enable the matches counter in the built-in PDF viewer.
The property is intended to contain a reference to a DOM element, which not only is nowhere to be found *now* but appears to never have existed in the first place.
As outlined in https://bugzilla.mozilla.org/show_bug.cgi?id=1282759 the internal Firefox name for the feature is `entireWord`, hence that name is used here as well for consistency (with "Whole words" being limited to the UI).
Given existing limitations of the PDF.js search functionality, e.g. the existing problems of searching across "new lines", there's some edge-cases where "Whole words" searching will ignore (valid) results.
However, considering that this is a pre-existing issue related to the way that the find controller joins text-content together, that shouldn't have to block this new feature in my opionion.
*Please note:* In order to enable this feature in the `MOZCENTRAL` version, a small follow-up patch for [PdfjsChromeUtils.jsm](https://hg.mozilla.org/mozilla-central/file/tip/browser/extensions/pdfjs/content/PdfjsChromeUtils.jsm) will be required once this has landed in `mozilla-central`.
For the `PDFFindBar` implementation, similar to the native Firefox findbar, the matches count displayed is now limited to a (hopefully) reasonable value.
*Please note:* In order to enable this feature in the `MOZCENTRAL` version, a follow-up patch will be required once this has landed in `mozilla-central`.
This patch is the first step to be able to eventually get rid of the `attachDOMEventsToEventBus` function, by allowing `EventBus` instances to simply re-dispatch most[1] events to the DOM.
Note that the re-dispatching is purposely implemented to occur *after* all registered `EventBus` listeners have been serviced, to prevent the ordering issues that necessitated the duplicated page/scale-change events.
The DOM events are currently necessary for the `mozilla-central` tests, see https://hg.mozilla.org/mozilla-central/file/tip/browser/extensions/pdfjs/test, and perhaps also for custom deployments of the PDF.js default viewer.
Once this have landed, and been successfully uplifted to `mozilla-central`, I intent to submit a patch to update the test-code to utilize the new preference. This will thus, eventually, make it possible to remove the `attachDOMEventsToEventBus` functionality.
*Please note:* I've successfully ran all `mozilla-central` tests locally, with these patches applied.
---
[1] The exception being events that originated on the `window` or `document`, since those are already globally available anyway.
The new events follow the same naming pattern as the 'pagesinit'/'pagesloaded' events dispatched on `BaseViewer` instances, and the intention is to allow the eventual removal of 'documentload'.
[api-minor] Add an `IsLinearized` property to the `PDFDocument.documentInfo` getter, to allow accessing the linearization status through the API (via `PDFDocumentProxy.getMetadata`)
When updating Preferences using the `set` method, the input is carefully validated. However, no validation is (currently) done when a `BasePreferences` instance is created, which probably isn't that great. Hence this patch that simply ignores, to not unnecessarily break loading of the viewer itself, any invalid Preferences.
Given that the various Preferences are currently, and have been for quite some time, only used when initializing `PDFViewerApplication` re-loading them when a new PDF file is opened in the viewer is essentially a no-op.
Furthermore, with the only usage of `BasePreferences.reload` now gone, the value of that method seems questionable at best. In the event that the functionality is actually needed again, similar to the `ViewHistory`, it'd probably make more sense to simply replace `PDFViewerApplication.preferences` with a new `BasePreferences` instance instead (using e.g. `DefaultExternalServices.createPreferences`).
Given that *all* Preferences are already fetched in `PDFViewerApplication._readPreferences`, the amount of boilerplate/duplication can be considerably reduced with the addition of a `BasePreferences.getAll` method.
The only reason that this check ever existed in the first place, is that originally there was a global `PDFJS.openExternalLinkInNewWindow` option which was then subsumed by the (more generic) `PDFJS.externalLinkTarget` option. (The `externalLinkTarget` has since been moved into a `PDFLinkService` option, as part of PDF.js version `2.0`.)
Hence, during the period where both `PDFJS.openExternalLinkInNewWindow` and `PDFJS.externalLinkTarget` existed side-by-side, there was a need to allow the former one to override the latter one (for backward compatibility purposes). However, that's no longer the case, and this extra `externalLinkTarget` check can now be removed.
*This was a stupid error on my part; sorry about breaking this!*
With the current code, the value of the `externalLinkTarget` option is now (potentially) updated *after* the viewer components have been initialized. For the "viewer in iframe/object tag" case, the result is that the value of the `externalLinkTarget` option isn't adjusted as intended any more.
Without providing useful (custom) error messages for the `no-restricted-globals` rule, see https://eslint.org/docs/rules/no-restricted-globals, it's quite likely that the rule will be incorrectly disabled rather than the required globals being imported as intended.
To reduced duplication of the `no-restricted-globals` rule in multiple `.eslintrc` files, it's instead moved to the top-level `.eslintrc` file and disabled as needed on a folder/file basis outside of `/src` and `/web`.
*Another small piece of clean-up of code I've previously written; follow-up to PR 8775.*
Importing `createPromiseCapability`, and then using it in just *one* spot, seems unnecessary since the `waitOnEventOrTimeout` function may just as well return a regular `Promise` directly.
Given that the non-default Spread modes (currently) doesn't affect the page layout when horizontal scrolling is enabled, having the Spread buttons appear active when clicking them appears to do *nothing* is probably confusing rather than helpful to users.
If the current viewer is a `PDFSinglePageViewer` instance the Scroll/Spread modes are no-ops, hence displaying buttons that do *nothing* when clicked will probably do very little besides confuse users.
The names 'resetscrollmode'/'resetspreadmode' were probably *not* great choices, given that the only thing being reset are toolbar buttons and not the actual Scroll/Spread modes. Furthermore, there's really no need for two separate events here.
The patch also adds a comment that ought to have been included in PR 9040, to prevent future refactoring/removing of what may appear to be an unnecessary `Promise.resolve` call.
This moves/exposes the `URL` polyfill similarily to the existing `ReadableStream` polyfill, rather than exposing it globally, to avoid interfering with any "outside" code.
Both the `URL` and `ReadableStream` polyfills are now exposed on the `pdfjsLib` object, such that they are accessible to the viewer components.
Furthermore, the `no-restricted-globals` ESLint rule is also enabled to prevent accidental usage of the native `URL`/`ReadableStream` implementations directly in the `src/` and `web/` folders; see also https://eslint.org/docs/rules/no-restricted-globals
Addresses the remaining TODO in https://github.com/mozilla/pdf.js/projects/6
Rather than having to manually call a method on `PDFFindController` instances from `BaseViewer.setDocument`, thus essentially having to resolve the private `_firstPagePromise` from the "outside", this can be done easily with the 'pagesinit' event dispatched on the `eventBus` instead.
Please note this particular `PDFFindController` code pre-dates the `eventBus` by almost three years, which should explain why the code looks the way it does.
Since the Scroll/Spread modes are now document specific, as all other properties such as page/scale/rotation, ensure that the toolbar is always correctly reset.
Since other viewer state, such as the current page/scale/rotation[1], are not available as `BaseViewer` constructor options, this makes the Scroll/Spread modes stand out quite a bit. Hence it probably makes sense to remove/deprecate this, to avoid inconsistent and possibly confusing state in this code.
---
[1] These properties are *purposely* not available in the constructor, since attempting to set them before a document is loaded has number of issues; please refer to https://github.com/mozilla/pdf.js/pull/8539#issuecomment-309706629 for additional details.
Since all the other viewer methods use the getter/setter pattern, e.g. for setting page/scale/rotation, the way that the Scroll/Spread modes are set thus stands out. For consistency, this really ought to use the same pattern as the rest of the `BaseViewer`. (To avoid breaking third-party implementations, the old methods are kept around as aliases.)
Note how the other "...OnLoad" preferences will allow a *non-default* value to always override a history entry. To improve overall consistency for the viewer options, and to reduce possibly confusing behaviour, this patch changes the `scrollModeOnLoad`/`spreadModeOnLoad` preferences to behave as all the other ones.
For some reason, these weren't added to `AppOptions` despite actually being set and read from `web/app.js`. Not adding them creates inconsistencies, since all other options *are* present in `web/app_options.js`.
This property isn't accessed anywhere in the `web/app.js` file, and is also not being reset in `PDFViewerApplication.close`. Hence it seems that it can simply be removed, especially since the fingerprint is already synchronously available through `PDFViewerApplication.pdfDocument.fingerprint` (provided that a document is loaded).
With the current code, the location in the viewer could change *well* after the user has started to interact with the viewer (for very large and/or slow loading documents). Attempt to reduce the likelyhood of that happening, by adding an upper bound to the time spent waiting before attempting to re-apply the initial position.
Rather than using a "special" property to check if a `ViewHistory` database entry existed on load, it seems that you could just as well check for the existence of one of the actually needed properties instead (here 'page' is used).
This way we can avoid storing what, more of less, amounts to useless state, which will help reduce the size of the `ViewHistory` database. Given that we don't directly, nor need to, validate the `ViewHistory` data but rely on other parts of the code-base to do so, the existance of an 'exists' property doesn't in and of itself really add much utility.
Finally, to simplify the implementation the 'exists' property won't be actively removed from the `ViewHistory` data. Instead we'll simply stop adding 'exists' when writing `ViewHistory` data, and rely on the existing pruning of old entries to eventually remove any remnants of 'exists' from storage.
Note how, in the current code, only *one* old history entry would ever be removed. That would mean that if e.g. the `cacheSize` is reduced, it would potentially require loading of multiple files before the database would be correctly pruned.
Furthermore, in the case where the database was empty on load there's no need to attempt to shrink it, since trying to reduce the size of an *empty* array won't do much :-)