This patch deprecates the existing `getOpenActionDestination` API method, in favor of a better and more general `getOpenAction` method instead. (For now JavaScript actions, related to printing, are still handled as before.)
By clearly separating "regular" Print actions from the JavaScript handling, it's thus possible to get rid of the somewhat annoying and strictly incorrect warning when the viewer loads.
It occured to me that similar to the `getGlobalEventBus` function, it's probably a good idea to *also* notify users of the fact that `eventBusDispatchToDOM` is now deprecated.
Rather than depending of the re-dispatching of internal events to the DOM, the default viewer can instead be used in e.g. the following way:
```javascript
document.addEventListener("webviewerloaded", function() {
PDFViewerApplication.initializedPromise.then(function() {
// The viewer has now been initialized, and its properties can be accessed.
PDFViewerApplication.eventBus.on("pagerendered", function(event) {
console.log("Has rendered page number: " + event.pageNumber);
});
});
});
```
I overlooked this in PR 11631; sorry about that!
Also, ensure that `EventBus` instances *always* track "external" events using a boolean regardless of the actual option value.
This changes the dropdown icon from being set using the `background` CSS property, to being set with `::after` which is *similar* to all the other toolbar button icons (which use `::before`).
Also tweaks the dropdown `background-color` on `:hover` slightly, since the other changes made it too light.
Since the goal has always been, essentially since the `EventBus` abstraction was added, to remove all dispatching of DOM events[1] from the viewer components this patch tries to address one thing that came up when updating the examples:
The DOM events are always dispatched last, and it's thus guaranteed that all internal event listeners have been invoked first.
However, there's no such guarantees with the general `EventBus` functionality and the order in which event listeners are invoked is *not* specified. With the promotion of the `EventBus` in the examples, over DOM events, it seems like a good idea to at least *try* to keep this ordering invariant[2] intact.
Obviously this won't prevent anyone from manually calling the new *internal* viewer component methods on the `EventBus`, but hopefully that won't be too common since any existing third-party code would obviously use the `on`/`off` methods and that all of the examples shows the *correct* usage (which should be similarily documented on the "Third party viewer usage" Wiki-page).
---
[1] Looking at the various Firefox-tests, I'm not sure that it'll be possible to (easily) re-write all of them to not rely on DOM events (since getting access to `PDFViewerApplication` might be generally difficult/messy depending on scopes).
In any case, even if technically feasible, it would most likely add *a lot* of complication that may not be desireable in the various Firefox-tests. All-in-all, I'd be fine with keeping the DOM events only for the `MOZCENTRAL` target and gated on `Cu.isInAutomation` (or similar) rather than a preference.
[2] I wouldn't expect any *real* bugs in a custom implementation, simply based on event ordering, but it nonetheless seem like a good idea if any "external" events are still handled last.
To avoid outright breaking third-party usages of the "viewer components" the `getGlobalEventBus` functionality is left intact, but a deprecation message is printed if the function is invoked.
The various examples are updated to *explicitly* initialize an `EventBus` instance, and provide that when initializing the relevant viewer components.
This complements the existing `PDFViewerApplication.initialized` boolean property, and may be helpful for custom implementations of the default viewer. This will thus provide users of the default viewer an alternative to setting the preference to dispatch events to the DOM (and listen for the "localized" event), since they can instead use:
```javascript
document.addEventListener("webviewerloaded", function() {
PDFViewerApplication.initializedPromise.then(function() {
// The viewer has now been initialized.
})
});
```
Note that in order to avoid manually tracking the initialization state *twice*, this implementation purposely uses the `PromiseCapability` functionality to handle both `PDFViewerApplication.initialized` and `PDFViewerApplication.initializedPromise` internally.
This patch contains some *much* needed clean-up of, and improvements to, this old code thus addressing a number of issues:
- Set more reasonable *default* values for the widths, in `web/viewer.css`, since the current ones are actually too small even for the (default) `en-US` locale.
This obviously result in a slightly larger zoom dropdown width for many locales, but the more consistent UI does look good to me.
- Stop setting the `min-width`/`max-width` and just use `width` instead.
- Set an explicit `height` of the zoom dropdown, in an attempt to get Google Chrome to display it with the same height as the toolbar buttons.
- Remove additional `Element.setAttribute("style", ...)` usage.
- Actually check *all* of the predefined l10n strings, since the old implementation (implicitly) assumed that the currently selected one was the longest (note e.g. the `ja-JP` locale where one string is considerably longer than the rest).
- Stop invalidating the DOM multiple times when doing the measurements. This was achieved by using a temporary in-memory `canvas`, and we now only need to query the DOM once in order to get the current font properties of the zoom dropdown.
This should hopefully be useful in environments where restrictive CSPs are in effect.
In most cases the replacement is entirely straighforward, and there's only a couple of special cases:
- For the `src/display/font_loader.js` and `web/pdf_outline_viewer.js `cases, since the elements aren't appended to the document yet, it shouldn't matter if the style properties are set one-by-one rather than all at once.
- For the `web/debugger.js` case, there's really no need to set the `padding` inline at all and the definition was simply moved to `web/viewer.css` instead.
*Please note:* There's still *a single* case left, in `web/toolbar.js` for setting the width of the zoom dropdown, which is left intact for now.
The reasons are that this particular case shouldn't matter for users of the general PDF.js library, and that it'd make a lot more sense to just try and re-factor that very old code anyway (thus fixing the `setAttribute` usage in the process).
Having thought *briefly* about using `css-vars-ponyfill`, I'm no longer convinced that it'd be a good idea. The reason is that if we actually want to properly support CSS variables, then that functionality should be available in *all* of our CSS files.
Note in particular the `pdf_viewer.css` file that's built as part of the `COMPONENTS` target, in which case I really cannot see how a rewrite-at-the-client solution would ever be guaranteed to always work correctly and without accidentally touching other CSS in the surrounding application.
All-in-all, simply re-writing the CSS variables at build-time seems much easier and is thus the approach taken in this patch; courtesy of https://github.com/MadLittleMods/postcss-css-variables
By using its `preserve` option, the built files will thus include *both* a fallback and a modern `var(...)` format[1]. As a proof-of-concept this patch removes a couple of manually added fallback values, and converts an additional sidebar related property to use a CSS variable.
---
[1] Comparing the `master` branch with this patch, when using `gulp generic`, produces the following diff for the built `web/viewer.css` file:
```diff
@@ -408,6 +408,7 @@
:root {
--sidebar-width: 200px;
+ --sidebar-transition-duration: 200ms;
}
* {
@@ -550,27 +551,28 @@
position: absolute;
top: 32px;
bottom: 0;
- width: 200px; /* Here, and elsewhere below, keep the constant value for compatibility
- with older browsers that lack support for CSS variables. */
+ width: 200px;
width: var(--sidebar-width);
visibility: hidden;
z-index: 100;
border-top: 1px solid rgba(51, 51, 51, 1);
-webkit-transition-duration: 200ms;
transition-duration: 200ms;
+ -webkit-transition-duration: var(--sidebar-transition-duration);
+ transition-duration: var(--sidebar-transition-duration);
-webkit-transition-timing-function: ease;
transition-timing-function: ease;
}
html[dir='ltr'] #sidebarContainer {
-webkit-transition-property: left;
transition-property: left;
- left: -200px;
+ left: calc(-1 * 200px);
left: calc(-1 * var(--sidebar-width));
}
html[dir='rtl'] #sidebarContainer {
-webkit-transition-property: right;
transition-property: right;
- right: -200px;
+ right: calc(-1 * 200px);
right: calc(-1 * var(--sidebar-width));
}
@@ -640,6 +642,8 @@
#viewerContainer:not(.pdfPresentationMode) {
-webkit-transition-duration: 200ms;
transition-duration: 200ms;
+ -webkit-transition-duration: var(--sidebar-transition-duration);
+ transition-duration: var(--sidebar-transition-duration);
-webkit-transition-timing-function: ease;
transition-timing-function: ease;
}
```
Since this rule is now enforced in the entire `/web` folder, there's no good reason for it to remain disabled in this file. Most likely, it's added to reduce the number of linting errors when we started enforcing block-scoped variables.
There's no particular reason for using the PDF.js helper function `createObjectURL`[1] in the debugger, instead of the native `URL.createObjectURL` directly, for a couple of reasons:
- The relevant code-path only applies to fonts loaded with the Font Loading API, and this isn't supported in IE anyway.
- The debugger can, since quite some time, not even be loaded in IE any more.
- General support for IE is now limited, and there's no guaratee that everything actually works.
---
[1] It provides a fallback for browsers with broken `Blob` support, which as usual means Internet Explorer :-P
Please find additional details about the ESLint rule at https://eslint.org/docs/rules/prefer-const
With the recent introduction of Prettier this sort of mass enabling of ESLint rules becomes a lot easier, since the code will be automatically reformatted as necessary to account for e.g. changed line lengths.
Note that this patch is generated automatically, by using the ESLint `--fix` argument, and will thus require some additional clean-up (which is done separately).
The debugging hash parameters[1] are intended to facilitate access to various tools/settings in PRODUCTION builds, protected by the `pdfBugEnabled` preference.
At this point, the remaining debugging hash parameters are mainly intended to allow access to the `PDFBug` tools and/or to quickly toggle certain larger features.
The "useOnlyCssZoom" functionality doesn't really seem to fit in with the rest of these hash parameters, since:
- This is, comparatively speaking, a minor viewer-specific feature.
- The zooming implementation will (almost) always fallback to CSS-only zooming, for any document, once the canvases becomes large enough. Hence, the majority of the CSS zooming feature can still be tested *directly* in any build of the viewer.
- After the initial implementation, years ago, the CSS-only zooming code in question hasn't changed much (or even at all), i.e. it doesn't seem like an active development target.[2]
- If the "useOnlyCssZoom" functionality was added today, it's unlikely that a hash parameter would've been added.
- Last, but not least, there's also a `useOnlyCssZoom` preference hence toggling this functionality shouldn't be too difficult (e.g. if someone needs to hack on it).
All in all, I'm thus suggesting that we remove the "useOnlyCssZoom" hash parameter.
---
[1] Originally these hash parameters could be used directly in any build, which was bad since it would allow any link to potentially disable functionality and/or reduce performance.
[2] If it had seen active development over the years, I'd be *much* more inclined to keep the hash parameter.
For people running e.g. Firefox with the `pdfBugEnabled` preference set, to allow quick access to debugging tools, this method will obviously run for every opened PDF file. However, in most cases the URL hash is empty and we can thus skip most of the parsing and simply return early instead.
This removes a couple of, thanks to preceeding code, unnecessary `typeof PDFJSDev` checks, and also fixes a couple of incorrectly implemented (my fault) checks intended for `TESTING` builds.
After PR 9566, which removed all of the old Firefox extension code, the `FIREFOX` build flag is no longer used for anything.
It thus seems to me that it should be removed, for a couple of reasons:
- It's simply dead code now, which only serves to add confusion when looking at the `PDFJSDev` calls.
- It used to be that `MOZCENTRAL` and `FIREFOX` was *almost* always used together. However, ever since PR 9566 there's obviously been no effort put into keeping the `FIREFOX` build flags up to date.
- In the event that a new, Webextension based, Firefox addon is created in the future you'd still need to audit all `MOZCENTRAL` (and possibly `CHROME`) build flags to see what'd make sense for the addon.
While only the `MOZCENTRAL` builds will actually do anything meaningful with the telemetry data, none of the code in question actually runs *at all* in e.g. development mode.[1]
This seems bad since it essentially means that this code is completely untested, despite being quite important for the built-in Firefox PDF viewer, and this thus ought to be fixed.
In this case, the explanation for the current state of the code should be "for historical reasons". Before the viewer was split into the current components and before the pre-processor was improved, back when all code resided in the `web/viewer.js` file, the telemetry reporting was done with *direct* `FirefoxCom` calls. However, with the dummy `DefaultExternalServices.reportTelemetry` method there's nothing actually preventing attempting to report telemetry in any type of build.
NOTE: By running this code in GENERIC builds as well, in addition to just locally, the *viewer* part of telemetry reporting becomes tested e.g. in preview builds too which should help with reviewing.
---
[1] When fixing bug 1606566, I had to edit the relevant `PDFJSDev` checks to be able to actually test the changes locally.
With https://bugzilla.mozilla.org/show_bug.cgi?id=844349 now being fixed in Firefox, the textLayer will now actually stay hidden as intended regardless of the browser settings.
Hence it should no longer be necessary to display the fallback bar, nor print a warning in the console, for documents which contains a textLayer.
Besides removing the `supportsDocumentColors` methods in the default viewer, we can also remove a now unused l10n string.
This rule is already enabled in mozilla-central, and helps avoid some confusing formatting, see https://searchfox.org/mozilla-central/rev/9e45d74b956be046e5021a746b0c8912f1c27318/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#209-210
With the recent introduction of Prettier some of the existing nested ternary statements became even more difficult to read, since any possibly helpful indentation was removed.
This particular ESLint rule wasn't entirely straightforward to enable, and I do recognize that there's a certain amount of subjectivity in the changes being made. Generally, the changes in this patch fall into three categories:
- Cases where a value is only clamped to a certain range (the easiest ones to update).
- Cases where the values involved are "simple", such as Numbers and Strings, which are re-factored to initialize the variable with the *default* value and only update it when necessary by using `if`/`else if` statements.
- Cases with more complex and/or larger values, such as TypedArrays, which are re-factored to let the variable be (implicitly) undefined and where all values are then set through `if`/`else if`/`else` statements.
Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-nested-ternary
As described in the issue, having a DOM element with `id=page2` (or any other number) will automatically cause that element to become linkable through the URL hash. That's currently leading to some confusing and outright wrong behaviour, since it obviously only works for pages that have been loaded and rendered.
For PDF documents the only officially supported way to reference a particular page through the URL hash is using the `#page=2` format, which also works for all pages regardless if they're loaded or not.
As far as I can tell there's nothing in the PDF.js default viewer that actually depends on the page/thumbnail `id` at this point in time, hence why I believe that this removal ought to be safe.
Just as a pre-caution this patch adds an `aria-label` to the page canvas, similar to the thumbnail canvas/image, to at least keep this information in the DOM.
In order to eventually get rid of SystemJS and start using native `import`s instead, we'll need to provide "complete" file identifiers since otherwise there'll be MIME type errors when attempting to use `import`.
When working on PR 11463 I couldn't help thinking that the `Array.prototype.some` callback function, used when determining the generator, was somewhat difficult to read with its partly unused and strangely named parameters.
This is not only useful to have one format for consistency, but also to
be able to quickly search for colors for e.g., finding duplicates or
when tweaking the CSS for custom deployments.
This covers the handful of cases that the `--fix` command couldn't deal with, and the changes aren't just fixing the linting errors but attempt to slightly improve the relevant code.
Please find additional details about the ESLint rule at https://eslint.org/docs/rules/prefer-const
Note that this patch is generated automatically, by using the ESLint `--fix` argument, and will thus require some additional clean-up (which is done separately).
Rather than having a copy of this regular expression in the `test/unit/api_spec.js` file, with a comment about keeping it up-to-date with the code in the viewer (note the incorrect file reference as well), we can just import it instead to simplify all of this.
This patch makes the follow changes:
- Remove no longer necessary inline `// eslint-disable-...` comments.
- Fix `// eslint-disable-...` comments that Prettier moved down, thus causing new linting errors.
- Concatenate strings which now fit on just one line.
- Fix comments that are now too long.
- Finally, and most importantly, adjust comments that Prettier moved down, since the new positions often is confusing or outright wrong.
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
Apparently Ghostscript can, in some cases, generate/include `Metadata` with incorrectly encoded characters.[1] This results in the viewer title looking wrong, which we thus attempt to avoid by falling back to the `Info` entry instead.
*Please note:* Obviously it would be better if this was fixed in the `Metadata` parser instead, rather than using a viewer work-around, but I'm just not sure how or even *if* that could actually be done given that the `Metadata` stream contains no trace of the *original* character.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1605526
---
[1] The problematic characters are from the Specials Unicode block, see https://en.wikipedia.org/wiki/Specials_(Unicode_block)
With these changes we'll always set the `pdfTitle` to the `Info` dictionary entry *first* (assuming it exists and isn't empty), before attempting to override it with the `Metadata` stream entry (assuming it exists, is non-empty *and* valid).
There should be no functional changes with this patch, but it will simplify the following patch somewhat.
This patch reduces some duplication, by moving *all* fake worker loader code into the `setupFakeWorkerGlobal` function. Furthermore, the functions are simplified further by using `async`/`await` where appropriate.
Looking at the `parseCurrentHash` function again it's now difficult for me to understand *what* I was thinking, since having a helper function that needs to be manually passed a `linkService` reference just looks weird.
This patch addresses a couple of smaller issues with the `PDFHistory` class:
- Most, if not all, other viewer components can be reset in one way or another, and there's no good reason for the `PDFHistory` implementation to be different here.
- Currently it's (technically) possible to keep adding entries to the browser history, via the `PDFHistory` instance, even after the document has been closed. That obviously makes no sense, and is caused by the lack of a `reset` method.
- The internal `this._isPagesLoaded` property was never actually reset, which would lead to it being temporarily wrong when a new document was opened in the default viewer.
The `viewer` option was *only* used for checking that a document is loaded in `PDFPresentationMode.request`, however that's just as easy to do by simply utilizing `BaseViewer.pagesCount` instead and this way we can also avoid the DOM lookup.
This was a blatant oversight in PR 10217, since there's obviously no `this.pageNumber` property anywhere in the `BaseViewer`. Luckily this shouldn't have caused any bugs, since the only call-site is also validating the `pageNumber` (but correctly that time).
*This patch is simple enough that I almost feel like I'm overlooking some trivial reason why this would be a bad idea.*
Note how in `{BaseViewer, PDFThumbnailViewer}.setDocument` we're always getting the *first* `pdfPage` in order to initialize all pages/thumbnails.
However, once that's done the first `pdfPage` is simply ignored and rendering of the first page thus requires calling `PDFDocumentProxy.getPage` yet again. (And in the `BaseViewer` case, it's even done once more after `onePageRenderedCapability` is resolved.)
All in all, I cannot see why we cannot just immediately set the first `pdfPage` and thus avoid an early round-trip to the API in the `_ensurePdfPageLoaded` method before rendering can begin.
This code was originally added to support IE10 (and below), however with those browsers *explicitly* unsupported since PDF.js version `2.0` this code is now dead.
Obviously the `_pagesRequests` functionality is *mainly* used when `disableAutoFetch` is set, but it will also be used during ranged/streamed loading of documents.
However, the `_pagesRequests` property is currently an Array which seems a bit strange:
- Arrays are zero-indexed, but the first element will never actually be set in the code.
- The `_pagesRequests` Array is never cleared, unless a new document is loaded, and once the `PDFDocumentProxy.getPage` call has resolved/rejected the element is just replaced by `null`.
- Unless the document is browsed *in order* the resulting `_pagesRequests` Array can also be arbitrarily sparse.
All in all, I don't believe that an Array is an appropriate data structure to use for this purpose.
This patch simply restores the behaviour that existed prior to PR 7697, since I cannot imagine that that was changed other than by pure accident.
As mentioned by a comment in `BaseViewer.setDocument`: "Printing is semi-broken with auto fetch disabled.", and note that since triggering of printing is a synchronous operation there's generally no easy way to load the missing data.
https://github.com/mozilla/pdf.js/pull/7697/files#diff-529d1853ee1bba753a0fcb40ea778723L1114-L1118
Considering just how small/simple this code is, it doesn't seem necessary to have a separate method for it (even more so when there's only one call-site).
I've always disliked the solution in PR 10461, since it required changes to the `PDFHistory` code itself to deal with a bug in IE11.
Now that IE11 support is limited, it seems reasonable to remove these `pushState`/`replaceState` hacks from the main code-base and simply use polyfills instead.
The code in question is *only* relevant in non-`PRODUCTION` mode, i.e. the *development* version of the viewer run with `gulp server`, and has been completely unused at least since SystemJS was added.
I really cannot see any reason to keep this, since it's code which first of all isn't shipping and secondly isn't even being used in the development viewer.
For *very* long/large documents fetching all pages on load may cause quite bad performance, both memory and CPU wise. In order to at least slightly alleviate this, we can let the viewer treat these kind of documents[1] as if `disableAutoFetch` were set.
---
[1] One example of a really bad case is https://bugzilla.mozilla.org/show_bug.cgi?id=1588435, which this patch should at least help somewhat. In general, for these cases, we'd probably need to implement switching between `PDFViewer`/`PDFSinglePageViewer` (as already tracked on GitHub) and use the latter for these kind of long documents.
The issue has been open for years now, and has even been marked with `5-good-beginner-bug` for *months*, without any movement.
Considering just how simple the suggested solution is, I'm submitting this patch just to close out a long-standing issue.
Obviously userAgent checks aren't that great, since it's very easy to spoof, but it probably doesn't hurt to attempt to extend this (since it's limited to `GENERIC` builds).
Besides, anyone using the default viewer can always set the `maxCanvasPixels` option to a value of their choosing.
Sometimes we also used `@return`, but `@returns` is what the JSDoc
documentation recommends. Even though `@return` works as an alias, it's
good to use the recommended syntax and to be consistent within the
project.
Sometimes we also used `@return` or `@returns`, but `@type` is what
the JSDoc documentation recommends. This also improves the documentation
because before this commit the types were not shown and now they are.
I've absolutely no idea why I wrote the code that way originally, since the nested `if`s are not really helping readability one bit.
Hence this patch which changes things to use early `return`s instead, to help readability.
Rather than manually clamping the `width` here, we can just `export` an already existing helper function from `ui_utils.js` instead.
(Hopefully it will eventually be possible to replace this helper function with a native `Math.clamp` function, given that there exists a "Stage 1 Proposal" for adding such a thing to the ECMAScript specification.)
Besides avoiding errors during loading, this also ensures that the document will be correctly scrolled/zoomed into view once the viewer becomes visible.
This "new" behaviour was always intended, see PR 2613, however various re-factoring over the years seem to have broken this (and I'm probably at least somewhat responsible for that).
By transfering, rather than copying, `ArrayBuffer`s between the main- and worker-threads, you can avoid unnecessary allocations by only having *one* copy of the same data.
Hence manually setting `postMessageTransfers: false`, when calling `getDocument`, is a performance footgun[1] which will do nothing but waste memory.
Given that every reasonably modern browser supports `postMessage` transfers[2], I really don't see why it should be possible to force-disable this functionality.
Looking at the browser support, for `postMessage` transfers[2], it's highly unlikely that PDF.js is even usable in browsers without it. However, the feature testing of `postMessage` transfers is kept for the time being just to err on the safe side.
---
[1] This is somewhat similar to the, now removed, `disableWorker` parameter which also provided API users a much too simple way of reducing performance.
[2] See e.g. https://developer.mozilla.org/en-US/docs/Web/API/MessagePort/postMessage#Browser_compatibility and https://developer.mozilla.org/en-US/docs/Web/API/Transferable#Browser_compatibility
This was added in PR 4470, but doesn't appear to have been used since.
While it's certainly easy to understand how this was helpful during development of that PR, actually providing this hash parameter isn't going to work anymore given that the original CMap files were also removed from the repository.
I suppose that the hash parameter *could* be useful if you'd attempt to update the BCMap files, however that hasn't been attempted even once in over *five* years time. Furthermore, at this point using the `AppOptions` directly in that situation should also work fine.
All in all, this seems like a piece of old and unused code which we can simply remove now.
By using the same internal formatting here as in the `Ref.toString` method, in `src/core/primitives.js`, all cache-keys will become at least two bytes shorter (and most three bytes shorter).
Obviously this won't have a huge effect on memory since there's only one cache entry per page, but it nonetheless seems wasteful to use longer keys than strictly required.
Firefox telemetry supports using string labels now. Convert our integers
that we used for categories to just use strings.
The upstream work will happen in:
https://bugzilla.mozilla.org/show_bug.cgi?id=1566882
Currently the indicator may remain visible even after the document has been closed, which seems weird given that no page is either visible nor rendering :-)
Ensure that setting the `zoomDisabledTimeout` isn't skipped, regardless of the supported zoom keys, when handling mouse wheel events (PR 7097 follow-up)
*Possible follow-up:* It probably wouldn't hurt to try and shorten the `supportedMouseWheelZoomModifierKeys` name a bit, but I'm not attempting that here since it'd also require updating `PdfStreamConverter.jsm` in mozilla-central in order to be consistent.
Since calling `getDocument` with a `PDFDataRangeTransport` argument will always unconditionally override a manually provided `length` argument, see a1a667809f/src/display/api.js (L390-L394), this patch should thus be safe.
This functionality is very old, and pre-dates e.g. the introduction of the `EventBus` by a number of years. Rather than attaching two callback functions to every single `PDFPageView` instance, it's thus now possible to utilize the `EventBus` such that you only need a grand total of two listeners to achieve the same result.
For the `onAfterDraw` callback the replacement is particularly simple, given that a 'pagerendered' event is already being dispatched in the appropriate spot. An added benefit here is the ability to remove the event listener, since we only ever care about *one* (arbitrary) page being rendered for the `BaseViewer.onePageRendered` promise.
For the `onBeforeDraw` callback, a new 'pagerender' event was thus added to replace the callback.
Given that this special-case only matters for the Firefox PDF viewer, it's probably better to just move it into `firefoxcom.js` instead to reduce unnecessary confusion.
Similar to the `zoomReset` method we need to ensure that this code won't run for zoom events originating within the browser UI itself, since checks in e.g. the `keydown` event handler won't help in that case.
When searching occurs for the first time in a document, the `textContent` of all pages will be fetched from the API. If there's a pending search operation when the document loads that will thus lead to a lot of `getTextContent` calls very early on, which may unnecessarily delay rendering of the first page. Generally, in the viewer, a number of non-essential API calls[1] will be deferred until the first page has been rendered, and there's no good reason as far as I can tell to handle searching differently.
---
[1] Such as e.g. `getOutline` and `getAttachments`.
Unless the `PDFLinkService` instance contains all of the expected methods, a lot of things will break in various places in the default viewer. Hence there's not much value in having this check, and outright falling seems more appropriate.
Finally, this also makes the return value explicit in this case, since that's consistent with the rest of the `PDFFindController._shouldDirtyMatch` method.
As have already been stated multiple times, simply increasing the printing resolution may have undesirable effects on both memory usage *and* general performance. Hence why PR 10854 did *not* add a preference, and only exposed AppOption by default in `GENERIC` builds for now.
However, considering how differently printing works in the built-in Firefox version (with `mozPrintCallback`) compared to the general default viewer, any testing done in the latter case might not be completely relevant to the first (and most important) case of the Firefox PDF Viewer.
Note that considering the implementation of `AppOptions.get`, this patch will be safe and should allow experimenting with `printResolution` in all builds of the default viewer[1]. By not, however, having `printResolution` appear in AppOptions for either the `MOZCENTRAL` or `CHROMIUM` build targets, there should be no indication of official support for now.
Furthermore, it shouldn't be a preference at this point in time (or even at all), since that makes it too easy for users to change it permanently[2] and possible "break" printing.
---
[1] By running `PDFViewerApplicationOptions.get('printResolution', /* value here */);` in the console after the viewer loads.
[2] I've seen Firefox bugs, filed in Bugzilla, where users modified e.g. preferences manually in `about:config` and then some time later (maybe months) wondered why something was suddenly broken. In those cases, trying to work out that a preference change was the culprit can take time/effort.
This includes the information in the core and display layers. The
date parsing logic from the document properties is rewritten according
to the specification and now includes unit tests.
Moreover, missing unit tests for the color of a popup annotation have
been added.
Finally the styling of the popup is changed slightly to make the text a
bit smaller (it's currently quite large in comparison to other viewers)
and to make the drop shadow a bit more subtle. The former is done to be
able to easily include the modification date in the popup similar to how
other viewers do this.
The file `test/pdfs/annotation-caret-ink.pdf` is already available in
the repository as a reference test for this since I supplied it for
another patch that implemented ink annotations.
With PR 10675 having fixed the completely broken `disableRange=true` setting in the Firefox version of PDF.js, I couldn't help but noticing that loading progress is never reported properly in that case.
Currently loading progress is only reported for the `rangeProgress` chrome-event, which obviously isn't dispatched with `disableRange=true` set. However, the `progressiveRead` chrome-event includes loading progress as well, but this information isn't being used in any way.
Furthermore, the `PDFDataRangeTransport.onDataProgress` method wasn't able to handle "complete" loading information, and neither was `PDFDataTransportStream._onProgress` since that method would only ever attempt to report it through a RangeReader (which won't exist when `disableRange=true` is set).
Currently if trying to set `disableRange=true` in the built-in PDF Viewer in Firefox, either through `about:config` or via the URL hash, the PDF document will never load. It appears that this has been broken for a couple of years, without anyone noticing.
Obviously it's not a good idea to set `disableRange=true`, however it seems that this bug affects the PDF Viewer in Firefox even with default settings:
- In the case where `initialData` already contains the *entire* file, we're forced to dispatch a range request to re-fetch already available data just so that file loading may complete.
- (In the case where the data arrives, via streaming, before being specifically requested through `requestDataRange`, we're also forced to re-fetch data unnecessarily.) *This part was removed, to reduce the scope/risk of the patch somewhat.*
In the cases outlined above, we're having to re-fetch already available data thus potentially delaying loading/rendering of PDF files in Firefox (and wasting resources in the process).
This lays the necessary foundation for handling zoom events originating within the browser itself, rather than in the viewer. Please note that this will also require a follow-up patch to `mozilla-central`, such that the viewer is actually notified when zooming occurs.
The generated `default_preferences.json` file is necessary when initializing the Firefox preferences, which only supports certain types, hence this patch adds additional validation to help prevent run-time errors in Firefox.
Given that these changes add a code-path to `AppOptions.getAll` which could throw, the `OptionKind.PREFERENCE` branch is also modified to require *exact* matching to prevent (future) errors in the viewer.
Finally the conditionally defined `defaultOptions` will no longer (potentially) be considered during the `gulp default_preferences` task, to make it more difficult for them to be accidentally included.
Currently these methods are only used from the respective `reset` methods, and from `{BaseViewer, PDFThumbnailViewer}._cancelRendering` which only runs when the active document is closed.
This patch changes `{PDFPageView, PDFThumbnailView}.cancelRendering` to *only* cancel any pending rendering operations, and doesn't attempt to reset e.g. the `renderingState`, since that causes visual glitches (duplicated canvases in the viewer) when called directly.
Furthermore, unless you "know" what you're doing, the `{PDFPageView, PDFThumbnailView}.reset` methods are what *should* normally be used instead.
This implements the nice suggestion from https://github.com/mozilla/pdf.js/pull/10099#discussion_r219698000, which I at the time didn't think would work.
I'll probably have to plead temporary insanity here, since it *should* have been totally obvious to me how this could be implemented. By simply not registering the event until the `textLayer` is actually rendered, removing the event on `cancel` works just fine.
This patch also removes the `pagecancelled` event, given that it's no longer used anywhere in the code-base and that its implemention was flawed since it wasn't guaranteed to be called in *every* situation where rendering was actually cancelled.
Currently any editing of the preferences require updates in *three* separate files, which isn't a great developer experience to say the least.
This has annoyed me sufficiently to write this patch, which moves the definition of all preferences into `AppOptions` and adds a new `gulp` task to generate the `default_preferences.json` file for the builds where it's needed.
There's a bunch of code, in the viewer, which for historical reasons use `switch` statements to add and remove CSS classes.
This code can be simplified, and unnecessary duplication avoided, by using `classList.toggle` instead.
This avoids having the initialization code "spread out", and will become even simpler once the `TODO` is addressed (which I'm planning on fixing as soon as possible).
This patch ignores the recently added `disableOpenActionDestination` preference, since the latest PDF.js version found on the "Chrome Web Store" doesn't include it.
The intention with preferences such as `sidebarViewOnLoad`/`scrollModeOnLoad`/`spreadModeOnLoad` were always that they should be able to *unconditionally* override their view history counterparts.
Due to the way that these preferences were initially implemented[1], trying to e.g. force the sidebar to remain hidden on load cannot be guaranteed[2]. The reason for this is the use of "enumeration values" containing zero, which in hindsight was an unfortunate choice on my part.
At this point it's also not as simple as just re-numbering the affected structures, since that would wreak havoc on existing (modified) preferences. The only reasonable solution that I was able to come up with was to change the *default* values of the preferences themselves, but not their actual values or the meaning thereof.
As part of the refactoring, the `disablePageMode` preference was combined with the *adjusted* `sidebarViewOnLoad` one, to hopefully reduce confusion by not tracking related state separately.
Additionally, the `showPreviousViewOnLoad` and `disableOpenActionDestination` preferences were combined into a *new* `viewOnLoad` enumeration preference, to further avoid tracking related state separately.
The new "private" method will return a boolean, indicating if the `sidebarviewchanged` event was dispatched, thus allowing some simplification of the `PDFSidebar.setInitialView` method.
When `firstVisibleElementInd === 0`, regardless of the number of views, there's no reason to attempt to backtrack at all since it's never possible to find an element before the *first* one anyway.
This piggybacks of the existing `cancel` functionality, to ensure that any pending operations are closed *and* that any temporary canvases are actually being removed.
Also simplifies `finishPaintTask` in `PDFPageView.draw` slightly, by converting it to an async function.
This attempts to provide more "default" methods in the base class, in order to reduce unnecessary duplication and to improve self-documentation of the `BaseViewer` class slightly.
The following changes are made (in no particular order):
- Have `BaseViewer` implement the `_scrollIntoView` method, and *extend* it as necessary in `PDFViewer`/`PDFSinglePageViewer`.
- Simply inline the `BaseViewer._resizeBuffer` method, in `BaseViewer.update`, since there's only one call-site at this point.
- Provide a default implementation of `_isScrollModeHorizontal` in `BaseViewer`, and have `PDFSinglePageViewer` override it.
- Provide a default implementation of `_getVisiblePages`, and have `PDFViewer` extend it and `PDFSinglePageViewer` override it.
Based on the discussion in https://bugzilla.mozilla.org/show_bug.cgi?id=1521413, this patch simply removes the `ReadableStream` polyfill completely from MOZCENTRAL builds.
With this patch, the size of the `gulp mozcentral` build target is thus further reduced (building on PR 10470):
| | `build/mozcentral`
|-------|-------------------
|master | 3 339 666
|patch | 3 209 572
The `FontInspector` was completely broken by PR 10197, and trying to select a particular font (using the checkboxes) or clicking on a piece of text currently does nothing.
With https://bugzilla.mozilla.org/show_bug.cgi?id=1505122 landing in Firefox 65, the native `ReadableStream` implementation is now enabled by default in Firefox.
Obviously it would be nice to simply stop bundling the polyfill in MOZCENTRAL builds altogether, however given that it's still possible to disable[1] `ReadableStream` this is probably not a good idea just yet.
Nonetheless, now that native support is available, it seems unnecessary (and wasteful) to keep bundling the polyfill twice[2] in MOZCENTRAL builds. Hence this patch, which contains a suggest approach for packing the polyfill in a *separate* file which is then *only* loaded if/when needed.
With this patch, the size of the `gulp mozcentral` build target is thus reduced accordingly:
| | `build/mozcentral`
|-------|-------------------
|master | 3 461 089
|patch | 3 340 268
Besides the PDF.js files taking up less space in Firefox this way, the additional benefit is that there's (by default) less code that needs to be loaded and parsed when the PDF Viewer is used which also cannot hurt.
---
[1] In `about:config`, by toggling the `javascript.options.streams` preference.
[2] Once in the `build/pdf.js` file, and once in the `build/pdf.worker.js` file.
Since most of the important rendering code is already (almost) identical between `PDFViewer.update` and `PDFSinglePageViewer.update`, it's possible to further reduce duplication by moving the code into `BaseViewer.update` instead.
Apparently in IE 11 when `history.{pushState, replaceState}` is called, there's actually a difference between not providing the *third* argument vs providing it set implicitly to `undefined`. It appears that in IE 11 it's actually being stringified, rather than ignored, which seems completely wrong (obviously other browsers aren't affected, so no surprises there).
This is yet another reason why I think the feature itself was a really bad idea, since it now requires extra/duplicated code just to prevent weird/incorrect URL behaviour in crappy browsers.
Previously a couple of different attempts at fixing this problem has been rejected, given how *crucial* this code is for the correct function of the viewer, since no one has thus far provided any evidence that the problem actually affects the default viewer[1] nor an example using the viewer components directly (without another library on top).
The fact that none of the prior patches contained even a *simple* unit-test probably contributed to the unwillingness of a reviewer to sign off on the suggested changes.
However, it turns out that it's possible to create a reduced test-case, using the default viewer, that demonstrates the error[2]. Since this utilizes a hidden `<iframe>`, please note that this error will thus affect Firefox as well.
Note that while errors are thrown when the hidden `<iframe>` loads, the default viewer doesn't break completely since rendering does start working once the `<iframe>` becomes visible (although the errors do break the initial Toolbar state).
Before making any changes here, I carefully read through not just the immediately relevant code but also the rendering code in the viewer (given it's dependence on `getVisibleElements`). After concluding that the changes should be safe in general, the default viewer was tested without any issues found. (The above being much easier with significant prior experience of working with the viewer code.)
Finally the patch also adds new unit-tests, one of which explicitly triggers the relevant code-path and will thus fail with the current `master` branch.
This patch also makes `PDFViewerApplication` slightly more robust against errors during document opening, to ensure that viewer/document initialization always completes as expected.
Please keep in mind that even though this patch prevents an error in `getVisibleElements`, it's still not possible to set the initial position/zoom level/sidebar view etc. when the viewer is hidden since rendering and scrolling is completely dependent[3] on being able to actually access the DOM elements.
---
[1] And hence the PDF Viewer that's built-in to Firefox.
[2] Copy the HTML code below and save it as `iframe.html`, and place the file in the `web/` folder. Then start the server, with `gulp server`, and navigate to http://localhost:8888/web/iframe.html
```html
<!DOCTYPE html>
<html>
<head>
<title>Iframe test</title>
<script>
window.onload = function() {
const button = document.getElementById('button1');
const frame = document.getElementById('frame1');
button.addEventListener('click', function(evt) {
frame.hidden = !frame.hidden;
});
};
</script>
</head>
<body>
<button id="button1">Toggle iframe</button>
<br>
<iframe id="frame1" width="800" height="600" src="http://localhost:8888/web/viewer.html" hidden="true"></iframe>
</body>
</html>
```
[3] This is an old, pre-exisiting, issue that's not relevant to this patch as such (and it's already being tracked elsewhere).
With modern JavaScript supporting block-scoped variables, it's no longer necessary to have large numbers of top-level variable definitions (especially when all variables are, implicitly, set to `undefined`).
Besides moving variable definintions, a number of them are also converted to `const` to help ensure that they cannot be *accidentally* modified in the code.
Finally, the `views.length` calculation is now cached *once* rather than being re-computed multiple times.
This is *really* the best that we can do here, since other proposed solutions would interfere with (and break) the painstakingly implemented browsing history that's present in the default viewer.
I'm still not convinced that this is a good idea in general, but this patch implements it in a way where it is possible to toggle[1] for users that wish to have this feature. In particular, there's a couple of reasons why I'm not finding this feature necessary/great:
- It's already possible to easily obtain the current hash, by simply clicking on the `viewBookmark` button at any time.
- Hash changes requires a bit of special handling[2], i.e. extra code, to prevent issues when the browser history is traversed (see `PDFHistory._popState`). Currently this is only necessary when the user has manually changed the hash, with this patch it will always be the case (assuming the feature is active).
- It's not always possible to change the URL when updating the browser history. For example: In the Firefox built-in viewer, the URL cannot be modified for local files (i.e. those using the `file://` protocol).
This leads to inconsistent behaviour, and may in some cases even result in errors being thrown and the history thus not updating, if the browser prevents changes to the URL during `pushState`/`replaceState` calls.
---
[1] Using the `historyUpdateUrl` viewer preference.
[2] This depends, to a great extent, on browsers always firing `popstate` events *before* `hashchange` events, which may or may not actually be guaranteed.
This should hopefully be sufficient to address issue 6847, and given the limited impact of the code changes I'm not completely sure if this would need to be controlled by a preference!?
Initially my intention was to try and provide some (slightly more detailed) implementation suggestions in the issue, but having looked briefly at doing that it would essentially have amounted to actually writing the code anyway. (Especially considering that the recent questions seemed to more-or-less ignore the information already provided in the first post.)
Finally, note that since `performance.navigation.type` is marked as deprecated, a slightly different approach was choosen instead.
If, as PR 10368 suggests, more parameters should be added to `getViewport` I think that it would be a mistake to not change the signature *first* to avoid needlessly unwieldy call-sites.
To not break any existing code and third-party use-cases, this is obviously implemented with a deprecation warning *and* with a working fallback[1] for the old method signature.
---
[1] This is limited to `GENERIC` builds, which should be sufficient.
Given that it's really not clear to me if this is actually desired functionality in the default viewer, and considering that it doesn't fit in *great* with the way that `PDFHistory` is initialized, this feature is currently off by default[1].
---
[1] It's controlled with the `disableOpenActionDestination` Preference/AppOption.
This patch re-factors, and extends, the already existing `zoomDisabledTimeout` used during mouse wheel zooming.
Unfortunately I haven't got the required hardware to actually test this patch, but there's a decent chance that it will fix, or at least reduce, the problems reported in https://bugzilla.mozilla.org/show_bug.cgi?id=1503412.
Given that a larger number of pages may now be visible at once, and importantly that their layout may be non-vertical, one of the conditions should be tweaked to not accidentally miss cases where a page is still visible.
Please note: This patch is based on code-inspection, and the only ill effect occurring without it would be a couple of (near) duplicate history entries in some *rare* edge-cases.
With the removal of the global `PDFJS` object, in PDF.js version `2.0`, the viewer options are no longer as easily accessible as they previously were (and issues have been filed about this).
In particular, since the viewer files aren't necessarily loaded *immediately*, this means that `PDFViewerApplication`/`PDFViewerApplicationOptions` aren't necessarily available directly. By dispatching an event once all viewer files are loaded but *before* the viewer initialization has run, setting `AppOptions` during load (in custom implementations of the default viewer) should hopefully become a little bit easier[1].
---
[1] In hindsight, this should probably have been implemented when the global `PDFJS` object was removed...
Rather than having a (somewhat) randomly choosen list of Preferences which `AppOptions` are allowed to override, it makes much more sense to simply add an AppOption to allow custom implementations to ignore Preferences altogether (it's also inline with the AppOption that allows the `ViewHistory` to be bypassed on load).
Rather than closing [bug 1505824](https://bugzilla.mozilla.org/show_bug.cgi?id=1505824) as WONTFIX (which is my preferred solution), given how *minor* this "problem" is, it's still possible to adjust the error messages a bit.
The main point here, which is relevant even if the changes in `BaseViewer` are ultimately rejected during review, is that we'll no longer attempt to call `BaseViewer.currentPageLabel` with an empty string from `webViewerPageNumberChanged` in `app.js`.
The other changes are:
- Stop printing an error in `BaseViewer._setCurrentPageNumber`, and have it return a boolean indicating if the page is within bounds.
- Have the `BaseViewer.{currentPageNumber, currentPageLabel}` setters print their own errors for invalid pages.
- Have the `BaseViewer.currentPageLabel` setter no longer depend, indirectly, on the `BaseViewer.currentPageNumber` setter.
- Improve a couple of other error messages.
When `highlightAll` is *not* set, there's only going to be a single match per page and unconditionally calling `PDFFindController.scrollMatchIntoView` doesn't really matter.
However, when `highlightAll` is set the current code may result in a large number of unnecessary `PDFFindController.scrollMatchIntoView` calls. Since `TextLayerBuilder._renderMatches` already checks if a particular match is the selected one, for highlighting purposes, it's simple enough to also skip scrolling completely for non-selected matches.
Only scroll search results into view as a result of an actual find operation, and not when the user scrolls/zooms/rotates the document (bug 1237076, issue 6746)
Most of the code in `PDFFindController` assumes that a valid `state` always exits, hence it cannot hurt to add a simple check to avoid errors being thrown.
Currently searching, and particularily highlighting of search results, may interfere with subsequent user-interactions such as scrolling/zooming/rotating which can result in a somewhat jarring UX where the document suddenly "jumps" to a previous position.
This is especially annoying in cases where the highlighted search result isn't even visible when a user initiated scrolling/zooming/rotating happens, and there exists a couple of bugs/issues about this behaviour.
It seems reasonable, as far as I'm concerned, to treat searching as one operation and any subsequent non-search user interactions with the viewer as separate and thus not scroll the current search result into view *unless* the user is actually doing another search.
This also seems consistent with general searching in e.g. Firefox and Adobe Reader:
- Compare with "regular" searching of e.g. HTML files in Firefox, where the user scrolling and/or zooming the document will not force a currently highlighted search result to become re-scrolled into view.
- Compare also with Adobe Reader, where the user scrolling, zooming, and/or rotating the document will not force the currently highlighted search result to become re-scrolled into view.
The question is then why search highlighting was implemented this way in PDF.js to begin with. It might be that this wasn't really intended behaviour, but more a consequence of the asynchronous nature of the API. Considering that most operations, such as fetching the page, rendering it and extracting its text-content are all asynchronous; searching and highlighting of matches thus becomes asynchronous too.
However, it should be possible to track when search results have been scrolled into view and highlighted, and thus prevent these wierd "jumps" when the user interacts with the document.
*Please note:* Unfortunately this required moving the scrolling of matches back into `PDFFindController`, since I simply couldn't see any other (reasonable) way of implementing the functionality without tracking the `_shouldScroll` property in only *one* spot.
However, given that the new `PDFFindController.scrollMatchIntoView` method follows a similar pattern as `BaseViewer.scrollPageIntoView` and `PDFThumbnailViewer.scrollThumbnailIntoView`, this is hopefully deemed OK.
Given that the `_updateScrollMode`/`_updateSpreadMode` methods are "private", there's no particular reason to not just directly call `_setCurrentPageNumber`.
Note that when e.g. presentation mode is active, we fail[1] to ensure that the `pageNumber` parameter is actually an integer before calling `_setCurrentPageNumber` (that method expects the argument be an integer).
Also changes the method signature, of `scrollPageIntoView`, to use object destructuring instead.
---
[1] Most likely, this is actually *my* oversight :-)
Currently we'll only attempt to start from the current page when a new search is done, however for 'findagain' operations we'll always continue from the last match position.
This could easily lead to confusing behaviour if the user has scrolled to a completely different part of the document. In an attempt to improve this somewhat, for repeated 'findagain' operations, we'll instead reset the position to the current page when it's *absolutely* certain that the user has scrolled.
Note that this required adding a new `BaseViewer` method, and exposing that through `PDFLinkService`, in order to check if a given page is visible.
In an attempt to avoid issues, in custom implementations of `PDFFindController`, the code checks for the existence of the `PDFLinkService.isPageVisible` method *before* using it.
Unfortunately the `PDFFindController.executeCommand` method has now become a bit more complicated than one would like, but hopefully this small change will improve the structure somewhat (especially for subsequent patches).
With only *one* event now being dispatched when the scale changes, in combination with there only being two call-sites, it doesn't seem necessary to keep the helper method for dispatching the 'scalechanging' event.
Currently `PDFFindController._calculateMatch` is (indirectly) dispatching an `updatetextlayermatches` event for every *single* page of the document. For short documents, such as the `tracemonkey` file, this probably doesn't matter too much, but for documents with a couple of thousand pages it seems unfortunate.
It shouldn't be necessary, in general, to dispatch `updatetextlayermatches` events here, since that's already being taken care of in `PDFFindController._updateMatch` which is always called when a match has been found.
However, when `highlightAll` is set we still need to ensure that pages which finished rendered *before* searching begun are updated correctly.
**STR:**
1. Open the default viewer, with the `tracemonkey` file.
2. Open the findbar, and search for "trace".
3. Enable the "Highlight All" option.
4. Close the findbar.
5. Re-open the findbar, and click on the "findNext" button.
6. Scroll down to the *second* page of the document.
**ER:**
Since "Highlight All" is active, all matches on the *second* page should be highlighted.
**AR:**
No matches are highlighted on the *second* page.
This is relevant for e.g. `PDFSinglePageViewer`, and `PDFViewer` with Presentation Mode active.
By moving this code to a helper method in `BaseViewer`, it's thus possible to reduce the amount of duplicate code that currently needed in `PDFViewer` and `PDFSinglePageViewer`.
For a short document, such as e.g. the `tracemonkey` file, this repeated normalization won't matter much, but for documents with a couple of thousand pages it seems completely unnecessary (and wasteful) to keep repeating the normalization whenever for every single page.
Currently the text-content is normalized every time that a new search operation is started, which seems completely useless considering that the "raw" text-content is never used for anything.
For a short document, such as e.g. the `tracemonkey` file, this repeated normalization won't matter much, but for documents with a couple of thousand pages it seems completely unnecessary (and wasteful) to keep repeating the normalization whenever e.g. a new search operation starts.
In the event that multiple instances of `PDFFindController` ever exists simultaneously, they will all be able to share just one `normalize` function in this way. Furthermore, the regular expression is now created lazily rather than at class construction time.
Despite all highlighted matches being removed in response to the 'findbarclose' event, there's a risk that a match could still be scrolled into view *after* the findbar has been closed[1].
Hence we need to ensure that long running searches, particularily those happening in large and/or slow loading documents[2], are ignored as well.
---
[1] The match is hidden, as expected, but the document could still scroll unexpectedly.
[2] Large documents loaded with `disableAutoFetch = true` and `disableStream = true` set are particularily susceptible to this issue.
Given that dispatching the 'updatetextlayermatches' event with `pageIndex = -1` set is now used to target the textLayers of *all* pages, there's no need to send individual events to every single page during `_nextMatch`. Since there can be an arbitrary number of pages in a document, this small/simple optimization seems too easy to ignore.
This patch does four things:
- Change the search related methods in `TextLayerBuilder` to be "private", since there're only called from within the class itself now.
- Use `const` for local variables not intended to change in the search related methods in `TextLayerBuilder`.
- Finally, removes most `this.findController` checks since they are redundant. Note how both `this._convertMatches` and `this._renderMatches` are *only* ever called, from `this._updateMatches`, when `this.findController` is actually defined. Hence there's really no need to repeat those checks all over the place, especially with all the relevant methods now being marked as "private".
- Always initialize the `this._pageMatchesLength` property with an empty array, to simplify the code in `TextLayerBuilder`.
This should, hopefully, cover all the possible ways[1] in which "fake workers" are loaded. Given the different code-paths, adding unit-tests might not be that simple.
Note that in order to make this work, the various `fakeWorkerFilesLoader` functions were converted to return `Promises`.
---
[1] Unfortunately there's lots of them, for various build targets and configurations.
Rather than having every invocation of `Toolbar._updateUIState` compute a valid `pageScaleValue`, it seems easier to simply ensure that it happens when the value is actually updated.
Looking at the history of this code, this parameter has never been used.
I'm guessing that most likely the code in `web/toolbar.js` began life as a copy of `web/secondary_toolbar.js`, which would probably explain why that parameter exists.
*This patch is based on something that I noticed while working on PR 10126.*
The recent re-factoring of `PDFFindController` brought many improvements, among those the fact that access to `BaseViewer` is no longer required. However, with these changes there's one thing which now strikes me as not particularly user-friendly[1]: The fact that in order for searching to actually work, `PDFFindController.setDocument` must be called *and* a 'pagesinit' event must be dispatched (from somewhere).
For all other viewer components, calling the `setDocument` method[2] is enough in order for the component to actually be usable.
The `PDFFindController` thus stands out quite a bit, and it also becomes difficult to work with in any sort of custom implementation. For example: Imagine someone trying to use `PDFFindController` separately from the viewer[3], which *should* now be relatively simple given the re-factoring, and thus having to (somehow) figure out that they'll also need to manually dispatch a 'pagesinit' event for searching to work.
Note that the above even affects the unit-tests, where an out-of-place 'pagesinit' event is being used.
To attempt to address these problems, I'm thus suggesting that *only* `setDocument` should be used to indicate that searching may start. For the default viewer and/or the viewer components, `BaseViewer.setDocument` will now call `PDFFindController.setDocument` when the document is ready, thus requiring no outside configuration anymore[4]. For custom implementation, and the unit-tests, it's now as simple as just calling `PDFFindController.setDocument` to allow searching to start.
---
[1] I should have caught this during review of PR 10099, but unfortunately it's sometimes not until you actually work with the code in question that things like these become clear.
[2] Assuming, obviously, that the viewer component in question actually implements such a method :-)
[3] There's even a very recent issue, filed by someone trying to do just that.
[4] Short of providing a `PDFFindController` instance when creating a `BaseViewer` instance, of course.
Since searching itself is an asynchronous operation, removal of highlights needs to be asynchronous too since otherwise there's a risk that the events happen in the wrong order and find highlights thus remain visible.
Also, this patch will now ensure that only 'findbarclose' events for the *current* document is handled since other ones doesn't really matter. Note in particular that when no document is loaded text-layers are, obviously, not present and subsequently it's unnecessary to attempt to hide non-existent find highlights.
For many years it's been possible to enter a search term into the findbar(s) before the document has finised loading, such that searching starts immediately once it has loaded.
PR 10099 accidentally broke that, which I unfortunately missed during reviewing.
Since searching is asynchronous you cannot directly check in `executeCommand` if the document is loaded/current, but need to wait until searching is actually enabled first.
Furthermore this patch also ensures that the `_findTimeout` is always correctly cleared given that it adds further asynchronous behaviour to searching, since you obviously only want to deal with searches relevant to the current document.
This is similar to the format used by a number of other viewer components, and should simplify the `PDFSidebar` initialization slightly.
Furthermore, by using the `eventBus` it's no longer necessary for `PDFSidebar` to have a direct dependency on `PDFOutlineViewer`.
There's still room for improvement here, but this patch is at least a start (since it's not clear to me how best to handle the viewers).
This attempts to reduced the level of indirection, and the amount of code, when dispatching `fileattachmentannotation` events, by removing the `PDFLinkService.onFileAttachmentAnnotation` method and just accessing `PDFLinkService.eventBus` directly in the `FileAttachmentAnnotationElement` constructor.
Given that other properties, such as `externalLinkTarget`/`externalLinkRel`, are already being accessed directly this pattern seems fine here as well.
This commit shows that we can now unit test the find controller and
that executing regular queries works. Note that this is only a first
step and not a complete suite of unit tests for all possible options
of the find controller.
While writing this unit test, I found two smaller issues that I
addressed directly. The first one is that in the previous find
controller refactoring I forgot to rename some occurrences of a now
private member variable. Fortunately this did not cause any bugs since
we did have a public getter and the fetched value may be changed by
reference, but it's nevertheless good to fix. The second issue is that
some entries in the `test/unit/clitests.json` file were not correct,
resulting in these tests not being executed on e.g., Travis CI.
With `PDFFindController` instances no longer (directly) depending on
`BaseViewer` instances, we can pass a single `findController` when
initializing a viewer, similar to other components.
This removes the dependency on a `PDFViewer` instance from the find
controller, which makes it more similar to other components and makes it
easier to unit test with a mock link service.
Finally, we remove the search capabilities from the SVG example since it
doesn't work there because there is no separate text layer.
Now it follows the same pattern as e.g., the document properties
component, which allows us to have one instance of the find controller
and set a new document to search upon switching documents.
Moreover, this allows us to get rid of the dependency on `pdfViewer` in
order to fetch the text content for a page. This is working towards
getting rid of the `pdfViewer` dependency upon initializing the
component entirely in future commits.
Finally, we make the `reset` method private since it's not supposed to
be used from the outside anymore now that `setDocument` takes care of
this, similar to other components.
This way the resetting of `PDFLinkService`/`PDFDocumentProperties` instances, as is done in `PDFViewerApplication.close`, only requires passing in *one* `null` argument instead of two.
The built-in PDF Viewer (in Firefox) cannot use the browser findbar when PDF files are embedded in e.g. iframe/object tags, and the PDF.js findbar (i.e. `PDFFindBar`) will thus be used instead in those cases.
This is slightly problematic, since the `MOZCENTRAL` version of the viewer uses a special, slimmed down, version of the `l10n.js` file that doesn't (currently) support plural forms. To prevent the matchesCounter from breaking completely in this edge-case, temporarily hard-code the plural form to use the default `[other]` version of the locale strings.
This prevents the findbar from intermittently displaying `0 of {number} matches`, which *could* theoretically happen for large and/or slow loading documents.
The find controller should only coordinate finding a string in the
document and should not be responsible for presenting the matches to the
user. The text layer builder already contains the logic to render the
matches in the viewer, so it should also take care of scrolling the
selected match into view.
The find controller already has quite a lot of state to maintain. We can
avoid keeping track of this member variable because when the find
controller is reset, so is the extract text promises array. Therefore,
we can just check if that array contains items or not to determine if
text extraction already started.
Moreover, there is no need to reset the `pageContents` array since the
`reset` method already takes care of that.
This is an unfortunate oversight on my part, which I stumbled upon when (locally) testing the `mozilla-central` follow-up patch necessary to enable the matches counter in the built-in PDF viewer.
The property is intended to contain a reference to a DOM element, which not only is nowhere to be found *now* but appears to never have existed in the first place.
As outlined in https://bugzilla.mozilla.org/show_bug.cgi?id=1282759 the internal Firefox name for the feature is `entireWord`, hence that name is used here as well for consistency (with "Whole words" being limited to the UI).
Given existing limitations of the PDF.js search functionality, e.g. the existing problems of searching across "new lines", there's some edge-cases where "Whole words" searching will ignore (valid) results.
However, considering that this is a pre-existing issue related to the way that the find controller joins text-content together, that shouldn't have to block this new feature in my opionion.
*Please note:* In order to enable this feature in the `MOZCENTRAL` version, a small follow-up patch for [PdfjsChromeUtils.jsm](https://hg.mozilla.org/mozilla-central/file/tip/browser/extensions/pdfjs/content/PdfjsChromeUtils.jsm) will be required once this has landed in `mozilla-central`.
For the `PDFFindBar` implementation, similar to the native Firefox findbar, the matches count displayed is now limited to a (hopefully) reasonable value.
*Please note:* In order to enable this feature in the `MOZCENTRAL` version, a follow-up patch will be required once this has landed in `mozilla-central`.
This patch is the first step to be able to eventually get rid of the `attachDOMEventsToEventBus` function, by allowing `EventBus` instances to simply re-dispatch most[1] events to the DOM.
Note that the re-dispatching is purposely implemented to occur *after* all registered `EventBus` listeners have been serviced, to prevent the ordering issues that necessitated the duplicated page/scale-change events.
The DOM events are currently necessary for the `mozilla-central` tests, see https://hg.mozilla.org/mozilla-central/file/tip/browser/extensions/pdfjs/test, and perhaps also for custom deployments of the PDF.js default viewer.
Once this have landed, and been successfully uplifted to `mozilla-central`, I intent to submit a patch to update the test-code to utilize the new preference. This will thus, eventually, make it possible to remove the `attachDOMEventsToEventBus` functionality.
*Please note:* I've successfully ran all `mozilla-central` tests locally, with these patches applied.
---
[1] The exception being events that originated on the `window` or `document`, since those are already globally available anyway.
The new events follow the same naming pattern as the 'pagesinit'/'pagesloaded' events dispatched on `BaseViewer` instances, and the intention is to allow the eventual removal of 'documentload'.
[api-minor] Add an `IsLinearized` property to the `PDFDocument.documentInfo` getter, to allow accessing the linearization status through the API (via `PDFDocumentProxy.getMetadata`)
When updating Preferences using the `set` method, the input is carefully validated. However, no validation is (currently) done when a `BasePreferences` instance is created, which probably isn't that great. Hence this patch that simply ignores, to not unnecessarily break loading of the viewer itself, any invalid Preferences.
Given that the various Preferences are currently, and have been for quite some time, only used when initializing `PDFViewerApplication` re-loading them when a new PDF file is opened in the viewer is essentially a no-op.
Furthermore, with the only usage of `BasePreferences.reload` now gone, the value of that method seems questionable at best. In the event that the functionality is actually needed again, similar to the `ViewHistory`, it'd probably make more sense to simply replace `PDFViewerApplication.preferences` with a new `BasePreferences` instance instead (using e.g. `DefaultExternalServices.createPreferences`).
Given that *all* Preferences are already fetched in `PDFViewerApplication._readPreferences`, the amount of boilerplate/duplication can be considerably reduced with the addition of a `BasePreferences.getAll` method.
The only reason that this check ever existed in the first place, is that originally there was a global `PDFJS.openExternalLinkInNewWindow` option which was then subsumed by the (more generic) `PDFJS.externalLinkTarget` option. (The `externalLinkTarget` has since been moved into a `PDFLinkService` option, as part of PDF.js version `2.0`.)
Hence, during the period where both `PDFJS.openExternalLinkInNewWindow` and `PDFJS.externalLinkTarget` existed side-by-side, there was a need to allow the former one to override the latter one (for backward compatibility purposes). However, that's no longer the case, and this extra `externalLinkTarget` check can now be removed.
*This was a stupid error on my part; sorry about breaking this!*
With the current code, the value of the `externalLinkTarget` option is now (potentially) updated *after* the viewer components have been initialized. For the "viewer in iframe/object tag" case, the result is that the value of the `externalLinkTarget` option isn't adjusted as intended any more.
Without providing useful (custom) error messages for the `no-restricted-globals` rule, see https://eslint.org/docs/rules/no-restricted-globals, it's quite likely that the rule will be incorrectly disabled rather than the required globals being imported as intended.
To reduced duplication of the `no-restricted-globals` rule in multiple `.eslintrc` files, it's instead moved to the top-level `.eslintrc` file and disabled as needed on a folder/file basis outside of `/src` and `/web`.
*Another small piece of clean-up of code I've previously written; follow-up to PR 8775.*
Importing `createPromiseCapability`, and then using it in just *one* spot, seems unnecessary since the `waitOnEventOrTimeout` function may just as well return a regular `Promise` directly.
Given that the non-default Spread modes (currently) doesn't affect the page layout when horizontal scrolling is enabled, having the Spread buttons appear active when clicking them appears to do *nothing* is probably confusing rather than helpful to users.
If the current viewer is a `PDFSinglePageViewer` instance the Scroll/Spread modes are no-ops, hence displaying buttons that do *nothing* when clicked will probably do very little besides confuse users.
The names 'resetscrollmode'/'resetspreadmode' were probably *not* great choices, given that the only thing being reset are toolbar buttons and not the actual Scroll/Spread modes. Furthermore, there's really no need for two separate events here.
The patch also adds a comment that ought to have been included in PR 9040, to prevent future refactoring/removing of what may appear to be an unnecessary `Promise.resolve` call.
This moves/exposes the `URL` polyfill similarily to the existing `ReadableStream` polyfill, rather than exposing it globally, to avoid interfering with any "outside" code.
Both the `URL` and `ReadableStream` polyfills are now exposed on the `pdfjsLib` object, such that they are accessible to the viewer components.
Furthermore, the `no-restricted-globals` ESLint rule is also enabled to prevent accidental usage of the native `URL`/`ReadableStream` implementations directly in the `src/` and `web/` folders; see also https://eslint.org/docs/rules/no-restricted-globals
Addresses the remaining TODO in https://github.com/mozilla/pdf.js/projects/6
Rather than having to manually call a method on `PDFFindController` instances from `BaseViewer.setDocument`, thus essentially having to resolve the private `_firstPagePromise` from the "outside", this can be done easily with the 'pagesinit' event dispatched on the `eventBus` instead.
Please note this particular `PDFFindController` code pre-dates the `eventBus` by almost three years, which should explain why the code looks the way it does.
Since the Scroll/Spread modes are now document specific, as all other properties such as page/scale/rotation, ensure that the toolbar is always correctly reset.
Since other viewer state, such as the current page/scale/rotation[1], are not available as `BaseViewer` constructor options, this makes the Scroll/Spread modes stand out quite a bit. Hence it probably makes sense to remove/deprecate this, to avoid inconsistent and possibly confusing state in this code.
---
[1] These properties are *purposely* not available in the constructor, since attempting to set them before a document is loaded has number of issues; please refer to https://github.com/mozilla/pdf.js/pull/8539#issuecomment-309706629 for additional details.
Since all the other viewer methods use the getter/setter pattern, e.g. for setting page/scale/rotation, the way that the Scroll/Spread modes are set thus stands out. For consistency, this really ought to use the same pattern as the rest of the `BaseViewer`. (To avoid breaking third-party implementations, the old methods are kept around as aliases.)
Note how the other "...OnLoad" preferences will allow a *non-default* value to always override a history entry. To improve overall consistency for the viewer options, and to reduce possibly confusing behaviour, this patch changes the `scrollModeOnLoad`/`spreadModeOnLoad` preferences to behave as all the other ones.
For some reason, these weren't added to `AppOptions` despite actually being set and read from `web/app.js`. Not adding them creates inconsistencies, since all other options *are* present in `web/app_options.js`.
This property isn't accessed anywhere in the `web/app.js` file, and is also not being reset in `PDFViewerApplication.close`. Hence it seems that it can simply be removed, especially since the fingerprint is already synchronously available through `PDFViewerApplication.pdfDocument.fingerprint` (provided that a document is loaded).
With the current code, the location in the viewer could change *well* after the user has started to interact with the viewer (for very large and/or slow loading documents). Attempt to reduce the likelyhood of that happening, by adding an upper bound to the time spent waiting before attempting to re-apply the initial position.
Rather than using a "special" property to check if a `ViewHistory` database entry existed on load, it seems that you could just as well check for the existence of one of the actually needed properties instead (here 'page' is used).
This way we can avoid storing what, more of less, amounts to useless state, which will help reduce the size of the `ViewHistory` database. Given that we don't directly, nor need to, validate the `ViewHistory` data but rely on other parts of the code-base to do so, the existance of an 'exists' property doesn't in and of itself really add much utility.
Finally, to simplify the implementation the 'exists' property won't be actively removed from the `ViewHistory` data. Instead we'll simply stop adding 'exists' when writing `ViewHistory` data, and rely on the existing pruning of old entries to eventually remove any remnants of 'exists' from storage.
Note how, in the current code, only *one* old history entry would ever be removed. That would mean that if e.g. the `cacheSize` is reduced, it would potentially require loading of multiple files before the database would be correctly pruned.
Furthermore, in the case where the database was empty on load there's no need to attempt to shrink it, since trying to reduce the size of an *empty* array won't do much :-)
Since the current page will be explicitly scrolled into view *directly* afterwards anyway (compare with e.g. the `pagesRotation` code), trying to maintain the current position when re-applying the zoom level during changing of Scroll modes is redundant.
Note how in `BaseViewer.forceRendering` the Scroll mode is used to determine how pre-rendering will work. Currently this is broken in Presentation Mode, if horizontal scrolling was enabled prior to entering fullscreen.
Furthermore, there's a few additional cases where the `this.scrollMode === ScrollMode.HORIZONTAL` check is pointless either in Presentation Mode or when a `PDFSinglePageViewer` instance is used.
Since the Scroll/Spread modes doesn't make (any) sense in `PDFSinglePageViewer` instances, the general structure of these methods can be improved to reflect that.
Given that this method is a no-op in `PDFSinglePageViewer`, similar to `_regroupSpreads`, let's improve the general code structure by simply moving the method.
There's no good reason to iterate through an arbitrary number of DOM elements this way, since a document could contain thousands of pages, when everything can be easily removed at once; compare with e.g. `BaseViewer._resetView` and `PDFThumbnailViewer._resetView`.
Furthermore given that it's a `PDFViewer` instance, the `this.viewer` property can be accessed directly. Besides, `_setDocumentViewerElement` only exists as a helper method for `setDocument` in the base class and none of this code applies for `PDFSinglePageViewer` instances either.
The Secondary Toolbar buttons for, not to mention the actual toggling of, Scroll/Spread modes are currently completely broken in older browsers (such as IE11).
As a follow-up, it'd probably be a good idea to try and find a *feature complete* `classList` polyfill that could be used instead, but this patch at least addresses the immediate regression.
Please refer to the compatibility information in https://developer.mozilla.org/en-US/docs/Web/API/Element/classList#Browser_compatibility
The `onOpenWithURL` method, in `PDFViewerApplication.initPassiveLoading`, accepts a `originalURL` parameter which is then passed on to `PDFViewerApplication.open` as is. However, the latter method expects the name of the parameter to be `originalUrl` (note the casing), meaning that `getDocument` will fail in this case.
For consistency, and to avoid confusion, the renaming is done in `web/chromecom.js` as well.
*The danger of fixing one bug is that it can, sometimes too easily, cause another one in the process; sorry for not catching this when testing PR 9816 locally.*
When the *entire* viewer becomes narrow enough, as controlled by the `@media all and (max-width: 840px)` media query, the sidebar should (semi-transparently) overlay the `viewerContainer` instead of moving it horizontally. Unfortunately the changes made in PR 9816 caused the relevant CSS rules to be skipped, because of how the inheritance model works in CSS.
I'm well aware that `!important` is usually advised against, since it "breaks" the CSS inheritance model. However in this case it seemed reasonable to use it, to not only fix the bug at hand but to also prevent similar bugs from occurring in the future.
This is a regression from PR 8993; it causes the pages to be offset horizontally in Presentation Mode, if and only if the sidebar is currently open when the user triggers Presentation Mode.
Please note that while this doesn't seem to affect Firefox, both Chrome and IE are however affected.
Interestingly enough, despite the Chrome extension being affected as well, I cannot find any issue filed about this. (Either Presentation Mode isn't used much at all, or users don't open the sidebar before entering Presentation Mode.)
It appears that Microsoft silently fixed the problem that required disabling of fullscreen mode, in e.g. `iframe`s, in IE 11; please see issue 4711 and PR 5525 for historical context.
Unfortunately my Google-fu isn't strong enough to find any *official* information regarding the fixing of the browser bug in IE. However testing of the default viewer in IE 11, with this patch applied, it now appears that Presentation Mode is working correctly even in an `iframe` in IE 11.
Further anecdotal evidence that the bug is in fact fixed, is for example that jQuery previously contained a work-around for the IE bug. However, that's removed over two years ago now; see ff1a0822f7 and the issues referenced there.
Given that the default viewer isn't intended to be used as-is anyway (in custom deployments), it didn't seem necessary to keep the `disableFullscreen` option around since it was *only* ever added for compatibility purposes.
Fixes 9585.
Not only is the `Util.loadScript` helper function unused on the Worker side, even trying to use it there would throw an Error (since `document` isn't defined/available in Workers).
Hence this helper function is moved, and its code modernized slightly by having it return a Promise rather than needing a callback function.
Finally, to reduced code duplication, the "new" loadScript function is exported and used in the viewer.
This should provide a better out-of-the-box experience when using PDF.js in a Node.js environment, since it's missing native support for both `@font-face` and `Image`.
Please note that this change *only* affects the default values, hence it's still possible for an API consumer to override those values when calling `getDocument`.
Also, prevents "ReferenceError: document is not defined" errors, when running the unit-tests in Node.js/Travis.
This commit adds `scrollModeOnLoad` and `spreadModeOnLoad` preferences
that control the default viewer state when opening a new document for
the first time.
This commit also contains a minor refactoring of some of the option UI
rendering code in extensions/chromium/options/options.js, as I couldn't
bear creating two more functions nearly identical to the four that
already existed.
This builds on the scrolling mode work to add three buttons for joining
page spreads together: one for the default view, with no page spreads,
and two for spreads starting on odd-numbered or even-numbered pages.
Prior to this commit, if the vertical scroll bar is absent and the horizontal
scroll bar is present, a link to a particular point on the page which should
induce a horizontal scroll did not do so, because the absence of a vertical
scroll bar meant that the viewer was not recognized as the nearest scrolling
ancestor. This commit adds a check for horizontal scroll bars when searching
for the scrolling ancestor.
*This is a final piece of clean-up of code that I recently wrote, after which I'm done :-)*
When the `getMainThreadWorkerMessageHandler` function was added, in PR 9385, it did so by basically introducing a `web/app.js` dependency in `src/display/api.js` through the `window.pdfjsNonProductionPdfWorker` property[1]. Even though this is limited to non-`PRODUCTION` mode, i.e. `gulp server`, it still seems unfortunate to have that sort of viewer dependency in the API code itself.
With the new, much nicer and shorter, names introduced in PR 9565 we can remove this non-`PRODUCTION` hack and just use `window.pdfjsWorker` in both the viewer and the API regardless of the build mode.
---
[1] It didn't seem correct to piggy-back on the `window.pdfjsDistBuildPdfWorker` property in non-`PRODUCTION` mode.
Please note that this patch *purposely* doesn't add every standard (or semi-standard) page name in existence, but rather only a few common ones. This is done to lessen the burden on localizers, since it's quite possible that all of the page names could need translation (depending on locale).
It's easy to add more standard page sizes in the future, but we should take care to *only* add those that are very commonly used in actual PDF files.
This uses a whitelist, based on the locale, to determine where non-metric units should be used.
Note that the behaviour implemented here seem consistent with desktop PDF viewers (e.g. Adobe Reader), where the pageSizes are *always* displayed with locale dependent units rather than pageSize dependent ones (since the latter would probably be quite confusing).