Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized
when creating the search query. So to avoid to duplicate the normalization code,
everything is moved in the find controller.
The previous code to normalize text was using NFKC but with a hardcoded map, hence it
has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size
by 30kb).
In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into
account some RTL unicode ranges, the generated font wasn't embedding the mapping this
char and the unicode ranges in the OS/2 table weren't up-to-date.
When normalized some chars can be replaced by several ones and it induced to have
some extra chars in the text layer. To avoid any regression, when copying some text
from the text layer, a copied string is normalized (NFKC) before being put in the
clipboard (it works like this in either Acrobat or Chrome).
After the previous patch we now have only *a single* `PRODUCTION` occurrence in the entire code-base, more specifically in the `web/viewer.html` file.
This special build-target can be replaced with any condition that always evaluate to `false`, such as e.g. a comment.
*Please note:* This patch might be considered too hacky, hence I completely understand if it's rejected.
This *special* build-target is very old, and was introduced with the first pre-processor that only uses comments to enable/disable code.
When the new pre-processor was added `PRODUCTION` effectively became redundant, at least in JavaScript code, since `typeof PDFJSDev === "undefined"` checks now do the same thing.
This patch proposes that we remove `PRODUCTION` from the JavaScript code, since that simplifies the conditions and thus improves readability in many cases.
*Please note:* There's not, nor has there ever been, any gulp-task that set `PRODUCTION = false` during building.
To make this functionality work out-of-the-box in custom implementations, see e.g. the "viewer components" examples, it'd be slightly easier if we dynamically create/insert the "hiddenCopyElement" in the `PDFViewer` constructor.
Given that the "copy all text" feature still appears to work just as before with this patch, hopefully I'm not overlooking any reason why doing this would be a bad idea.
I was playing with the new "copy all text" feature, and stumbled upon one document where the copied text was truncated; see http://mirrors.ctan.org/info/lshort/english/lshort.pdf
The problem turns out to be that on [page 83](https://ftp.acc.umu.se/mirror/CTAN/info/lshort/english/lshort.pdf#page=83) the textLayer contains `\u0000` and apparently copying just stops when a null char is encountered.
To fix this we can simply use an existing helper function, and with this patch we're able to successfully copy all the text in that document.
*Please note:* This patch only extends the `PDFFindController` implementation itself to support this functionality, however it's *purposely* not exposed in the default viewer.
This replaces the previous `phraseSearch`-parameter, and a `query`-string will now always be interpreted as a phrase-search.
To enable searching for individual words, the `query`-parameter must instead consist of an Array of strings. This way it's now also possible to combine phrase/word searches, with a `query`-parameter looking something like `["Lorem ipsum", "foo", "bar"]` which will search for the phrase "Lorem ipsum" *and* the words "foo" respectively "bar".
This method was originally added in PR 1320, eleven years ago, however it doesn't appear to ever have been used (not even from the start).
Furthermore, this method also tries to access a property that doesn't exist (`this.out`) and then call a method that also doesn't exist (`writeByteArray`).
In looking at https://bugs.ghostscript.com/show_bug.cgi?id=706451 I noticed that bug2.pdf was pretty
slow to load for such a basic file.
In profiling I noticed that a lot of time is spent in Array.concat, hence this patch use Array.push when
it's possible (it's now ~3 times faster).
The changes in PR 16238 were intended specifically for Node.js environments, however they accidentally applied to older browsers as well.
*Please note:* In up-to-date browsers `Path2D` is available in Workers, which should be connected to the introduction of `OffscreenCanvas`.
By getting the width/height of the first page initially, we can slightly reduce the amount of code needed both in the `hasEqualPageSizes`-check and when building the print-styles.
For the moment there is no real consensus on how we should download a pdf on Android.
Hence we keep this solution for the moment but behind a pref (which will be true on
nightly only).
Currently we repeat the same code in lots of places, to update the "toggled" class and "aria-checked" attribute, when various toolbar buttons are clicked.
For the MOZCENTRAL build-target this patch reduces the size of the *built* `web/viewer.js` file by just over `1.2` kilo-bytes.
Apparently the `structuredClone` polyfill doesn't handle transfers correctly, and `DOMException`s may thus be thrown. This is particularly problematical in Node.js environments, where that exception (obviously) isn't available.
To work-around these issues we'll simply ignore any transfers in `legacy`-builds, since those *may* use the `structuredClone` polyfill. This will obviously lead to slightly higher memory usage in those builds, however this really only affects Node.js environments. (Browsers are only affected if workers are disabled, however that's never been an officially recommended/supported configuration.)
Currently `float: inline-start/inline-end` is only supported in Firefox, see https://developer.mozilla.org/en-US/docs/Web/CSS/float#browser_compatibility, and in order to support other browsers we're thus forced to jump through some hoops.
This leads to slightly less nice code in the *built-in* Firefox PDF Viewer, and this patch attempts to improve the current situation:
- Use Stylelint to forbid direct use of `float: inline-start/inline-end` in the CSS files, to prevent future bugs in the general PDF.js viewer.
- Do a build-time replacement, only in MOZCENTRAL builds, to replace the CSS-variables with raw `float: inline-start/inline-end` instances.
Currently we have two separate image-caches on the worker-thread:
- A local one, which is unique to each `PartialEvaluator.getOperatorList` invocation. This one caches both names *and* references, since image-resources may be accessed in either way.
- A global one, which applies to the entire PDF documents and all its pages. This one only caches references, since nothing else would work.
This patch introduces a third image-cache, which essentially sits "between" the two existing ones. The new `RegionalImageCache`[1] will be usable throughout a `PartialEvaluator` instance, and consequently it *only* caches references, which thus allows us to keep track of repeated image-resources found in e.g. different /Form and /SMask objects.
---
[1] For lack of a better word, since naming things is hard...