The "binary" CMap-format is specific to the PDF.js library, and is used to reduce the size of the built-in CMap data-files.
By moving this code to its own file we can remove the nowadays unnecessary closures, which helps to slightly reduce the size of this code.
The latest version of `eslint-plugin-mozilla` removed the Prettier dependency, see https://bugzilla.mozilla.org/show_bug.cgi?id=1677562, which means that we no longer need to use `npm install --force` in the PDF.js library.
When permissions are enabled and the PDF document doesn't have the COPY-flag set, it shouldn't be possible for the user to trigger the "copy all text" feature.
While this slightly reduces duplication in the CSS rules, some of the auto-formatting done by Prettier is perhaps not great. (Given the overall advantage of using Prettier, we'll probably have to simply accept this.)
Hopefully these changes make sense (since this functionality is new to me), however the existing `xfa`-tests should help avoid any outright regressions.
Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized
when creating the search query. So to avoid to duplicate the normalization code,
everything is moved in the find controller.
The previous code to normalize text was using NFKC but with a hardcoded map, hence it
has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size
by 30kb).
In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into
account some RTL unicode ranges, the generated font wasn't embedding the mapping this
char and the unicode ranges in the OS/2 table weren't up-to-date.
When normalized some chars can be replaced by several ones and it induced to have
some extra chars in the text layer. To avoid any regression, when copying some text
from the text layer, a copied string is normalized (NFKC) before being put in the
clipboard (it works like this in either Acrobat or Chrome).
After the previous patch we now have only *a single* `PRODUCTION` occurrence in the entire code-base, more specifically in the `web/viewer.html` file.
This special build-target can be replaced with any condition that always evaluate to `false`, such as e.g. a comment.
*Please note:* This patch might be considered too hacky, hence I completely understand if it's rejected.
This *special* build-target is very old, and was introduced with the first pre-processor that only uses comments to enable/disable code.
When the new pre-processor was added `PRODUCTION` effectively became redundant, at least in JavaScript code, since `typeof PDFJSDev === "undefined"` checks now do the same thing.
This patch proposes that we remove `PRODUCTION` from the JavaScript code, since that simplifies the conditions and thus improves readability in many cases.
*Please note:* There's not, nor has there ever been, any gulp-task that set `PRODUCTION = false` during building.
To make this functionality work out-of-the-box in custom implementations, see e.g. the "viewer components" examples, it'd be slightly easier if we dynamically create/insert the "hiddenCopyElement" in the `PDFViewer` constructor.
Given that the "copy all text" feature still appears to work just as before with this patch, hopefully I'm not overlooking any reason why doing this would be a bad idea.
I was playing with the new "copy all text" feature, and stumbled upon one document where the copied text was truncated; see http://mirrors.ctan.org/info/lshort/english/lshort.pdf
The problem turns out to be that on [page 83](https://ftp.acc.umu.se/mirror/CTAN/info/lshort/english/lshort.pdf#page=83) the textLayer contains `\u0000` and apparently copying just stops when a null char is encountered.
To fix this we can simply use an existing helper function, and with this patch we're able to successfully copy all the text in that document.