On my computer, it takes few tenths of a second to load a local font.
Since a font can be used several times in a document, the cache will
improve performances.
- Replace FoxitSans with LiberationSans: LiberationSans is already there (for XFA) and we can use
it as a good replacement of FoxitSans.
- For now we just try to substitue standard fonts, the strategy is the following:
* we try to find a font locally from a hardcoded list;
* if it fails then we use Liberation as fallback (only for Helvetica for the moment);
* else we just fallback on the system serif/sansserif/monospace font.
This patch updates the minimum supported browsers as follows:
- Safari 15.4, which was released on 2022-03-15; see https://en.wikipedia.org/wiki/Safari_version_history#Safari_15
Nowadays we usually we try, where feasible and possible, to support browsers that are about two years old. The reasons for limiting support to a *somewhat* more recent Safari version include:
- Throughout the history of the PDF.js project, Safari has always been the worst browser to attempt to support. Compared to other browsers there's a disproportionate number of bugs affecting Safari, especially on iOS, and in most cases those are browser-specific issues that we simply cannot address.[1]
- Safari has often been a lot slower, compared to other browsers, at implementing new web-platform features. Historically this has sometimes blocked usage of new features, for the benefit of the Firefox PDF Viewer, and it's very often meant having to include and maintain polyfills *only* for Safari.
- The current (minimum) supported Safari version lack enough functionality that polyfills placed in the `src/shared/compatibility.js` file are unfortunately not sufficient, but it also requires a bunch of special-cases in both the `gulpfile` and in the `web/`-code.
- Given that the *built-in* Firefox PDF Viewer is the primary development target for the PDF.js library, and the general development pace these days, we need to limit the maintenance "overhead" caused by other browsers.
---
[1] In a few cases a work-around might be possible, however it'd negatively affect e.g. performance, readability, and/or maintainability of the code.
Originally we only used the `structuredClone` polyfill in the `LoopbackPort`-implementation, and that obviously isn't used anywhere within the various image decoders.
At this point in time we've started to use `structuredClone` a little bit more, hence it seems overall simpler to just bundle the polyfill even in the `legacy`-version of the IMAGE_DECODERS built-target.
For some time these checks have only targeted Node.js environments, since the features in question exist in all supported browsers (even when a `legacy`-build is used).
Now that we've updated the minimum supported Node.js version to 18, a number of polyfills are thus (finally) no longer necessary in that environment. Hence for certain *basic* functionality, such as e.g. text-extraction, it's now possible to use either a modern- or a `legacy`-build of the PDF.js library in Node.js environments.
*Please note:* For e.g. canvas-rendering in Node.js environments it's still necessary to use a `legacy`-build, since that functionality requires various polyfills.
This patch updates the minimum supported environments as follows:
- Node.js 18, which was released on 2022-04-19; see https://en.wikipedia.org/wiki/Node.js#Releases
Note also that Node.js 16 will soon reach EOL, and thus no longer receive any security updates.
The /Decode-implementation in the our JPEG decoder, i.e. `src/core/jpg.js`, seems to only handle *inverting* of images properly. To support arbitrary /Decode-entries correctly we'll always use the `PDFImage.decodeBuffer` method, even for "simple" JPEG images, which should be fine since non-default /Decode-entries aren't a very common occurrence.
*Please note:* This patch will lead to a little bit of movement in some existing test-cases, however it should be virtually imperceivable to the naked eye.
This property was added in PR 12726 specifically for use in the `getFontType` function, indirectly used by the `PDFDocumentProxy.stats` getter in the API.
In PR 15880 that functionality was removed, but I forgot to remove this now unused font-property.
Now that we no longer depend on the old Babel version in SystemJS we can remove the `static get ...` work-arounds used to define constants, which leads to slightly more compact code.
Now that https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 has landed in Firefox, we're able to use worker-modules during development :-)
This removes the final piece of SystemJS usage from the PDF.js library, thus allowing a fair bit of clean-up, and we now use *only* native `import`/`export` statements everywhere in development mode.
When the `GlobalImageCache` implementation originally landed, back in PR 11912, the image handling was slightly more complex (with e.g. browser-decoding of some JPEG images). At this point it no longer seems necessary to manually handle pageIndexes in this way, and we should be able to simply inline that in the `GlobalImageCache.shouldCache` method.
This method was added in PR 4938, almost nine years ago, however it doesn't appear to ever have been used.
Given the similarities between the `PDF17` and `PDF20` classes, and how they're used, if the `PDF20.hash` method was actually necessary you'd also expect a similiar method in the `PDF17` class.
The "binary" CMap-format is specific to the PDF.js library, and is used to reduce the size of the built-in CMap data-files.
By moving this code to its own file we can remove the nowadays unnecessary closures, which helps to slightly reduce the size of this code.
Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized
when creating the search query. So to avoid to duplicate the normalization code,
everything is moved in the find controller.
The previous code to normalize text was using NFKC but with a hardcoded map, hence it
has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size
by 30kb).
In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into
account some RTL unicode ranges, the generated font wasn't embedding the mapping this
char and the unicode ranges in the OS/2 table weren't up-to-date.
When normalized some chars can be replaced by several ones and it induced to have
some extra chars in the text layer. To avoid any regression, when copying some text
from the text layer, a copied string is normalized (NFKC) before being put in the
clipboard (it works like this in either Acrobat or Chrome).
This *special* build-target is very old, and was introduced with the first pre-processor that only uses comments to enable/disable code.
When the new pre-processor was added `PRODUCTION` effectively became redundant, at least in JavaScript code, since `typeof PDFJSDev === "undefined"` checks now do the same thing.
This patch proposes that we remove `PRODUCTION` from the JavaScript code, since that simplifies the conditions and thus improves readability in many cases.
*Please note:* There's not, nor has there ever been, any gulp-task that set `PRODUCTION = false` during building.
This method was originally added in PR 1320, eleven years ago, however it doesn't appear to ever have been used (not even from the start).
Furthermore, this method also tries to access a property that doesn't exist (`this.out`) and then call a method that also doesn't exist (`writeByteArray`).