Commit Graph

546 Commits

Author SHA1 Message Date
Calixte Denizet
7851c0da8d [Debugger] Add some info about substitution font
When pdfBug is true, the substitution font is used in the text layer in order
to be able to know what is the font really used thanks to the devtools.
And to be sure that fonts are loaded, the font cache isn't cleaned up when
the debugger is active.
2023-10-09 12:06:33 +02:00
Jonas Jenwald
0ac8f33e13 Ignore optional content with missing /Type-entries
In the rare situation that an optional content dictionary lacks a /Type-entry we currently throw, which may prevent e.g. Form XObjects from rendering completely.

Fixes https://bugs.ghostscript.com/show_bug.cgi?id=707147
2023-09-19 14:11:03 +02:00
Jonas Jenwald
316d1ec5ef Simplify the EvaluatorPreprocessor.opMap getter a little bit
Given that this is a shadowed getter, the `opMap` is already lazily initialized and it shouldn't be necessary to *also* use the `getLookupTableFactory` helper function here. Looking at the history of the code, it seems that this is simply a leftover from before JavaScript classes existed.
2023-09-16 12:26:38 +02:00
Jonas Jenwald
c0fe96b8fe Additional *manual* unicorn/prefer-ternary changes
Not all cases could be automatically fixed, and the changes also triggered a number of `prefer-const` errors that needed to be handled manually.
2023-07-27 09:48:24 +02:00
Jonas Jenwald
674e7ee381 Enable the unicorn/prefer-ternary ESLint plugin rule
To limit the readability impact of these changes, the `only-single-line` option was used; please find additional details at https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-ternary.md

These changes were done automatically, using the `gulp lint --fix` command.
2023-07-27 09:18:26 +02:00
Jonas Jenwald
fee850737b Enable the unicorn/prefer-optional-catch-binding ESLint plugin rule
According to MDN this format is available in all browsers/environments that we currently support, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/try...catch#browser_compatibility

Please also see https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-optional-catch-binding.md
2023-06-12 11:46:11 +02:00
Jonas Jenwald
1f42aaf21b Improve SMask/Mask lookup when parsing inline images
- Don't attempt to lookup an "SM" entry, since we're only using "SMask" in the `PDFImage` code and I also cannot find any mention in the PDF specification about that being a valid abbreviation for a Soft Mask entry. (There's only a `SM = Smoothness Tolerance` Graphics State parameter, which is obviously something completely different.)

 - Don't lookup the /SMask and /Mask entries unless it's actually an inline image, since it's pointless otherwise.

 - Last, but most importantly, only check for the *existence* of /SMask and /Mask entries but don't actually fetch the data. Note that if either one exists it'll contain a Stream, and those cannot be cached on the `XRef`-instance, which leads to unnecessary parsing/allocations and in this case we're not using the actual data for anything.
2023-06-10 13:19:43 +02:00
Jonas Jenwald
459d26edec Improve handling of mismatching /BaseFont and /FontName entries for non-embedded fonts (issue 7454)
This patch is the result of me going through some old issues regarding non-embedded Wingdings support.

There's a few different things wrong in the referenced PDF document:
 - The /BaseFont and /FontName entries don't agree on the name of the fonts, with one font using `/BaseFont /Wingdings-Regular` and `/FontName /wg09np` which obviously makes no sense.
   To address this we'll compare the font-names against our lists of known ones and ignore /FontName entries that don't make sense iff the /BaseFont entry is a known font-name.
 - The non-embedded Wingdings font also set an incorrect /Encoding, in this case /MacRomanEncoding, which should have been fixed by PR 16465. However this doesn't work since the font has *bogus* font-flags, that fail to categorize the font as Symbolic.
   To address this we'll also compare the font-name against the list of known symbol fonts.
2023-06-02 17:10:25 +02:00
Jonas Jenwald
5a7beb9f30 Attempt to improve non-embedded Wingdings font support (bug 1652224)
Now that font-substitution has been implemented, we should be able to do much a better job at supporting non-embedded Wingdings fonts.
Given that this is a Windows-specific font, see https://en.wikipedia.org/wiki/Wingdings, this is however not guaranteed to work (well) on other platforms.
2023-05-24 14:59:13 +02:00
Jonas Jenwald
aeed6f2b67 Ignore named encoding for non-embedded symbol fonts (issue 16464)
The affected font is non-embedded ZapfDingbats, however the PDF document for some inexplicable reason specifies the encoding as "WinAnsiEncoding" (which is obviously wrong).
To work-around this bug in the PDF generator, we'll simply ignore any explicitly specified named encoding for non-embedded symbol fonts.
2023-05-24 10:48:47 +02:00
Calixte Denizet
a76a69e1ed Take into account the final space if any in the TJ command
The final space was just ignored and that led to wrongly position
the next chunk of text.
2023-05-23 17:09:32 +02:00
Jonas Jenwald
8c4821ceda [api-minor] Slightly shorten the marked-content ids used in the textLayer
Generally we try to keep the ids that we create short, hence we can slightly shorten the "static" parts of them.
2023-05-18 22:32:10 +02:00
Calixte Denizet
3091e70aad Flush the current chunk when the font changed because of a restore op (issue #14755) 2023-05-18 19:37:16 +02:00
Jonas Jenwald
4355e76c60 Simplify the fontID handling in PartialEvaluator.loadFont
The `fontID` handling is quite old and predates the use of the `idFactory` to generate a unique id for each font, hence we can simplify this code a little bit.
2023-05-18 13:09:08 +02:00
Tim van der Meij
ac8032628b
Merge pull request #16424 from Snuffleupagus/core-optional-chaining
Introduce more optional chaining in the `src/core/` folder
2023-05-18 12:40:08 +02:00
Jonas Jenwald
bfb374dbf6 Attempt to fallback to a default font, for non-available ones, in more cases (issue 16432)
This essentially extends PR 11218 to also apply when looking up the final font-reference, via the XRef-table, fails because the font isn't available.

This patch also changes `PartialEvaluator.fallbackFontDict` to simply use "Helvetica" as the default font-name, since that seems generally reasonable given the now existing font-substitution code.
2023-05-17 11:41:08 +02:00
Jonas Jenwald
1b4a7c5965 Introduce more optional chaining in the src/core/ folder
After PR 12563 we're now free to use optional chaining in the worker-thread as well. (This patch also fixes one previously "missed" case in the `web/` folder.)

For the MOZCENTRAL build-target this patch reduces the total bundle-size by `1.6` kilobytes.
2023-05-15 12:38:28 +02:00
Calixte Denizet
d4b70ec306 For missing font, use a local font if it exists even if there's no standard substitution
If the font foo is missing we just try lo load local(foo) and maybe
we'll be lucky.
2023-05-13 21:54:27 +02:00
Calixte Denizet
cfb908c999 Add a cache to avoid to load several times a local font
On my computer, it takes few tenths of a second to load a local font.
Since a font can be used several times in a document, the cache will
improve performances.
2023-05-10 20:01:21 +02:00
Calixte Denizet
53134c0c0b [api-minor] Use a local font or fallback on an embedded one (if it exists) for non-embedded fonts (bug 1766039)
- Replace FoxitSans with LiberationSans: LiberationSans is already there (for XFA) and we can use
it as a good replacement of FoxitSans.
- For now we just try to substitue standard fonts, the strategy is the following:
  * we try to find a font locally from a hardcoded list;
  * if it fails then we use Liberation as fallback (only for Helvetica for the moment);
  * else we just fallback on the system serif/sansserif/monospace font.
2023-05-10 14:10:23 +02:00
Jonas Jenwald
667085ee33
Merge pull request #16368 from Snuffleupagus/rm-GlobalImageCache-addPageIndex
Inline the `addPageIndex` method in `GlobalImageCache.shouldCache`
2023-05-04 12:09:04 +02:00
Jonas Jenwald
001acfb5ac
Merge pull request #16381 from Snuffleupagus/rm-isStandardFont-prop
Remove the unused `isStandardFont` font-property (PR 15880 follow-up)
2023-05-04 00:30:05 +02:00
Jonas Jenwald
24a75bda5d Remove the unused isStandardFont font-property (PR 15880 follow-up)
This property was added in PR 12726 specifically for use in the `getFontType` function, indirectly used by the `PDFDocumentProxy.stats` getter in the API.
In PR 15880 that functionality was removed, but I forgot to remove this now unused font-property.
2023-05-03 11:52:54 +02:00
Jonas Jenwald
b0a1af306d Simplify initialization of static class properties in the worker-thread
Now that we no longer depend on the old Babel version in SystemJS we can remove the `static get ...` work-arounds used to define constants, which leads to slightly more compact code.
2023-04-29 13:49:38 +02:00
Jonas Jenwald
d950b91c4e Introduce some logical assignment in the src/core/ folder 2023-04-29 13:49:37 +02:00
Jonas Jenwald
317abd6d07 Change the createPromiseCapability helper function into a PromiseCapability class
This is not only slightly more compact, but it also simplifies the handling of the `settled` getter.
2023-04-29 13:43:24 +02:00
Jonas Jenwald
bb1228cb64 Inline the addPageIndex method in GlobalImageCache.shouldCache
When the `GlobalImageCache` implementation originally landed, back in PR 11912, the image handling was slightly more complex (with e.g. browser-decoding of some JPEG images). At this point it no longer seems necessary to manually handle pageIndexes in this way, and we should be able to simply inline that in the `GlobalImageCache.shouldCache` method.
2023-04-28 09:40:32 +02:00
Tim van der Meij
c9359957e6
Merge pull request #16305 from Snuffleupagus/PDFJSDev-skip-PRODUCTION
Remove the `PRODUCTION` build-target
2023-04-22 14:53:30 +02:00
Calixte Denizet
19ca41896e Correctly clip the text in the text layer (fixes #16316) 2023-04-18 17:00:42 +02:00
Calixte Denizet
117bbf7cd9 [api-minor] Don't normalize the text used in the text layer.
Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized
when creating the search query. So to avoid to duplicate the normalization code,
everything is moved in the find controller.
The previous code to normalize text was using NFKC but with a hardcoded map, hence it
has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size
by 30kb).
In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into
account some RTL unicode ranges, the generated font wasn't embedding the mapping this
char and the unicode ranges in the OS/2 table weren't up-to-date.

When normalized some chars can be replaced by several ones and it induced to have
some extra chars in the text layer. To avoid any regression, when copying some text
from the text layer, a copied string is normalized (NFKC) before being put in the
clipboard (it works like this in either Acrobat or Chrome).
2023-04-17 14:31:23 +02:00
Jonas Jenwald
804aa896a7 Stop using the PRODUCTION build-target in the JavaScript code
This *special* build-target is very old, and was introduced with the first pre-processor that only uses comments to enable/disable code.
When the new pre-processor was added `PRODUCTION` effectively became redundant, at least in JavaScript code, since `typeof PDFJSDev === "undefined"` checks now do the same thing.

This patch proposes that we remove `PRODUCTION` from the JavaScript code, since that simplifies the conditions and thus improves readability in many cases.
*Please note:* There's not, nor has there ever been, any gulp-task that set `PRODUCTION = false` during building.
2023-04-17 12:04:34 +02:00
Jonas Jenwald
3a36a9d337
Merge pull request #16268 from Snuffleupagus/RegionalImageCache
Attempt to also cache images at the "page"-level (issue 16263)
2023-04-11 12:06:29 +02:00
calixteman
c1c372c320
Merge pull request #16225 from calixteman/16224
Thin whitespaces must have their own span
2023-04-11 11:13:16 +02:00
Jonas Jenwald
9881dbf927 Attempt to also cache images at the "page"-level (issue 16263)
Currently we have two separate image-caches on the worker-thread:
 - A local one, which is unique to each `PartialEvaluator.getOperatorList` invocation. This one caches both names *and* references, since image-resources may be accessed in either way.
 - A global one, which applies to the entire PDF documents and all its pages. This one only caches references, since nothing else would work.

This patch introduces a third image-cache, which essentially sits "between" the two existing ones. The new `RegionalImageCache`[1] will be usable throughout a `PartialEvaluator` instance, and consequently it *only* caches references, which thus allows us to keep track of repeated image-resources found in e.g. different /Form and /SMask objects.

---
[1] For lack of a better word, since naming things is hard...
2023-04-10 11:34:41 +02:00
Jonas Jenwald
5063a6f2a9 [api-minor] Remove the disableCombineTextItems option
*Please note:* This parameter has never been used within the PDF.js library/viewer itself, and it was only ever added for backwards compatibility reasons.

This parameter was added in PR 7475, over six years ago, to try and optionally maintain the previous *default* text-extraction behaviour.
However as part of the general text-extraction improvements in PR 13257, almost two years ago, the `disableCombineTextItems` functionality was accidentally "broken" in various ways. Note how the only (very basic) unit-test was updated in a way that doesn't really make sense, since generally speaking you'd expect that using the option should result in *more* (or at least the same number of) text-items. Furthermore there's also the recent issue 16209, where the option causes almost all textContent to be concatenated together.

Hence this patch proposes that we simply remove the `disableCombineTextItems` option since it's essentially unused/untested functionality, as evident from the fact that it took almost two years for someone to notice that it's broken.
2023-03-30 14:23:38 +02:00
Calixte Denizet
4b7eb1436d Thin whitespaces must have their own span 2023-03-29 11:23:58 +02:00
Calixte Denizet
a96f10e55d Create a new chunk when the char is too rised compared to the previouse one 2023-03-28 13:56:46 +02:00
Jonas Jenwald
1fc09f0235 Enable the unicorn/prefer-string-replace-all ESLint plugin rule
Note that the `replaceAll` method still requires that a *global* regular expression is used, however by using this method it's immediately obvious when looking at the code that all occurrences will be replaced; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll#parameters

Please find additional details at https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-string-replace-all.md
2023-03-23 12:57:10 +01:00
Jonas Jenwald
137a2d6e30 Add even more non-standard ligatures (PR 15517 follow-up)
Given that we already create multi-byte ToUnicode entries in other cases, see e.g. the `getNormalizedUnicodes` table, this is hopefully fine.
2023-03-22 10:42:52 +01:00
Jonas Jenwald
d4bcfe8c16 Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176)
Hopefully this makes sense, since we already "create" multi-byte ToUnicode entries in other cases (see e.g. the `getNormalizedUnicodes` table).
2023-03-21 21:35:57 +01:00
Jonas Jenwald
6839f15a32
Merge pull request #16128 from Snuffleupagus/issue-16127
Support (rare) Type3 fonts with Pattern resources (issue 16127)
2023-03-08 12:21:53 +01:00
Jonas Jenwald
e5427ab11b
Merge pull request #16122 from Snuffleupagus/rm-onUnsupportedFeature
[api-minor] Remove the deprecated `onUnsupportedFeature` functionality (PR 15758 follow-up)
2023-03-08 12:16:27 +01:00
Calixte Denizet
e9474f1c84 [api-minor] Add an option to set the max canvas area 2023-03-08 10:37:06 +01:00
Jonas Jenwald
471aef5fc6 Support (rare) Type3 fonts with Pattern resources (issue 16127)
This simply extends the approach in PR 10727 to also cover Patterns, which shouldn't be a common occurrence in Type3 fonts (since this is the first issue we've seen).
2023-03-08 09:20:52 +01:00
Calixte Denizet
b8dda089e2 Slightly modify the max width of a tracking space 2023-03-07 19:38:49 +01:00
Jonas Jenwald
2f3dcc2327 [api-minor] Remove the deprecated onUnsupportedFeature functionality (PR 15758 follow-up)
This was deprecated in PR 15758, which has now been included in three official PDF.js releases.
While PR 15880 did limit the bundle-size impact of this functionality on e.g. the Firefox PDF Viewer, it still leads to some unnecessary "bloat" that these changes remove.
Furthermore, with this being deprecated there'd also be no effort put into e.g. extending the `UNSUPPORTED_FEATURES` list when handling future error cases.
2023-03-07 10:18:43 +01:00
Calixte Denizet
05b0c9d7e6 Render large images even if they're larger than the canvas limits (bug 1720282)
The idea is to encode large image in BMP format (which is very simple and doesn't
require to compute any checksums) and then use createImageBitmap with a BMP blob
(which doesn't suffer of the Canvas/ImageData limits).
From a performance point of view, it isn't crazy (generating a large blob + decoding
it on the main thread is really not ideal) but at least we've something to display
which is a way better than a blank page (and one can notice that most of the time is
spent in decoding the image from the pdf stream).
2023-03-05 14:07:07 +01:00
Calixte Denizet
fd03cd5493 [api-minor] Generate images in the worker instead of the main thread.
We introduced the use of OffscreenCanvas in #14754 and this patch aims
to use them for all kind of images.
It'll slightly improve performances (and maybe slightly decrease memory use).
Since an image can be rendered in using some transfer maps but because of
OffscreenCanvas we don't have the underlying pixels array the transfer maps
stuff is re-implemented in using the SVG filter feComponentTransfer.
2023-03-01 17:40:12 +01:00
Jonas Jenwald
45c332110e
Check OffscreenCanvas support once on the worker-thread
Currently we repeat the `FeatureTest.isOffscreenCanvasSupported` checks all over the worker-thread code, and with upcoming changes this will become even "worse".

Hence this patch, which changes the *worker-thread* default value for the `isOffscreenCanvasSupported`-parameter to `false` and moves the feature-testing into the `BasePdfManager`-constructor.

*Please note:* This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it works correctly.
2023-02-27 12:27:28 +01:00
Calixte Denizet
4e9f26afa3 Ignore position of combining diacritics when getting text (bug 1640217) 2023-02-09 17:13:57 +01:00