Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	3c2c0ecd88	Use the ESLint `arrow-body-style` rule in more spots in `src/core/evaluator.js`	2024-01-21 17:42:33 +01:00
Jonas Jenwald	d1bef8cb86	Use `await` consistently in the `PartialEvaluator.translateFont` method	2024-01-21 17:36:50 +01:00
Jonas Jenwald	fc62eec901	Convert the `handleSetFont` methods, in `src/core/evaluator.js`, to be async	2024-01-21 17:32:05 +01:00
Jonas Jenwald	f9a384d711	Enable the `arrow-body-style` ESLint rule This manually ignores some cases where the resulting auto-formatting would not, as far as I'm concerned, constitute a readability improvement or where we'd just end up with more overall indentation. Please see https://eslint.org/docs/latest/rules/arrow-body-style	2024-01-21 16:20:55 +01:00
Calixte Denizet	405f573d70	Take into account empty lines when extracting text content from the appearance Fixes #17492.	2024-01-14 20:23:29 +01:00
Calixte Denizet	7839e7b495	Preserve the whitespaces when getting text from FreeText annotations (bug 1871353) When the text of an annotation is extracted in using getTextContent, consecutive white spaces are just replaced by one space and. So this patch add an option to make sure that white spaces are preserved when appearance is parsed. For the case where there's no appearance, we can have a fast path to get the correct string from the Content entry. When an existing FreeText is edited, space (0x20) are replaced by non-breakable (0xa0) ones to make to see all of them on screen.	2024-01-05 10:20:32 +01:00
Jonas Jenwald	9f02cc36d4	Attempt to further reduce re-parsing for globally cached images (PR 11912, 16108 follow-up) In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents. However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes. Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to copy a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases. For the PDF document in issue 11878, the rendering time of the second page changes as follows (on my computer): - With the `master`-branch it takes >600 ms to render. - With this patch that goes down to ~50 ms, which is one order of magnitude faster. (Note that all other pages are, as expected, completely unaffected by these changes.) This new main-thread copying is limited to "large" global images, since: - Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue. - With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread. - This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.	2023-12-21 21:26:21 +01:00
Jonas Jenwald	e547b198a3	Compute the length of the final image-bitmap/data on the worker-thread Currently this is done in the API, but moving it into the worker-thread will simplify upcoming changes.	2023-12-21 21:26:21 +01:00
Jonas Jenwald	709d89420e	Re-factor how the `GenericL10n` class fetches localization-data - Re-factor the existing `fetchData` helper function such that it can fetch more types of data, and it now supports "arraybuffer", "json", and "text". This only needed minor adjustments in the `DOMCMapReaderFactory` and `DOMStandardFontDataFactory` classes.[1] - Expose the `fetchData` helper function in the API, such that the viewer is able to access it. - Use the `fetchData` helper function in the `GenericL10n` class, since this should allow fetching of localization-data even if the default viewer is run in an environment without support for the Fetch API. --- [1] While testing this I also noticed a minor inconsistency when handling standard font-data on the worker-thread.	2023-11-14 13:45:14 +01:00
Calixte Denizet	7851c0da8d	[Debugger] Add some info about substitution font When pdfBug is true, the substitution font is used in the text layer in order to be able to know what is the font really used thanks to the devtools. And to be sure that fonts are loaded, the font cache isn't cleaned up when the debugger is active.	2023-10-09 12:06:33 +02:00
Jonas Jenwald	0ac8f33e13	Ignore optional content with missing /Type-entries In the rare situation that an optional content dictionary lacks a /Type-entry we currently throw, which may prevent e.g. Form XObjects from rendering completely. Fixes https://bugs.ghostscript.com/show_bug.cgi?id=707147	2023-09-19 14:11:03 +02:00
Jonas Jenwald	316d1ec5ef	Simplify the `EvaluatorPreprocessor.opMap` getter a little bit Given that this is a shadowed getter, the `opMap` is already lazily initialized and it shouldn't be necessary to also use the `getLookupTableFactory` helper function here. Looking at the history of the code, it seems that this is simply a leftover from before JavaScript classes existed.	2023-09-16 12:26:38 +02:00
Jonas Jenwald	c0fe96b8fe	Additional manual `unicorn/prefer-ternary` changes Not all cases could be automatically fixed, and the changes also triggered a number of `prefer-const` errors that needed to be handled manually.	2023-07-27 09:48:24 +02:00
Jonas Jenwald	674e7ee381	Enable the `unicorn/prefer-ternary` ESLint plugin rule To limit the readability impact of these changes, the `only-single-line` option was used; please find additional details at https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-ternary.md These changes were done automatically, using the `gulp lint --fix` command.	2023-07-27 09:18:26 +02:00
Jonas Jenwald	fee850737b	Enable the `unicorn/prefer-optional-catch-binding` ESLint plugin rule According to MDN this format is available in all browsers/environments that we currently support, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/try...catch#browser_compatibility Please also see https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-optional-catch-binding.md	2023-06-12 11:46:11 +02:00
Jonas Jenwald	1f42aaf21b	Improve SMask/Mask lookup when parsing inline images - Don't attempt to lookup an "SM" entry, since we're only using "SMask" in the `PDFImage` code and I also cannot find any mention in the PDF specification about that being a valid abbreviation for a Soft Mask entry. (There's only a `SM = Smoothness Tolerance` Graphics State parameter, which is obviously something completely different.) - Don't lookup the /SMask and /Mask entries unless it's actually an inline image, since it's pointless otherwise. - Last, but most importantly, only check for the existence of /SMask and /Mask entries but don't actually fetch the data. Note that if either one exists it'll contain a Stream, and those cannot be cached on the `XRef`-instance, which leads to unnecessary parsing/allocations and in this case we're not using the actual data for anything.	2023-06-10 13:19:43 +02:00
Jonas Jenwald	459d26edec	Improve handling of mismatching /BaseFont and /FontName entries for non-embedded fonts (issue 7454) This patch is the result of me going through some old issues regarding non-embedded Wingdings support. There's a few different things wrong in the referenced PDF document: - The /BaseFont and /FontName entries don't agree on the name of the fonts, with one font using `/BaseFont /Wingdings-Regular` and `/FontName /wg09np` which obviously makes no sense. To address this we'll compare the font-names against our lists of known ones and ignore /FontName entries that don't make sense iff the /BaseFont entry is a known font-name. - The non-embedded Wingdings font also set an incorrect /Encoding, in this case /MacRomanEncoding, which should have been fixed by PR 16465. However this doesn't work since the font has bogus font-flags, that fail to categorize the font as Symbolic. To address this we'll also compare the font-name against the list of known symbol fonts.	2023-06-02 17:10:25 +02:00
Jonas Jenwald	5a7beb9f30	Attempt to improve non-embedded Wingdings font support (bug 1652224) Now that font-substitution has been implemented, we should be able to do much a better job at supporting non-embedded Wingdings fonts. Given that this is a Windows-specific font, see https://en.wikipedia.org/wiki/Wingdings, this is however not guaranteed to work (well) on other platforms.	2023-05-24 14:59:13 +02:00
Jonas Jenwald	aeed6f2b67	Ignore named encoding for non-embedded symbol fonts (issue 16464) The affected font is non-embedded ZapfDingbats, however the PDF document for some inexplicable reason specifies the encoding as "WinAnsiEncoding" (which is obviously wrong). To work-around this bug in the PDF generator, we'll simply ignore any explicitly specified named encoding for non-embedded symbol fonts.	2023-05-24 10:48:47 +02:00
Calixte Denizet	a76a69e1ed	Take into account the final space if any in the TJ command The final space was just ignored and that led to wrongly position the next chunk of text.	2023-05-23 17:09:32 +02:00
Jonas Jenwald	8c4821ceda	[api-minor] Slightly shorten the marked-content ids used in the textLayer Generally we try to keep the ids that we create short, hence we can slightly shorten the "static" parts of them.	2023-05-18 22:32:10 +02:00
Calixte Denizet	3091e70aad	Flush the current chunk when the font changed because of a restore op (issue #14755 )	2023-05-18 19:37:16 +02:00
Jonas Jenwald	4355e76c60	Simplify the `fontID` handling in `PartialEvaluator.loadFont` The `fontID` handling is quite old and predates the use of the `idFactory` to generate a unique id for each font, hence we can simplify this code a little bit.	2023-05-18 13:09:08 +02:00
Tim van der Meij	ac8032628b	Merge pull request #16424 from Snuffleupagus/core-optional-chaining Introduce more optional chaining in the `src/core/` folder	2023-05-18 12:40:08 +02:00
Jonas Jenwald	bfb374dbf6	Attempt to fallback to a default font, for non-available ones, in more cases (issue 16432) This essentially extends PR 11218 to also apply when looking up the final font-reference, via the XRef-table, fails because the font isn't available. This patch also changes `PartialEvaluator.fallbackFontDict` to simply use "Helvetica" as the default font-name, since that seems generally reasonable given the now existing font-substitution code.	2023-05-17 11:41:08 +02:00
Jonas Jenwald	1b4a7c5965	Introduce more optional chaining in the `src/core/` folder After PR 12563 we're now free to use optional chaining in the worker-thread as well. (This patch also fixes one previously "missed" case in the `web/` folder.) For the MOZCENTRAL build-target this patch reduces the total bundle-size by `1.6` kilobytes.	2023-05-15 12:38:28 +02:00
Calixte Denizet	d4b70ec306	For missing font, use a local font if it exists even if there's no standard substitution If the font foo is missing we just try lo load local(foo) and maybe we'll be lucky.	2023-05-13 21:54:27 +02:00
Calixte Denizet	cfb908c999	Add a cache to avoid to load several times a local font On my computer, it takes few tenths of a second to load a local font. Since a font can be used several times in a document, the cache will improve performances.	2023-05-10 20:01:21 +02:00
Calixte Denizet	53134c0c0b	[api-minor] Use a local font or fallback on an embedded one (if it exists) for non-embedded fonts (bug 1766039) - Replace FoxitSans with LiberationSans: LiberationSans is already there (for XFA) and we can use it as a good replacement of FoxitSans. - For now we just try to substitue standard fonts, the strategy is the following: * we try to find a font locally from a hardcoded list; * if it fails then we use Liberation as fallback (only for Helvetica for the moment); * else we just fallback on the system serif/sansserif/monospace font.	2023-05-10 14:10:23 +02:00
Jonas Jenwald	667085ee33	Merge pull request #16368 from Snuffleupagus/rm-GlobalImageCache-addPageIndex Inline the `addPageIndex` method in `GlobalImageCache.shouldCache`	2023-05-04 12:09:04 +02:00
Jonas Jenwald	001acfb5ac	Merge pull request #16381 from Snuffleupagus/rm-isStandardFont-prop Remove the unused `isStandardFont` font-property (PR 15880 follow-up)	2023-05-04 00:30:05 +02:00
Jonas Jenwald	24a75bda5d	Remove the unused `isStandardFont` font-property (PR 15880 follow-up) This property was added in PR 12726 specifically for use in the `getFontType` function, indirectly used by the `PDFDocumentProxy.stats` getter in the API. In PR 15880 that functionality was removed, but I forgot to remove this now unused font-property.	2023-05-03 11:52:54 +02:00
Jonas Jenwald	b0a1af306d	Simplify initialization of `static` class properties in the worker-thread Now that we no longer depend on the old Babel version in SystemJS we can remove the `static get ...` work-arounds used to define constants, which leads to slightly more compact code.	2023-04-29 13:49:38 +02:00
Jonas Jenwald	d950b91c4e	Introduce some logical assignment in the `src/core/` folder	2023-04-29 13:49:37 +02:00
Jonas Jenwald	317abd6d07	Change the `createPromiseCapability` helper function into a `PromiseCapability` class This is not only slightly more compact, but it also simplifies the handling of the `settled` getter.	2023-04-29 13:43:24 +02:00
Jonas Jenwald	bb1228cb64	Inline the `addPageIndex` method in `GlobalImageCache.shouldCache` When the `GlobalImageCache` implementation originally landed, back in PR 11912, the image handling was slightly more complex (with e.g. browser-decoding of some JPEG images). At this point it no longer seems necessary to manually handle pageIndexes in this way, and we should be able to simply inline that in the `GlobalImageCache.shouldCache` method.	2023-04-28 09:40:32 +02:00
Tim van der Meij	c9359957e6	Merge pull request #16305 from Snuffleupagus/PDFJSDev-skip-PRODUCTION Remove the `PRODUCTION` build-target	2023-04-22 14:53:30 +02:00
Calixte Denizet	19ca41896e	Correctly clip the text in the text layer (fixes #16316 )	2023-04-18 17:00:42 +02:00
Calixte Denizet	117bbf7cd9	[api-minor] Don't normalize the text used in the text layer. Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized when creating the search query. So to avoid to duplicate the normalization code, everything is moved in the find controller. The previous code to normalize text was using NFKC but with a hardcoded map, hence it has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size by 30kb). In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into account some RTL unicode ranges, the generated font wasn't embedding the mapping this char and the unicode ranges in the OS/2 table weren't up-to-date. When normalized some chars can be replaced by several ones and it induced to have some extra chars in the text layer. To avoid any regression, when copying some text from the text layer, a copied string is normalized (NFKC) before being put in the clipboard (it works like this in either Acrobat or Chrome).	2023-04-17 14:31:23 +02:00
Jonas Jenwald	804aa896a7	Stop using the `PRODUCTION` build-target in the JavaScript code This special build-target is very old, and was introduced with the first pre-processor that only uses comments to enable/disable code. When the new pre-processor was added `PRODUCTION` effectively became redundant, at least in JavaScript code, since `typeof PDFJSDev === "undefined"` checks now do the same thing. This patch proposes that we remove `PRODUCTION` from the JavaScript code, since that simplifies the conditions and thus improves readability in many cases. Please note: There's not, nor has there ever been, any gulp-task that set `PRODUCTION = false` during building.	2023-04-17 12:04:34 +02:00
Jonas Jenwald	3a36a9d337	Merge pull request #16268 from Snuffleupagus/RegionalImageCache Attempt to also cache images at the "page"-level (issue 16263)	2023-04-11 12:06:29 +02:00
calixteman	c1c372c320	Merge pull request #16225 from calixteman/16224 Thin whitespaces must have their own span	2023-04-11 11:13:16 +02:00
Jonas Jenwald	9881dbf927	Attempt to also cache images at the "page"-level (issue 16263) Currently we have two separate image-caches on the worker-thread: - A local one, which is unique to each `PartialEvaluator.getOperatorList` invocation. This one caches both names and references, since image-resources may be accessed in either way. - A global one, which applies to the entire PDF documents and all its pages. This one only caches references, since nothing else would work. This patch introduces a third image-cache, which essentially sits "between" the two existing ones. The new `RegionalImageCache`[1] will be usable throughout a `PartialEvaluator` instance, and consequently it only caches references, which thus allows us to keep track of repeated image-resources found in e.g. different /Form and /SMask objects. --- [1] For lack of a better word, since naming things is hard...	2023-04-10 11:34:41 +02:00
Jonas Jenwald	5063a6f2a9	[api-minor] Remove the `disableCombineTextItems` option Please note: This parameter has never been used within the PDF.js library/viewer itself, and it was only ever added for backwards compatibility reasons. This parameter was added in PR 7475, over six years ago, to try and optionally maintain the previous default text-extraction behaviour. However as part of the general text-extraction improvements in PR 13257, almost two years ago, the `disableCombineTextItems` functionality was accidentally "broken" in various ways. Note how the only (very basic) unit-test was updated in a way that doesn't really make sense, since generally speaking you'd expect that using the option should result in more (or at least the same number of) text-items. Furthermore there's also the recent issue 16209, where the option causes almost all textContent to be concatenated together. Hence this patch proposes that we simply remove the `disableCombineTextItems` option since it's essentially unused/untested functionality, as evident from the fact that it took almost two years for someone to notice that it's broken.	2023-03-30 14:23:38 +02:00
Calixte Denizet	4b7eb1436d	Thin whitespaces must have their own span	2023-03-29 11:23:58 +02:00
Calixte Denizet	a96f10e55d	Create a new chunk when the char is too rised compared to the previouse one	2023-03-28 13:56:46 +02:00
Jonas Jenwald	1fc09f0235	Enable the `unicorn/prefer-string-replace-all` ESLint plugin rule Note that the `replaceAll` method still requires that a global regular expression is used, however by using this method it's immediately obvious when looking at the code that all occurrences will be replaced; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll#parameters Please find additional details at https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-string-replace-all.md	2023-03-23 12:57:10 +01:00
Jonas Jenwald	137a2d6e30	Add even more non-standard ligatures (PR 15517 follow-up) Given that we already create multi-byte ToUnicode entries in other cases, see e.g. the `getNormalizedUnicodes` table, this is hopefully fine.	2023-03-22 10:42:52 +01:00
Jonas Jenwald	d4bcfe8c16	Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176) Hopefully this makes sense, since we already "create" multi-byte ToUnicode entries in other cases (see e.g. the `getNormalizedUnicodes` table).	2023-03-21 21:35:57 +01:00
Jonas Jenwald	6839f15a32	Merge pull request #16128 from Snuffleupagus/issue-16127 Support (rare) Type3 fonts with Pattern resources (issue 16127)	2023-03-08 12:21:53 +01:00

1 2 3 4 5 ...