pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	0ff43b27bb	Remove the overflowing text special-case from `scrollIntoView` (issue 15714) With the changes made in PR 14564 this should no longer be necessary now, however we still need to keep the `scrollMatches` parameter to handle textLayers with markedContent correctly when searching.	2022-11-22 11:54:30 +01:00
Jonas Jenwald	748be3f702	Merge pull request #15713 from Snuffleupagus/annotation-no-appearance-cleanup Reduce duplication when creating a fallback appearance for `MarkupAnnotation`s	2022-11-20 17:12:41 +01:00
Jonas Jenwald	2ff9799e7a	Tweak assignment of common parameters in the `Annotation` classes This is slightly more compact, and also unifies the format across the various classes.	2022-11-20 12:29:59 +01:00
Jonas Jenwald	c92de947b6	Reduce duplication when creating a fallback appearance for `MarkupAnnotation`s Currently we repeat the same color-conversion code verbatim in lots of classes, which seems completely unnecessary.	2022-11-20 12:05:25 +01:00
Tim van der Meij	ae7c97aef8	Merge pull request #15710 from Snuffleupagus/issue-10791 Add localization support for the `annotationLayer` reference tests (issue 10791)	2022-11-19 11:25:15 +01:00
Tim van der Meij	d6908ee145	Merge pull request #15701 from Snuffleupagus/move-string-helpers Move some string helper functions to the worker-thread	2022-11-19 11:20:07 +01:00
Tim van der Meij	3d49459d64	Merge pull request #15706 from Snuffleupagus/worker-rm-fn-names Remove unnecessary function names in the `src/core/worker.js` file	2022-11-19 11:14:24 +01:00
Jonas Jenwald	2ff904fb2b	Add localization support for the `annotationLayer` reference tests (issue 10791)	2022-11-18 23:08:11 +01:00
Jonas Jenwald	70d362f22c	Remove an unnecessary variable in `getPdfManager`, in the `src/core/worker.js` file Another tiny piece of clean-up, since adding a `catch`-handler to a Promise shouldn't require an intermediate variable.	2022-11-17 15:31:41 +01:00
Jonas Jenwald	a2a200175f	Remove unnecessary function names in the `src/core/worker.js` file Currently some functions in this file have names while others don't, and in a few cases the names are no longer entirely accurate. For the relevant functions there should really be no need to name them, and if memory serves this was originally done since browsers (many years ago) didn't always handle anonymous functions correctly in stack traces.	2022-11-17 15:12:48 +01:00
Jonas Jenwald	7d029f8bfe	Add a basic `stringToUTF16HexString` unit-test	2022-11-16 12:39:35 +01:00
Jonas Jenwald	9adc7859c8	Move the `escapeString` helper function into the worker-thread Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.	2022-11-16 12:35:48 +01:00
Jonas Jenwald	e5859e145d	Move the `isAscii` helper function into the worker-thread Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.	2022-11-16 12:35:48 +01:00
Jonas Jenwald	2eaa708e3a	Combine the `stringToUTF16String` and `stringToUTF16BEString` helper functions Given that these functions are virtually identical, with the latter only adding a BOM, we can combine the two. Furthermore, since both functions were only used on the worker-thread, there's no reason to duplicate this functionality in both of the `pdf.js` and `pdf.worker.js` files.	2022-11-16 12:35:44 +01:00
Jonas Jenwald	c7d6ab2f71	Merge pull request #15699 from Snuffleupagus/isOffscreenCanvasSupported-Annotation Move the `_isOffscreenCanvasSupported` property to the base `Annotation` class	2022-11-15 17:18:03 +01:00
Jonas Jenwald	f358e76f5b	Move the `_isOffscreenCanvasSupported` property to the base `Annotation` class Having just played around with adding FreeText-annotations and then trying to print, there were `FreeTextAnnotation: OffscreenCanvas is not supported, annotation may not render correctly.` messages printed in the console. The reason for this is that `FreeTextAnnotation` inherits from `MarkupAnnotation`, however only `WidgetAnnotation` actually defines the `_isOffscreenCanvasSupported` property.	2022-11-15 16:30:53 +01:00
Jonas Jenwald	e089d07994	Merge pull request #15698 from Snuffleupagus/DIACRITICS_EXCEPTION_STR-lazy Initialize the find-related `DIACRITICS_EXCEPTION_STR` constant lazily	2022-11-15 14:01:14 +01:00
Jonas Jenwald	176e8f0ddc	Initialize the find-related `DIACRITICS_EXCEPTION_STR` constant lazily Adding some logging with `console.{time, timeEnd}` around all the constant definitions at the top of the `web/pdf_find_controller.js` file, I noticed that computing `DIACRITICS_EXCEPTION_STR` took close to half the total time. My first idea was just to try and make it slightly more efficient, by reducing the amount of iterations and intermediate allocations. However, with this constant only being used during "match diacritics" searches it thus seemed like a good candidate for lazy initialization. Please note: Given that this is a micro optimization, I fully understand if the patch is rejected.	2022-11-15 12:46:16 +01:00
calixteman	859335a1ae	Merge pull request #15694 from calixteman/15690 Normalize fullwidth, halfwidth and circled chars when searching	2022-11-14 21:36:29 +01:00
Calixte Denizet	2be64d63e1	Normalize fullwidth, halfwidth and circled chars when searching	2022-11-14 19:27:51 +01:00
Jonas Jenwald	3078e2c1d9	Merge pull request #15692 from mozilla/dependabot/npm_and_yarn/minimatch-3.1.2 Bump minimatch from 3.0.4 to 3.1.2	2022-11-14 15:56:27 +01:00
Jonas Jenwald	6d7250bfca	Merge pull request #15693 from Snuffleupagus/dependabot-labels Stop Dependabot from creating its own, otherwise unused, labels	2022-11-14 15:19:56 +01:00
Jonas Jenwald	26883c0d7e	Stop Dependabot from creating its own, otherwise unused, labels Currently all Dependabot update PRs get tagged with a "javascript" label, which is annoying since we don't actually use that one. To try and avoid this we specify the labels explicitly, please see https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file#labels	2022-11-14 15:07:55 +01:00
dependabot[bot]	497b32a0a3	Bump minimatch from 3.0.4 to 3.1.2 Bumps [minimatch](https://github.com/isaacs/minimatch) from 3.0.4 to 3.1.2. - [Release notes](https://github.com/isaacs/minimatch/releases) - [Commits](https://github.com/isaacs/minimatch/compare/v3.0.4...v3.1.2) --- updated-dependencies: - dependency-name: minimatch dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2022-11-14 13:54:34 +00:00
Jonas Jenwald	82795a3b81	Merge pull request #15688 from Snuffleupagus/bug-1799927-mask Take the mask-offset into account when rendering repeated image masks (bug 1799927)	2022-11-14 14:49:06 +01:00
Jonas Jenwald	8f676e88fb	Merge pull request #15689 from Snuffleupagus/update-packages Update packages and translations	2022-11-14 14:48:26 +01:00
Jonas Jenwald	b85ce7f761	Update l10n files	2022-11-13 21:32:12 +01:00
Jonas Jenwald	fbcc20adb7	Update npm packages	2022-11-13 21:28:21 +01:00
Jonas Jenwald	3e4caf2e13	Take the mask-offset into account when rendering repeated image masks (bug 1799927) Please note: As usual when I'm working with the `src/display/canvas.js` code I don't really know what I'm doing, but it at least appears to work.	2022-11-13 16:15:30 +01:00
Tim van der Meij	bfe6ff5893	Merge pull request #15686 from Snuffleupagus/findDefaultInlineStreamEnd-assert Change the `assert` in `Parser.findDefaultInlineStreamEnd` to a non-PRODUCTION one	2022-11-13 13:20:03 +01:00
Jonas Jenwald	a1d48e3651	Add a linked test-case for issue 2618 Given that this PDF document is an interesting test-case for performance reasons, w.r.t. inline image caching, it probably can't hurt to add it to the test-suite to make it more readily available. Considering the contents of that PDF document I'm not sure if we can include it directly in the repository, hence why a linked test-case was choosen here.	2022-11-12 16:31:01 +01:00
Jonas Jenwald	d22eb3591e	Change the `assert` in `Parser.findDefaultInlineStreamEnd` to a non-PRODUCTION one Given that this `assert` is only intended to catch any implementation bugs in our code, and not actually to validate the PDF data directly[1], we can avoid making this function call unconditionally. --- [1] In those cases, for example a `FormatError` should have been thrown instead.	2022-11-12 16:30:58 +01:00
Jonas Jenwald	2d1b1e7968	Merge pull request #15682 from Snuffleupagus/constructor-cleanup Some small `AnnotationStorage` and `StatTimer` clean-up	2022-11-11 13:37:49 +01:00
Jonas Jenwald	bab1097db3	Remove the constructor in the `StatTimer` class With modern EcmaScript features, we can define these fields directly instead. Please note that for backwards compatibility purposes they are still public as before, however note that this functionality is disabled by default (see the `pdfBug` API option). Also, we can (slightly) simplify the two loops used in the `toString` method.	2022-11-11 12:31:04 +01:00
Jonas Jenwald	d6cd48e12a	Use actually private fields in the `AnnotationStorage` class These fields were never intended to be public, since modifying them manually would lead to inconsistent state, and with modern EcmaScript features we can now enforce this. Also, this patch removes a couple of JSDoc comments that we generally don't use.	2022-11-11 12:30:02 +01:00
Jonas Jenwald	595711bd7c	Merge pull request #15679 from Snuffleupagus/bug-1799927-2 Use the full inline image as the cacheKey in `Parser.makeInlineImage` (bug 1799927)	2022-11-10 22:54:48 +01:00
calixteman	592d92424e	Merge pull request #15587 from calixteman/save_unicode [Annotation] Fix printing/saving for annotations containing some non-ascii chars and with no fonts to handle them (bug 1666824)	2022-11-10 20:57:34 +01:00
Calixte Denizet	3ca03603c2	[Annotation] Fix printing/saving for annotations containing some non-ascii chars and with no fonts to handle them (bug 1666824) - For text fields * when printing, we generate a fake font which contains some widths computed thanks to an OffscreenCanvas and its method measureText. In order to avoid to have to layout the glyphs ourselves, we just render all of them in one call in the showText method in using the system sans-serif/monospace fonts. * when saving, we continue to create the appearance streams if the fonts contain the char but when a char is missing, we just set, in the AcroForm dict, the flag /NeedAppearances to true and remove the appearance stream. This way, we let the different readers handle the rendering of the strings. - For FreeText annotations * when printing, we use the same trick as for text fields. * there is no need to save an appearance since Acrobat is able to infer one from the Content entry.	2022-11-10 19:05:39 +01:00
Jonas Jenwald	e8ec6af73e	Remove a couple of unnecessary temporary variables in `MurmurHash3_64.hexdigest` These variables are left-over from the initial implementation, back when `String.prototype.padStart` didn't exist and we thus had to pad manually (using a loop).	2022-11-10 18:27:26 +01:00
Jonas Jenwald	7abb6429b0	Initialize the dictionary lazily when parsing inline images This helps improve performance for some PDF documents with a huge number of inline images, e.g. the PDF document from issue 2618. Given that we no longer create `Stream`-instances unconditionally, we also don't need `Dict`-instances for cached inline images (since we only access the filter).	2022-11-10 18:27:26 +01:00
Jonas Jenwald	b46e0d61cf	Use the full inline image as the cacheKey in `Parser.makeInlineImage` (bug 1799927) Please note: This only fixes the "wrong letter" part of bug 1799927. It appears that the simple `computeAdler32` function, used when caching inline images, generates hash collisions for some (very short) TypedArrays. In this case that leads to some of the "letters", which are actually inline images, being rendered incorrectly. Rather than switching to another hashing algorithm, e.g. the `MurmurHash3_64` class, we simply cache using a stringified version of the inline image data as the cacheKey to prevent any future collisions. While this will (naturally) lead to slightly higher peak memory usage, it'll however be limited to the current `Parser`-instance which means that it's not persistent. One small benefit of these changes is that we can avoid creating lots of `Stream`-instances for already cached inline images.	2022-11-10 18:27:26 +01:00
Jonas Jenwald	f7449563ef	Merge pull request #15659 from sxyuan/system-font-name-fix [api-minor] Propagate the translated font name to TextContentItem for system fonts	2022-11-08 21:56:49 +01:00
Samuel Yuan	36fb5c1e2b	Propagate the translated font name to TextContentItems. This allows font data for system fonts to be looked up in the PDFObjects.	2022-11-08 11:16:21 -08:00
Jonas Jenwald	7e5008f0ff	Merge pull request #15665 from Snuffleupagus/Glyph-category [api-minor] Initialize the unicode-category lazily on the `Glyph`-instance	2022-11-05 15:26:57 +01:00
Jonas Jenwald	c8868a1c7a	[api-minor] Initialize the unicode-category lazily on the `Glyph`-instance The purpose of this patch is twofold: - Initialize the unicode-category data lazily during text-extraction, since this is completely unused during general parsing/rendering. - Stop exposing this data in the API, since it's unused on the main-thread and it seems like it was accidentally included. Obviously these changes are API-observable, but hopefully no user is depending on this. Furthermore, it's trivial for a user to re-create this unicode-category data manually with a regular expression (from the exposed `unicode` property).	2022-11-05 10:12:17 +01:00
Jonas Jenwald	26f6f77db6	Merge pull request #15657 from Snuffleupagus/Glyph-normalizedUnicode Cache the normalized unicode-value on the `Glyph`-instance	2022-11-05 09:18:35 +01:00
Jonas Jenwald	0b27d703fa	Merge pull request #15663 from Snuffleupagus/viewer-classes-private-fields Use private fields in a few more viewer classes	2022-11-04 15:51:53 +01:00
Jonas Jenwald	e7a6e7393a	Use private fields in a few more viewer classes These properties were always intended to be private, so let's use modern JS features to actually enforce that.	2022-11-04 15:29:45 +01:00
Jonas Jenwald	c33b8d7692	Cache the normalized unicode-value on the `Glyph`-instance Currently, during text-extraction, we're repeatedly normalizing and (when necessary) reversing the unicode-values every time. This seems a little unnecessary, since the result won't change, hence this patch moves that into the `Glyph`-instance and makes it lazily initialized. Taking the `tracemonkey.pdf` document as an example: When extracting the text-content there's a total of 69236 characters but only 595 unique `Glyph`-instances, which mean a 99.1 percent cache hit-rate. Generally speaking, the longer a PDF document is the more beneficial this should be. Please note: The old code is fast enough that it unfortunately seems difficult to measure a (clear) performance improvement with this patch, so I completely understand if it's deemed an unnecessary change.	2022-11-03 22:36:53 +01:00
Jonas Jenwald	eda51d1dcc	Merge pull request #15613 from Snuffleupagus/issue-15590 [api-minor] Let `Catalog.getAllPageDicts` return an empty dictionary when loading the first /Page fails (issue 15590)	2022-11-03 15:41:39 +01:00

... 9 10 11 12 13 ...

16980 Commits