Commit Graph

16504 Commits

Author SHA1 Message Date
Jonas Jenwald
b85ce7f761 Update l10n files 2022-11-13 21:32:12 +01:00
Jonas Jenwald
fbcc20adb7 Update npm packages 2022-11-13 21:28:21 +01:00
Jonas Jenwald
3e4caf2e13 Take the mask-offset into account when rendering repeated image masks (bug 1799927)
*Please note:* As usual when I'm working with the `src/display/canvas.js` code I don't really know what I'm doing, but it at least *appears* to work.
2022-11-13 16:15:30 +01:00
Tim van der Meij
bfe6ff5893
Merge pull request #15686 from Snuffleupagus/findDefaultInlineStreamEnd-assert
Change the `assert` in `Parser.findDefaultInlineStreamEnd` to a non-PRODUCTION one
2022-11-13 13:20:03 +01:00
Jonas Jenwald
a1d48e3651 Add a *linked* test-case for issue 2618
Given that this PDF document is an interesting test-case for performance reasons, w.r.t. inline image caching, it probably can't hurt to add it to the test-suite to make it more readily available.
Considering the contents of that PDF document I'm not sure if we can include it directly in the repository, hence why a *linked* test-case was choosen here.
2022-11-12 16:31:01 +01:00
Jonas Jenwald
d22eb3591e Change the assert in Parser.findDefaultInlineStreamEnd to a non-PRODUCTION one
Given that this `assert` is only intended to catch any implementation bugs in our code, and not actually to validate the PDF data directly[1], we can avoid making this function call unconditionally.

---
[1] In those cases, for example a `FormatError` should have been thrown instead.
2022-11-12 16:30:58 +01:00
Jonas Jenwald
2d1b1e7968
Merge pull request #15682 from Snuffleupagus/constructor-cleanup
Some small `AnnotationStorage` and `StatTimer` clean-up
2022-11-11 13:37:49 +01:00
Jonas Jenwald
bab1097db3 Remove the constructor in the StatTimer class
With modern EcmaScript features, we can define these fields directly instead. Please note that for backwards compatibility purposes they are still public as before, however note that this functionality is *disabled* by default (see the `pdfBug` API option).
Also, we can (slightly) simplify the two loops used in the `toString` method.
2022-11-11 12:31:04 +01:00
Jonas Jenwald
d6cd48e12a Use actually private fields in the AnnotationStorage class
These fields were never intended to be public, since modifying them manually would lead to inconsistent state, and with modern EcmaScript features we can now enforce this.
Also, this patch removes a couple of JSDoc comments that we generally don't use.
2022-11-11 12:30:02 +01:00
Jonas Jenwald
595711bd7c
Merge pull request #15679 from Snuffleupagus/bug-1799927-2
Use the *full* inline image as the cacheKey in `Parser.makeInlineImage` (bug 1799927)
2022-11-10 22:54:48 +01:00
calixteman
592d92424e
Merge pull request #15587 from calixteman/save_unicode
[Annotation] Fix printing/saving for annotations containing some non-ascii chars and with no fonts to handle them (bug 1666824)
2022-11-10 20:57:34 +01:00
Calixte Denizet
3ca03603c2 [Annotation] Fix printing/saving for annotations containing some non-ascii chars and with no fonts to handle them (bug 1666824)
- For text fields
 * when printing, we generate a fake font which contains some widths computed thanks to
   an OffscreenCanvas and its method measureText.
   In order to avoid to have to layout the glyphs ourselves, we just render all of them
   in one call in the showText method in using the system sans-serif/monospace fonts.
 * when saving, we continue to create the appearance streams if the fonts contain the char
   but when a char is missing, we just set, in the AcroForm dict, the flag /NeedAppearances
   to true and remove the appearance stream. This way, we let the different readers handle
   the rendering of the strings.
- For FreeText annotations
  * when printing, we use the same trick as for text fields.
  * there is no need to save an appearance since Acrobat is able to infer one from the
    Content entry.
2022-11-10 19:05:39 +01:00
Jonas Jenwald
e8ec6af73e Remove a couple of unnecessary temporary variables in MurmurHash3_64.hexdigest
These variables are left-over from the initial implementation, back when `String.prototype.padStart` didn't exist and we thus had to pad manually (using a loop).
2022-11-10 18:27:26 +01:00
Jonas Jenwald
7abb6429b0 Initialize the dictionary *lazily* when parsing inline images
This helps improve performance for some PDF documents with a huge number of inline images, e.g. the PDF document from issue 2618.
Given that we no longer create `Stream`-instances unconditionally, we also don't need `Dict`-instances for cached inline images (since we only access the filter).
2022-11-10 18:27:26 +01:00
Jonas Jenwald
b46e0d61cf Use the *full* inline image as the cacheKey in Parser.makeInlineImage (bug 1799927)
*Please note:* This only fixes the "wrong letter" part of bug 1799927.

It appears that the simple `computeAdler32` function, used when caching inline images, generates hash collisions for some (very short) TypedArrays. In this case that leads to some of the "letters", which are actually inline images, being rendered incorrectly.
Rather than switching to another hashing algorithm, e.g. the `MurmurHash3_64` class, we simply cache using a stringified version of the inline image data as the cacheKey to prevent any future collisions. While this will (naturally) lead to slightly higher peak memory usage, it'll however be limited to the current `Parser`-instance which means that it's not persistent.

One small benefit of these changes is that we can avoid creating lots of `Stream`-instances for already cached inline images.
2022-11-10 18:27:26 +01:00
Jonas Jenwald
f7449563ef
Merge pull request #15659 from sxyuan/system-font-name-fix
[api-minor] Propagate the translated font name to TextContentItem for system fonts
2022-11-08 21:56:49 +01:00
Samuel Yuan
36fb5c1e2b Propagate the translated font name to TextContentItems.
This allows font data for system fonts to be looked up in the
PDFObjects.
2022-11-08 11:16:21 -08:00
Jonas Jenwald
7e5008f0ff
Merge pull request #15665 from Snuffleupagus/Glyph-category
[api-minor] Initialize the unicode-category *lazily* on the `Glyph`-instance
2022-11-05 15:26:57 +01:00
Jonas Jenwald
c8868a1c7a [api-minor] Initialize the unicode-category *lazily* on the Glyph-instance
The purpose of this patch is twofold:
 - Initialize the unicode-category data *lazily* during text-extraction, since this is completely unused during general parsing/rendering.
 - Stop exposing this data in the API, since it's unused on the main-thread and it seems like it was *accidentally* included.

Obviously these changes are API-observable, but hopefully no user is depending on this. Furthermore, it's trivial for a user to re-create this unicode-category data manually with a regular expression (from the exposed `unicode` property).
2022-11-05 10:12:17 +01:00
Jonas Jenwald
26f6f77db6
Merge pull request #15657 from Snuffleupagus/Glyph-normalizedUnicode
Cache the normalized unicode-value on the `Glyph`-instance
2022-11-05 09:18:35 +01:00
Jonas Jenwald
0b27d703fa
Merge pull request #15663 from Snuffleupagus/viewer-classes-private-fields
Use private fields in a few more viewer classes
2022-11-04 15:51:53 +01:00
Jonas Jenwald
e7a6e7393a Use private fields in a few more viewer classes
These properties were always intended to be *private*, so let's use modern JS features to actually enforce that.
2022-11-04 15:29:45 +01:00
Jonas Jenwald
c33b8d7692 Cache the normalized unicode-value on the Glyph-instance
Currently, during text-extraction, we're repeatedly normalizing and (when necessary) reversing the unicode-values every time. This seems a little unnecessary, since the result won't change, hence this patch moves that into the `Glyph`-instance and makes it *lazily* initialized.

Taking the `tracemonkey.pdf` document as an example: When extracting the text-content there's a total of 69236 characters but only 595 unique `Glyph`-instances, which mean a 99.1 percent cache hit-rate. Generally speaking, the longer a PDF document is the more beneficial this should be.

*Please note:* The old code is fast enough that it unfortunately seems difficult to measure a (clear) performance improvement with this patch, so I completely understand if it's deemed an unnecessary change.
2022-11-03 22:36:53 +01:00
Jonas Jenwald
eda51d1dcc
Merge pull request #15613 from Snuffleupagus/issue-15590
[api-minor] Let `Catalog.getAllPageDicts` return an *empty*  dictionary when loading the first /Page fails (issue 15590)
2022-11-03 15:41:39 +01:00
Jonas Jenwald
23930a249e [api-minor] Let Catalog.getAllPageDicts return an *empty* dictionary when loading the first /Page fails (issue 15590)
In order to support opening certain corrupt PDF documents, particularly hand-edited ones, this patch adds support for letting the `Catalog.getAllPageDicts` method fallback to returning an *empty* dictionary to replace (only) the first /Page of the document.
Given that the viewer cannot initialize/load without access to the first page, this will thus allow e.g. document-level scripting to run as expected. Note that by effectively replacing a corrupt or missing first /Page in this way[1], we'll now render nothing but a *blank* page for certain cases of broken/corrupt PDF documents which may look weird.

*Please note:* This functionality is controlled via the existing `stopAtErrors` option, that can be passed to `getDocument`, since it's easy to imagine use-cases where this sort of fallback behaviour isn't desirable.

---
[1] Currently we still require that a /Pages-dictionary is found though, however it *may* be possible to relax even that assumption if that becomes absolutely necessary in future corrupt documents.
2022-11-03 12:51:48 +01:00
Jonas Jenwald
2516ffa78e Fallback to finding the first "obj" occurrence, when the trailer-dictionary is incomplete (issue 15590)
Note that the "trailer"-case is already a fallback, since normally we're able to use the "xref"-operator even in corrupt documents. However, when a "trailer"-operator is found we still expect "startxref" to exist and be usable in order to advance the stream position. When that's not the case, as happens in the referenced issue, we use a simple fallback to find the first "obj" occurrence instead.

This *partially* fixes issue 15590, since without this patch we fail to find any objects at all during `XRef.indexObjects`. However, note that the PDF document is still corrupt and won't render since there's no actual /Pages-dictionary and the /Root-entry simply points to the /OpenAction-dictionary instead.
2022-11-03 12:46:30 +01:00
Jonas Jenwald
2ae90f9615
Merge pull request #15655 from tamuratak/move_canvas_to_optionaldeps
Move canvas to optionalDependencies
2022-11-02 09:49:04 +01:00
Takashi Tamura
0bb478cb23 Move canvas to optionalDependencies, which enables npm to continue installing pdfjs-dist
even if the installation of canvas fails. Close #15652
2022-11-02 08:33:31 +09:00
Jonas Jenwald
6193537cd3
Merge pull request #15648 from Snuffleupagus/issue-12232
Prevent interaction with form elements in PresentationMode (issue 12232)
2022-10-31 11:14:23 +01:00
calixteman
e42e1cde61
Merge pull request #15615 from calixteman/bug1796741
[Form] Don't use field appearances when /NeedAppearances is set to true (bug 1796741)
2022-10-31 09:58:27 +01:00
Jonas Jenwald
bc4e5e39ff
Merge pull request #15649 from SpartanApple/patch-1
Changed link for "Gulp's getting started guide"
2022-10-31 09:20:18 +01:00
Mitchell Gale
8d147b993f
Changed link for "Gulp's getting started guide"
Gulp's getting started guide changed location to https://github.com/gulpjs/gulp/tree/master/docs/getting-started. Link updated in readme to reflect that.
2022-10-30 15:30:42 -07:00
Jonas Jenwald
547556b5b2 Prevent keyboard interaction with form elements in PresentationMode (issue 12232)
This uses the relatively new `HTMLElement.inert` property, see https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/inert for additional information. The only "problem" is that this isn't yet available in all Firefox channels, but until https://bugzilla.mozilla.org/show_bug.cgi?id=1764263 is fixed we're no worse off than before.
2022-10-30 21:57:55 +01:00
Jonas Jenwald
f0811a4a3c Prevent mouse interaction with form elements in PresentationMode (issue 12232) 2022-10-30 21:55:44 +01:00
Tim van der Meij
c059c13785
Merge pull request #15643 from timvandermeij/bump
Bump versions in `pdfjs.config`
2022-10-29 20:11:04 +02:00
Tim van der Meij
ab136c5c39
Bump versions in pdfjs.config 2022-10-29 20:04:37 +02:00
Tim van der Meij
d0823066cc
Merge pull request #15642 from mozilla/dependabot/npm_and_yarn/minimist-and-minimist-1.2.6
Bump minimist
2022-10-29 19:19:40 +02:00
dependabot[bot]
131819a15c
Bump minimist
Bumps [minimist](https://github.com/minimistjs/minimist) and [minimist](https://github.com/minimistjs/minimist). These dependencies needed to be updated together.

Updates `minimist` from 1.2.0 to 1.2.6
- [Release notes](https://github.com/minimistjs/minimist/releases)
- [Changelog](https://github.com/minimistjs/minimist/blob/main/CHANGELOG.md)
- [Commits](https://github.com/minimistjs/minimist/compare/v1.2.0...v1.2.6)

Updates `minimist` from 1.2.5 to 1.2.6
- [Release notes](https://github.com/minimistjs/minimist/releases)
- [Changelog](https://github.com/minimistjs/minimist/blob/main/CHANGELOG.md)
- [Commits](https://github.com/minimistjs/minimist/compare/v1.2.0...v1.2.6)

---
updated-dependencies:
- dependency-name: minimist
  dependency-type: indirect
- dependency-name: minimist
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-29 17:14:12 +00:00
Tim van der Meij
b74fbdeda7
Merge pull request #15640 from Snuffleupagus/update-packages
Update packages and translations
2022-10-29 19:12:29 +02:00
Tim van der Meij
eeca44d162
Merge pull request #15641 from Snuffleupagus/rm-PdfManager-onLoadedStream
Remove the `PdfManager.onLoadedStream` method (PR 15616 follow-up)
2022-10-29 19:09:35 +02:00
Jonas Jenwald
caef47a0cf Remove the PdfManager.onLoadedStream method (PR 15616 follow-up)
After the clean-up in PR 15616, the `PdfManager.onLoadedStream` method now only has a single call-site.
Hence why this patch suggests that we remove this method and replace it with an *optional* parameter in `PdfManager.requestLoadedStream` instead. By making the new behaviour opt-in, we'll thus not change any existing call-site.
2022-10-29 14:42:17 +02:00
Jonas Jenwald
5b46400240
Merge pull request #15633 from calixteman/cursors
[Editor] Change the cursor icons
2022-10-29 12:24:10 +02:00
Calixte Denizet
67778eac60 [Editor] Change the cursor icons 2022-10-29 12:05:09 +02:00
Jonas Jenwald
571a986496 Update l10n files 2022-10-29 11:34:45 +02:00
Jonas Jenwald
f6746854ac Update npm packages 2022-10-29 11:34:43 +02:00
Jonas Jenwald
8b970109ea
Merge pull request #15632 from Snuffleupagus/issue-15629-2
[api-minor] Move the handling of unbalanced markedContent to the worker-thread (PR 15630 follow-up)
2022-10-29 09:37:07 +02:00
calixteman
8f80efa4ab
Merge pull request #15618 from calixteman/15614
[JS] Some functions (print, alert,...) must be called only after a user activation
2022-10-28 21:04:42 +02:00
Calixte Denizet
0de804a256 [JS] Some functions (print, alert,...) must be called only after a user activation
- Some events, which require a user interaction, will allow those functions to be called.
But after few seconds, if there are no more user interaction, it won't be possible
anymore.
The idea is to give an opportunity to the user to leave the pdf.
- Disable print function when we're printing, the same with saving and disallow to save
on open events.
2022-10-28 18:52:07 +02:00
Jonas Jenwald
a7232339d8
Merge pull request #15637 from Snuffleupagus/Array-from-map
Combine `Array.from` and `Array.prototype.map` calls
2022-10-28 18:29:02 +02:00
Jonas Jenwald
ba05e47b3e Combine Array.from and Array.prototype.map calls
This isn't just a tiny bit more compact, but it also avoids an intermediate allocation; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/from#description
2022-10-28 13:46:30 +02:00