Commit Graph

5617 Commits

Author SHA1 Message Date
Jonas Jenwald
fa54a58790
Merge pull request #15765 from Snuffleupagus/rm-textLayer-timeout
[api-minor] Remove the TextLayer `timeout` parameter (PR 15742 follow-up)
2022-11-29 21:21:45 +01:00
calixteman
f3206b351f
Merge pull request #15764 from calixteman/15753
[Annotation] Send correctly the updated values to the JS sandbox
2022-11-29 20:04:12 +01:00
Jonas Jenwald
7c25b1b455 [api-minor] Remove the TextLayer timeout parameter (PR 15742 follow-up)
The deprecation is included in the current release, i.e. version `3.1.81`, and given the edge-case nature of this option I really don't think that we need to keep it deprecated for multiple releases.
2022-11-29 19:57:38 +01:00
Calixte Denizet
20fd9099f8 [Annotation] Send correctly the updated values to the JS sandbox 2022-11-29 17:34:06 +01:00
Jonas Jenwald
1f082d3e1d
Merge pull request #15761 from Snuffleupagus/platform
Stop duplicating the `platform` getter in multiple files
2022-11-29 17:32:52 +01:00
Jonas Jenwald
0d648f531b Ignore PDF documents opened from "data:"-URLs when handling internal links (bug 1803050)
This patch has been successfully tested in a local, artifact, Firefox build.

*Please note:* The only thing that'll no longer work for PDF documents opened using "data:"-URLs is middle-clicking on internal/outline links, in order to open the destination in a new tab. This is however an extremely small loss of functionality, and as can be seen in the bug the alternative (i.e. doing nothing) is surely much worse.
2022-11-29 14:08:01 +01:00
Jonas Jenwald
82d127883d Stop duplicating the platform getter in multiple files
Currently both of the `AnnotationElement` and `KeyboardManager` classes contain *identical* `platform` getters, which seems like unnecessary duplication.
With the pre-processor we can also limit the feature-testing to only GENERIC builds, since `navigator` should always be available in browsers.
2022-11-29 12:14:40 +01:00
calixteman
44bc315444
Merge pull request #15758 from calixteman/cleanup_telemetry
[api-minor] Remove all the useless telemetry stuff in the viewer (bug 1802468)
2022-11-28 21:54:00 +01:00
Calixte Denizet
b9cb651c44 [api-minor] Remove all the useless telemetry stuff in the viewer (bug 1802468)
Add a deprecation notification for PDFDocumentLoadingTask.onUnsupportedFeature and PDFDocumentProxy.stats
which are likely useless.
The unsupported feature stuff have initially been added in (#4048) in order to be able to display a
warning bar and to help to have some numbers to know how a feature was used.
Those data are no more used in Firefox.
2022-11-28 20:55:15 +01:00
Calixte Denizet
ae7da6ae48 [JS] By default, a text field value must be treated as a number (bug 1802888) 2022-11-28 16:24:01 +01:00
calixteman
33f9d1aab2
Merge pull request #15755 from calixteman/rounding_printf
[JS] Fix a rounding issue in printf (bug 1802888)
2022-11-28 15:39:36 +01:00
Calixte Denizet
4ee0c83548 [JS] Fix a rounding issue in printf (bug 1802888) 2022-11-28 14:37:15 +01:00
Jonas Jenwald
85f03c0ea4 Slightly modernize the FontLoader.isSyncFontLoadingSupported getter
This is very old code, which is unused (by default) in browsers nowadays since the Font Loading API will always be preferred.
For Node.js environments we use the same constant as elsewhere throughout the code-base, and we can also simplify the Firefox-specific check given that the lowest supported version is `102` (as of this writing).

Finally the old TODO is removed, since the general availability of the Font Loading API has made it redundant.
2022-11-27 12:19:11 +01:00
Jonas Jenwald
aa5b678f94 Add default icons for FileAttachment annotations (bug 1230933)
*Please note:* This "borrows" the icons from Thunderbird.

According to the PDF specification, see https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2096626, we should be providing default icons for FileAttachment annotations without appearances.
2022-11-26 11:24:59 +01:00
Jonas Jenwald
4b02610e8c Re-factor and simplify the getQuadPoints helper function
The use of `Array.prototype.reduce()` is, in my opinion, hurting overall readability since it's not particularly easy to look at the relevant code and immediately understand what's going on here. Furthermore this code leads to strictly speaking unnecessary allocations and parsing, since we could just track the min/max values directly in the relevant loop instead.
2022-11-25 10:40:16 +01:00
Jonas Jenwald
b3e161c328 [api-minor] Deprecate the TextLayer timeout parameter
This has never really been used anywhere within the PDF.js library[1], and when streaming of textContent was introduced this parameter was effectively made redundant.
Note that when streaming of textContent is used, all text-layout has already happened by the time that this `timeout`-functionality is actually invoked (thus making it pointless).
While the `timeout`-functionality may still "work" when the textContent is provided upfront, although it's never been used/tested, streaming will generally perform better (in e.g. a viewer setting).

*Please note:* While unrelated here, also removes a now unused property that I forgot in PR 15259.

---
[1] At least not since the code was moved into its current file, which happened in PR 6619 and landed seven years ago.
2022-11-24 23:08:39 +01:00
Jonas Jenwald
8fda3f04fe
Merge pull request #15732 from Snuffleupagus/issue-15719
Add a fallback for non-embedded *composite* Tahoma fonts (issue 15719)
2022-11-24 19:09:12 +01:00
Jonas Jenwald
d1c01b3164 Add a fallback for non-embedded *composite* Tahoma fonts (issue 15719) 2022-11-23 15:51:18 +01:00
Jonas Jenwald
47682985d3 Add support for Optional Content in TilingPatterns (issue 15716)
This can't be a particularly common feature, since we've supported Optional Content for over two years and this is the very first TilingPattern-case we've seen.
2022-11-23 12:58:00 +01:00
Jonas Jenwald
f3e0f86641 Simplify the getFilenameFromUrl helper function 2022-11-23 11:48:08 +01:00
Jonas Jenwald
0ba242ea4a Support FileAttachments with hash-signs in the filename (issue 15729)
The reason for the issue is that we use the generic `getFilenameFromUrl` helper function, which was originally intended for regular URLs.
For the filenames we're dealing with in FileAttachments, we really only want to strip the path when one exists[1].

---
[1] See [bug 1230933](https://bugzilla.mozilla.org/show_bug.cgi?id=1230933) for an example of such a case.
2022-11-23 10:47:33 +01:00
Jonas Jenwald
2ff9799e7a Tweak assignment of common parameters in the Annotation classes
This is slightly more compact, and also unifies the format across the various classes.
2022-11-20 12:29:59 +01:00
Jonas Jenwald
c92de947b6 Reduce duplication when creating a fallback appearance for MarkupAnnotations
Currently we repeat the same color-conversion code verbatim in lots of classes, which seems completely unnecessary.
2022-11-20 12:05:25 +01:00
Tim van der Meij
d6908ee145
Merge pull request #15701 from Snuffleupagus/move-string-helpers
Move some string helper functions to the worker-thread
2022-11-19 11:20:07 +01:00
Jonas Jenwald
70d362f22c Remove an unnecessary variable in getPdfManager, in the src/core/worker.js file
Another tiny piece of clean-up, since adding a `catch`-handler to a Promise shouldn't require an intermediate variable.
2022-11-17 15:31:41 +01:00
Jonas Jenwald
a2a200175f Remove unnecessary function names in the src/core/worker.js file
Currently *some* functions in this file have names while others don't, and in a few cases the names are no longer entirely accurate.
For the relevant functions there should really be no need to name them, and if memory serves this was originally done since browsers (many years ago) didn't always handle anonymous functions correctly in stack traces.
2022-11-17 15:12:48 +01:00
Jonas Jenwald
9adc7859c8 Move the escapeString helper function into the worker-thread
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.
2022-11-16 12:35:48 +01:00
Jonas Jenwald
e5859e145d Move the isAscii helper function into the worker-thread
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.
2022-11-16 12:35:48 +01:00
Jonas Jenwald
2eaa708e3a Combine the stringToUTF16String and stringToUTF16BEString helper functions
Given that these functions are virtually identical, with the latter only adding a BOM, we can combine the two. Furthermore, since both functions were only used on the worker-thread, there's no reason to duplicate this functionality in both of the `pdf.js` and `pdf.worker.js` files.
2022-11-16 12:35:44 +01:00
Jonas Jenwald
f358e76f5b Move the _isOffscreenCanvasSupported property to the base Annotation class
Having just played around with adding FreeText-annotations and then trying to print, there were `FreeTextAnnotation: OffscreenCanvas is not supported, annotation may not render correctly.` messages printed in the console.
The reason for this is that `FreeTextAnnotation` inherits from `MarkupAnnotation`, however only `WidgetAnnotation` actually defines the `_isOffscreenCanvasSupported` property.
2022-11-15 16:30:53 +01:00
Jonas Jenwald
3e4caf2e13 Take the mask-offset into account when rendering repeated image masks (bug 1799927)
*Please note:* As usual when I'm working with the `src/display/canvas.js` code I don't really know what I'm doing, but it at least *appears* to work.
2022-11-13 16:15:30 +01:00
Jonas Jenwald
d22eb3591e Change the assert in Parser.findDefaultInlineStreamEnd to a non-PRODUCTION one
Given that this `assert` is only intended to catch any implementation bugs in our code, and not actually to validate the PDF data directly[1], we can avoid making this function call unconditionally.

---
[1] In those cases, for example a `FormatError` should have been thrown instead.
2022-11-12 16:30:58 +01:00
Jonas Jenwald
bab1097db3 Remove the constructor in the StatTimer class
With modern EcmaScript features, we can define these fields directly instead. Please note that for backwards compatibility purposes they are still public as before, however note that this functionality is *disabled* by default (see the `pdfBug` API option).
Also, we can (slightly) simplify the two loops used in the `toString` method.
2022-11-11 12:31:04 +01:00
Jonas Jenwald
d6cd48e12a Use actually private fields in the AnnotationStorage class
These fields were never intended to be public, since modifying them manually would lead to inconsistent state, and with modern EcmaScript features we can now enforce this.
Also, this patch removes a couple of JSDoc comments that we generally don't use.
2022-11-11 12:30:02 +01:00
Jonas Jenwald
595711bd7c
Merge pull request #15679 from Snuffleupagus/bug-1799927-2
Use the *full* inline image as the cacheKey in `Parser.makeInlineImage` (bug 1799927)
2022-11-10 22:54:48 +01:00
Calixte Denizet
3ca03603c2 [Annotation] Fix printing/saving for annotations containing some non-ascii chars and with no fonts to handle them (bug 1666824)
- For text fields
 * when printing, we generate a fake font which contains some widths computed thanks to
   an OffscreenCanvas and its method measureText.
   In order to avoid to have to layout the glyphs ourselves, we just render all of them
   in one call in the showText method in using the system sans-serif/monospace fonts.
 * when saving, we continue to create the appearance streams if the fonts contain the char
   but when a char is missing, we just set, in the AcroForm dict, the flag /NeedAppearances
   to true and remove the appearance stream. This way, we let the different readers handle
   the rendering of the strings.
- For FreeText annotations
  * when printing, we use the same trick as for text fields.
  * there is no need to save an appearance since Acrobat is able to infer one from the
    Content entry.
2022-11-10 19:05:39 +01:00
Jonas Jenwald
e8ec6af73e Remove a couple of unnecessary temporary variables in MurmurHash3_64.hexdigest
These variables are left-over from the initial implementation, back when `String.prototype.padStart` didn't exist and we thus had to pad manually (using a loop).
2022-11-10 18:27:26 +01:00
Jonas Jenwald
7abb6429b0 Initialize the dictionary *lazily* when parsing inline images
This helps improve performance for some PDF documents with a huge number of inline images, e.g. the PDF document from issue 2618.
Given that we no longer create `Stream`-instances unconditionally, we also don't need `Dict`-instances for cached inline images (since we only access the filter).
2022-11-10 18:27:26 +01:00
Jonas Jenwald
b46e0d61cf Use the *full* inline image as the cacheKey in Parser.makeInlineImage (bug 1799927)
*Please note:* This only fixes the "wrong letter" part of bug 1799927.

It appears that the simple `computeAdler32` function, used when caching inline images, generates hash collisions for some (very short) TypedArrays. In this case that leads to some of the "letters", which are actually inline images, being rendered incorrectly.
Rather than switching to another hashing algorithm, e.g. the `MurmurHash3_64` class, we simply cache using a stringified version of the inline image data as the cacheKey to prevent any future collisions. While this will (naturally) lead to slightly higher peak memory usage, it'll however be limited to the current `Parser`-instance which means that it's not persistent.

One small benefit of these changes is that we can avoid creating lots of `Stream`-instances for already cached inline images.
2022-11-10 18:27:26 +01:00
Jonas Jenwald
f7449563ef
Merge pull request #15659 from sxyuan/system-font-name-fix
[api-minor] Propagate the translated font name to TextContentItem for system fonts
2022-11-08 21:56:49 +01:00
Samuel Yuan
36fb5c1e2b Propagate the translated font name to TextContentItems.
This allows font data for system fonts to be looked up in the
PDFObjects.
2022-11-08 11:16:21 -08:00
Jonas Jenwald
c8868a1c7a [api-minor] Initialize the unicode-category *lazily* on the Glyph-instance
The purpose of this patch is twofold:
 - Initialize the unicode-category data *lazily* during text-extraction, since this is completely unused during general parsing/rendering.
 - Stop exposing this data in the API, since it's unused on the main-thread and it seems like it was *accidentally* included.

Obviously these changes are API-observable, but hopefully no user is depending on this. Furthermore, it's trivial for a user to re-create this unicode-category data manually with a regular expression (from the exposed `unicode` property).
2022-11-05 10:12:17 +01:00
Jonas Jenwald
c33b8d7692 Cache the normalized unicode-value on the Glyph-instance
Currently, during text-extraction, we're repeatedly normalizing and (when necessary) reversing the unicode-values every time. This seems a little unnecessary, since the result won't change, hence this patch moves that into the `Glyph`-instance and makes it *lazily* initialized.

Taking the `tracemonkey.pdf` document as an example: When extracting the text-content there's a total of 69236 characters but only 595 unique `Glyph`-instances, which mean a 99.1 percent cache hit-rate. Generally speaking, the longer a PDF document is the more beneficial this should be.

*Please note:* The old code is fast enough that it unfortunately seems difficult to measure a (clear) performance improvement with this patch, so I completely understand if it's deemed an unnecessary change.
2022-11-03 22:36:53 +01:00
Jonas Jenwald
23930a249e [api-minor] Let Catalog.getAllPageDicts return an *empty* dictionary when loading the first /Page fails (issue 15590)
In order to support opening certain corrupt PDF documents, particularly hand-edited ones, this patch adds support for letting the `Catalog.getAllPageDicts` method fallback to returning an *empty* dictionary to replace (only) the first /Page of the document.
Given that the viewer cannot initialize/load without access to the first page, this will thus allow e.g. document-level scripting to run as expected. Note that by effectively replacing a corrupt or missing first /Page in this way[1], we'll now render nothing but a *blank* page for certain cases of broken/corrupt PDF documents which may look weird.

*Please note:* This functionality is controlled via the existing `stopAtErrors` option, that can be passed to `getDocument`, since it's easy to imagine use-cases where this sort of fallback behaviour isn't desirable.

---
[1] Currently we still require that a /Pages-dictionary is found though, however it *may* be possible to relax even that assumption if that becomes absolutely necessary in future corrupt documents.
2022-11-03 12:51:48 +01:00
Jonas Jenwald
2516ffa78e Fallback to finding the first "obj" occurrence, when the trailer-dictionary is incomplete (issue 15590)
Note that the "trailer"-case is already a fallback, since normally we're able to use the "xref"-operator even in corrupt documents. However, when a "trailer"-operator is found we still expect "startxref" to exist and be usable in order to advance the stream position. When that's not the case, as happens in the referenced issue, we use a simple fallback to find the first "obj" occurrence instead.

This *partially* fixes issue 15590, since without this patch we fail to find any objects at all during `XRef.indexObjects`. However, note that the PDF document is still corrupt and won't render since there's no actual /Pages-dictionary and the /Root-entry simply points to the /OpenAction-dictionary instead.
2022-11-03 12:46:30 +01:00
Jonas Jenwald
6193537cd3
Merge pull request #15648 from Snuffleupagus/issue-12232
Prevent interaction with form elements in PresentationMode (issue 12232)
2022-10-31 11:14:23 +01:00
calixteman
e42e1cde61
Merge pull request #15615 from calixteman/bug1796741
[Form] Don't use field appearances when /NeedAppearances is set to true (bug 1796741)
2022-10-31 09:58:27 +01:00
Jonas Jenwald
f0811a4a3c Prevent mouse interaction with form elements in PresentationMode (issue 12232) 2022-10-30 21:55:44 +01:00
Jonas Jenwald
caef47a0cf Remove the PdfManager.onLoadedStream method (PR 15616 follow-up)
After the clean-up in PR 15616, the `PdfManager.onLoadedStream` method now only has a single call-site.
Hence why this patch suggests that we remove this method and replace it with an *optional* parameter in `PdfManager.requestLoadedStream` instead. By making the new behaviour opt-in, we'll thus not change any existing call-site.
2022-10-29 14:42:17 +02:00
Jonas Jenwald
8b970109ea
Merge pull request #15632 from Snuffleupagus/issue-15629-2
[api-minor] Move the handling of unbalanced markedContent to the worker-thread (PR 15630 follow-up)
2022-10-29 09:37:07 +02:00