pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	fa54a58790	Merge pull request #15765 from Snuffleupagus/rm-textLayer-timeout [api-minor] Remove the TextLayer `timeout` parameter (PR 15742 follow-up)	2022-11-29 21:21:45 +01:00
calixteman	f3206b351f	Merge pull request #15764 from calixteman/15753 [Annotation] Send correctly the updated values to the JS sandbox	2022-11-29 20:04:12 +01:00
Jonas Jenwald	7c25b1b455	[api-minor] Remove the TextLayer `timeout` parameter (PR 15742 follow-up) The deprecation is included in the current release, i.e. version `3.1.81`, and given the edge-case nature of this option I really don't think that we need to keep it deprecated for multiple releases.	2022-11-29 19:57:38 +01:00
Calixte Denizet	20fd9099f8	[Annotation] Send correctly the updated values to the JS sandbox	2022-11-29 17:34:06 +01:00
Jonas Jenwald	1f082d3e1d	Merge pull request #15761 from Snuffleupagus/platform Stop duplicating the `platform` getter in multiple files	2022-11-29 17:32:52 +01:00
Jonas Jenwald	0d648f531b	Ignore PDF documents opened from "data:"-URLs when handling internal links (bug 1803050) This patch has been successfully tested in a local, artifact, Firefox build. Please note: The only thing that'll no longer work for PDF documents opened using "data:"-URLs is middle-clicking on internal/outline links, in order to open the destination in a new tab. This is however an extremely small loss of functionality, and as can be seen in the bug the alternative (i.e. doing nothing) is surely much worse.	2022-11-29 14:08:01 +01:00
Jonas Jenwald	82d127883d	Stop duplicating the `platform` getter in multiple files Currently both of the `AnnotationElement` and `KeyboardManager` classes contain identical `platform` getters, which seems like unnecessary duplication. With the pre-processor we can also limit the feature-testing to only GENERIC builds, since `navigator` should always be available in browsers.	2022-11-29 12:14:40 +01:00
calixteman	44bc315444	Merge pull request #15758 from calixteman/cleanup_telemetry [api-minor] Remove all the useless telemetry stuff in the viewer (bug 1802468)	2022-11-28 21:54:00 +01:00
Calixte Denizet	b9cb651c44	[api-minor] Remove all the useless telemetry stuff in the viewer (bug 1802468) Add a deprecation notification for PDFDocumentLoadingTask.onUnsupportedFeature and PDFDocumentProxy.stats which are likely useless. The unsupported feature stuff have initially been added in (#4048) in order to be able to display a warning bar and to help to have some numbers to know how a feature was used. Those data are no more used in Firefox.	2022-11-28 20:55:15 +01:00
Calixte Denizet	ae7da6ae48	[JS] By default, a text field value must be treated as a number (bug 1802888)	2022-11-28 16:24:01 +01:00
calixteman	33f9d1aab2	Merge pull request #15755 from calixteman/rounding_printf [JS] Fix a rounding issue in printf (bug 1802888)	2022-11-28 15:39:36 +01:00
Calixte Denizet	4ee0c83548	[JS] Fix a rounding issue in printf (bug 1802888)	2022-11-28 14:37:15 +01:00
Jonas Jenwald	85f03c0ea4	Slightly modernize the `FontLoader.isSyncFontLoadingSupported` getter This is very old code, which is unused (by default) in browsers nowadays since the Font Loading API will always be preferred. For Node.js environments we use the same constant as elsewhere throughout the code-base, and we can also simplify the Firefox-specific check given that the lowest supported version is `102` (as of this writing). Finally the old TODO is removed, since the general availability of the Font Loading API has made it redundant.	2022-11-27 12:19:11 +01:00
Jonas Jenwald	aa5b678f94	Add default icons for FileAttachment annotations (bug 1230933) Please note: This "borrows" the icons from Thunderbird. According to the PDF specification, see https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2096626, we should be providing default icons for FileAttachment annotations without appearances.	2022-11-26 11:24:59 +01:00
Jonas Jenwald	4b02610e8c	Re-factor and simplify the `getQuadPoints` helper function The use of `Array.prototype.reduce()` is, in my opinion, hurting overall readability since it's not particularly easy to look at the relevant code and immediately understand what's going on here. Furthermore this code leads to strictly speaking unnecessary allocations and parsing, since we could just track the min/max values directly in the relevant loop instead.	2022-11-25 10:40:16 +01:00
Jonas Jenwald	b3e161c328	[api-minor] Deprecate the TextLayer `timeout` parameter This has never really been used anywhere within the PDF.js library[1], and when streaming of textContent was introduced this parameter was effectively made redundant. Note that when streaming of textContent is used, all text-layout has already happened by the time that this `timeout`-functionality is actually invoked (thus making it pointless). While the `timeout`-functionality may still "work" when the textContent is provided upfront, although it's never been used/tested, streaming will generally perform better (in e.g. a viewer setting). Please note: While unrelated here, also removes a now unused property that I forgot in PR 15259. --- [1] At least not since the code was moved into its current file, which happened in PR 6619 and landed seven years ago.	2022-11-24 23:08:39 +01:00
Jonas Jenwald	8fda3f04fe	Merge pull request #15732 from Snuffleupagus/issue-15719 Add a fallback for non-embedded composite Tahoma fonts (issue 15719)	2022-11-24 19:09:12 +01:00
Jonas Jenwald	d1c01b3164	Add a fallback for non-embedded composite Tahoma fonts (issue 15719)	2022-11-23 15:51:18 +01:00
Jonas Jenwald	47682985d3	Add support for Optional Content in TilingPatterns (issue 15716) This can't be a particularly common feature, since we've supported Optional Content for over two years and this is the very first TilingPattern-case we've seen.	2022-11-23 12:58:00 +01:00
Jonas Jenwald	f3e0f86641	Simplify the `getFilenameFromUrl` helper function	2022-11-23 11:48:08 +01:00
Jonas Jenwald	0ba242ea4a	Support FileAttachments with hash-signs in the filename (issue 15729) The reason for the issue is that we use the generic `getFilenameFromUrl` helper function, which was originally intended for regular URLs. For the filenames we're dealing with in FileAttachments, we really only want to strip the path when one exists[1]. --- [1] See [bug 1230933](https://bugzilla.mozilla.org/show_bug.cgi?id=1230933) for an example of such a case.	2022-11-23 10:47:33 +01:00
Jonas Jenwald	2ff9799e7a	Tweak assignment of common parameters in the `Annotation` classes This is slightly more compact, and also unifies the format across the various classes.	2022-11-20 12:29:59 +01:00
Jonas Jenwald	c92de947b6	Reduce duplication when creating a fallback appearance for `MarkupAnnotation`s Currently we repeat the same color-conversion code verbatim in lots of classes, which seems completely unnecessary.	2022-11-20 12:05:25 +01:00
Tim van der Meij	d6908ee145	Merge pull request #15701 from Snuffleupagus/move-string-helpers Move some string helper functions to the worker-thread	2022-11-19 11:20:07 +01:00
Jonas Jenwald	70d362f22c	Remove an unnecessary variable in `getPdfManager`, in the `src/core/worker.js` file Another tiny piece of clean-up, since adding a `catch`-handler to a Promise shouldn't require an intermediate variable.	2022-11-17 15:31:41 +01:00
Jonas Jenwald	a2a200175f	Remove unnecessary function names in the `src/core/worker.js` file Currently some functions in this file have names while others don't, and in a few cases the names are no longer entirely accurate. For the relevant functions there should really be no need to name them, and if memory serves this was originally done since browsers (many years ago) didn't always handle anonymous functions correctly in stack traces.	2022-11-17 15:12:48 +01:00
Jonas Jenwald	9adc7859c8	Move the `escapeString` helper function into the worker-thread Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.	2022-11-16 12:35:48 +01:00
Jonas Jenwald	e5859e145d	Move the `isAscii` helper function into the worker-thread Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.	2022-11-16 12:35:48 +01:00
Jonas Jenwald	2eaa708e3a	Combine the `stringToUTF16String` and `stringToUTF16BEString` helper functions Given that these functions are virtually identical, with the latter only adding a BOM, we can combine the two. Furthermore, since both functions were only used on the worker-thread, there's no reason to duplicate this functionality in both of the `pdf.js` and `pdf.worker.js` files.	2022-11-16 12:35:44 +01:00
Jonas Jenwald	f358e76f5b	Move the `_isOffscreenCanvasSupported` property to the base `Annotation` class Having just played around with adding FreeText-annotations and then trying to print, there were `FreeTextAnnotation: OffscreenCanvas is not supported, annotation may not render correctly.` messages printed in the console. The reason for this is that `FreeTextAnnotation` inherits from `MarkupAnnotation`, however only `WidgetAnnotation` actually defines the `_isOffscreenCanvasSupported` property.	2022-11-15 16:30:53 +01:00
Jonas Jenwald	3e4caf2e13	Take the mask-offset into account when rendering repeated image masks (bug 1799927) Please note: As usual when I'm working with the `src/display/canvas.js` code I don't really know what I'm doing, but it at least appears to work.	2022-11-13 16:15:30 +01:00
Jonas Jenwald	d22eb3591e	Change the `assert` in `Parser.findDefaultInlineStreamEnd` to a non-PRODUCTION one Given that this `assert` is only intended to catch any implementation bugs in our code, and not actually to validate the PDF data directly[1], we can avoid making this function call unconditionally. --- [1] In those cases, for example a `FormatError` should have been thrown instead.	2022-11-12 16:30:58 +01:00
Jonas Jenwald	bab1097db3	Remove the constructor in the `StatTimer` class With modern EcmaScript features, we can define these fields directly instead. Please note that for backwards compatibility purposes they are still public as before, however note that this functionality is disabled by default (see the `pdfBug` API option). Also, we can (slightly) simplify the two loops used in the `toString` method.	2022-11-11 12:31:04 +01:00
Jonas Jenwald	d6cd48e12a	Use actually private fields in the `AnnotationStorage` class These fields were never intended to be public, since modifying them manually would lead to inconsistent state, and with modern EcmaScript features we can now enforce this. Also, this patch removes a couple of JSDoc comments that we generally don't use.	2022-11-11 12:30:02 +01:00
Jonas Jenwald	595711bd7c	Merge pull request #15679 from Snuffleupagus/bug-1799927-2 Use the full inline image as the cacheKey in `Parser.makeInlineImage` (bug 1799927)	2022-11-10 22:54:48 +01:00
Calixte Denizet	3ca03603c2	[Annotation] Fix printing/saving for annotations containing some non-ascii chars and with no fonts to handle them (bug 1666824) - For text fields * when printing, we generate a fake font which contains some widths computed thanks to an OffscreenCanvas and its method measureText. In order to avoid to have to layout the glyphs ourselves, we just render all of them in one call in the showText method in using the system sans-serif/monospace fonts. * when saving, we continue to create the appearance streams if the fonts contain the char but when a char is missing, we just set, in the AcroForm dict, the flag /NeedAppearances to true and remove the appearance stream. This way, we let the different readers handle the rendering of the strings. - For FreeText annotations * when printing, we use the same trick as for text fields. * there is no need to save an appearance since Acrobat is able to infer one from the Content entry.	2022-11-10 19:05:39 +01:00
Jonas Jenwald	e8ec6af73e	Remove a couple of unnecessary temporary variables in `MurmurHash3_64.hexdigest` These variables are left-over from the initial implementation, back when `String.prototype.padStart` didn't exist and we thus had to pad manually (using a loop).	2022-11-10 18:27:26 +01:00
Jonas Jenwald	7abb6429b0	Initialize the dictionary lazily when parsing inline images This helps improve performance for some PDF documents with a huge number of inline images, e.g. the PDF document from issue 2618. Given that we no longer create `Stream`-instances unconditionally, we also don't need `Dict`-instances for cached inline images (since we only access the filter).	2022-11-10 18:27:26 +01:00
Jonas Jenwald	b46e0d61cf	Use the full inline image as the cacheKey in `Parser.makeInlineImage` (bug 1799927) Please note: This only fixes the "wrong letter" part of bug 1799927. It appears that the simple `computeAdler32` function, used when caching inline images, generates hash collisions for some (very short) TypedArrays. In this case that leads to some of the "letters", which are actually inline images, being rendered incorrectly. Rather than switching to another hashing algorithm, e.g. the `MurmurHash3_64` class, we simply cache using a stringified version of the inline image data as the cacheKey to prevent any future collisions. While this will (naturally) lead to slightly higher peak memory usage, it'll however be limited to the current `Parser`-instance which means that it's not persistent. One small benefit of these changes is that we can avoid creating lots of `Stream`-instances for already cached inline images.	2022-11-10 18:27:26 +01:00
Jonas Jenwald	f7449563ef	Merge pull request #15659 from sxyuan/system-font-name-fix [api-minor] Propagate the translated font name to TextContentItem for system fonts	2022-11-08 21:56:49 +01:00
Samuel Yuan	36fb5c1e2b	Propagate the translated font name to TextContentItems. This allows font data for system fonts to be looked up in the PDFObjects.	2022-11-08 11:16:21 -08:00
Jonas Jenwald	c8868a1c7a	[api-minor] Initialize the unicode-category lazily on the `Glyph`-instance The purpose of this patch is twofold: - Initialize the unicode-category data lazily during text-extraction, since this is completely unused during general parsing/rendering. - Stop exposing this data in the API, since it's unused on the main-thread and it seems like it was accidentally included. Obviously these changes are API-observable, but hopefully no user is depending on this. Furthermore, it's trivial for a user to re-create this unicode-category data manually with a regular expression (from the exposed `unicode` property).	2022-11-05 10:12:17 +01:00
Jonas Jenwald	c33b8d7692	Cache the normalized unicode-value on the `Glyph`-instance Currently, during text-extraction, we're repeatedly normalizing and (when necessary) reversing the unicode-values every time. This seems a little unnecessary, since the result won't change, hence this patch moves that into the `Glyph`-instance and makes it lazily initialized. Taking the `tracemonkey.pdf` document as an example: When extracting the text-content there's a total of 69236 characters but only 595 unique `Glyph`-instances, which mean a 99.1 percent cache hit-rate. Generally speaking, the longer a PDF document is the more beneficial this should be. Please note: The old code is fast enough that it unfortunately seems difficult to measure a (clear) performance improvement with this patch, so I completely understand if it's deemed an unnecessary change.	2022-11-03 22:36:53 +01:00
Jonas Jenwald	23930a249e	[api-minor] Let `Catalog.getAllPageDicts` return an empty dictionary when loading the first /Page fails (issue 15590) In order to support opening certain corrupt PDF documents, particularly hand-edited ones, this patch adds support for letting the `Catalog.getAllPageDicts` method fallback to returning an empty dictionary to replace (only) the first /Page of the document. Given that the viewer cannot initialize/load without access to the first page, this will thus allow e.g. document-level scripting to run as expected. Note that by effectively replacing a corrupt or missing first /Page in this way[1], we'll now render nothing but a blank page for certain cases of broken/corrupt PDF documents which may look weird. Please note: This functionality is controlled via the existing `stopAtErrors` option, that can be passed to `getDocument`, since it's easy to imagine use-cases where this sort of fallback behaviour isn't desirable. --- [1] Currently we still require that a /Pages-dictionary is found though, however it may be possible to relax even that assumption if that becomes absolutely necessary in future corrupt documents.	2022-11-03 12:51:48 +01:00
Jonas Jenwald	2516ffa78e	Fallback to finding the first "obj" occurrence, when the trailer-dictionary is incomplete (issue 15590) Note that the "trailer"-case is already a fallback, since normally we're able to use the "xref"-operator even in corrupt documents. However, when a "trailer"-operator is found we still expect "startxref" to exist and be usable in order to advance the stream position. When that's not the case, as happens in the referenced issue, we use a simple fallback to find the first "obj" occurrence instead. This partially fixes issue 15590, since without this patch we fail to find any objects at all during `XRef.indexObjects`. However, note that the PDF document is still corrupt and won't render since there's no actual /Pages-dictionary and the /Root-entry simply points to the /OpenAction-dictionary instead.	2022-11-03 12:46:30 +01:00
Jonas Jenwald	6193537cd3	Merge pull request #15648 from Snuffleupagus/issue-12232 Prevent interaction with form elements in PresentationMode (issue 12232)	2022-10-31 11:14:23 +01:00
calixteman	e42e1cde61	Merge pull request #15615 from calixteman/bug1796741 [Form] Don't use field appearances when /NeedAppearances is set to true (bug 1796741)	2022-10-31 09:58:27 +01:00
Jonas Jenwald	f0811a4a3c	Prevent mouse interaction with form elements in PresentationMode (issue 12232)	2022-10-30 21:55:44 +01:00
Jonas Jenwald	caef47a0cf	Remove the `PdfManager.onLoadedStream` method (PR 15616 follow-up) After the clean-up in PR 15616, the `PdfManager.onLoadedStream` method now only has a single call-site. Hence why this patch suggests that we remove this method and replace it with an optional parameter in `PdfManager.requestLoadedStream` instead. By making the new behaviour opt-in, we'll thus not change any existing call-site.	2022-10-29 14:42:17 +02:00
Jonas Jenwald	8b970109ea	Merge pull request #15632 from Snuffleupagus/issue-15629-2 [api-minor] Move the handling of unbalanced markedContent to the worker-thread (PR 15630 follow-up)	2022-10-29 09:37:07 +02:00

1 2 3 4 5 ...

5617 Commits