pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	aa5b678f94	Add default icons for FileAttachment annotations (bug 1230933) Please note: This "borrows" the icons from Thunderbird. According to the PDF specification, see https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2096626, we should be providing default icons for FileAttachment annotations without appearances.	2022-11-26 11:24:59 +01:00
Jonas Jenwald	4b02610e8c	Re-factor and simplify the `getQuadPoints` helper function The use of `Array.prototype.reduce()` is, in my opinion, hurting overall readability since it's not particularly easy to look at the relevant code and immediately understand what's going on here. Furthermore this code leads to strictly speaking unnecessary allocations and parsing, since we could just track the min/max values directly in the relevant loop instead.	2022-11-25 10:40:16 +01:00
Jonas Jenwald	b3e161c328	[api-minor] Deprecate the TextLayer `timeout` parameter This has never really been used anywhere within the PDF.js library[1], and when streaming of textContent was introduced this parameter was effectively made redundant. Note that when streaming of textContent is used, all text-layout has already happened by the time that this `timeout`-functionality is actually invoked (thus making it pointless). While the `timeout`-functionality may still "work" when the textContent is provided upfront, although it's never been used/tested, streaming will generally perform better (in e.g. a viewer setting). Please note: While unrelated here, also removes a now unused property that I forgot in PR 15259. --- [1] At least not since the code was moved into its current file, which happened in PR 6619 and landed seven years ago.	2022-11-24 23:08:39 +01:00
Jonas Jenwald	8fda3f04fe	Merge pull request #15732 from Snuffleupagus/issue-15719 Add a fallback for non-embedded composite Tahoma fonts (issue 15719)	2022-11-24 19:09:12 +01:00
Jonas Jenwald	d1c01b3164	Add a fallback for non-embedded composite Tahoma fonts (issue 15719)	2022-11-23 15:51:18 +01:00
Jonas Jenwald	47682985d3	Add support for Optional Content in TilingPatterns (issue 15716) This can't be a particularly common feature, since we've supported Optional Content for over two years and this is the very first TilingPattern-case we've seen.	2022-11-23 12:58:00 +01:00
Jonas Jenwald	f3e0f86641	Simplify the `getFilenameFromUrl` helper function	2022-11-23 11:48:08 +01:00
Jonas Jenwald	0ba242ea4a	Support FileAttachments with hash-signs in the filename (issue 15729) The reason for the issue is that we use the generic `getFilenameFromUrl` helper function, which was originally intended for regular URLs. For the filenames we're dealing with in FileAttachments, we really only want to strip the path when one exists[1]. --- [1] See [bug 1230933](https://bugzilla.mozilla.org/show_bug.cgi?id=1230933) for an example of such a case.	2022-11-23 10:47:33 +01:00
Jonas Jenwald	2ff9799e7a	Tweak assignment of common parameters in the `Annotation` classes This is slightly more compact, and also unifies the format across the various classes.	2022-11-20 12:29:59 +01:00
Jonas Jenwald	c92de947b6	Reduce duplication when creating a fallback appearance for `MarkupAnnotation`s Currently we repeat the same color-conversion code verbatim in lots of classes, which seems completely unnecessary.	2022-11-20 12:05:25 +01:00
Tim van der Meij	d6908ee145	Merge pull request #15701 from Snuffleupagus/move-string-helpers Move some string helper functions to the worker-thread	2022-11-19 11:20:07 +01:00
Jonas Jenwald	70d362f22c	Remove an unnecessary variable in `getPdfManager`, in the `src/core/worker.js` file Another tiny piece of clean-up, since adding a `catch`-handler to a Promise shouldn't require an intermediate variable.	2022-11-17 15:31:41 +01:00
Jonas Jenwald	a2a200175f	Remove unnecessary function names in the `src/core/worker.js` file Currently some functions in this file have names while others don't, and in a few cases the names are no longer entirely accurate. For the relevant functions there should really be no need to name them, and if memory serves this was originally done since browsers (many years ago) didn't always handle anonymous functions correctly in stack traces.	2022-11-17 15:12:48 +01:00
Jonas Jenwald	9adc7859c8	Move the `escapeString` helper function into the worker-thread Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.	2022-11-16 12:35:48 +01:00
Jonas Jenwald	e5859e145d	Move the `isAscii` helper function into the worker-thread Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.	2022-11-16 12:35:48 +01:00
Jonas Jenwald	2eaa708e3a	Combine the `stringToUTF16String` and `stringToUTF16BEString` helper functions Given that these functions are virtually identical, with the latter only adding a BOM, we can combine the two. Furthermore, since both functions were only used on the worker-thread, there's no reason to duplicate this functionality in both of the `pdf.js` and `pdf.worker.js` files.	2022-11-16 12:35:44 +01:00
Jonas Jenwald	f358e76f5b	Move the `_isOffscreenCanvasSupported` property to the base `Annotation` class Having just played around with adding FreeText-annotations and then trying to print, there were `FreeTextAnnotation: OffscreenCanvas is not supported, annotation may not render correctly.` messages printed in the console. The reason for this is that `FreeTextAnnotation` inherits from `MarkupAnnotation`, however only `WidgetAnnotation` actually defines the `_isOffscreenCanvasSupported` property.	2022-11-15 16:30:53 +01:00
Jonas Jenwald	3e4caf2e13	Take the mask-offset into account when rendering repeated image masks (bug 1799927) Please note: As usual when I'm working with the `src/display/canvas.js` code I don't really know what I'm doing, but it at least appears to work.	2022-11-13 16:15:30 +01:00
Jonas Jenwald	d22eb3591e	Change the `assert` in `Parser.findDefaultInlineStreamEnd` to a non-PRODUCTION one Given that this `assert` is only intended to catch any implementation bugs in our code, and not actually to validate the PDF data directly[1], we can avoid making this function call unconditionally. --- [1] In those cases, for example a `FormatError` should have been thrown instead.	2022-11-12 16:30:58 +01:00
Jonas Jenwald	bab1097db3	Remove the constructor in the `StatTimer` class With modern EcmaScript features, we can define these fields directly instead. Please note that for backwards compatibility purposes they are still public as before, however note that this functionality is disabled by default (see the `pdfBug` API option). Also, we can (slightly) simplify the two loops used in the `toString` method.	2022-11-11 12:31:04 +01:00
Jonas Jenwald	d6cd48e12a	Use actually private fields in the `AnnotationStorage` class These fields were never intended to be public, since modifying them manually would lead to inconsistent state, and with modern EcmaScript features we can now enforce this. Also, this patch removes a couple of JSDoc comments that we generally don't use.	2022-11-11 12:30:02 +01:00
Jonas Jenwald	595711bd7c	Merge pull request #15679 from Snuffleupagus/bug-1799927-2 Use the full inline image as the cacheKey in `Parser.makeInlineImage` (bug 1799927)	2022-11-10 22:54:48 +01:00
Calixte Denizet	3ca03603c2	[Annotation] Fix printing/saving for annotations containing some non-ascii chars and with no fonts to handle them (bug 1666824) - For text fields * when printing, we generate a fake font which contains some widths computed thanks to an OffscreenCanvas and its method measureText. In order to avoid to have to layout the glyphs ourselves, we just render all of them in one call in the showText method in using the system sans-serif/monospace fonts. * when saving, we continue to create the appearance streams if the fonts contain the char but when a char is missing, we just set, in the AcroForm dict, the flag /NeedAppearances to true and remove the appearance stream. This way, we let the different readers handle the rendering of the strings. - For FreeText annotations * when printing, we use the same trick as for text fields. * there is no need to save an appearance since Acrobat is able to infer one from the Content entry.	2022-11-10 19:05:39 +01:00
Jonas Jenwald	e8ec6af73e	Remove a couple of unnecessary temporary variables in `MurmurHash3_64.hexdigest` These variables are left-over from the initial implementation, back when `String.prototype.padStart` didn't exist and we thus had to pad manually (using a loop).	2022-11-10 18:27:26 +01:00
Jonas Jenwald	7abb6429b0	Initialize the dictionary lazily when parsing inline images This helps improve performance for some PDF documents with a huge number of inline images, e.g. the PDF document from issue 2618. Given that we no longer create `Stream`-instances unconditionally, we also don't need `Dict`-instances for cached inline images (since we only access the filter).	2022-11-10 18:27:26 +01:00
Jonas Jenwald	b46e0d61cf	Use the full inline image as the cacheKey in `Parser.makeInlineImage` (bug 1799927) Please note: This only fixes the "wrong letter" part of bug 1799927. It appears that the simple `computeAdler32` function, used when caching inline images, generates hash collisions for some (very short) TypedArrays. In this case that leads to some of the "letters", which are actually inline images, being rendered incorrectly. Rather than switching to another hashing algorithm, e.g. the `MurmurHash3_64` class, we simply cache using a stringified version of the inline image data as the cacheKey to prevent any future collisions. While this will (naturally) lead to slightly higher peak memory usage, it'll however be limited to the current `Parser`-instance which means that it's not persistent. One small benefit of these changes is that we can avoid creating lots of `Stream`-instances for already cached inline images.	2022-11-10 18:27:26 +01:00
Jonas Jenwald	f7449563ef	Merge pull request #15659 from sxyuan/system-font-name-fix [api-minor] Propagate the translated font name to TextContentItem for system fonts	2022-11-08 21:56:49 +01:00
Samuel Yuan	36fb5c1e2b	Propagate the translated font name to TextContentItems. This allows font data for system fonts to be looked up in the PDFObjects.	2022-11-08 11:16:21 -08:00
Jonas Jenwald	c8868a1c7a	[api-minor] Initialize the unicode-category lazily on the `Glyph`-instance The purpose of this patch is twofold: - Initialize the unicode-category data lazily during text-extraction, since this is completely unused during general parsing/rendering. - Stop exposing this data in the API, since it's unused on the main-thread and it seems like it was accidentally included. Obviously these changes are API-observable, but hopefully no user is depending on this. Furthermore, it's trivial for a user to re-create this unicode-category data manually with a regular expression (from the exposed `unicode` property).	2022-11-05 10:12:17 +01:00
Jonas Jenwald	c33b8d7692	Cache the normalized unicode-value on the `Glyph`-instance Currently, during text-extraction, we're repeatedly normalizing and (when necessary) reversing the unicode-values every time. This seems a little unnecessary, since the result won't change, hence this patch moves that into the `Glyph`-instance and makes it lazily initialized. Taking the `tracemonkey.pdf` document as an example: When extracting the text-content there's a total of 69236 characters but only 595 unique `Glyph`-instances, which mean a 99.1 percent cache hit-rate. Generally speaking, the longer a PDF document is the more beneficial this should be. Please note: The old code is fast enough that it unfortunately seems difficult to measure a (clear) performance improvement with this patch, so I completely understand if it's deemed an unnecessary change.	2022-11-03 22:36:53 +01:00
Jonas Jenwald	23930a249e	[api-minor] Let `Catalog.getAllPageDicts` return an empty dictionary when loading the first /Page fails (issue 15590) In order to support opening certain corrupt PDF documents, particularly hand-edited ones, this patch adds support for letting the `Catalog.getAllPageDicts` method fallback to returning an empty dictionary to replace (only) the first /Page of the document. Given that the viewer cannot initialize/load without access to the first page, this will thus allow e.g. document-level scripting to run as expected. Note that by effectively replacing a corrupt or missing first /Page in this way[1], we'll now render nothing but a blank page for certain cases of broken/corrupt PDF documents which may look weird. Please note: This functionality is controlled via the existing `stopAtErrors` option, that can be passed to `getDocument`, since it's easy to imagine use-cases where this sort of fallback behaviour isn't desirable. --- [1] Currently we still require that a /Pages-dictionary is found though, however it may be possible to relax even that assumption if that becomes absolutely necessary in future corrupt documents.	2022-11-03 12:51:48 +01:00
Jonas Jenwald	2516ffa78e	Fallback to finding the first "obj" occurrence, when the trailer-dictionary is incomplete (issue 15590) Note that the "trailer"-case is already a fallback, since normally we're able to use the "xref"-operator even in corrupt documents. However, when a "trailer"-operator is found we still expect "startxref" to exist and be usable in order to advance the stream position. When that's not the case, as happens in the referenced issue, we use a simple fallback to find the first "obj" occurrence instead. This partially fixes issue 15590, since without this patch we fail to find any objects at all during `XRef.indexObjects`. However, note that the PDF document is still corrupt and won't render since there's no actual /Pages-dictionary and the /Root-entry simply points to the /OpenAction-dictionary instead.	2022-11-03 12:46:30 +01:00
Jonas Jenwald	6193537cd3	Merge pull request #15648 from Snuffleupagus/issue-12232 Prevent interaction with form elements in PresentationMode (issue 12232)	2022-10-31 11:14:23 +01:00
calixteman	e42e1cde61	Merge pull request #15615 from calixteman/bug1796741 [Form] Don't use field appearances when /NeedAppearances is set to true (bug 1796741)	2022-10-31 09:58:27 +01:00
Jonas Jenwald	f0811a4a3c	Prevent mouse interaction with form elements in PresentationMode (issue 12232)	2022-10-30 21:55:44 +01:00
Jonas Jenwald	caef47a0cf	Remove the `PdfManager.onLoadedStream` method (PR 15616 follow-up) After the clean-up in PR 15616, the `PdfManager.onLoadedStream` method now only has a single call-site. Hence why this patch suggests that we remove this method and replace it with an optional parameter in `PdfManager.requestLoadedStream` instead. By making the new behaviour opt-in, we'll thus not change any existing call-site.	2022-10-29 14:42:17 +02:00
Jonas Jenwald	8b970109ea	Merge pull request #15632 from Snuffleupagus/issue-15629-2 [api-minor] Move the handling of unbalanced markedContent to the worker-thread (PR 15630 follow-up)	2022-10-29 09:37:07 +02:00
calixteman	8f80efa4ab	Merge pull request #15618 from calixteman/15614 [JS] Some functions (print, alert,...) must be called only after a user activation	2022-10-28 21:04:42 +02:00
Calixte Denizet	0de804a256	[JS] Some functions (print, alert,...) must be called only after a user activation - Some events, which require a user interaction, will allow those functions to be called. But after few seconds, if there are no more user interaction, it won't be possible anymore. The idea is to give an opportunity to the user to leave the pdf. - Disable print function when we're printing, the same with saving and disallow to save on open events.	2022-10-28 18:52:07 +02:00
Jonas Jenwald	ba05e47b3e	Combine `Array.from` and `Array.prototype.map` calls This isn't just a tiny bit more compact, but it also avoids an intermediate allocation; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/from#description	2022-10-28 13:46:30 +02:00
Jonas Jenwald	1e7274e9c6	[api-minor] Move the handling of unbalanced markedContent to the worker-thread (PR 15630 follow-up)	2022-10-27 11:14:54 +02:00
calixteman	27b251ac99	Merge pull request #15631 from calixteman/15627 [JS] Avoid to trigger a commit event on 'ENTER' when the textfield is multiline	2022-10-27 10:29:25 +02:00
Calixte Denizet	87f53b9cc9	[JS] Avoid to trigger a commit event on 'ENTER' when the textfield is multiline	2022-10-26 19:29:13 +02:00
Jonas Jenwald	980acddbfa	Prevent textLayer errors in documents with unbalanced beginMarkedContent/endMarkedContent operators (issue 15629)	2022-10-26 18:35:48 +02:00
Calixte Denizet	9f95a14e91	[Form] Don't use field appearances when /NeedAppearances is set to true (bug 1796741) When a form isn't changed, we used the appearances we had in the file, but when /NeedAppearances is true, all the appearances have to be regenerated whatever they're.	2022-10-26 12:10:51 +02:00
Jonas Jenwald	bcffbf74f3	Let the `PdfManager.requestLoadedStream` method return the stream This is very old code, and it could thus do with some simplification. Note how in the `src/core/worker.js` file we're combining both the `PdfManager.requestLoadedStream` and `PdfManager.onLoadedStream` methods in order to access the stream-data. This seems unnecessary, and it's simple enough to always let the `PdfManager.requestLoadedStream` method return the stream-data as well.	2022-10-24 17:00:48 +02:00
Jonas Jenwald	497edbd0ee	Revert "Avoid all rendering breaking completely when CanvasPattern.setTransform() is unsupported" (PR 13725 follow-up) PR 13725 was only intended as a temporary work-around, and it seems that we can now revert that. - Firefox 102 is the currently maintained ESR-branch, and the PDF.js project only supports the active one. - Node.js now works, thanks to the `node-canvas` package, and I've confirmed locally that following the STR in issue 13724 generates a correct image.	2022-10-22 10:58:51 +02:00
Jonas Jenwald	71bd8b4de9	Let `Lexer.getNumber` treat more invalid "numbers" as zero (issue 15604) In the referenced PDF document there are "numbers" which consist only of `-.`, and while that's obviously not valid Adobe Reader seems to handle it just fine. Letting this method ignore more invalid "numbers" was suggested during the review of PR 14543, so let's simply relax our the validation here.	2022-10-20 22:36:15 +02:00
Jonas Jenwald	e591378ff1	Restore a weaker version of the /Pages dictionary /Count check for corrupt documents (PR 15593 follow-up) It appears that PR 15593 broke `issue12402`, and we thus need to partially restore the /Count check. I completely missed this when looking at the test-results for PR 15593, both locally and on the bots, since the `Driver._getLastPageNumber` method would "swallow" an unavailable page number.	2022-10-20 14:22:29 +02:00
Jonas Jenwald	36967fcedb	Merge pull request #15586 from Snuffleupagus/rm-matchesForCache Remove the `Glyph.matchesForCache` method (PR 13494 follow-up)	2022-10-20 10:35:00 +02:00

1 2 3 4 5 ...

5604 Commits