pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	e8ec6af73e	Remove a couple of unnecessary temporary variables in `MurmurHash3_64.hexdigest` These variables are left-over from the initial implementation, back when `String.prototype.padStart` didn't exist and we thus had to pad manually (using a loop).	2022-11-10 18:27:26 +01:00
Jonas Jenwald	7abb6429b0	Initialize the dictionary lazily when parsing inline images This helps improve performance for some PDF documents with a huge number of inline images, e.g. the PDF document from issue 2618. Given that we no longer create `Stream`-instances unconditionally, we also don't need `Dict`-instances for cached inline images (since we only access the filter).	2022-11-10 18:27:26 +01:00
Jonas Jenwald	b46e0d61cf	Use the full inline image as the cacheKey in `Parser.makeInlineImage` (bug 1799927) Please note: This only fixes the "wrong letter" part of bug 1799927. It appears that the simple `computeAdler32` function, used when caching inline images, generates hash collisions for some (very short) TypedArrays. In this case that leads to some of the "letters", which are actually inline images, being rendered incorrectly. Rather than switching to another hashing algorithm, e.g. the `MurmurHash3_64` class, we simply cache using a stringified version of the inline image data as the cacheKey to prevent any future collisions. While this will (naturally) lead to slightly higher peak memory usage, it'll however be limited to the current `Parser`-instance which means that it's not persistent. One small benefit of these changes is that we can avoid creating lots of `Stream`-instances for already cached inline images.	2022-11-10 18:27:26 +01:00
Jonas Jenwald	f7449563ef	Merge pull request #15659 from sxyuan/system-font-name-fix [api-minor] Propagate the translated font name to TextContentItem for system fonts	2022-11-08 21:56:49 +01:00
Samuel Yuan	36fb5c1e2b	Propagate the translated font name to TextContentItems. This allows font data for system fonts to be looked up in the PDFObjects.	2022-11-08 11:16:21 -08:00
Jonas Jenwald	c8868a1c7a	[api-minor] Initialize the unicode-category lazily on the `Glyph`-instance The purpose of this patch is twofold: - Initialize the unicode-category data lazily during text-extraction, since this is completely unused during general parsing/rendering. - Stop exposing this data in the API, since it's unused on the main-thread and it seems like it was accidentally included. Obviously these changes are API-observable, but hopefully no user is depending on this. Furthermore, it's trivial for a user to re-create this unicode-category data manually with a regular expression (from the exposed `unicode` property).	2022-11-05 10:12:17 +01:00
Jonas Jenwald	c33b8d7692	Cache the normalized unicode-value on the `Glyph`-instance Currently, during text-extraction, we're repeatedly normalizing and (when necessary) reversing the unicode-values every time. This seems a little unnecessary, since the result won't change, hence this patch moves that into the `Glyph`-instance and makes it lazily initialized. Taking the `tracemonkey.pdf` document as an example: When extracting the text-content there's a total of 69236 characters but only 595 unique `Glyph`-instances, which mean a 99.1 percent cache hit-rate. Generally speaking, the longer a PDF document is the more beneficial this should be. Please note: The old code is fast enough that it unfortunately seems difficult to measure a (clear) performance improvement with this patch, so I completely understand if it's deemed an unnecessary change.	2022-11-03 22:36:53 +01:00
Jonas Jenwald	23930a249e	[api-minor] Let `Catalog.getAllPageDicts` return an empty dictionary when loading the first /Page fails (issue 15590) In order to support opening certain corrupt PDF documents, particularly hand-edited ones, this patch adds support for letting the `Catalog.getAllPageDicts` method fallback to returning an empty dictionary to replace (only) the first /Page of the document. Given that the viewer cannot initialize/load without access to the first page, this will thus allow e.g. document-level scripting to run as expected. Note that by effectively replacing a corrupt or missing first /Page in this way[1], we'll now render nothing but a blank page for certain cases of broken/corrupt PDF documents which may look weird. Please note: This functionality is controlled via the existing `stopAtErrors` option, that can be passed to `getDocument`, since it's easy to imagine use-cases where this sort of fallback behaviour isn't desirable. --- [1] Currently we still require that a /Pages-dictionary is found though, however it may be possible to relax even that assumption if that becomes absolutely necessary in future corrupt documents.	2022-11-03 12:51:48 +01:00
Jonas Jenwald	2516ffa78e	Fallback to finding the first "obj" occurrence, when the trailer-dictionary is incomplete (issue 15590) Note that the "trailer"-case is already a fallback, since normally we're able to use the "xref"-operator even in corrupt documents. However, when a "trailer"-operator is found we still expect "startxref" to exist and be usable in order to advance the stream position. When that's not the case, as happens in the referenced issue, we use a simple fallback to find the first "obj" occurrence instead. This partially fixes issue 15590, since without this patch we fail to find any objects at all during `XRef.indexObjects`. However, note that the PDF document is still corrupt and won't render since there's no actual /Pages-dictionary and the /Root-entry simply points to the /OpenAction-dictionary instead.	2022-11-03 12:46:30 +01:00
Jonas Jenwald	6193537cd3	Merge pull request #15648 from Snuffleupagus/issue-12232 Prevent interaction with form elements in PresentationMode (issue 12232)	2022-10-31 11:14:23 +01:00
calixteman	e42e1cde61	Merge pull request #15615 from calixteman/bug1796741 [Form] Don't use field appearances when /NeedAppearances is set to true (bug 1796741)	2022-10-31 09:58:27 +01:00
Jonas Jenwald	f0811a4a3c	Prevent mouse interaction with form elements in PresentationMode (issue 12232)	2022-10-30 21:55:44 +01:00
Jonas Jenwald	caef47a0cf	Remove the `PdfManager.onLoadedStream` method (PR 15616 follow-up) After the clean-up in PR 15616, the `PdfManager.onLoadedStream` method now only has a single call-site. Hence why this patch suggests that we remove this method and replace it with an optional parameter in `PdfManager.requestLoadedStream` instead. By making the new behaviour opt-in, we'll thus not change any existing call-site.	2022-10-29 14:42:17 +02:00
Jonas Jenwald	8b970109ea	Merge pull request #15632 from Snuffleupagus/issue-15629-2 [api-minor] Move the handling of unbalanced markedContent to the worker-thread (PR 15630 follow-up)	2022-10-29 09:37:07 +02:00
calixteman	8f80efa4ab	Merge pull request #15618 from calixteman/15614 [JS] Some functions (print, alert,...) must be called only after a user activation	2022-10-28 21:04:42 +02:00
Calixte Denizet	0de804a256	[JS] Some functions (print, alert,...) must be called only after a user activation - Some events, which require a user interaction, will allow those functions to be called. But after few seconds, if there are no more user interaction, it won't be possible anymore. The idea is to give an opportunity to the user to leave the pdf. - Disable print function when we're printing, the same with saving and disallow to save on open events.	2022-10-28 18:52:07 +02:00
Jonas Jenwald	ba05e47b3e	Combine `Array.from` and `Array.prototype.map` calls This isn't just a tiny bit more compact, but it also avoids an intermediate allocation; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/from#description	2022-10-28 13:46:30 +02:00
Jonas Jenwald	1e7274e9c6	[api-minor] Move the handling of unbalanced markedContent to the worker-thread (PR 15630 follow-up)	2022-10-27 11:14:54 +02:00
calixteman	27b251ac99	Merge pull request #15631 from calixteman/15627 [JS] Avoid to trigger a commit event on 'ENTER' when the textfield is multiline	2022-10-27 10:29:25 +02:00
Calixte Denizet	87f53b9cc9	[JS] Avoid to trigger a commit event on 'ENTER' when the textfield is multiline	2022-10-26 19:29:13 +02:00
Jonas Jenwald	980acddbfa	Prevent textLayer errors in documents with unbalanced beginMarkedContent/endMarkedContent operators (issue 15629)	2022-10-26 18:35:48 +02:00
Calixte Denizet	9f95a14e91	[Form] Don't use field appearances when /NeedAppearances is set to true (bug 1796741) When a form isn't changed, we used the appearances we had in the file, but when /NeedAppearances is true, all the appearances have to be regenerated whatever they're.	2022-10-26 12:10:51 +02:00
Jonas Jenwald	bcffbf74f3	Let the `PdfManager.requestLoadedStream` method return the stream This is very old code, and it could thus do with some simplification. Note how in the `src/core/worker.js` file we're combining both the `PdfManager.requestLoadedStream` and `PdfManager.onLoadedStream` methods in order to access the stream-data. This seems unnecessary, and it's simple enough to always let the `PdfManager.requestLoadedStream` method return the stream-data as well.	2022-10-24 17:00:48 +02:00
Jonas Jenwald	497edbd0ee	Revert "Avoid all rendering breaking completely when CanvasPattern.setTransform() is unsupported" (PR 13725 follow-up) PR 13725 was only intended as a temporary work-around, and it seems that we can now revert that. - Firefox 102 is the currently maintained ESR-branch, and the PDF.js project only supports the active one. - Node.js now works, thanks to the `node-canvas` package, and I've confirmed locally that following the STR in issue 13724 generates a correct image.	2022-10-22 10:58:51 +02:00
Jonas Jenwald	71bd8b4de9	Let `Lexer.getNumber` treat more invalid "numbers" as zero (issue 15604) In the referenced PDF document there are "numbers" which consist only of `-.`, and while that's obviously not valid Adobe Reader seems to handle it just fine. Letting this method ignore more invalid "numbers" was suggested during the review of PR 14543, so let's simply relax our the validation here.	2022-10-20 22:36:15 +02:00
Jonas Jenwald	e591378ff1	Restore a weaker version of the /Pages dictionary /Count check for corrupt documents (PR 15593 follow-up) It appears that PR 15593 broke `issue12402`, and we thus need to partially restore the /Count check. I completely missed this when looking at the test-results for PR 15593, both locally and on the bots, since the `Driver._getLastPageNumber` method would "swallow" an unavailable page number.	2022-10-20 14:22:29 +02:00
Jonas Jenwald	36967fcedb	Merge pull request #15586 from Snuffleupagus/rm-matchesForCache Remove the `Glyph.matchesForCache` method (PR 13494 follow-up)	2022-10-20 10:35:00 +02:00
Calixte Denizet	6db9cefaaf	[Annotation] Replace use of id by data-element-id to have the correct id	2022-10-19 23:36:28 +02:00
calixteman	ba3a0e104a	Merge pull request #15595 from calixteman/1793419 [Editor] Make FreeText annotations visible for screen readers when in editing mode (bug 1793419)	2022-10-19 19:33:49 +02:00
Jonas Jenwald	3c046c0a21	Extend `getSupplementalGlyphMapForCalibri` with some umlauts (issue 15594)	2022-10-19 17:49:40 +02:00
Jonas Jenwald	e00a040a80	Merge pull request #15593 from Snuffleupagus/issue-9105-other Relax the /Pages dictionary /Count check for corrupt documents (issue 9105)	2022-10-19 16:26:14 +02:00
Calixte Denizet	535c624e0d	[Editor] Make FreeText annotations visible for screen readers when in editing mode (bug 1793419) - When we're editing some annotations, keeping the role="text-box" make them visible as editable and VoiceOver (Mac) is able to read the contents when they're focused; - Add an attribute "aria-activedescendant" in order to make the content discoverable by NVDA on Windows.	2022-10-19 16:21:04 +02:00
Jonas Jenwald	bc13a277ce	Relax the /Pages dictionary /Count check for corrupt documents (issue 9105) After PR 14311, and follow-up patches, we no longer require that the /Count entry (in the /Pages dictionary) is either present or even valid in order to parse/render a PDF document. Hence it seems strange to keep this requirement for corrupt PDF documents, when trying to find a usable `trailer` in the `XRef.indexObjects` method.	2022-10-19 12:28:25 +02:00
Calixte Denizet	69b01d4398	[Annotation] Take the border into account when computing the font size (bug 1794403)	2022-10-19 10:27:27 +02:00
Jonas Jenwald	fd35cda8bc	Re-factor the glyph-cache lookup in the `Font._charToGlyph` method With the changes in the previous patch we can move the glyph-cache lookup to the top of the method and thus avoid a bunch of, in almost every case, completely unnecessary re-parsing for every `charCode`.	2022-10-19 09:55:09 +02:00
Jonas Jenwald	3e391aaed9	Remove the `Glyph.matchesForCache` method (PR 13494 follow-up) This method, and its class, was originally added in PR 4453 to reduce memory usage when parsing text. Then PR 13494 extended the `Glyph`-representation slightly to also include the `charCode`, which made the `matchesForCache` method effectively redundant since most properties on a `Glyph`-instance indirectly depends on that one. The only exception is potentially `isSpace` in multi-byte strings. Also, something that I noticed when testing this code: The `matchesForCache` method never worked correctly for `Glyph`s containing `accent`-data, since Objects are passed by reference in JavaScript. For affected fonts, of which there's only a handful of examples in our test-suite, we'd fail to find an already existing `Glyph` because of this.	2022-10-19 09:54:35 +02:00
Jonas Jenwald	de99f99a01	Fallback and try a previous generation if all else fails in `XRef.indexObjects` (issue 15577) When we fail to find a usable PDF document `trailer` and there were errors during parsing, try and fallback to a previous generation as a last resort during fetching of uncompressed references. Please note: This will not affect "normal" PDF documents, with valid /XRef data, and even most corrupt documents should be completely unaffected by these changes.	2022-10-18 20:24:01 +02:00
Calixte Denizet	6fb694658e	[Editor] Commit the current editor before setting the new viewport	2022-10-17 11:58:29 +02:00
Calixte Denizet	9e2bc8853f	[Editor] Ink editors must have their dimensions in percents after having been resized	2022-10-15 19:59:10 +02:00
Tim van der Meij	06599f487f	Merge pull request #15576 from Snuffleupagus/version Re-factor the PDF version parsing in the worker-thread	2022-10-15 13:03:43 +02:00
Tim van der Meij	2508792f29	Merge pull request #15572 from Snuffleupagus/simpleFontToUnicode-refactor Slightly re-factor `PartialEvaluator._simpleFontToUnicode`	2022-10-15 12:31:27 +02:00
Jonas Jenwald	d470010293	Re-factor the PDF version parsing in the worker-thread Part of this is very old code, and back when support for parsing the catalog-version was added things became less clear (in my opinion). Hence this patch tries to improve things, by e.g. validating the header- and catalog-version separately.	2022-10-15 12:06:39 +02:00
Jonas Jenwald	a576ea216f	Don't trigger worker-thread cleanup when destruction has already started Note how we're currently skipping all main-thread cleanup when document destruction has started, but for some reason we're still dispatching the "Cleanup" message. This seems like a simple oversight, since destruction will already invoke the `BasePdfManager.cleanup` method (on the worker-thread) to fully clear-out all caches.	2022-10-14 16:43:49 +02:00
Calixte Denizet	556513a6e7	Use all the current transform as key when caching some image for masks used with pattern fill (bug 1795263, #15573 )	2022-10-14 14:37:58 +02:00
Jonas Jenwald	15d4d80d45	Merge pull request #15563 from Snuffleupagus/issue-15559 Take the /CIDToGIDMap into account when getting the glyph mapping for CFF fonts (issue 15559)	2022-10-14 09:13:41 +02:00
Jonas Jenwald	d5036d7bfe	Merge pull request #15569 from Snuffleupagus/rm-worker-GetOperatorList-UnsupportedFeature [api-minor] Stop sending "UnsupportedFeature" from the worker-thread GetOperatorList-handling	2022-10-14 09:12:10 +02:00
Jonas Jenwald	fa47d4b9b1	Slightly re-factor `PartialEvaluator._simpleFontToUnicode` Given the sheer number of heuristics added to this method over the years, moving the valid unicode found case to the top should improve readability of the code.	2022-10-13 21:42:57 +02:00
Calixte Denizet	e756bb69e4	[JS] Take into account all the required fields for some computations - Fix Field::getArray in order to collect only the fields which have a value; - Fix AFSimple_Calculate: * allow to have a string with a list of field names as argument; * since a field can be non-terminal, use Field::getArray to collect the field under it and then apply the calculation on all the descendants.	2022-10-13 18:33:12 +02:00
Jonas Jenwald	f2f0a1e871	[api-minor] Stop sending "UnsupportedFeature" from the worker-thread GetOperatorList-handling This code was added all the way back in PR 6698, almost seven years ago, for backwards compatibility reasons. At this point in time, it seems that we can remove that since: - We have more fine-grained "UnsupportedFeature" reporting elsewhere in the worker-thread code nowadays. - The GetOperatorList-handling is now using `ReadableStream`s, which means that errors are being forwarded to the main-thread anyway. - We're also no longer displaying a notification-bar, in the built-in Firefox PDF Viewer, for any of these "UnsupportedFeature" messages.	2022-10-13 11:46:17 +02:00
Jonas Jenwald	858d941ff8	Take the /CIDToGIDMap into account when getting the glyph mapping for CFF fonts (issue 15559) Please note: I don't really know what I'm doing here, however the patch appears to fix the referenced issue when comparing the rendering with Adobe Reader (with the caveat that I don't speak the language in question).	2022-10-13 10:02:25 +02:00

... 12 13 14 15 16 ...

6231 Commits