pdf.js

Author	SHA1	Message	Date
Calixte Denizet	7839e7b495	Preserve the whitespaces when getting text from FreeText annotations (bug 1871353) When the text of an annotation is extracted in using getTextContent, consecutive white spaces are just replaced by one space and. So this patch add an option to make sure that white spaces are preserved when appearance is parsed. For the case where there's no appearance, we can have a fast path to get the correct string from the Content entry. When an existing FreeText is edited, space (0x20) are replaced by non-breakable (0xa0) ones to make to see all of them on screen.	2024-01-05 10:20:32 +01:00
Jonas Jenwald	9f02cc36d4	Attempt to further reduce re-parsing for globally cached images (PR 11912, 16108 follow-up) In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents. However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes. Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to copy a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases. For the PDF document in issue 11878, the rendering time of the second page changes as follows (on my computer): - With the `master`-branch it takes >600 ms to render. - With this patch that goes down to ~50 ms, which is one order of magnitude faster. (Note that all other pages are, as expected, completely unaffected by these changes.) This new main-thread copying is limited to "large" global images, since: - Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue. - With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread. - This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.	2023-12-21 21:26:21 +01:00
Jonas Jenwald	e547b198a3	Compute the length of the final image-bitmap/data on the worker-thread Currently this is done in the API, but moving it into the worker-thread will simplify upcoming changes.	2023-12-21 21:26:21 +01:00
Jonas Jenwald	63eb8991a3	Support Annotations with corrupt /BS-entries There's obviously a few things wrong with the Annotations in the referenced PDF document, however parsing of an Annotation shouldn't just break if the /BS-entry isn't a dictionary.	2023-12-09 10:36:18 +01:00
Calixte Denizet	ae5828c968	[Editor] Avoid conflicts between new persistent refs and the ones created when saving (bug 1865341) When a pdf as a FreeText without appearance, we use a fake font in order to render it and that leads to create few new refs for the font. But then when we're saving, we create some new refs which start at the same number as the previous created ones. Consequently, when saving we're using some wrong objects (like a font) to check if we're able to render the newly added FreeText. In order to fix this bug, we just remove the persistent refs (which are only used when rendering/printing) during the saving.	2023-12-05 12:33:21 +01:00
Calixte Denizet	52ea20eda4	Don't throw when there isn't enough data to get block info in flate stream but just ends the stream.	2023-11-26 18:12:22 +01:00
Calixte Denizet	f8f4432961	[Editor] Add support for saving/printing a newly added Highlight annotation (bug 1865708)	2023-11-22 10:41:55 +01:00
Jonas Jenwald	a6f0609a6e	Throw a `JpegError` when a JPEG image has no frame data (issue 17302) Given that there's nothing to parse in this case, since we're dealing with an invalid JPEG image, throwing an explicit Error makes sense here.	2023-11-20 17:33:49 +01:00
Jonas Jenwald	709d89420e	Re-factor how the `GenericL10n` class fetches localization-data - Re-factor the existing `fetchData` helper function such that it can fetch more types of data, and it now supports "arraybuffer", "json", and "text". This only needed minor adjustments in the `DOMCMapReaderFactory` and `DOMStandardFontDataFactory` classes.[1] - Expose the `fetchData` helper function in the API, such that the viewer is able to access it. - Use the `fetchData` helper function in the `GenericL10n` class, since this should allow fetching of localization-data even if the default viewer is run in an environment without support for the Fetch API. --- [1] While testing this I also noticed a minor inconsistency when handling standard font-data on the worker-thread.	2023-11-14 13:45:14 +01:00
Calixte Denizet	09b4fe6a30	Get the field name from its parent when it doesn't have one when collecting fields (bug 1864136) Some fields, somewhere under the Fields entry in Acroform, could have no name (in T) but with a parent which has a name but which isn't somewhere under Fields. As a side-effect, this patch prevents infinite loops because of potential cycles under Fields.	2023-11-13 14:41:14 +01:00
Calixte Denizet	59ce1a4a3f	Fix the maxp table version in font to make it visible on Windows	2023-11-10 14:16:20 +01:00
Jonas Jenwald	ff62fc8e2c	Skip `fieldObjects` that are not actually References The `fieldObjects`-getter is implemented in the `PDFDocument` class, which means that the `this._localIdFactory`-property that we pass to `AnnotationFactory.create` doesn't actually exist. The reason that this hasn't caused any bugs, that I'm aware of, is that all /Fields-entries need to be References to actually make sense.	2023-11-08 14:39:13 +01:00
Jonas Jenwald	65c827b0eb	Ensure that `fieldObjects` and `#collectFieldObjects` handles References correctly The `fieldObjects`-getter itself is called, from `src/core/worker.js`, in a way that'll ensure that any `MissingDataException`s are handled. However the problem is that the actual data-lookups in `fieldObjects` and `#collectFieldObjects` are done inside of a Promise, which means that `MissingDataException`s won't be handled and parsing could thus break. To address this we change all data-lookups to be asynchronous instead.	2023-11-08 14:38:57 +01:00
Calixte Denizet	acc62f80de	Don't try to collect a nonexistent field because of an invalid ref	2023-11-07 19:58:29 +01:00
Calixte Denizet	0c38c6e103	Improve performance of optional content parsing	2023-10-25 17:50:53 +02:00
Calixte Denizet	133ed96f8f	Don't take into account the INVISIBLE flag for well-known annotations	2023-10-25 10:16:14 +02:00
Calixte Denizet	2f3797db34	[Annotation] Use the field V entry when there is no Parent one for a radio button (bug 1860602)	2023-10-23 22:11:30 +02:00
Jonas Jenwald	25a1a9d28f	Reduce unnecessary type conversion in `writeStream` Currently we're unnecessarily converting data between strings and typed-arrays, when dealing with compressible data, in the `writeStream` function. Note how we're first getting a string-representation of the stream, which involves converting the underlying typed-array into a string, only to immediately convert this back into a typed-array. This seems completely unnecessary, and is easy enough to avoid, and we'll now only do a single type-conversion in this function.	2023-10-18 15:39:01 +02:00
Calixte Denizet	7851c0da8d	[Debugger] Add some info about substitution font When pdfBug is true, the substitution font is used in the text layer in order to be able to know what is the font really used thanks to the devtools. And to be sure that fonts are loaded, the font cache isn't cleaned up when the debugger is active.	2023-10-09 12:06:33 +02:00
Calixte Denizet	e737638a40	Add a HTML containter for locked FreeText annotations in order to be able to display a popup (follow-up iof #17070 )	2023-10-05 14:01:34 +02:00
Calixte Denizet	40b1d92044	Update the noHTML flag to take into account the hasOwnCanvas one (fixes #17069 ) When an element has the hasOwnCanvas flag we must have an HTML container to attach the canvas where the element will be rendered. So the noHTML flag must take this information into account: - in some cases the noHTML flag is resetted depending on the hasOwnCanvas value; - in some others, the hasOwnCanvas flag is set depending on the value of noHTML.	2023-10-04 18:06:21 +02:00
Jonas Jenwald	bf9c33e60f	Add support for "GoToE" actions with destinations (issue 17056) This shouldn't be very common in practice, since "GoToE" actions themselves seem quite uncommon; see PR 15537.	2023-10-04 11:14:23 +02:00
Calixte Denizet	f2196f7803	StructParents entry isn't required on pages with no tagged contents (bug 1855641)	2023-09-28 14:23:10 +02:00
Calixte Denizet	3ee5268a23	[Editor] Don't try to add data to the struct tree when there is no accessibilityData (bug 1855157)	2023-09-26 11:02:14 +02:00
Jonas Jenwald	4cedc12341	Restore the `collectFields` parameter in the Annotation code (issue 17000) Rather than trying to be "clever" here, and possibly affect code readability negatively, let's just restore the `collectFields` parameter to address the unneeded parsing that now happens when printing new Annotations.	2023-09-21 19:49:31 +02:00
Jonas Jenwald	0ac8f33e13	Ignore optional content with missing /Type-entries In the rare situation that an optional content dictionary lacks a /Type-entry we currently throw, which may prevent e.g. Form XObjects from rendering completely. Fixes https://bugs.ghostscript.com/show_bug.cgi?id=707147	2023-09-19 14:11:03 +02:00
calixteman	3afb717eed	Merge pull request #16938 from calixteman/update_struct_tree [Editor] Add the ability to create/update the structure tree when saving a pdf containing newly added annotations (bug 1845087)	2023-09-18 13:04:52 +02:00
Calixte Denizet	a8573d4e1b	[Editor] Add the ability to create/update the structure tree when saving a pdf containing newly added annotations (bug 1845087) When there is no tree, the tags for the new annotions are just put under the root element. When there is a tree, we insert the new tags at the right place in using the value of structTreeParentId (added in PR #16916).	2023-09-16 18:34:58 +02:00
Jonas Jenwald	0304b65fcd	Remove the closure from the `CipherTransformFactory` class Now that modern JavaScript is fully supported also in the worker-thread we no longer need to keep old closures, which slightly reduces the size of the code.	2023-09-16 16:34:24 +02:00
Tim van der Meij	7f8de83e96	Merge pull request #16957 from Snuffleupagus/SaveDocument-more-await Use `await` even more in the "SaveDocument" worker-thread handler	2023-09-16 14:31:05 +02:00
Tim van der Meij	89c8f90a14	Merge pull request #16956 from Snuffleupagus/opMap-rm-getLookupTableFactory Simplify the `EvaluatorPreprocessor.opMap` getter a little bit	2023-09-16 13:51:33 +02:00
Jonas Jenwald	ff96c413d3	Use `await` even more in the "SaveDocument" worker-thread handler Given that the function is already asynchronous we can make use of `await` even more and reduce the amount of indentation a little bit.	2023-09-16 13:06:48 +02:00
Jonas Jenwald	316d1ec5ef	Simplify the `EvaluatorPreprocessor.opMap` getter a little bit Given that this is a shadowed getter, the `opMap` is already lazily initialized and it shouldn't be necessary to also use the `getLookupTableFactory` helper function here. Looking at the history of the code, it seems that this is simply a leftover from before JavaScript classes existed.	2023-09-16 12:26:38 +02:00
Jonas Jenwald	8cb5d01acd	Remove the closure from the `LabCS` class Now that modern JavaScript is fully supported also in the worker-thread we no longer need to keep old closures, which slightly reduces the size of the code.	2023-09-16 12:20:14 +02:00
Jonas Jenwald	52fa66a98b	Remove the closure from the `CalRGBCS` class Now that modern JavaScript is fully supported also in the worker-thread we no longer need to keep old closures, which slightly reduces the size of the code.	2023-09-16 12:20:12 +02:00
Jonas Jenwald	4d615f087f	Remove the closure from the `CalGrayCS` class Now that modern JavaScript is fully supported also in the worker-thread we no longer need to keep old closures, which slightly reduces the size of the code.	2023-09-15 15:53:16 +02:00
Jonas Jenwald	d2c8997380	Remove the closure from the `DeviceCmykCS` class Now that modern JavaScript is fully supported also in the worker-thread we no longer need to keep old closures, which slightly reduces the size of the code.	2023-09-15 15:53:05 +02:00
Jonas Jenwald	628ca737dd	Make it possible to clear the cache, used by the `getB` function in `src/core/pattern.js` While this cache will not contain a huge amount of data in practice, it's nonetheless a global cache that currently will never be cleared. This patch also removes the existing closure, since it shouldn't really be necessary nowadays given that the code is a JavaScript module which means that only explicitly listed properties will be exported.	2023-09-15 12:23:06 +02:00
Jonas Jenwald	93ce7c5a89	Change the `getB` function, in `src/core/pattern.js`, to use the exponentiation operator	2023-09-15 11:46:00 +02:00
Jonas Jenwald	50937a3539	Ensure that the entire PDF document is loaded before we begin saving it When I started looking at PR 16938 it occurred to me that some of the new structTree-methods are synchronously accessing certain dictionary-data (not used during "normal" structTree-parsing), which may not be generally safe since everything in a dictionary could be a reference (and the relevant data may not have been loaded yet). Rather than suggesting that we make all those new methods even more asynchronous, to me the overall simplest and safest solution is to ensure that the entire PDF document has been loaded before we begin saving it. In practice this shouldn't really affect "performance" of saving noticeably, since it's always depended on the entire PDF document being downloaded. Finally note that with the exception of the PDF document possibly not having been fully downloaded when saving is triggered, all other "global" document properties are pretty much guaranteed to already be available at this point.	2023-09-12 13:26:57 +02:00
Jonas Jenwald	18a661b6a0	Merge pull request #16920 from Snuffleupagus/annotationGlobals Slightly reduce asynchronicity when parsing Annotations	2023-09-09 09:55:49 +02:00
Calixte Denizet	52cc1220e4	Simplify writeObject function It'll avoid to have the duplication of the code to get the encrypt transform, and last but not least, it'll avoid to forget about encryption.	2023-09-08 19:59:59 +02:00
Jonas Jenwald	b5b061cdb6	Slightly re-factor the parameter handling in `Catalog.parseDestDictionary` While it makes sense to check that the `destDict` parameter is indeed a Dictionary, since that data comes from the PDF document itself, the `resultObj` parameter is an internal PDF.js implementation detail that should always be correct (or tests will fail).	2023-09-08 13:27:31 +02:00
Jonas Jenwald	df9cce39c0	Slightly reduce asynchronicity when parsing Annotations Over time the amount of "document level" data potentially needed during parsing of Annotations have increased a fair bit, which means that we currently need to ensure that a bunch of data is available for each individual Annotation. Given that this data is "constant" for a PDF document we can instead create (and cache) it lazily, only when needed, before starting to parse the Annotations on a page. This way the parsing of individual Annotations should become slightly less asynchronous, which really cannot hurt. An additional benefit of these changes is that we can reduce the number of parameters that need to be explicitly passed around in the annotation-code, which helps overall readability in my opinion. One potential drawback of these changes is that the `AnnotationFactory.create` method no longer handles "everything" on its own, however given how few call-sites there are I don't think that's too much of a problem.	2023-09-08 13:27:27 +02:00
Calixte Denizet	a8a50c567a	Construct the correct field name and strip out classes when searching The classes were stripped out during when creating the field name but it led to a wrong name. Since class components in a path are irrelevant, they're just ignored when searching for a node in the datasets.	2023-09-07 15:56:47 +02:00
Calixte Denizet	d185db2b70	Add tagged annotations in the structure tree (bug 1850797)	2023-08-31 12:35:32 +02:00
Jonas Jenwald	ec3d2be761	Introduce more optional chaining in the code-base Also, use logical OR assignment a bit more.	2023-08-26 10:52:23 +02:00
Jonas Jenwald	598421b11f	Merge pull request #16856 from Snuffleupagus/limit-lineEndings Exclude `lineEndings`, in Annotation-data, in MOZCENTRAL builds (PR 14899 follow-up)	2023-08-24 15:58:32 +02:00
Calixte Denizet	24b480fabe	Don't reset all fields when the resetForm argument is an array correctly set the readonly property in the annotation layer and set the default checkbox value to Off when none is provided.	2023-08-24 09:10:27 -04:00
Calixte Denizet	ee3ac35e05	Revert fix for bug 1838855 (bug 1849876) The issue described in the mentioned bug is reall because Acrobat is rendering the XFA instead of the Acroform. The original patch just tried to workaround the issue but it induces some regressions.	2023-08-23 12:34:41 -04:00

1 2 3 4 5 ...

2923 Commits