Commit Graph

2951 Commits

Author SHA1 Message Date
Calixte Denizet
e0b843d991 Always add links in the annotation layer
Fixes #17730.
2024-02-27 22:48:08 +01:00
Calixte Denizet
a6eadf8150 Avoid to access to a missing cidSystemInfo property
Fixes #17689.
2024-02-19 09:55:23 +01:00
Jonas Jenwald
a7bcc81eb1 Add a dummy beginMarkedContentProps operator when optional content parsing fails (issue 17679) 2024-02-17 13:45:16 +01:00
calixteman
a83a8d7e4f
Merge pull request #17674 from calixteman/issue17671
Fix the endoffset of the last glyph when it's followed by a null offset in the loca table
2024-02-15 10:19:55 +01:00
Calixte Denizet
fcad3718f0 Fix the endoffset of the last glyph when it's followed by a null offset in the loca table
It fixes #17671.
2024-02-14 17:20:04 +01:00
Calixte Denizet
2133da166e When updating, write the xref table in the same format as the previous one (bug 1878916)
The specs are unclear about what kind of xref table format must be used.
In checking the validity of some pdfs in the preflight tool from Acrobat
we can guess that having the same format is the correct way to do.
The pdf in the mentioned bug, after having been changed, wasn't correctly
displayed in neither Chrome nor Acrobat: it's now fixed.
2024-02-13 14:14:37 +01:00
Jonas Jenwald
37e98e39f6 Skip any whitespace after the first object in linearized PDFs (issue 17665)
This way the code is now consistent with the non-linearized branch in the `PDFDocument.startXRef` getter.
2024-02-12 22:05:36 +01:00
Jonas Jenwald
363dce6744 Use a limit, in more places, when splitting strings
This should be a *tiny* bit more efficient, since it avoids parsing substrings that we don't care about.

*Please note:* I cannot find an ESLint rule to enforce this automatically.
2024-02-02 13:10:52 +01:00
Calixte Denizet
7f2428a77e Reduce memory use and improve perfs when computing the bounding box of a bezier curve (bug 1875547)
It isn't really a fix for the mentioned bug but it slightly improve things.
In reducing the memory use, the time spent in the GC is reduced either.
The algorithm to compute the bounding box is the same as before but it has just
been rewritten to be more efficient.
2024-01-24 23:41:14 +01:00
Jonas Jenwald
fa583427ef Always export the "raw" /ToUnicode-data from PartialEvaluator.preEvaluateFont (PR 13354 follow-up)
This, ever so slightly, simplifies the implementation in the `PartialEvaluator.extractDataStructures`-method.
2024-01-22 13:06:32 +01:00
Jonas Jenwald
f21a30dfb4 Convert the PartialEvaluator.readToUnicode method to be async 2024-01-22 12:47:06 +01:00
Jonas Jenwald
f5c01188dc Convert the PartialEvaluator.extractDataStructures method to be async 2024-01-22 12:47:06 +01:00
Jonas Jenwald
cf0797dfbd Use await consistently in the PartialEvaluator.setGState method 2024-01-22 12:47:06 +01:00
Jonas Jenwald
1cc83c4fdc Use await consistently in the PartialEvaluator.buildFormXObject method 2024-01-22 12:47:06 +01:00
calixteman
bba831821d
Merge pull request #17558 from calixteman/bug1669097
Print correctly documents containing chars with an unicode greater than 0xFFFF (bug 1669097)
2024-01-22 12:23:06 +01:00
Calixte Denizet
06601fd90c Print correctly documents containing chars with an unicode greater than 0xFFFF (bug 1669097) 2024-01-22 10:48:00 +01:00
Tim van der Meij
49b2d9b5af
Merge pull request #17556 from Snuffleupagus/issue-17554
Ensure that `EvaluatorPreprocessor.opMap` has a null-prototype (issue 17554)
2024-01-21 20:58:09 +01:00
Jonas Jenwald
d7e41d4cb6 Ensure that EvaluatorPreprocessor.opMap has a null-prototype (issue 17554)
This accidentally regressed in PR 16956, sorry about that!
2024-01-21 19:59:13 +01:00
Jonas Jenwald
3c2c0ecd88 Use the ESLint arrow-body-style rule in more spots in src/core/evaluator.js 2024-01-21 17:42:33 +01:00
Jonas Jenwald
d1bef8cb86 Use await consistently in the PartialEvaluator.translateFont method 2024-01-21 17:36:50 +01:00
Jonas Jenwald
fc62eec901 Convert the handleSetFont methods, in src/core/evaluator.js, to be async 2024-01-21 17:32:05 +01:00
Jonas Jenwald
f9a384d711 Enable the arrow-body-style ESLint rule
This manually ignores some cases where the resulting auto-formatting would not, as far as I'm concerned, constitute a readability improvement or where we'd just end up with more overall indentation.

Please see https://eslint.org/docs/latest/rules/arrow-body-style
2024-01-21 16:20:55 +01:00
Jonas Jenwald
9dfe9c552c Use shorter arrow functions where possible
For arrow functions that are both simple and short, we can avoid using explicit `return` to shorten them even further without hurting readability.

For the `gulp mozcentral` build-target this reduces the overall size of the output by just under 1 kilo-byte (which isn't a lot but still can't hurt).
2024-01-21 10:13:12 +01:00
Calixte Denizet
d64f334f98 [Editor] Add support for printing/saving free highlight annotations 2024-01-19 12:58:46 +01:00
Calixte Denizet
10389c5017 Add the font Linux Libertine as a possible substitution for Times New Roman
and try to load the font family (guessed from the font name) before trying
the local substitution.
The local(...) command expects to have a real font name and not a predefined
substitution it's why we try the font family.
2024-01-16 12:31:23 +01:00
Calixte Denizet
e9946fa22a [Editor] Draw a line instead of a Bezier curve when an Ink has only one point
Fixes #17418.
2024-01-15 13:32:36 +01:00
Calixte Denizet
405f573d70 Take into account empty lines when extracting text content from the appearance
Fixes #17492.
2024-01-14 20:23:29 +01:00
Calixte Denizet
0392feaee4 Remove terminal white spaces when extracting text from annotation appearances 2024-01-09 10:42:53 +01:00
Calixte Denizet
7839e7b495 Preserve the whitespaces when getting text from FreeText annotations (bug 1871353)
When the text of an annotation is extracted in using getTextContent, consecutive white spaces
are just replaced by one space and. So this patch add an option to make sure that white
spaces are preserved when appearance is parsed.
For the case where there's no appearance, we can have a fast path to get the correct string
from the Content entry.
When an existing FreeText is edited, space (0x20) are replaced by non-breakable (0xa0) ones
to make to see all of them on screen.
2024-01-05 10:20:32 +01:00
Jonas Jenwald
9f02cc36d4 Attempt to further reduce re-parsing for globally cached images (PR 11912, 16108 follow-up)
In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents.
However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes.

Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to *copy* a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases.

For the PDF document in issue 11878, the rendering time of *the second page* changes as follows (on my computer):
 - With the `master`-branch it takes >600 ms to render.
 - With this patch that goes down to ~50 ms, which is one order of magnitude faster.

(Note that all other pages are, as expected, completely unaffected by these changes.)

This new main-thread copying is limited to "large" global images, since:
 - Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue.
 - With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread.
 - This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.
2023-12-21 21:26:21 +01:00
Jonas Jenwald
e547b198a3 Compute the length of the final image-bitmap/data on the worker-thread
Currently this is done in the API, but moving it into the worker-thread will simplify upcoming changes.
2023-12-21 21:26:21 +01:00
Jonas Jenwald
63eb8991a3 Support Annotations with corrupt /BS-entries
There's obviously a few things wrong with the Annotations in the referenced PDF document, however parsing of an Annotation shouldn't just break if the /BS-entry isn't a dictionary.
2023-12-09 10:36:18 +01:00
Calixte Denizet
ae5828c968 [Editor] Avoid conflicts between new persistent refs and the ones created when saving (bug 1865341)
When a pdf as a FreeText without appearance, we use a fake font in order to render it
and that leads to create few new refs for the font.
But then when we're saving, we create some new refs which start at the same number
as the previous created ones.
Consequently, when saving we're using some wrong objects (like a font) to check if
we're able to render the newly added FreeText.
In order to fix this bug, we just remove the persistent refs (which are only used
when rendering/printing) during the saving.
2023-12-05 12:33:21 +01:00
Calixte Denizet
52ea20eda4 Don't throw when there isn't enough data to get block info in flate stream
but just ends the stream.
2023-11-26 18:12:22 +01:00
Calixte Denizet
f8f4432961 [Editor] Add support for saving/printing a newly added Highlight annotation (bug 1865708) 2023-11-22 10:41:55 +01:00
Jonas Jenwald
a6f0609a6e Throw a JpegError when a JPEG image has no frame data (issue 17302)
Given that there's nothing to parse in this case, since we're dealing with an invalid JPEG image, throwing an *explicit* Error makes sense here.
2023-11-20 17:33:49 +01:00
Jonas Jenwald
709d89420e Re-factor how the GenericL10n class fetches localization-data
- Re-factor the existing `fetchData` helper function such that it can fetch more types of data, and it now supports "arraybuffer", "json", and "text".
   This only needed minor adjustments in the `DOMCMapReaderFactory` and `DOMStandardFontDataFactory` classes.[1]

 - Expose the `fetchData` helper function in the API, such that the viewer is able to access it.

 - Use the `fetchData` helper function in the `GenericL10n` class, since this should allow fetching of localization-data even if the default viewer is run in an environment without support for the Fetch API.

---
[1] While testing this I also noticed a minor inconsistency when handling standard font-data on the worker-thread.
2023-11-14 13:45:14 +01:00
Calixte Denizet
09b4fe6a30 Get the field name from its parent when it doesn't have one when collecting fields (bug 1864136)
Some fields, somewhere under the Fields entry in Acroform, could have no name (in T)
but with a parent which has a name but which isn't somewhere under Fields.
As a side-effect, this patch prevents infinite loops because of potential cycles
under Fields.
2023-11-13 14:41:14 +01:00
Calixte Denizet
59ce1a4a3f Fix the maxp table version in font to make it visible on Windows 2023-11-10 14:16:20 +01:00
Jonas Jenwald
ff62fc8e2c Skip fieldObjects that are not actually References
The `fieldObjects`-getter is implemented in the `PDFDocument` class, which means that the `this._localIdFactory`-property that we pass to `AnnotationFactory.create` doesn't actually exist.
The reason that this hasn't caused any bugs, that I'm aware of, is that all /Fields-entries need to be References to actually make sense.
2023-11-08 14:39:13 +01:00
Jonas Jenwald
65c827b0eb Ensure that fieldObjects and #collectFieldObjects handles References correctly
The `fieldObjects`-getter itself is called, from `src/core/worker.js`, in a way that'll ensure that any `MissingDataException`s are handled. However the problem is that the actual data-lookups in `fieldObjects` and `#collectFieldObjects` are done inside of a Promise, which means that `MissingDataException`s won't be handled and parsing could thus break.

To address this we change all data-lookups to be asynchronous instead.
2023-11-08 14:38:57 +01:00
Calixte Denizet
acc62f80de Don't try to collect a nonexistent field because of an invalid ref 2023-11-07 19:58:29 +01:00
Calixte Denizet
0c38c6e103 Improve performance of optional content parsing 2023-10-25 17:50:53 +02:00
Calixte Denizet
133ed96f8f Don't take into account the INVISIBLE flag for well-known annotations 2023-10-25 10:16:14 +02:00
Calixte Denizet
2f3797db34 [Annotation] Use the field V entry when there is no Parent one for a radio button (bug 1860602) 2023-10-23 22:11:30 +02:00
Jonas Jenwald
25a1a9d28f Reduce unnecessary type conversion in writeStream
Currently we're unnecessarily converting data between strings and typed-arrays, when dealing with compressible data, in the `writeStream` function.
Note how we're *first* getting a string-representation of the stream, which involves converting the underlying typed-array into a string, only to immediately convert this back into a typed-array. This seems completely unnecessary, and is easy enough to avoid, and we'll now only do a *single* type-conversion in this function.
2023-10-18 15:39:01 +02:00
Calixte Denizet
7851c0da8d [Debugger] Add some info about substitution font
When pdfBug is true, the substitution font is used in the text layer in order
to be able to know what is the font really used thanks to the devtools.
And to be sure that fonts are loaded, the font cache isn't cleaned up when
the debugger is active.
2023-10-09 12:06:33 +02:00
Calixte Denizet
e737638a40 Add a HTML containter for locked FreeText annotations in order to be able to display a popup (follow-up iof #17070) 2023-10-05 14:01:34 +02:00
Calixte Denizet
40b1d92044 Update the noHTML flag to take into account the hasOwnCanvas one (fixes #17069)
When an element has the hasOwnCanvas flag we must have an HTML container to attach
the canvas where the element will be rendered.
So the noHTML flag must take this information into account:
 - in some cases the noHTML flag is resetted depending on the hasOwnCanvas value;
 - in some others, the hasOwnCanvas flag is set depending on the value of noHTML.
2023-10-04 18:06:21 +02:00
Jonas Jenwald
bf9c33e60f Add support for "GoToE" actions with destinations (issue 17056)
This shouldn't be very common in practice, since "GoToE" actions themselves seem quite uncommon; see PR 15537.
2023-10-04 11:14:23 +02:00