*Please note:* The reduced test-case is *not* a perfect reproduction of the original PDF document, since this one fails to open in e.g. Adobe Reader, but I do believe that it captures the most important points here.
For corrupt *and* encrypted PDF documents, it's possible that only some trailer dictionaries actually contain an /Encrypt-entry. Previously we'd could easily miss that, since we generally pick the first not obviously corrupt trailer dictionary, and the solution implemented here is to simply pre-parse all trailer dictionaries to see if there's any /Encrypt-entries.
In GeckoView, on an event, a callback must be executed with the result of an action,
but the callback can be used only one time.
So for each FindInPage event, we must trigger only one matches count update.
This was deprecated in PR 15758 and given that it's quite unlikely that any third-party users are relying on this functionality, since it was only ever added to support telemetry reporting in the Firefox PDF Viewer, it should hopefully be fine to remove this fairly quickly.
These changes reduce the bundle size of the Firefox PDF Viewer by 4.5 kB in total.
When trying to find incomplete objects, i.e. those missing the "endobj"-string at the end, there's unfortunately a number of possible operators that we need to check for. Otherwise we could miss e.g. the "trailer" at the end of a corrupt PDF document, which is why the referenced document didn't work.
Currently we do all searching on the "raw" bytes of the PDF document, for efficiency, however this doesn't really work when we need to check for *multiple* potential command-strings. To keep the complexity manageable we'll instead use regular expressions here, but we can at least avoid creating lots of substrings thanks to the `RegExp.lastIndex` property; which is well supported across browsers according to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastIndex#browser_compatibility
Note that this repeated regular expression usage could perhaps be slightly less efficient than the old code, however this method is only invoked for corrupt PDF documents.
Previously we'd abort all parsing if an Error was encountered, despite the fact that multiple `startXRefQueue`-entries may be available and that continued parsing could thus eventually be able to find usable data.
Note that in the referenced PDF document the `startxref`-operator, at the end of the file, points to a position in the middle of an arbitrary `stream` which is why things break.
This is done to support upcoming viewer-changes, and in order to prevent third-party users from outright breaking things we'll simply ignore too large values.
It's a follow-up of #14950: some format actions are ran when the document is open
but we must be sure we've everything ready for that, hence we have to run some
named actions before runnig the global format.
In playing with the form, I discovered that the blur event wasn't triggered when
JS called `setFocus` (because in such a case the mouse was never down). So I removed
the mouseState thing to just use the correct commitKey when blur is triggered by a
TAB key.
In order to move the annotations in the DOM to have something which corresponds
to the visual order, we need to have their dimensions/positions which means that
the parent must have some dimensions.
An annotation editor layer can be destroyed when it's invisible, hence some
annotations can have a null parent but when printing/saving or when changing
font size, color, ... of all added annotations (when selected with ctrl+a) we
still need to have some parent properties especially the page dimensions, global
scale factor and global rotation angle.
This patch aims to remove all the references to the parent in the editor instances
except in some cases where an editor should obviously have one.
It fixes#15780.
The main issue is due to the fact that an editor's parent can be null when
we want to serialize it and that lead to an exception which break all the
saving/printing process.
So this incomplete patch fixes only the saving/printing issue but not the
underlying problem (i.e. having a null parent) and doesn't bring that much
complexity, so it should help to uplift it the next Firefox release.
Rather than handling these parameters separately, which is a left-over from back when streaming of textContent was originally added, we can simply pass either data directly to the `TextLayer` and let it handle things accordingly.
Also, improves a few JSDoc comments and `typedef`-imports.
The idea is just to resuse what we got on the first draw.
Now, we only update the scaleX of the different spans and the other values
are dependant of --scale-factor.
Move some properties in the CSS in order to avoid any updates in JS.
This can't be a particularly common feature, since we've supported Optional Content for over two years and this is the very first TilingPattern-case we've seen.
The reason for the issue is that we use the generic `getFilenameFromUrl` helper function, which was originally intended for regular URLs.
For the filenames we're dealing with in FileAttachments, we really only want to strip the path when one exists[1].
---
[1] See [bug 1230933](https://bugzilla.mozilla.org/show_bug.cgi?id=1230933) for an example of such a case.
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.
Given that these functions are virtually identical, with the latter only adding a BOM, we can combine the two. Furthermore, since both functions were only used on the worker-thread, there's no reason to duplicate this functionality in both of the `pdf.js` and `pdf.worker.js` files.
Given that this PDF document is an interesting test-case for performance reasons, w.r.t. inline image caching, it probably can't hurt to add it to the test-suite to make it more readily available.
Considering the contents of that PDF document I'm not sure if we can include it directly in the repository, hence why a *linked* test-case was choosen here.
- For text fields
* when printing, we generate a fake font which contains some widths computed thanks to
an OffscreenCanvas and its method measureText.
In order to avoid to have to layout the glyphs ourselves, we just render all of them
in one call in the showText method in using the system sans-serif/monospace fonts.
* when saving, we continue to create the appearance streams if the fonts contain the char
but when a char is missing, we just set, in the AcroForm dict, the flag /NeedAppearances
to true and remove the appearance stream. This way, we let the different readers handle
the rendering of the strings.
- For FreeText annotations
* when printing, we use the same trick as for text fields.
* there is no need to save an appearance since Acrobat is able to infer one from the
Content entry.
*Please note:* This only fixes the "wrong letter" part of bug 1799927.
It appears that the simple `computeAdler32` function, used when caching inline images, generates hash collisions for some (very short) TypedArrays. In this case that leads to some of the "letters", which are actually inline images, being rendered incorrectly.
Rather than switching to another hashing algorithm, e.g. the `MurmurHash3_64` class, we simply cache using a stringified version of the inline image data as the cacheKey to prevent any future collisions. While this will (naturally) lead to slightly higher peak memory usage, it'll however be limited to the current `Parser`-instance which means that it's not persistent.
One small benefit of these changes is that we can avoid creating lots of `Stream`-instances for already cached inline images.
In order to support opening certain corrupt PDF documents, particularly hand-edited ones, this patch adds support for letting the `Catalog.getAllPageDicts` method fallback to returning an *empty* dictionary to replace (only) the first /Page of the document.
Given that the viewer cannot initialize/load without access to the first page, this will thus allow e.g. document-level scripting to run as expected. Note that by effectively replacing a corrupt or missing first /Page in this way[1], we'll now render nothing but a *blank* page for certain cases of broken/corrupt PDF documents which may look weird.
*Please note:* This functionality is controlled via the existing `stopAtErrors` option, that can be passed to `getDocument`, since it's easy to imagine use-cases where this sort of fallback behaviour isn't desirable.
---
[1] Currently we still require that a /Pages-dictionary is found though, however it *may* be possible to relax even that assumption if that becomes absolutely necessary in future corrupt documents.
Note that the "trailer"-case is already a fallback, since normally we're able to use the "xref"-operator even in corrupt documents. However, when a "trailer"-operator is found we still expect "startxref" to exist and be usable in order to advance the stream position. When that's not the case, as happens in the referenced issue, we use a simple fallback to find the first "obj" occurrence instead.
This *partially* fixes issue 15590, since without this patch we fail to find any objects at all during `XRef.indexObjects`. However, note that the PDF document is still corrupt and won't render since there's no actual /Pages-dictionary and the /Root-entry simply points to the /OpenAction-dictionary instead.
When a form isn't changed, we used the appearances we had in the file, but when
/NeedAppearances is true, all the appearances have to be regenerated whatever they're.