66 Commits

Author SHA1 Message Date
Calixte Denizet
0c38c6e103 Improve performance of optional content parsing 2023-10-25 17:50:53 +02:00
Jonas Jenwald
bf9c33e60f Add support for "GoToE" actions with destinations (issue 17056)
This shouldn't be very common in practice, since "GoToE" actions themselves seem quite uncommon; see PR 15537.
2023-10-04 11:14:23 +02:00
Calixte Denizet
a8573d4e1b [Editor] Add the ability to create/update the structure tree when saving a pdf containing newly added annotations (bug 1845087)
When there is no tree, the tags for the new annotions are just put under the root element.
When there is a tree, we insert the new tags at the right place in using the value
of structTreeParentId (added in PR #16916).
2023-09-16 18:34:58 +02:00
Jonas Jenwald
b5b061cdb6 Slightly re-factor the parameter handling in Catalog.parseDestDictionary
While it makes sense to check that the `destDict` parameter is indeed a Dictionary, since that data comes from the PDF document itself, the `resultObj` parameter is an internal PDF.js implementation detail that should always be correct (or tests will fail).
2023-09-08 13:27:31 +02:00
Jonas Jenwald
df9cce39c0 Slightly reduce asynchronicity when parsing Annotations
Over time the amount of "document level" data potentially needed during parsing of Annotations have increased a fair bit, which means that we currently need to ensure that a bunch of data is available for each individual Annotation.
Given that this data is "constant" for a PDF document we can instead create (and cache) it lazily, only when needed, *before* starting to parse the Annotations on a page. This way the parsing of individual Annotations should become slightly less asynchronous, which really cannot hurt.

An additional benefit of these changes is that we can reduce the number of parameters that need to be explicitly passed around in the annotation-code, which helps overall readability in my opinion.

One potential drawback of these changes is that the `AnnotationFactory.create` method no longer handles "everything" on its own, however given how few call-sites there are I don't think that's too much of a problem.
2023-09-08 13:27:27 +02:00
Jonas Jenwald
64e8557fb5 [api-minor] Deprecate the PDFDocumentProxy.getJavaScript method
This method is very old, however with the exception of the auto-print hack (when scripting is disabled) in the viewer it's never actually been used.

Most likely the idea with `PDFDocumentProxy.getJavaScript` was that it'd be useful if scripting support was added, however it turned out that it was a bit too simplistic and instead a number of new methods were added for the scripting use-cases.
2023-08-01 09:02:05 +02:00
Jonas Jenwald
1b4a7c5965 Introduce more optional chaining in the src/core/ folder
After PR 12563 we're now free to use optional chaining in the worker-thread as well. (This patch also fixes one previously "missed" case in the `web/` folder.)

For the MOZCENTRAL build-target this patch reduces the total bundle-size by `1.6` kilobytes.
2023-05-15 12:38:28 +02:00
Calixte Denizet
cfb908c999 Add a cache to avoid to load several times a local font
On my computer, it takes few tenths of a second to load a local font.
Since a font can be used several times in a document, the cache will
improve performances.
2023-05-10 20:01:21 +02:00
Jonas Jenwald
d950b91c4e Introduce some logical assignment in the src/core/ folder 2023-04-29 13:49:37 +02:00
Jonas Jenwald
5f64621d46 Use String.prototype.replaceAll() where appropriate
This fairly new method allows replacing *multiple* occurrences within a string without having to use regular expressions.

Please refer to:
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll#browser_compatibility
2023-03-22 15:31:10 +01:00
Jonas Jenwald
23930a249e [api-minor] Let Catalog.getAllPageDicts return an *empty* dictionary when loading the first /Page fails (issue 15590)
In order to support opening certain corrupt PDF documents, particularly hand-edited ones, this patch adds support for letting the `Catalog.getAllPageDicts` method fallback to returning an *empty* dictionary to replace (only) the first /Page of the document.
Given that the viewer cannot initialize/load without access to the first page, this will thus allow e.g. document-level scripting to run as expected. Note that by effectively replacing a corrupt or missing first /Page in this way[1], we'll now render nothing but a *blank* page for certain cases of broken/corrupt PDF documents which may look weird.

*Please note:* This functionality is controlled via the existing `stopAtErrors` option, that can be passed to `getDocument`, since it's easy to imagine use-cases where this sort of fallback behaviour isn't desirable.

---
[1] Currently we still require that a /Pages-dictionary is found though, however it *may* be possible to relax even that assumption if that becomes absolutely necessary in future corrupt documents.
2022-11-03 12:51:48 +01:00
Jonas Jenwald
d470010293 Re-factor the PDF version parsing in the worker-thread
Part of this is very old code, and back when support for parsing the catalog-version was added things became less clear (in my opinion).
Hence this patch tries to improve things, by e.g. validating the header- and catalog-version separately.
2022-10-15 12:06:39 +02:00
Jonas Jenwald
ce66fefbff [api-minor] Add partial support for the "GoToE" action (issue 8844)
*Please note:* The referenced issue is the only mention that I can find, in either GitHub or Bugzilla, of "GoToE" actions.
Hence why I've purposely settled for a very simple, and partial, "GoToE" implementation to avoid complicating things initially.[1] In particular, this patch only supports "GoToE" actions that references the /EmbeddedFiles-dict in the PDF document.

See https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2048909

---
[1] Usually I always prefer having *real-world* test-cases to work with, whenever I'm implementing new features.
2022-10-06 10:33:07 +02:00
Jonas Jenwald
60f6272ed9 Use more for...of loops in the code-base
Most, if not all, of this code is old enough to predate the general availability of `for...of` iteration.
2022-10-03 13:08:38 +02:00
Jonas Jenwald
cc4baa2fe9 [api-minor] Add basic support for the SetOCGState action (issue 15372)
Note that this patch implements the `SetOCGState`-handling in `PDFLinkService`, rather than as a new method in `OptionalContentConfig`[1], since this action is nothing but a series of `setVisibility`-calls and that it seems quite uncommon in real-world PDF documents.

The new functionality also required some tweaks in the `PDFLayerViewer`, to ensure that the `layersView` in the sidebar is updated correctly when the optional-content visibility changes from "outside" of `PDFLayerViewer`.

---
[1] We can obviously move this code into `OptionalContentConfig` instead, if deemed necessary, but for an initial implementation I figured that doing it this way might be acceptable.
2022-09-01 17:34:24 +02:00
Jonas Jenwald
216b86a082 [api-minor] Support Named-actions in the outline (issue 15367)
Apparently this is implemented in e.g. Adobe Reader, and the specification does support it, however it cannot be commonly used in real-world PDF documents since it took over ten years for this feature to be requested.
2022-08-30 18:47:45 +02:00
Calixte Denizet
5f0c95e70e [JS] Embedded JS scripts can have some null chars 2022-07-15 16:05:25 +02:00
Jonas Jenwald
9ac4536693 Enable the unicorn/prefer-at ESLint plugin rule (PR 15008 follow-up)
Please find additional information here:
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/at
 - https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-at.md
2022-06-09 21:21:19 +02:00
Jonas Jenwald
df5a4fd0a7 Support encoded dest-strings in /GoTo destination dictionaries (issue 14864)
Interestingly enough this appears to be the very first case of *encoded* dest-strings, in /GoTo destination dictionaries, that we've actually come across. What's really fascinating is that it's less than a week after issue 14847, given that these issues are *somewhat* similar.
2022-05-02 10:14:32 +02:00
Jonas Jenwald
71370d012b Support destinations in NameTrees with encoded keys (issue 14847)
Initially I considered updating the `NameOrNumberTree`-implementation to handle encoded keys, however that quickly became somewhat messy (especially in the `NameOrNumberTree.get`-method) since only NameTrees using string-keys.
Hence the easiest solution, as far as I'm concerned, was thus to just update the `Catalog.destinations`-getter instead. Please note that in the referenced PDF document the `Catalog.destination`-method will thus fallback to fetch all destinations, which should be fine since this is the very first case of encoded keys that we've seen.

Also changes the `NameOrNumberTree.getAll`-method to prevent a possible run-time error, although we've so far not seen such a case, for any non-Array Kids-entries found in a NameTree/NumberTree.

Finally, to improve overall consistency and to hopefully prevent future bugs, the patch also updates a couple of other `NameTree` call-sites to correctly handle encoded keys. (Note that the `Catalog.attachments`-getter was already doing this.)
2022-04-27 11:19:55 +02:00
Jonas Jenwald
5bc7339c1b Add support for the /Catalog Base-URI when resolving URLs (issue 14802)
As far as I can tell, this is actually the very first time that we've seen a PDF document with a Base-URI specified in the /Catalog; please refer to the specification:
https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2097122

To simplify the overall implementation, this new parameter is accessed via the existing `BasePdfManager.docBaseUrl`-getter and will thus override any user-specified `docBaseUrl` API-parameter.
2022-04-19 17:14:52 +02:00
Jonas Jenwald
a919959d83 Slightly simplify the Catalog._readMarkInfo method
We don't need to first check if the Dictionary contains the key, since trying to get a non-existent key simply returns `undefined` and we're already ensuring that the value is a boolean.
Furthermore, we shouldn't need to worry about the `Object.prototype` containing enumerable properties since the checks (in `src/core/worker.js`) done for `Array.prototype` *indirectly* also cover `Object`s. (Keep in mind that an `Array` is just a special kind of `Object` in JavaScript.)
2022-04-05 16:37:51 +02:00
Jonas Jenwald
addb4cb12b Use String.prototype.repeat() in a couple of spots
Rather than using a temporary Array to manually create repeated strings, we can use `String.prototype.repeat()` instead.
The reason that we didn't use this from the start is most likely because some browsers, notably IE, didn't support this; note https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/repeat#browser_compatibility
2022-03-30 15:42:40 +02:00
Jonas Jenwald
c0736647f9 Add general iteration support in the RefSet and RefSetCache classes
This patch removes the existing `forEach` methods, in favor of making the classes properly iterable instead. Given that the classes are using a `Set` respectively a `Map` internally, implementing this is very easy/efficient and allows us to simplify some existing code.
2022-03-18 14:27:34 +01:00
Jonas Jenwald
939e6f0c4c Fix a couple of small typos in JSDoc typedef comments
While this doesn't affect the official API documentation, these cases should nonetheless be fixed.
2022-03-04 12:11:52 +01:00
Jonas Jenwald
99cd24ce3e Remove the isString helper function
The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls. Note that in the `src/`-folder we already had more `typeof`-cases than `isString`-calls.
2022-02-26 16:33:41 +01:00
Jonas Jenwald
3704283f5b Remove the isBool helper function
The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls.
2022-02-23 13:31:03 +01:00
Jonas Jenwald
82f1ee1755 Re-factor the Catalog.viewerPreferences method
This removes the `ViewerPreferencesValidators` structure, and thus (slightly) simplifies the code overall. With these changes we only have to iterate through, and validate, the actually available Dictionary entries.
2022-02-23 13:25:56 +01:00
Jonas Jenwald
05edd91bdb Remove the isNum helper function
The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls. Note that in the `src/`-folder we already had more `typeof`-cases than `isNum`-calls.

These changes were *mostly* done using regular expression search-and-replace, with two exceptions:
 - In `Font._charToGlyph` we no longer unconditionally update the `width`, since that seems completely unnecessary.
 - In `PDFDocument.documentInfo`, when parsing custom entries, we now do the `typeof`-check once.
2022-02-22 11:55:34 +01:00
Jonas Jenwald
b282814e38 Prefer instanceof Name rather than calling isName() with one argument
Unless you actually need to check that something is both a `Name` and also of the *correct* type, using `instanceof Name` directly should be a tiny bit more efficient since it avoids one function call and an unnecessary `undefined` check.

This patch uses ESLint to enforce this, since we obviously still want to keep the `isName` helper function for where it makes sense.
2022-02-21 12:45:00 +01:00
Jonas Jenwald
4df82ad31e Prefer instanceof Dict rather than calling isDict() with one argument
Unless you actually need to check that something is both a `Dict` and also of the *correct* type, using `instanceof Dict` directly should be a tiny bit more efficient since it avoids one function call and an unnecessary `undefined` check.

This patch uses ESLint to enforce this, since we obviously still want to keep the `isDict` helper function for where it makes sense.
2022-02-21 12:44:56 +01:00
Jonas Jenwald
2cb2f633ac Remove the isRef helper function
This helper function is not really needed, since it's just a wrapper around a simple `instanceof` check, and it only adds unnecessary indirection in the code.
2022-02-19 15:33:42 +01:00
Jonas Jenwald
1a31855977 Remove the isStream helper function
At this point all the various Stream-classes extends an abstract base-class, hence this helper function is no longer necessary and only adds unnecessary indirection in the code.
2022-02-17 13:51:36 +01:00
Jonas Jenwald
8836593b9e Add a (global) cache to the getCharUnicodeCategory function
Given that the regular expression has already become more complex (after the initial patch adding it), it seems to me that it probably cannot hurt to add a global cache to reduce unnecessary re-parsing.
Obviously the `Glyph`-instances are being cached *per* font, however in most documents multiple fonts are being used and in practice there's very often a fair amount of overlap between the /ToUnicode-data in different fonts[1].

Consider for example loading and rendering the entire `tracemonkey.pdf` document (from the test-suite), which isn't a particularily large document. In that case the `getCharUnicodeCategory` function is being called a total of `601` times, however there's only `106` *unique* unicode-chars being checked.

*Please note:* In practice I suppose that this won't have a *huge* effect on overall performance, however given the relative simplicity of this patch I figured that it'd not hurt to submit it for review.

---
[1] Consider e.g. how there's usually different fonts used for regular, bold, respectively italic text.
2022-01-25 09:59:34 +01:00
Jonas Jenwald
b0e774d9c5 Convert Catalog.getAllPageDicts to an async method
The patch in PR 14335 *essentially* re-introduced the old code from before PR 3848, however looking at this code a bit closer it should be possible to simplify it by making the method asynchronous.

While this method is currently only used as a *fallback* in corrupt documents, the way that `MissingDataException`s are handled is less than ideal. Note that if a `MissingDataException` is thrown, we're forced to re-parse the *entire* /Pages tree[1].
With this method now being asynchronous, we're able to handle fetching of References in a *much* easier/nicer way than before without having to throw `MissingDataException`s and re-parse anything.
These changes also let us simplify the call-site slightly, by calling the method *directly* instead of using the `PDFManager`-instance (since again it will no longer throw `MissingDataException`s).

Furthermore, this patch contains the following other changes:
 - Reduce unnecessary duplication in the various `catch` handlers throughout the method, by simply moving the `XRefEntryException` handling into the `addPageError` helper function instead.
 - Move the "circular references"-check to occur slightly earlier, since there's obviously no point in asynchronously fetching data just to then throw an Error *immediately* afterwards.

---
[1] Imagine e.g. a thousand page document, where there's a `MissingDataException` thrown when fetching/parsing page 900.
2021-12-31 22:03:10 +01:00
Jonas Jenwald
1491459dea Improve caching for the Catalog.getPageIndex method (PR 13319 follow-up)
This method is now being used a lot more, compared to when it's added, since it's now used together with scripting as part of the `PDFDocument.fieldObjects` parsing (called during viewer initialization).
For /Page Dictionaries that we've already parsed, the `pageIndex` corresponding to a particular Reference is already known and we're thus able to skip *all* parsing in the `Catalog.getPageIndex` method for those cases.
2021-12-29 20:29:14 +01:00
Jonas Jenwald
b513c64d9d [api-minor] Convert Catalog.getPageDict to an asynchronous method
Besides converting `Catalog.getPageDict` to an `async` method, thus simplifying the code, this patch also allows us to pro-actively fix a existing issue.
Note how we're looking up References in such a way that `MissingDataException`s won't cause trouble, however it's *technically possible* that the entries (i.e. /Count, /Kids, and /Type) in a /Pages Dictionary could actually be indirect objects as well. In the existing code this could lead to *some*, or even all, pages failing to load/render as intended.
In practice that doesn't *appear* to happen in real-world PDF documents, but given all the weird things that PDF software do I'd prefer to fix this pro-actively (rather than waiting for a bug report).
With `Catalog.getPageDict` being `async` this is now really simple to address, however I didn't want to introduce a bunch more *unconditional* asynchronicity in this method if it could be avoided (since that could slow things down). Hence we'll *synchronously* lookup the *raw* data in a /Pages Dictionary, and only fallback to asynchronous data lookup when a Reference was encountered.

In addition to the above, this patch also makes the following notable changes:
 - Let `Catalog.getPageDict` *consistently* reject with the actual error, regardless of what data we're fetching. Previously we'd "swallow" the actual errors except when looking up Dictionary entries, which is inconsistent and thus seem unfortunate. As can be seen from the updated unit-tests this change is API-observable, hence why the patch is tagged `[api-minor]`.

 - Improve the consistency of the Dictionary /Type-checks in both the `Catalog.getPageDict` and `Catalog.getAllPageDicts` methods.
   In `Catalog.getPageDict` there's a fallback code-path where we're *incorrectly* checking the /Page Dictionary for a /Contents-entry, which is wrong since a /Page Dictionary doesn't need to have a /Contents-entry in order to be valid.
   For consistency the `Catalog.getAllPageDicts` method is also updated to handle errors in the /Type-lookup correctly.

 - Reduce the `PagesCountLimit.PAUSE_EAGER_PAGE_INIT` viewer constant, to further improve loading/rendering performance of the *second* page during initialization of very long documents; PR 14359 follow-up.
2021-12-25 15:22:48 +01:00
Jonas Jenwald
fa51fd9428 Slightly reduce asynchronicity in the Catalog.getPageDict method (PR 14338 follow-up)
After the changes in PR 14338, specifically in the `XRef.parse`-method, the /Pages-entry will now always have been fetched/validated when the `Catalog`-instance is created.
Hence we can directly access the /Pages-entry in `Catalog.getPageDict` and thus avoid *one* asynchronous data-lookup per page in the document. (In practice this is unlikely to show up in e.g. benchmarks, but it really cannot hurt.)

Finally, make sure that the `getPageDict`/`getAllPageDicts`-methods track the /Pages-tree reference correctly to prevent circular references in corrupt documents.
2021-12-13 21:18:06 +01:00
Tim van der Meij
a6dd39b645
Merge pull request #14358 from Snuffleupagus/checkLastPage-improvements
Improve `PDFDocument.checkLastPage`/`Catalog.getAllPageDicts` for documents with corrupt XRef tables (PR 14311, 14335 follow-up)
2021-12-11 13:07:54 +01:00
Jonas Jenwald
70ac6b1694 Update Catalog.getAllPageDicts to always propagate the actual Errors (PR 14335 follow-up)
Rather than "swallowing" the actual Errors, when data fetching fails, ensure that they're always being propagated as intended to the call-site instead.
Note that we purposely handle `XRefEntryException` specially, to make it possible to fallback to indexing all XRef objects.
2021-12-10 15:22:36 +01:00
Jonas Jenwald
8a05db230e Further improve caching in Catalog.getPageDict, for disableAutoFetch mode (PR 8207 follow-up)
PR 8207 added caching to improve the performance of `Catalog.getPageDict`, by not having to repeatedly fetch the same data and also reducing the asynchronicity of that method.
However, because of *another* oversight on my part, we're only caching /Page references once we've found the correct page. As long as all pages are loaded *in order* this doesn't really matter (happens by default in the viewer), but when `disableAutoFetch` is used the pages may be fetched in a more random order (this patch reduces the asynchronicity of `Catalog.getPageDict` slightly in that case).
2021-12-09 12:54:49 +01:00
Jonas Jenwald
5f295ba280 Improve caching in Catalog.getPageDict (PR 8207 follow-up)
PR 8207 added caching to improve the performance of `Catalog.getPageDict`, by not having to repeatedly fetch the same data and also reducing the asynchronicity of that method.
However, because of annoying off-by-one errors[1] the caching became less efficient than it could/should be.[2] Note here that the /Pages-tree is zero-indexed, and that e.g. `pageIndex = 5` thus correspond to the *sixth* page of the document.

---
[1] In particular the `currentPageIndex + count < pageIndex` part.

[2] For example, even when loading a relatively small/simple document such as `tracemonkey.pdf` in the viewer, the number of `xref.fetchAsync(currentNode)` calls are reduced from `56` to `44` with this patch.
2021-12-06 11:49:31 +01:00
Jonas Jenwald
40291d1943 Handle errors when fetching the raw /Metadata (issue 14305)
Currently the `Catalog.metadata` getter only handles errors during parsing, however in a *corrupt* PDF document fetching of the raw /Metadata can obviously fail as well.
Without this patch the `PDFDocumentProxy.getMetadata` method, in the API, can thus fail which it *never* should and this will cause the viewer to not initialize all state as expected.

Fixes one of the documents in issue 14305.
2021-12-04 09:41:42 +01:00
Jonas Jenwald
1fac6371d3 [Regression] Eagerly fetch/parse the entire /Pages-tree in corrupt documents (issue 14303, PR 14311 follow-up)
*Please note:* This is similar to the method that existed prior to PR 3848, but the new method will *only* be used as a fallback when parsing of corrupt PDF documents.

The implementation in PR 14311 unfortunately turned out to be *way* too simplistic, as evident by the recently added test-files in issue 14303, since it may *cause* infinite loops in `PDFDocument.checkLastPage` for some corrupt PDF documents.[1]
To avoid this, the easiest solution that I could come up with was to fallback to eagerly parsing the *entire* /Pages-tree when the /Count-entry validation fails during document initialization.

Fixes *at least* two of the issues listed in issue 14303, namely the `poppler-395-0.pdf...` and `GHOSTSCRIPT-698804-1.pdf...` documents.

---
[1] The whole point of PR 14311 was obviously to *get rid of* infinte loops during document initialization, not to introduce any more of those.
2021-12-02 14:31:04 +01:00
Jonas Jenwald
e045cd4520 Remove the unused skipCount parameter from Catalog.getPageDict (PR 14311 follow-up)
This was added in PR 14311, but given that I completely missed to update the `PDFDocument.getPage` signature accordingly it's completely unused.
Given that things work just as fine as-is, let's simply remove that optional parameter for now; sorry about the churn here!
2021-12-02 11:51:38 +01:00
Jonas Jenwald
63be23f05b Handle errors correctly when data lookup fails during /Pages-tree parsing (issue 14303)
This only applies to severely corrupt documents, where it's possible that the `Parser` throws when we try to access e.g. a /Kids-entry in the /Pages-tree.

Fixes two of the issues listed in issue 14303, namely the `poppler-742-0.pdf...` and `poppler-937-0.pdf...` documents.
2021-12-02 10:54:40 +01:00
Jonas Jenwald
d0c4bbd828 [api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*

Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).

Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.

Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
 - This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
 - For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
 - This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.

As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).

Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-27 21:57:35 +01:00
Jonas Jenwald
00720d059a [api-minor] Include the /Lang-property in the documentInfo, and use it in the viewer (issue 14110)
*Please note:* This is a tentative patch, since I don't have the necessary a11y-software to actually test it.

To avoid having to add a new API-method just for a single string, I figured that adding the new property to the existing `documentInfo`-data (accessed via `PDFDocumentProxy.getMetadata` in the API) will hopefully be deemed acceptable.
2021-10-16 14:27:47 +02:00
Jonas Jenwald
cd94a44ca1 Remove some duplication in *simple* shadowed getters in src/core/-code
In these cases there's no good reason, in my opinion, to duplicate the `shadow`-lines since that unnecessarily increases the risk of simple typos (see the previous patch).
2021-10-16 12:56:17 +02:00
Calixte Denizet
aecbd7cd89 AcroForm: Add support for ResetForm action
- it aims to fix #12721.
  - Thanks to PR #14023, we've now the fieldObjects in the annotation layer so we can easily map fields names on their id if needed.
  - Reset values in the storage, in the JS sandbox and in the visible html elements.
2021-09-30 22:02:33 +02:00