pdf.js/test/unit
Jonas Jenwald d0c4bbd828 [api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*

Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).

Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.

Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
 - This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
 - For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
 - This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.

As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).

Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-27 21:57:35 +01:00
..
.eslintrc Enable the ESLint no-var rule globally 2021-03-13 16:12:53 +01:00
annotation_spec.js [api-minor] Render pushbuttons on their own canvas (bug 1737260) 2021-11-12 15:37:33 +01:00
annotation_storage_spec.js Annotations - Avoid empty value in text field when storage contains something for it (bug 1719148) 2021-09-18 15:08:22 +02:00
api_spec.js [api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303) 2021-11-27 21:57:35 +01:00
base_viewer_spec.js Use the new iterator in the PDFPageViewBuffer unit-tests 2021-11-15 14:06:17 +01:00
bidi_spec.js Tweak the Bidi-detection heuristics for very short RTL strings (issue 11656) 2021-11-03 20:31:57 +01:00
cff_parser_spec.js Fix typo in cff_parser_spec.js 2021-08-06 19:30:36 +09:00
clitests_helper.js [api-minor] Highlight search results correctly for normalized text (PR 9448) 2021-01-12 18:08:08 +01:00
clitests.json Add a couple of basic unit-tests for PDFPageViewBuffer 2021-11-05 19:43:20 +01:00
cmap_spec.js Convert done callbacks to async/await in test/unit/cmap_spec.js 2021-04-14 22:24:28 +02:00
colorspace_spec.js Remove obsolete done callbacks from the unit tests 2021-04-10 20:29:39 +02:00
core_utils_spec.js XFA -- Load fonts permanently from the pdf 2021-04-15 17:57:42 +02:00
crypto_spec.js Correctly pad strings when saving an encrypted pdf (bug 1726789) 2021-09-02 10:37:21 +02:00
custom_spec.js Account for formatting changes in Prettier version 2.3.0 2021-05-16 11:44:05 +02:00
default_appearance_spec.js [api-minor] Change the format of the fontName-property, in defaultAppearanceData, on Annotation-instances (PR 12831 follow-up) 2021-04-01 16:47:30 +02:00
display_svg_spec.js Convert done callbacks to async/await in test/unit/display_svg_spec.js 2021-04-14 21:59:13 +02:00
display_utils_spec.js Remove obsolete done callbacks from the unit tests 2021-04-10 20:29:39 +02:00
document_spec.js A couple of small scripting/XFA-related tweaks in the worker-code 2021-04-17 10:34:22 +02:00
encodings_spec.js Update Prettier to version 2.0 2020-04-14 12:28:14 +02:00
evaluator_spec.js Support corrupt documents with *empty* Name-entries (issue 13610) 2021-06-22 16:55:44 +02:00
fetch_stream_spec.js Convert done callbacks to async/await in test/unit/fetch_stream_spec.js 2021-04-13 21:51:27 +02:00
function_spec.js Convert var to const/let in the test/unit folder 2020-10-25 15:40:51 +01:00
jasmine-boot.js Add a couple of basic unit-tests for PDFPageViewBuffer 2021-11-05 19:43:20 +01:00
message_handler_spec.js Convert done callbacks to async/await in test/unit/message_handler_spec.js 2021-04-14 21:59:13 +02:00
metadata_spec.js Move the Metadata parsing to the worker-thread 2021-02-17 13:12:01 +01:00
murmurhash3_spec.js Add a MurmurHash3_64.update unit-test for TypedArrays which share the same underlying ArrayBuffer (PR 12534 follow-up) 2020-10-28 12:42:04 +01:00
network_spec.js Convert done callbacks to async/await in test/unit/network_spec.js 2021-04-13 21:51:26 +02:00
network_utils_spec.js Update Prettier to version 2.0 2020-04-14 12:28:14 +02:00
node_stream_spec.js Convert done callbacks to async/await in test/unit/node_stream_spec.js 2021-04-13 21:51:26 +02:00
parser_spec.js Let Lexer.getObj return a dummy-Cmd for commands that start with a non-visible ASCII character (issue 13999) 2021-09-11 19:54:13 +02:00
pdf_find_controller_spec.js Merge pull request #13424 from calixteman/chunks2 2021-10-18 06:14:15 -07:00
pdf_find_utils_spec.js Run gulp lint --fix, to account for changes in Prettier version 2.1.x 2020-09-06 12:23:59 +02:00
pdf_history_spec.js Update Prettier to version 2.0 2020-04-14 12:28:14 +02:00
primitives_spec.js Always prefer abbreviated keys, over full ones, when doing any dictionary lookups (issue 14256) 2021-11-10 11:56:18 +01:00
scripting_spec.js Remove obsolete done callbacks from the unit tests 2021-04-10 20:29:39 +02:00
stream_spec.js Move the PredictorStream from src/core/stream.js and into its own file 2021-04-28 10:16:51 +02:00
struct_tree_spec.js Include the /Lang-property, when it exists, in the StructTree-data (issue 14261) 2021-11-14 12:37:41 +01:00
test_utils.js [api-minor] Replace PDFDocumentProxy.getStats with a synchronous PDFDocumentProxy.stats getter 2021-11-20 12:20:55 +01:00
testreporter.js Replace a few new Date().getTime() instances with Date.now() 2021-02-11 23:00:42 +01:00
type1_parser_spec.js Move some constants and helper functions from src/core/fonts.js and into their own file 2021-05-02 21:00:29 +02:00
ui_utils_spec.js Remove the moveToEndOfArray helper function, since it's unused 2021-11-06 10:19:17 +01:00
unicode_spec.js Remove obsolete done callbacks from the unit tests 2021-04-10 20:29:39 +02:00
unit_test.html Import the TestReporter, in the unit and font tests 2020-10-27 11:30:15 +01:00
util_spec.js Remove non-displayable chars from outline title (#14267) 2021-11-13 16:56:08 +01:00
writer_spec.js Don't save anything in XFA entry if no XFA! (bug 1732344) 2021-09-23 19:51:23 +02:00
xfa_formcalc_spec.js XFA - Add a lexer/parser for FormCalc language (#12936) 2021-02-17 20:28:06 +01:00
xfa_parser_spec.js Support rich content in markup annotation 2021-10-31 13:44:51 +01:00
xfa_serialize_data_spec.js XFA - Encode tag names in UTF-8 when saving (fix #14249) 2021-11-07 21:41:37 +01:00
xfa_tohtml_spec.js XFA - Get each page asynchronously in order to avoid blocking the event loop (#14014) 2021-11-06 13:25:03 +01:00
xml_spec.js Handle PI with no value in xml parser 2021-05-18 10:22:18 +02:00