When parsing corrupt documents without any trailer-dictionary, fallback to the "top"-dictionary (issue 14269)

There's obviously no guarantee that this will work in general, if the document is sufficiently corrupt, but it should hopefully be better than just throwing `InvalidPDFException` as currently happens.

Please note that, as is often the case with corrupt documents, it's somewhat difficult to know if we're rendering the document "correctly" with this patch[1]. In this case even Adobe Reader cannot open the document, which is always a good sign that it's *really* corrupt, however we're at least able to render *something* with this patch.

---
[1] Whatever "correct" even means when dealing with corrupt PDF documents, where often times different PDF viewers won't agree completely.
This commit is contained in:
Jonas Jenwald 2021-11-13 13:06:21 +01:00
parent 28fb3975eb
commit afcc99a86d
3 changed files with 15 additions and 0 deletions

View File

@ -590,6 +590,10 @@ class XRef {
if (trailerDict) {
return trailerDict;
}
// No trailer dictionary found, taking the "top"-dictionary (if exists).
if (this.topDict) {
return this.topDict;
}
// nothing helps
throw new InvalidPDFException("Invalid PDF structure.");
}
@ -680,6 +684,8 @@ class XRef {
throw e;
}
info("(while reading XRef): " + e);
this.startXRefQueue.shift();
}
if (recoveryMode) {

View File

@ -0,0 +1 @@
https://github.com/mozilla/pdf.js/files/7529789/test.pdf

View File

@ -91,6 +91,14 @@
"rounds": 1,
"type": "eq"
},
{ "id": "issue14269",
"file": "pdfs/issue14269.pdf",
"md5": "f34abf77a418f54e13fbcd03b063432e",
"rounds": 1,
"link": true,
"lastPage": 1,
"type": "eq"
},
{ "id": "issue11549",
"file": "pdfs/issue11549_reduced.pdf",
"md5": "a1ea636f413e02e10dbdf379ab4a99ae",