pdf.js/test/pdfs/.gitignore

515 lines
9.3 KiB
Plaintext
Raw Normal View History

*.pdf
2013-05-31 06:54:49 +09:00
*.error
!boundingBox_invalid.pdf
!tracemonkey.pdf
!TrueType_without_cmap.pdf
!franz.pdf
!franz_2.pdf
!fraction-highlight.pdf
!german-umlaut-r.pdf
!issue13269.pdf
!xref_command_missing.pdf
!issue1155r.pdf
!issue2017r.pdf
!bug1727053.pdf
!issue11913.pdf
!issue2391-1.pdf
!issue2391-2.pdf
!issue14046.pdf
!issue7891_bc1.pdf
!issue3214.pdf
!issue4665.pdf
!issue4684.pdf
!issue8092.pdf
!issue5256.pdf
2015-03-07 19:39:10 +09:00
!issue5801.pdf
!issue5946.pdf
!issue5972.pdf
!issue5874.pdf
!issue5808.pdf
!issue6179_reduced.pdf
!issue6204.pdf
!issue6342.pdf
2017-04-11 03:58:02 +09:00
!issue6652.pdf
!issue6782.pdf
!issue6901.pdf
!issue6961.pdf
!issue6962.pdf
!issue7020.pdf
!issue7101.pdf
!issue7115.pdf
!issue7180.pdf
!issue7769.pdf
!issue7200.pdf
!issue7229.pdf
2016-06-13 21:22:15 +09:00
!issue7403.pdf
!issue7406.pdf
!issue7426.pdf
!issue7439.pdf
!issue7847_radial.pdf
!issue7446.pdf
!issue7492.pdf
!issue7544.pdf
!issue7507.pdf
!issue6931_reduced.pdf
!doc_actions.pdf
!issue7580.pdf
!issue7598.pdf
!issue12750.pdf
!issue7665.pdf
!issue7696.pdf
!issue7835.pdf
!issue11922_reduced.pdf
!issue7855.pdf
!issue11144_reduced.pdf
!issue7872.pdf
!issue7901.pdf
!issue8061.pdf
!bug1721218_reduced.pdf
!issue8088.pdf
!issue8125.pdf
!issue8229.pdf
!issue8276_reduced.pdf
!issue8372.pdf
!issue9713.pdf
!xfa_filled_imm1344e.pdf
!issue8424.pdf
!issue8480.pdf
!bug1650302_reduced.pdf
!issue8570.pdf
!issue8697.pdf
!issue8702.pdf
!structure_simple.pdf
!issue12823.pdf
!issue8707.pdf
!issue8798r.pdf
!issue8823.pdf
!issue9084.pdf
!issue12963.pdf
!issue9105_reduced.pdf
!issue9252.pdf
!issue9262_reduced.pdf
!issue9291.pdf
!issue9418.pdf
!issue9458.pdf
!issue9655_reduced.pdf
!issue9915_reduced.pdf
!bug854315.pdf
!issue9940.pdf
!issue10388_reduced.pdf
!issue10438_reduced.pdf
!issue10529.pdf
!issue10542_reduced.pdf
!issue10665_reduced.pdf
!issue11016_reduced.pdf
!issue11045.pdf
!bug1057544.pdf
!issue11150_reduced.pdf
!issue6127.pdf
!issue7891_bc0.pdf
!issue11242_reduced.pdf
!issue11279.pdf
!issue11362.pdf
!issue13325_reduced.pdf
!issue11578_reduced.pdf
!issue11651.pdf
Attempt to cache repeated images at the document, rather than the page, level (issue 11878) Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1] Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2] However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages. In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data). While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour. Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3] *Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator. --- [1] There's e.g. PDF documents that use the same image as background on all pages. [2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer. [3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-18 21:17:56 +09:00
!issue11878.pdf
!issue13916.pdf
2021-09-17 18:29:58 +09:00
!issue14023.pdf
!issue14438.pdf
!bad-PageLabels.pdf
!decodeACSuccessive.pdf
!issue13003.pdf
2013-02-07 08:19:29 +09:00
!filled-background.pdf
!ArabicCIDTrueType.pdf
!ThuluthFeatures.pdf
!arial_unicode_ab_cidfont.pdf
!arial_unicode_en_cidfont.pdf
!asciihexdecode.pdf
!bug766086.pdf
!bug793632.pdf
!bug1020858.pdf
!prefilled_f1040.pdf
!bug1050040.pdf
!bug1200096.pdf
!bug1068432.pdf
!issue12295.pdf
!bug1146106.pdf
!issue13447.pdf
!bug1245391_reduced.pdf
!bug1252420.pdf
!bug1513120_reduced.pdf
!bug1538111.pdf
!bug1552113.pdf
!issue6132.pdf
!issue9949.pdf
!bug1308536.pdf
!bug1337429.pdf
!bug1606566.pdf
!issue5564_reduced.pdf
!canvas.pdf
!bug1132849.pdf
!issue6894.pdf
!issue5804.pdf
!issue11131_reduced.pdf
!Pages-tree-refs.pdf
!ShowText-ShadingPattern.pdf
!complex_ttf_font.pdf
!issue3694_reduced.pdf
!extgstate.pdf
!issue4706.pdf
!rotation.pdf
!simpletype3font.pdf
!sizes.pdf
!javauninstall-7r.pdf
!file_url_link.pdf
!multiple-filters-length-zero.pdf
!non-embedded-NuptialScript.pdf
!issue3205r.pdf
!issue3207r.pdf
!issue3263r.pdf
!issue3879r.pdf
!issue5686.pdf
!issue3928.pdf
!issue8565.pdf
2017-06-19 19:40:48 +09:00
!clippath.pdf
Map all glyphs to the private use area and duplicate the first glyph. There have been lots of problems with trying to map glyphs to their unicode values. It's more reliable to just use the private use areas so the browser's font renderer doesn't mess with the glyphs. Using the private use area for all glyphs did highlight other issues that this patch also had to fix: * small private use area - Previously, only the BMP private use area was used which can't map many glyphs. Now, the (much bigger) PUP 16 area can also be used. * glyph zero not shown - Browsers will not use the glyph from a font if it is glyph id = 0. This issue was less prevalent when we mapped to unicode values since the fallback font would be used. However, when using the private use area, the glyph would not be drawn at all. This is illustrated in one of the current test cases (issue #8234) where there's an "ä" glyph at position zero. The PDF looked like it rendered correctly, but it was actually not using the glyph from the font. To properly show the first glyph it is always duplicated and appended to the glyphs and the maps are adjusted. * supplementary characters - The private use area PUP 16 is 4 bytes, so String.fromCodePoint must be used where we previously used String.fromCharCode. This is actually an issue that should have been fixed regardless of this patch. * charset - Freetype fails to load fonts when the charset size doesn't match number of glyphs in the font. We now write out a fake charset with the correct length. This also brought up the issue that glyphs with seac/endchar should only ever write a standard charset, but we now write a custom one. To get around this the seac analysis is permanently enabled so those glyphs are instead always drawn as two glyphs.
2018-01-05 07:43:07 +09:00
!issue8795_reduced.pdf
!bug1755507.pdf
!close-path-bug.pdf
2015-11-04 00:03:08 +09:00
!issue6019.pdf
!issue6621.pdf
!issue6286.pdf
!issue13107_reduced.pdf
!issue1055r.pdf
!issue11713.pdf
!issue1293r.pdf
!issue11931.pdf
!issue1655r.pdf
!issue6541.pdf
!issue10640.pdf
!issue2948.pdf
!issue6231_1.pdf
!issue10402.pdf
!issue7074_reduced.pdf
2015-09-04 05:29:12 +09:00
!issue6413.pdf
2014-04-17 21:52:33 +09:00
!issue4630.pdf
!issue4909.pdf
!scorecard_reduced.pdf
!issue5084.pdf
!issue8960_reduced.pdf
!issue5202.pdf
!images_1bit_grayscale.pdf
2014-09-09 22:29:31 +09:00
!issue5280.pdf
!issue12399_reduced.pdf
!annotation-ink-without-appearance.pdf
!issue5677.pdf
!issue5954.pdf
!issue6612.pdf
2011-10-29 06:11:14 +09:00
!alphatrans.pdf
2017-11-29 02:40:22 +09:00
!pattern_text_embedded_font.pdf
!devicen.pdf
2011-11-10 02:39:55 +09:00
!cmykjpeg.pdf
!issue840.pdf
!160F-2019.pdf
!issue4402_reduced.pdf
!issue845r.pdf
!issue3405r.pdf
!issue14130.pdf
!issue7339_reduced.pdf
2013-11-03 07:07:13 +09:00
!issue3438.pdf
!issue11403_reduced.pdf
!ContentStreamNoCycleType3insideType3.pdf
!ContentStreamCycleType3insideType3.pdf
!issue2074.pdf
!scan-bad.pdf
!issue13561_reduced.pdf
2014-10-13 05:36:50 +09:00
!bug847420.pdf
2013-10-31 23:10:08 +09:00
!bug860632.pdf
!bug894572.pdf
!bug911034.pdf
!bug1108301.pdf
!issue10301.pdf
!bug1157493.pdf
!issue4260_reduced.pdf
!bug1250079.pdf
!bug1473809.pdf
!issue12120_reduced.pdf
2012-04-25 08:53:11 +09:00
!pdfjsbad1586.pdf
!standard_fonts.pdf
!freeculture.pdf
!issue6006.pdf
2012-03-02 12:23:36 +09:00
!pdfkit_compressed.pdf
2012-03-18 07:35:04 +09:00
!TAMReview.pdf
2015-09-05 19:29:16 +09:00
!pr4922.pdf
2015-10-17 01:48:26 +09:00
!pr6531_1.pdf
!pr6531_2.pdf
!pr7352.pdf
2013-11-18 22:48:06 +09:00
!bug900822.pdf
!bug1392647.pdf
2011-12-13 12:42:39 +09:00
!issue918.pdf
!bug920426.pdf
!issue1905.pdf
!issue2833.pdf
!issue2931.pdf
!issue3323.pdf
!issue4304.pdf
!issue9017_reduced.pdf
!issue4379.pdf
2014-04-11 04:36:37 +09:00
!issue4550.pdf
Fallback to the /ToUnicode map for TrueType fonts with (3, 1) and (1, 0) cmap-tables (issue 13316) In the PDF document some of the glyphs have bogus `differences`-entries[1] that cannot be resolved to valid glyph names, thus causing the glyph mapping to fail. My initial idea was to use a similar approach as in the `PartialEvaluator._simpleFontToUnicode`-method, to extract the charCodes from those entries, however it turned out that that didn't actually help in this case (the mapping was still wrong). To fix this I'm thus proposing that we fallback to the /ToUnicode map when no other useable data exists (e.g. no post-table), since it *hopefully* shouldn't make things any worse than leaving parts of the glyph map empty (which currently happens). --- [1] As can be seem below, some of the entries are completely normal while others are non-standard: ``` Differences (array) 0 = 65 1 = /g5167 2 = /space 3 = /g11927 4 = /g17737 5 = /g11540 6 = /g2180 7 = /K 8 = /P 9 = /two 10 = /zero 11 = /one 12 = /five 13 = /four 14 = /g6932 15 = /g7246 16 = /g1691 17 = /g2343 18 = /g14792 19 = /g3325 20 = /g4280 21 = /g20383 22 = /g18166 23 = /g16988 24 = /g17943 25 = /g19223 26 = /g10830 27 = 97 28 = /g982 29 = /g1226 30 = /g5059 31 = /g2677 32 = /g1042 33 = /g11568 34 = /L 35 = /three 36 = /seven 37 = /g2364 38 = /g12063 39 = /g5356 40 = /g2173 41 = /g17877 42 = /g7273 43 = /g7647 44 = /g7224 45 = /g19327 46 = /g5054 47 = /g2342 48 = /g10136 49 = /g6856 50 = /g13381 51 = /g7257 52 = /g12093 53 = /g2359 ```
2021-09-04 01:10:27 +09:00
!issue13316_reduced.pdf
!issue4575.pdf
2014-08-19 07:57:52 +09:00
!bug1011159.pdf
2015-02-14 19:59:10 +09:00
!issue5734.pdf
!issue4875.pdf
!issue11740_reduced.pdf
!issue12705.pdf
!issue4881.pdf
!issue5994.pdf
!issue6151.pdf
2013-06-21 07:03:30 +09:00
!rotated.pdf
!issue1249.pdf
2013-11-02 07:13:31 +09:00
!issue1171.pdf
2011-12-17 03:54:31 +09:00
!smaskdim.pdf
2013-03-05 05:28:04 +09:00
!endchar.pdf
!type4psfunc.pdf
2012-03-20 01:09:42 +09:00
!issue1350.pdf
2012-01-12 11:14:49 +09:00
!S2.pdf
!glyph_accent.pdf
2014-01-27 22:17:14 +09:00
!personwithdog.pdf
!find_all.pdf
2013-08-24 02:57:11 +09:00
!helloworld-bad.pdf
2012-01-18 13:50:49 +09:00
!zerowidthline.pdf
!issue13242.pdf
!js-colors.pdf
!annotation-line-without-appearance-empty-Rect.pdf
!issue12841_reduced.pdf
2013-11-03 08:56:48 +09:00
!bug868745.pdf
!mmtype1.pdf
!issue4436r.pdf
!issue5704.pdf
!issue5751.pdf
!bug893730.pdf
!bug864847.pdf
2012-03-11 12:12:33 +09:00
!issue1002.pdf
2012-02-20 15:12:22 +09:00
!issue925.pdf
!issue2840.pdf
!issue4061.pdf
!issue4668.pdf
!issue13226.pdf
!PDFJS-7562-reduced.pdf
!issue11768_reduced.pdf
!issue5039.pdf
!issue14117.pdf
!issue5070.pdf
!issue5238.pdf
!issue5244.pdf
2014-10-25 18:35:13 +09:00
!issue5291.pdf
!issue4398.pdf
!issue5421.pdf
2014-11-05 00:16:48 +09:00
!issue5470.pdf
!issue5501.pdf
!issue5599.pdf
!issue5747.pdf
!issue6099.pdf
!issue6336.pdf
!issue6387.pdf
!issue6410.pdf
!issue11124.pdf
2017-06-30 03:52:49 +09:00
!issue8586.pdf
!jbig2_symbol_offset.pdf
2012-03-30 00:53:51 +09:00
!gradientfill.pdf
2013-11-14 04:45:59 +09:00
!bug903856.pdf
!bug850854.pdf
!issue12810.pdf
!bug866395.pdf
!issue12010_reduced.pdf
!issue11718_reduced.pdf
!bug1027533.pdf
!bug1028735.pdf
!bug1046314.pdf
2014-10-09 04:11:41 +09:00
!bug1065245.pdf
!issue6769.pdf
!bug1151216.pdf
!issue8111.pdf
!bug1175962.pdf
!bug1020226.pdf
!issue9534_reduced.pdf
!attachment.pdf
2012-04-13 09:59:30 +09:00
!basicapi.pdf
!issue2884_reduced.pdf
!mixedfonts.pdf
!shading_extend.pdf
!noembed-identity.pdf
2013-01-24 01:15:02 +09:00
!noembed-identity-2.pdf
!noembed-jis7.pdf
!issue12504.pdf
!noembed-eucjp.pdf
!bug1627427_reduced.pdf
!noembed-sjis.pdf
2013-02-08 21:29:22 +09:00
!vertical.pdf
!issue13343.pdf
2014-09-04 04:57:57 +09:00
!ZapfDingbats.pdf
!bug878026.pdf
2015-11-06 23:54:50 +09:00
!issue1045.pdf
!issue5010.pdf
!issue10339_reduced.pdf
!issue4934.pdf
!issue4650.pdf
!issue6721_reduced.pdf
!issue3025.pdf
!french_diacritics.pdf
!issue2099-1.pdf
!issue3371.pdf
2013-03-18 22:06:59 +09:00
!issue2956.pdf
!issue2537r.pdf
!issue269_1.pdf
!bug946506.pdf
!issue3885.pdf
!issue11697_reduced.pdf
!bug859204.pdf
!annotation-tx.pdf
!annotation-tx2.pdf
!annotation-tx3.pdf
!coons-allflags-withfunction.pdf
2015-08-05 06:55:55 +09:00
!tensor-allflags-withfunction.pdf
!issue10084_reduced.pdf
!issue4246.pdf
!issue11915.pdf
!js-authors.pdf
2014-03-18 22:07:54 +09:00
!issue4461.pdf
2014-04-12 01:55:39 +09:00
!issue4573.pdf
!issue4722.pdf
2014-07-31 05:15:06 +09:00
!issue4800.pdf
!issue9243.pdf
!issue13147.pdf
!issue11477_reduced.pdf
!text_clip_cff_cid.pdf
!issue4801.pdf
!issue5334.pdf
2018-09-30 23:29:16 +09:00
!annotation-caret-ink.pdf
!bug1186827.pdf
!issue12706.pdf
!issue215.pdf
!issue5044.pdf
!issue1512r.pdf
!issue2128r.pdf
!bug1703683_page2_reduced.pdf
2015-07-03 06:47:47 +09:00
!issue5540.pdf
2014-12-18 06:42:06 +09:00
!issue5549.pdf
!visibility_expressions.pdf
2014-12-18 06:46:47 +09:00
!issue5475.pdf
!issue10519_reduced.pdf
!annotation-border-styles.pdf
!IdentityToUnicodeMap_charCodeOf.pdf
Map all glyphs to the private use area and duplicate the first glyph. There have been lots of problems with trying to map glyphs to their unicode values. It's more reliable to just use the private use areas so the browser's font renderer doesn't mess with the glyphs. Using the private use area for all glyphs did highlight other issues that this patch also had to fix: * small private use area - Previously, only the BMP private use area was used which can't map many glyphs. Now, the (much bigger) PUP 16 area can also be used. * glyph zero not shown - Browsers will not use the glyph from a font if it is glyph id = 0. This issue was less prevalent when we mapped to unicode values since the fallback font would be used. However, when using the private use area, the glyph would not be drawn at all. This is illustrated in one of the current test cases (issue #8234) where there's an "ä" glyph at position zero. The PDF looked like it rendered correctly, but it was actually not using the glyph from the font. To properly show the first glyph it is always duplicated and appended to the glyphs and the maps are adjusted. * supplementary characters - The private use area PUP 16 is 4 bytes, so String.fromCodePoint must be used where we previously used String.fromCharCode. This is actually an issue that should have been fixed regardless of this patch. * charset - Freetype fails to load fonts when the charset size doesn't match number of glyphs in the font. We now write out a fake charset with the correct length. This also brought up the issue that glyphs with seac/endchar should only ever write a standard charset, but we now write a custom one. To get around this the seac analysis is permanently enabled so those glyphs are instead always drawn as two glyphs.
2018-01-05 07:43:07 +09:00
!PDFJS-9279-reduced.pdf
2014-12-19 05:26:02 +09:00
!issue5481.pdf
!resetform.pdf
2015-02-10 07:32:16 +09:00
!issue5567.pdf
!issue5701.pdf
!issue6769_no_matrix.pdf
!issue12007_reduced.pdf
!issue5896.pdf
!issue6010_1.pdf
!issue6010_2.pdf
!issue6068.pdf
!issue6081.pdf
!issue6069.pdf
!issue6106.pdf
!issue6296.pdf
2016-04-11 06:39:15 +09:00
!bug852992_reduced.pdf
!issue13271.pdf
!issue6298.pdf
!issue6889.pdf
!issue11473.pdf
!bug1001080.pdf
!bug1671312_reduced.pdf
!bug1671312_ArialNarrow.pdf
!issue6108.pdf
!issue6113.pdf
!openoffice.pdf
!js-buttons.pdf
!issue7014.pdf
!issue8187.pdf
!annotation-link-text-popup.pdf
Map all glyphs to the private use area and duplicate the first glyph. There have been lots of problems with trying to map glyphs to their unicode values. It's more reliable to just use the private use areas so the browser's font renderer doesn't mess with the glyphs. Using the private use area for all glyphs did highlight other issues that this patch also had to fix: * small private use area - Previously, only the BMP private use area was used which can't map many glyphs. Now, the (much bigger) PUP 16 area can also be used. * glyph zero not shown - Browsers will not use the glyph from a font if it is glyph id = 0. This issue was less prevalent when we mapped to unicode values since the fallback font would be used. However, when using the private use area, the glyph would not be drawn at all. This is illustrated in one of the current test cases (issue #8234) where there's an "ä" glyph at position zero. The PDF looked like it rendered correctly, but it was actually not using the glyph from the font. To properly show the first glyph it is always duplicated and appended to the glyphs and the maps are adjusted. * supplementary characters - The private use area PUP 16 is 4 bytes, so String.fromCodePoint must be used where we previously used String.fromCharCode. This is actually an issue that should have been fixed regardless of this patch. * charset - Freetype fails to load fonts when the charset size doesn't match number of glyphs in the font. We now write out a fake charset with the correct length. This also brought up the issue that glyphs with seac/endchar should only ever write a standard charset, but we now write a custom one. To get around this the seac analysis is permanently enabled so those glyphs are instead always drawn as two glyphs.
2018-01-05 07:43:07 +09:00
!issue9278.pdf
!annotation-text-without-popup.pdf
!annotation-underline.pdf
!issue13193.pdf
!annotation-underline-without-appearance.pdf
!issue269_2.pdf
!issue13372.pdf
!annotation-strikeout.pdf
!annotation-strikeout-without-appearance.pdf
!annotation-squiggly.pdf
!issue14256.pdf
!annotation-squiggly-without-appearance.pdf
!annotation-highlight.pdf
!annotation-highlight-without-appearance.pdf
!issue12418_reduced.pdf
2019-04-14 01:45:22 +09:00
!annotation-freetext.pdf
!annotation-line.pdf
!evaljs.pdf
!issue12798_page1_reduced.pdf
!annotation-line-without-appearance.pdf
!bug1669099.pdf
!annotation-square-circle.pdf
!annotation-square-circle-without-appearance.pdf
!annotation-stamp.pdf
!issue14048.pdf
!issue11656.pdf
!annotation-fileattachment.pdf
!annotation-text-widget.pdf
!annotation-choice-widget.pdf
!issue10900.pdf
!annotation-button-widget.pdf
!annotation-polyline-polygon.pdf
!annotation-polyline-polygon-without-appearance.pdf
!zero_descent.pdf
!operator-in-TJ-array.pdf
!issue7878.pdf
2017-02-11 15:25:05 +09:00
!font_ascent_descent.pdf
!listbox_actions.pdf
!issue11442_reduced.pdf
!issue11549_reduced.pdf
!issue8097_reduced.pdf
!bug1743245.pdf
!quadpoints.pdf
!transparent.pdf
!issue13931.pdf
!xobject-image.pdf
!issue6605.pdf
!ccitt_EndOfBlock_false.pdf
2018-08-27 04:37:05 +09:00
!issue9972-1.pdf
!issue9972-2.pdf
!issue9972-3.pdf
!tiling-pattern-box.pdf
!tiling-pattern-large-steps.pdf
!issue13201.pdf
!issue14462_reduced.pdf
!issue11555.pdf
!issue12337.pdf
!pr12564.pdf
!pr12828.pdf
!secHandler.pdf
!issue14297.pdf
!rc_annotation.pdf
!issue14267.pdf
!PDFBOX-4352-0.pdf
!REDHAT-1531897-0.pdf
!xfa_issue14315.pdf
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303) *This patch basically extends the approach from PR 10392, by also checking the last page.* Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid. As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser). Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages. To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages. Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug. - This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents. - For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents, - This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost. As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value). Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-26 02:34:11 +09:00
!poppler-67295-0.pdf
!poppler-85140-0.pdf
!poppler-395-0-fuzzed.pdf
!GHOSTSCRIPT-698804-1-fuzzed.pdf
Prevent circular references in XRef tables from hanging the worker-thread (issue 14303) *Please note:* While this patch on its own is sufficient to prevent the worker-thread from hanging, however in combination with PR 14311 these PDF documents will both load *and* render correctly. Rather than focusing on the particular structure of these PDF documents, it seemed (at least to me) to make sense to try and prevent all circular references when fetching/looking-up data using the XRef table. To avoid a solution that required tracking the references manually everywhere, the implementation settled on here instead handles that internally in the `XRef.fetch`-method. This should work, since that method *and* the `Parser`/`Lexer`-implementations are completely synchronous. Note also that the existing `XRef`-caching, used for all data-types *except* Streams, should hopefully help to lessen the performance impact of these changes. One *potential* problem with these changes could be certain *browser* exceptions, since those are generally not catchable in JavaScript code, however those would most likely "stop" worker-thread parsing anyway (at least I hope so). Finally, note that I settled on returning dummy-data rather than throwing an exception. This was done to allow parsing, for the rest of the document, to continue such that *one* bad reference doesn't prevent an entire document from loading. Fixes two of the issues listed in issue 14303, namely the `poppler-91414-0.zip-2.gz-53.pdf` and `poppler-91414-0.zip-2.gz-54.pdf` documents.
2021-11-26 22:11:39 +09:00
!poppler-91414-0-53.pdf
!poppler-91414-0-54.pdf
!poppler-742-0-fuzzed.pdf
!poppler-937-0-fuzzed.pdf
!PDFBOX-3148-2-fuzzed.pdf
!poppler-90-0-fuzzed.pdf
!issue14415.pdf
!issue14307.pdf
!issue14497.pdf
!issue14502.pdf