Always prefer the PDF.js JPEG decoder for very large images, in order to reduce peak memory usage (issue 11694)

When JPEG images are decoded by the browser, on the main-thread, there's a handful of short-lived copies of the image data; see c3f4690bde/src/display/api.js (L2364-L2408) That code thus becomes quite problematic for very big JPEG images, since it increases peak memory usage a lot during decoding. In the referenced issue there's a couple of JPEG images whose dimensions are `10006 x 7088` (i.e. ~68 mega-pixels), which causes the *peak* memory usage to increase by close to `1 GB` (i.e. one giga-byte) in my testing. By letting the PDF.js JPEG decoder, rather than the browser, handle very large images the *peak* memory usage is considerably reduced and the allocated memory also seem to be reclaimed faster. *Please note:* This will lead to movement in some existing `eq` tests.
2020-03-18 11:29:16 +01:00 · 2020-03-18 11:29:16 +01:00 · 62a9c26cda
commit 62a9c26cda
parent 4893b14a52
1 changed files with 11 additions and 0 deletions
--- a/src/core/jpeg_stream.js
+++ b/src/core/jpeg_stream.js
@ -134,6 +134,17 @@ const JpegStream = (function JpegStreamClosure() {
            stream.pos += 2; // Skip marker length.
            stream.pos += 1; // Skip precision.
            const scanLines = stream.getUint16();
+            const samplesPerLine = stream.getUint16();
+
+            // Letting the browser handle the JPEG decoding, on the main-thread,
+            // will cause a *large* increase in peak memory usage since there's
+            // a handful of short-lived copies of the image data. For very big
+            // JPEG images, always let the PDF.js image decoder handle them to
+            // reduce overall memory usage during decoding (see issue 11694).
+            if (scanLines * samplesPerLine > 1e6) {
+              validDimensions = false;
+              break;
+            }

            // The "normal" case, where the image data and dictionary agrees.
            if (scanLines === dictHeight) {