From 90472e5130378cd443c05c5a8cbaaeff5b82b843 Mon Sep 17 00:00:00 2001 From: Jonas Jenwald Date: Fri, 10 Dec 2021 17:14:58 +0100 Subject: [PATCH] Avoid overloading the worker-thread during eager page initialization in the viewer (PR 11263 follow-up) This patch is essentially *another* continuation of PR 11263, which tried to improve loading/initialization performance of *very* large/long documents. For most documents, unless they're *very* long, we'll eagerly initialize all of the pages in the viewer. For shorter documents having all pages loaded/initialized early provides overall better performance/UX in the viewer, however there's cases where it can instead *hurt* performance. For documents with a couple of thousand pages[1], the parsing and pre-rendering of the *second* page of the document can be delayed (quite a bit). The reason for this is that we trigger `PDFDocumentProxy.getPage` for *all pages* early during the viewer initialization, which causes the worker-thread to be swamped with handling (potentially) thousands of `getPage`-calls and leaving very little time for other parsing (such as e.g. of operatorLists). To address this situation, this patch thus proposes temporarily "pausing" the eager `PDFDocumentProxy.getPage`-calls once a threshold has been reached, to give the worker-thread a change to handle other requests.[2] Obviously this may *slightly* delay the "pagesloaded" event in longer documents, but considering that it's already the result of asynchronous parsing that'll hopefully not be seen as a blocker for these changes.[3] --- [1] A particularly problematic example is https://github.com/mozilla/pdf.js/files/876321/kjv.pdf (16 MB large), which is a document with 2236 pages and a /Pages-tree that's only *one* level deep. [2] Please note that I initially considered simply chaining the `PDFDocumentProxy.getPage`-calls, however that'd slowed things down for all documents which didn't seem appropriate. [3] This patch will *hopefully* also make it possible to re-visit PR 11312, since it seems that changing `Catalog.getPageDict` to an `async` method wasn't the problem in itself. Rather it appears that it leads to slightly different timings, thus exacerbating the already existing issues with the worker-thread being overloaded by `getPage`-calls. Having recently worked with that method, there's a couple of (very old) issues that I'd also like to address and having `Catalog.getPageDict` be `async` would simplify things a great deal. --- web/base_viewer.js | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/web/base_viewer.js b/web/base_viewer.js index 931958ced..8fac5d0ce 100644 --- a/web/base_viewer.js +++ b/web/base_viewer.js @@ -57,6 +57,7 @@ const DEFAULT_CACHE_SIZE = 10; const PagesCountLimit = { FORCE_SCROLL_MODE_PAGE: 15000, FORCE_LAZY_PAGE_INIT: 7500, + PAUSE_EAGER_PAGE_INIT: 500, }; /** @@ -625,7 +626,7 @@ class BaseViewer { // Fetch all the pages since the viewport is needed before printing // starts to create the correct size canvas. Wait until one page is // rendered so we don't tie up too many resources early on. - this._onePageRenderedOrForceFetch().then(() => { + this._onePageRenderedOrForceFetch().then(async () => { if (this.findController) { this.findController.setDocument(pdfDocument); // Enable searching. } @@ -650,7 +651,7 @@ class BaseViewer { return; } for (let pageNum = 2; pageNum <= pagesCount; ++pageNum) { - pdfDocument.getPage(pageNum).then( + const promise = pdfDocument.getPage(pageNum).then( pdfPage => { const pageView = this._pages[pageNum - 1]; if (!pageView.pdfPage) { @@ -671,6 +672,10 @@ class BaseViewer { } } ); + + if (pageNum % PagesCountLimit.PAUSE_EAGER_PAGE_INIT === 0) { + await promise; + } } });