Avoid overloading the worker-thread during eager page initialization in the viewer (PR 11263 follow-up)

This patch is essentially *another* continuation of PR 11263, which tried to improve loading/initialization performance of *very* large/long documents.

For most documents, unless they're *very* long, we'll eagerly initialize all of the pages in the viewer. For shorter documents having all pages loaded/initialized early provides overall better performance/UX in the viewer, however there's cases where it can instead *hurt* performance.
For documents with a couple of thousand pages[1], the parsing and pre-rendering of the *second* page of the document can be delayed (quite a bit). The reason for this is that we trigger `PDFDocumentProxy.getPage` for *all pages* early during the viewer initialization, which causes the worker-thread to be swamped with handling (potentially) thousands of `getPage`-calls and leaving very little time for other parsing (such as e.g. of operatorLists).

To address this situation, this patch thus proposes temporarily "pausing" the eager `PDFDocumentProxy.getPage`-calls once a threshold has been reached, to give the worker-thread a change to handle other requests.[2]

Obviously this may *slightly* delay the "pagesloaded" event in longer documents, but considering that it's already the result of asynchronous parsing that'll hopefully not be seen as a blocker for these changes.[3]

---
[1] A particularly problematic example is https://github.com/mozilla/pdf.js/files/876321/kjv.pdf (16 MB large), which is a document with 2236 pages and a /Pages-tree that's only *one* level deep.

[2] Please note that I initially considered simply chaining the `PDFDocumentProxy.getPage`-calls, however that'd slowed things down for all documents which didn't seem appropriate.

[3] This patch will *hopefully* also make it possible to re-visit PR 11312, since it seems that changing `Catalog.getPageDict` to an `async` method wasn't the problem in itself. Rather it appears that it leads to slightly different timings, thus exacerbating the already existing issues with the worker-thread being overloaded by `getPage`-calls.
Having recently worked with that method, there's a couple of (very old) issues that I'd also like to address and having `Catalog.getPageDict` be `async` would simplify things a great deal.
This commit is contained in:
Jonas Jenwald 2021-12-10 17:14:58 +01:00
parent 97dc048e56
commit 90472e5130

View File

@ -57,6 +57,7 @@ const DEFAULT_CACHE_SIZE = 10;
const PagesCountLimit = {
FORCE_SCROLL_MODE_PAGE: 15000,
FORCE_LAZY_PAGE_INIT: 7500,
PAUSE_EAGER_PAGE_INIT: 500,
};
/**
@ -625,7 +626,7 @@ class BaseViewer {
// Fetch all the pages since the viewport is needed before printing
// starts to create the correct size canvas. Wait until one page is
// rendered so we don't tie up too many resources early on.
this._onePageRenderedOrForceFetch().then(() => {
this._onePageRenderedOrForceFetch().then(async () => {
if (this.findController) {
this.findController.setDocument(pdfDocument); // Enable searching.
}
@ -650,7 +651,7 @@ class BaseViewer {
return;
}
for (let pageNum = 2; pageNum <= pagesCount; ++pageNum) {
pdfDocument.getPage(pageNum).then(
const promise = pdfDocument.getPage(pageNum).then(
pdfPage => {
const pageView = this._pages[pageNum - 1];
if (!pageView.pdfPage) {
@ -671,6 +672,10 @@ class BaseViewer {
}
}
);
if (pageNum % PagesCountLimit.PAUSE_EAGER_PAGE_INIT === 0) {
await promise;
}
}
});