Wait for the completion of writing the generated SVG file before
processing the next page. This is to enable the garbage collector to
garbage-collect the (potentially large) SVG string before trying to
allocate memory again for the next page.
Note that since the PDF-to-SVG conversion is now sequential instead of
parallel, the time to generate all pages increases.
Test case:
node --max_old_space_size=200 examples/node/pdf2svg.js /tmp/FatalProcessOutOfMemory.pdf
Before this patch:
- Node.js crashes due to OOM after processing 20 pages.
After this patch:
- Node.js is able to convert all 203 PDFs to SVG without crashing.
Test case:
Using the PDF file from https://github.com/mozilla/pdf.js/issues/8534
node --max_old_space_size=200 examples/node/pdf2svg.js /tmp/FatalProcessOutOfMemory.pdf
Before this patch:
Node.js crashes due to OOM after processing 10 pages.
After this patch:
Node.js crashes due to OOM after processing 19 pages.
Since this call occurs *before* the `PDFViewer.setDocument` call, it won't actually cause any scale change.
Furthermore, moving it should not be necessary, since the `scale` is already used as the fallback case in `PDFViewerApplication.setInitialView` (provided it's non-zero, which isn't even the case in the default viewer).
Hence this patch should cause no functional changes at all, since it simply removes a piece of unnecessary code.
Currently, these properties are reset in what appears to be somewhat arbitrary locations (within the `load` and `open` methods respectively). The explanation is probably that both of these properties predates the existence of any centralized clean-up code in the viewer.
Hence I think that it makes sense to move the resetting of these properties to the `close` method, since that improves the overview of what's actually cleaned-up/reset when changing documents in the viewer.
The `ObjectLoader` currently takes an Object as input, despite actually working with `Dict`s internally. This means that at the (two) existing call-sites, we're passing in the "private" `Dict.map` property directly.
Doing this seems like an anti-pattern, and we could (and even should) simply provide the actual `Dict` when creating an `ObjectLoader` instance.
Accessing properties stored in the `Dict` is now done using the intended methods instead, in particular `getRaw` which (as the name suggests) doesn't do any de-referencing, thus maintaining the current functionality of the code.
The only functional change in this patch is that `ObjectLoader.load` will now ignore empty nodes, such that `ObjectLoader._walk` only needs to deal with nodes that are known to contain data. (This lets us skip, among other checks, meaningless `addChildren` function calls.)
This method is currently called from `PDFViewer._scrollUpdate` on *every* scroll event in the viewer.
However, I cannot see why this code is now necessary (assuming that it once was), since text-selection and searching still works *exactly* the same way with this patch as with the current `master`.
When `PDFPageView.updatePosition` is called, the page can be in either of these states:
1. The page hasn't been rendered, in which case the `textLayer` property doesn't exist yet.
2. The page is currently rendering, meaning that the `textLayer` property exists. Given that the `textContent` won't be fetched until the page has been successfully rendered, `TextLayerBuilder.render` will return immediately and effectively be a no-op (since there's nothing to render yet).
3. The has been been rendered, and the `textLayer` is currently rendering.
4. The page, and its `textLayer`, has been completely rendered. In this case, `TextLayerBuilder.render` will return immediately and effectively be a no-op.
Here, only the *third* case seem to require any further analysis:
When scrolling occurs while the `textLayer` is rendering, `TextLayerBuilder.render` will via a helper method call `TextLayerRenderTask.cancel` (in src/display/text_layer.js) to stop processing.
However, due to the run-to-completion nature of JavaScript, once `TextLayerRenderTask._render` has been invoked `appendText` will always run.[1]
So even though we cancel rendering of pending `textLayer`s during scrolling, via the repeated `TextLayerBuilder.render` calls from within the `PDFPageView.updatePosition` method, that does *not* prevent us from running the code inside of `TextLayerRenderTask._render` over and over for the *same* page; which all seems *very* inefficient to me.[2]
All this will thus have the effect of delaying the *actual* rendering of a `textLayer` ever so slightly while scrolling in the viewer. However, it does so at the expense of potentially hundreds of unnecessary `appendText` calls.[3]
Hence it seems to me that it's less resource intensive overall to simply let rendering of the `textLayer` complete, once it has started. Obviously, we still abort all rendering of a page, and its `textLayer`, when it's being destroyed (e.g. by being evicted from the page cache).
In case that there's any worry that the patch could affect e.g. highlighting of search results, please note that the existing code in `TextLayerBuilder.render` already calls `updateMatches` when the `TextLayerTask` resolves successfully.
*I'm sorry that this became quite long, but to try and summarize:*
`PDFPageView.updatePosition` doesn't actually do anything in *most* cases. In the one case where it matters, it seems that it's actually doing more harm than good; which is why I'm proposing that we just remove it.
---
[1] Although we may be able to skip the `render` call, provided that it happens *after* a `timeout` (as is the case in the default viewer).
[2] With current work being done to support streaming of `TextContent`, we'd also need to add just as many duplicate API calls to `PDFPageView.updatePosition`.
[3] The number of duplicate `appendText` calls is directly proportional not only to the scroll speed, but also to the number of pages in the document.
Since the localization service is now asynchronous, depending on the load the browser is under, there's a small risk that the lookup of the 'page_scale_percent' string could be delayed slightly.
If the scale would change a couple of times in *very* quick succession, there's perhaps a *theoretical* possibility that the Zoom dropdown would display an incorrect value.
Consider the following, somewhat contrived, theoretical example of two zoom commands being executed *right* after one another:
```javascript
PDFViewerApplication.pdfViewer.currentScale = 1.23;
PDFViewerApplication.pdfViewer.currentScaleValue = 'page-width';
```
Only the `currentScale` call will currently trigger a l10n lookup in `selectScaleOption`. However, as far as I understand, there's no *guarantee* that the l10n string is resolved *before* `selectScaleOption` is called again as a result of the `currentScaleValue` call.
This thus has the possibility of putting the Zoom dropdown into an inconsistent state, since it's currently updated synchronously for one code-path and asynchronously for another.
To avoid these issues, I'm proposing that we *always* update the Zoom dropdown asynchronously, such that we can guarantee that the ordering is correct.
After PR 8394, at least in Firefox, the Zoom dropdown now frequently displays an old custom scale instead of the correct one. To see this behaviour, the following STR works for me:
1. Open https://mozilla.github.io/pdf.js/web/viewer.html.
2. Zoom in, by clicking on the corresponding button in the toolbar.
3. Run `PDFViewerApplication.pdfViewer.currentScaleValue` in the console.
4. Compare what's displayed in the Zoom dropdown with what's printed in the console. (If no difference can be observed, try repeating steps 2 through 4 a couple of times.)
I really don't understand why this happens, but it seems that waiting until a custom scale has been set *before* selecting it fixes things in Firefox (and works fine in e.g. Chrome as well).
Note that this patch thus makes this particular piece of the code consistent with the state prior to PR 8394.
Adds functionality to accept Queueing Strategy in
sendWithStream method. Using Queueing Strategy we
can control the data that is enqueued into the sink,
and hence regulated the flow of chunks from worker
to main thread.
Adds capability in pull and cancel methods.
Adds ready and desiredSize property in streamSink.
Adds unit test for ReadableStream and sendWithStream.
The click handler used in Presentation Mode didn't check if the first/last page was already reached, which after PR 7529 now causes an unnecessary console error.
Hence we should simply use the already existing `_goToPreviousPage`/`_goToNextPage` methods instead, since they do the necessary bounds checking.
Fixes 8498.
Moreover, rename `FindStates` to `FindState` since enumeration names are
usually not in plural, for readability and consistency with the ones in
`src/shared/util.js`.