This rule is in the process of being rolled out in mozilla-central, and it helps avoid unnecessary `async` functions together with `new Promise(...)`.
Please see https://eslint.org/docs/rules/no-async-promise-executor for additional information.
Besides avoiding errors during loading, this also ensures that the document will be correctly scrolled/zoomed into view once the viewer becomes visible.
This "new" behaviour was always intended, see PR 2613, however various re-factoring over the years seem to have broken this (and I'm probably at least somewhat responsible for that).
By transfering, rather than copying, `ArrayBuffer`s between the main- and worker-threads, you can avoid unnecessary allocations by only having *one* copy of the same data.
Hence manually setting `postMessageTransfers: false`, when calling `getDocument`, is a performance footgun[1] which will do nothing but waste memory.
Given that every reasonably modern browser supports `postMessage` transfers[2], I really don't see why it should be possible to force-disable this functionality.
Looking at the browser support, for `postMessage` transfers[2], it's highly unlikely that PDF.js is even usable in browsers without it. However, the feature testing of `postMessage` transfers is kept for the time being just to err on the safe side.
---
[1] This is somewhat similar to the, now removed, `disableWorker` parameter which also provided API users a much too simple way of reducing performance.
[2] See e.g. https://developer.mozilla.org/en-US/docs/Web/API/MessagePort/postMessage#Browser_compatibility and https://developer.mozilla.org/en-US/docs/Web/API/Transferable#Browser_compatibility
Note how the sent values have inconsistent types, with a boolean in one case and an object in the other (normal) case.
Furthermore, explicitly sending a `supportTypedArray: true` property seems superfluous at least to me.
This check was added in PR 2445, however it's no longer necessary since all data[1] is now loaded on the main-thread (and then transferred to the worker-thread).
Furthermore, by default the Fetch API is now (usually) used rather than `XMLHttpRequest`.
All in all, while these checks *were* necessary at one point that's no longer the case and they can thus be removed.
---
[1] This includes both the actual PDF data, as well as the CMap data.
It recently occurred to me that the CMap data should be an excellent candidate for transfering.
This will help reduce peak memory usage for PDF documents using CMaps, since transfering of data avoids duplicating it on both the main- and worker-threads.
Unfortunately it's not possible to actually transfer data when *returning* data through `sendWithPromise`, and another solution had to be used.
Initially I looked at using one message for requesting the data, and another message for returning the actual CMap data. While that should have worked, it would have meant adding a lot more complexity particularly on the worker-thread.
Hence the simplest solution, at least in my opinion, is to utilize `sendWithStream` since that makes it *really* easy to transfer the CMap data. (This required PR 11115 to land first, since otherwise CMap fetch errors won't propagate correctly to the worker-thread.)
Please note that the patch *purposely* only changes the API to Worker communication, and not the API *itself* since changing the interface of `CMapReaderFactory` would be a breaking change.
Furthermore, given the relatively small size of the `.bcmap` files (the largest one is smaller than the default range-request size) streaming doesn't really seem necessary either.
Having recently worked with this code, it struck me that most of the `postMessage` calls where `Error`s are involved have never been correctly implemented (i.e. missing `wrapReason` calls).
There's only three call-sites and one of them doesn't even need the complete functionality of `resolveCall`, hence it seems reasonable to just inline this code.
An additional benefit of this is that the `Function.prototype.apply()` instance can also be converted into "normal" function calls, which should be a tiny bit more efficient.
The patch also replaces a number of unnecessary arrow functions, in relevant parts of the `MessageHandler` code, with "normal" functions instead.
Finally, all `Promise.resolve().then(...)` calls are replaced with `new Promise(...)` instead since the latter is a tiny bit more efficient. This also explains the test failures on the Linux bot, with a prior version of the patch, since the `Promise.resolve().then(...)` format essentially creates two Promises thus causing additional delay.
At this point in time it's easy to convert the `MessageHandler.on` call-sites to use arrow functions, and thus let the JavaScript engine handle scopes for us, rather than having to manually keep references to the relevant scopes in `MessageHandler`.[1]
An additional benefit of this is that a couple of `Function.prototype.call()` instances can now be converted into "normal" function calls, which should be a tiny bit more efficient.
All in all, I don't see any compelling reason why it'd be necessary to keep supporting custom `scope`s in the `MessageHandler` implementation.
---
[1] In the event that a custom scope is ever needed, simply using `bind` on the handler function when calling `MessageHandler.on` ought to work as well.
Since `wrapReason` and `makeReasonSerializable` are essentially functionally equivalent it doesn't seem necessary to keep both of them around, especially when `makeReasonSerializable` only has a *single* call-site.
Given that the `stream` property is an internal implementation detail, changing its type shouldn't be a problem. By using Numbers instead, we can avoid unnecessary String allocations when creating/processing Streams.
With PR 11069 we're now using Streams for OperatorList parsing (in addition to just TextContent parsing), which brings the nice benefit of being able to easily abort parsing on the worker-thread thus saving resources.
However, since we're now creating many more `ReadableStream` there appears to be a tiny bit more overhead because of it (giving ~1% slower runtime of `browsertest` on the bots). In this case we're just going to have to accept such a small regression, since the benefits of using Streams clearly outweighs it.
What we *can* do here, is to try and make the Streams part of the `MessageHandler` implementation slightly more efficient by e.g. removing unnecessary function calls (which has been helpful in other parts of the code-base). To that end, this patch makes the following changes:
- Actually support `transfers` in `MessageHandler.sendWithStream`, since the parameter was being ignored.
- Inline the `sendStreamRequest`/`sendStreamResponse` helper functions at their respective call-sites. Obviously this causes some amount of code duplication, however I still think this change seems reasonable since for each call-site:
- It avoids making one unnecessary function call.
- It avoids allocating one temporary object.
- It avoids sending, and thus structure clone, various undefined object properties.
- Inline objects in the `MessageHandler.{send, sendWithPromise}` methods.
- Finally, directly call `comObj.postMessage` in various methods when `transfers` are *not* present, rather than calling `MessageHandler.postMessage`, to further reduce the amount of function calls.
One of the motivations for using `setAttribute` in the first place was to support more efficient DOM updates in the `expandTextDivs` method, since performance of the `enhanceTextSelection` mode can be somewhat bad when there's a lot of `textDivs` on the page.
With recent `TextLayer` changes/optimizations it's no longer necessary to store a complete `style`-string for every `textDiv`, and we can thus re-visit the `setAttribute` usage.
Note that with the current code, in `appendText`, there's only *one* string per `textDiv` which avoids a bunch of temporary strings. While the changes in this patch means that there's now *three* strings per `textDiv` instead, the total length of these strings are now quite a bit shorter (42 characters to be exact).
*This should obviously have been done in PR 11097, but for some reason I completely overlooked it; sorry about that.*
There's no good reason to update the font unless you're actually going to measure the width of the textContent. This can reduce unnecessary font switching a fair bit, even for documents which are somewhat simple/short (in e.g. the `tracemonkey.pdf` file this cuts the amount of font switches almost in half).
For performance reasons single-char text divs aren't being scaled, as outlined in a comment in `appendText`. Hence it doesn't seem necessary, or even a good idea, to unconditionally measuring the width of the text in `_layoutText`.
This major version bump required two changes:
- The global line in the mobile viewer example should be removed because
the `.eslintrc` file already defines these globals and with the new
`eslint` version we otherwise get an error saying "'pdfjsLib' is already
defined as a built-in global variable".
- The ECMA version for the examples must be set to 6 since we're using
modules, otherwise we get an error saying "sourceType 'module' is not
supported when ecmaVersion < 2015". It turns out that the previous
version of `eslint` already used ECMA version 6 silently even though
we set 5, see https://github.com/eslint/eslint/issues/9687#issuecomment-432413384,
so in terms of our code nothing really changes.
These functions aren't returning anything, now that they're using `ReadableStream`s, and it thus doesn't seem necessary to re-throw errors (also given the console message that's caused by it).
*Please note:* The majority of this patch was written by Yury, and it's simply been rebased and slightly extended to prevent issues when dealing with `RenderingCancelledException`.
By leveraging streams this (finally) provides a simple way in which parsing can be aborted on the worker-thread, which will ultimately help save resources.
With this patch worker-thread parsing will *only* be aborted when the document is destroyed, and not when rendering is cancelled. There's a couple of reasons for this:
- The API currently expects the *entire* OperatorList to be extracted, or an Error to occur, once it's been started. Hence additional re-factoring/re-writing of the API code will be necessary to properly support cancelling and re-starting of OperatorList parsing in cases where the `lastChunk` hasn't yet been seen.
- Even with the above addressed, immediately cancelling when encountering a `RenderingCancelledException` will lead to worse performance in e.g. the default viewer. When zooming and/or rotation of the document occurs it's very likely that `cancel` will be (almost) immediately followed by a new `render` call. In that case you'd obviously *not* want to abort parsing on the worker-thread, since then you'd risk throwing away a partially parsed Page and thus be forced to re-parse it again which will regress perceived performance.
- This patch is already *somewhat* risky, given that it touches fundamentally important/critical code, and trying to keep it somewhat small should hopefully reduce the risk of regressions (and simplify reviewing as well).
Time permitting, once this has landed and been in Nightly for awhile, I'll try to work on the remaining points outlined above.
Co-Authored-By: Yury Delendik <ydelendik@mozilla.com>
Co-Authored-By: Jonas Jenwald <jonas.jenwald@gmail.com>
Furthermore, it's possible to re-use the same Array for all `textDiv`s on the page and the resulting padding string also becomes a lot more compact.
Please note that the `paddingLeft` branch was moved, since the padding values need to be ordered as `top, right, bottom, left`.
Finally, with this re-factoring it's no longer necessary to cache the original `style` string for every `textDiv` when `enhanceTextSelection` is enabled.