Currently we have two separate image-caches on the worker-thread:
- A local one, which is unique to each `PartialEvaluator.getOperatorList` invocation. This one caches both names *and* references, since image-resources may be accessed in either way.
- A global one, which applies to the entire PDF documents and all its pages. This one only caches references, since nothing else would work.
This patch introduces a third image-cache, which essentially sits "between" the two existing ones. The new `RegionalImageCache`[1] will be usable throughout a `PartialEvaluator` instance, and consequently it *only* caches references, which thus allows us to keep track of repeated image-resources found in e.g. different /Form and /SMask objects.
---
[1] For lack of a better word, since naming things is hard...
This effectively implements some of the changes from https://phabricator.services.mozilla.com/D170496, but using our existing "direction aware" CSS-variable to limit the amount of code changes needed.
*Please note:* This parameter has never been used within the PDF.js library/viewer itself, and it was only ever added for backwards compatibility reasons.
This parameter was added in PR 7475, over six years ago, to try and optionally maintain the previous *default* text-extraction behaviour.
However as part of the general text-extraction improvements in PR 13257, almost two years ago, the `disableCombineTextItems` functionality was accidentally "broken" in various ways. Note how the only (very basic) unit-test was updated in a way that doesn't really make sense, since generally speaking you'd expect that using the option should result in *more* (or at least the same number of) text-items. Furthermore there's also the recent issue 16209, where the option causes almost all textContent to be concatenated together.
Hence this patch proposes that we simply remove the `disableCombineTextItems` option since it's essentially unused/untested functionality, as evident from the fact that it took almost two years for someone to notice that it's broken.
With the changes in PR 16153 we're no longer setting a `<base href>` in the Firefox PDF Viewer, hence it shouldn't be necessary to keep setting a `baseUrl` in the `PDFLinkService`-class.
Given that the original document URL is now kept, the browser itself will handle relative URLs and we can thus slightly reduce the amount of string parsing required when handling various links in the viewer.
Currently if you e.g. enable the `useOnlyCssZoom` option rendering may no longer finish as intended. To reproduce:
- Enable the `useOnlyCssZoom` option.
- Load https://github.com/mozilla/pdf.js/files/1522715/wuppertal_2012.pdf (in the development viewer).
- When rendering starts, *immediately* change the zoom-level.
In this case the document will never finish rendering, since the `postponeDrawing`-functionality will (here incorrectly) abort rendering and with CSS-only zooming rendering is only expected to happen once per page.
To fix this we'll simply ignore any `drawingDelay` when CSS-only zooming is used (regardless if it's triggered via the option or the zoom-level being very large).
Currently the `zoomLayer` isn't rotated correctly in all cases. To reproduce:
- Load https://github.com/mozilla/pdf.js/files/1522715/wuppertal_2012.pdf
- Let the document render.
- Rotate the document *four* times, such that the original rotation is restored.
The easiest solution, as far as I can tell, is that we always set the `transform` just as we did (for years) prior to the changes in PR 15812.
Originally we used helper functions for checking if something was a Dictionary or Stream, and then having an initial `typeof` check probably made sense.
However, given that we're using `instanceof` nowadays the additional check longer seems necessary.
Currently we're *virtually* duplicating the same code, for validating quotation marks, twice in this helper function.
The size decrease is quite small (107 bytes) and this makes the code slightly harder to reader, hence I completely understand if this patch is rejected.