With the changes to the `StreamType`/`FontType` "enums" in PR 11029, one unfortunate result is that `getStats` now *always* returns empty Arrays. Something that everyone, myself included, apparently missed is that you obviously cannot index an Array with Strings :-)
I wrongly assumed that the unit-tests would catch any bugs, but they apparently suffered from the same issue as the code in `src/core/`.
Another possible option could perhaps be to use `Set`s, rather than objects, but that will require larger changes since `LoopbackPort` (in `src/display/api.js`) doesn't support them.
There's a number of spots in the current code, and tests, where `cancel` methods are not called with appropriate arguments (leading to Promises not being rejected with Errors as intended).
In some cases the cancel `reason` is implicitly set to `undefined`, and in others the cancel `reason` is just a plain String. To address this inconsistency, the patch changes things such that cancelling is done with `AbortException`s everywhere instead.
Note that, in the old code, there was a code-path which could prevent this from happening thus affecting future cleanup.
Furthermore, ensure that we'll always attempt to cleanup when handling the 'PageError' message, similar to the code in e.g. the `PDFPageProxy._renderPageChunk` method.
The `receivingOperatorList` property is currently tracked *twice* in the rendering code, both directly and inversely through the `intentState.operatorList.lastChunk` boolean. This type of double bookkeeping is never a good idea, since it's just too easy for the properties to accidentally fall out of sync.
In this case there's even a `cleanup`-related bug caused by this, which means that `PDFPageProxy._tryCleanup` will never be able to discard any data if there's an error on the worker-thread (as handled through the 'PageError' message).
Hence the simplest solution seems, at least to me, to update `PDFPageProxy._tryCleanup` to replace the `intentState.receivingOperatorList` check with a `!intentState.operatorList.lastChunk` check and completely remove the former property.
Given that `cleanupAfterRender` is already set for large images, when handling 'obj' messages, this patch *should* thus be safe in general (since otherwise there ought be existing bugs related to cleanup and printing).
Calling `someArray = []` will create a new Array, which seems completely unnecessary when it's sufficient to just call `someArray.length = 0` to achieve the same effect.
Even though I cannot imagine these particular cases having any noticeable performance impact, similar changes were made in `core/` code years ago since it's apparently more efficient memory wise.
This includes the information in the core and display layers. The
date parsing logic from the document properties is rewritten according
to the specification and now includes unit tests.
Moreover, missing unit tests for the color of a popup annotation have
been added.
Finally the styling of the popup is changed slightly to make the text a
bit smaller (it's currently quite large in comparison to other viewers)
and to make the drop shadow a bit more subtle. The former is done to be
able to easily include the modification date in the popup similar to how
other viewers do this.
While PR 10714 did address the `disableRange=true` case, it also managed to "break" the `disableStream=true` case instead since the indeterminate loadingBar is now displayed when it shouldn't; sorry about that!
The solution is simple enough though, don't attempt to fallback to `_fullRequestReader.onProgress` when handling "incomplete" loading information.
Please see the specification, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#M11.9.12864.1Heading.71.Viewer.Preferences
Furthermore, note that this patch *only* adds API support and unit-tests but does not attempt to integrate e.g. the `ViewerPreferences -> Direction` property into the viewer (which would be necessary to address issue 10736).
The reason for this is that it's not entirely clear to me exactly if/how that could be implemented; e.g. would it be as simple as setting the `dir` attribute on the `viewerContainer` DOM element, or will it be more complicated?
There's also the question of how the `ViewerPreferences -> Direction` value interacts with the `PageMode`, and this will generally require a fair bit of manual testing. Since the direction of the *entire* viewer depends on the browser locale, there's also a somewhat open question regarding what default value to use for different locales.
Finally, if the viewer supports `ViewerPreferences -> Direction` then I'm assuming that it will be necessary to allow users to override the default value, which will require (most likely) new `SecondaryToolbar` buttons and icons for those etc.
Hence this patch only lays the necessary foundation for eventually addressing issue 10736, but defers the actual implementation until later. (Time permitting, I'll try to look into the viewer part later.)
*Please note:* This patch purposely ignores `src/display/network.js`, since its support for progressive reading depends on the non-standard `moz-chunked-arraybuffer` responseType which is currently in the process of being removed.
The file `test/pdfs/annotation-caret-ink.pdf` is already available in
the repository as a reference test for this since I supplied it for
another patch that implemented ink annotations.
This mirrors the canvas implementation where we ignore these operators.
This avoids console spam regarding unimplemented operators we're not
interested in.
For the Tracemonkey paper, we're now down to one warning about tiling
patterns which is in fact a valid one.
In particular, this should reduce intermediate string creation by using
template strings and reduce variable lookup times by removing unneeded
variables and caching `this.current` in more places.
With PR 10675 having fixed the completely broken `disableRange=true` setting in the Firefox version of PDF.js, I couldn't help but noticing that loading progress is never reported properly in that case.
Currently loading progress is only reported for the `rangeProgress` chrome-event, which obviously isn't dispatched with `disableRange=true` set. However, the `progressiveRead` chrome-event includes loading progress as well, but this information isn't being used in any way.
Furthermore, the `PDFDataRangeTransport.onDataProgress` method wasn't able to handle "complete" loading information, and neither was `PDFDataTransportStream._onProgress` since that method would only ever attempt to report it through a RangeReader (which won't exist when `disableRange=true` is set).
Note how at https://mozilla.github.io/pdf.js/api/ it's being described as API docs, however `src/core/annotation.js` is not part of the public API.
Furthermore, given that the code residing in the `src/core/` folder is run in a worker-thread, it's not even accessible on the main-thread (since `postMessage` is being used to transfer the data).
Hence the different API methods simply returns a "proxy" to the underlying data, but not actually the same objects and data structures as in the worker-thread itself; thus it doesn't make a whole lot of sense to expose this in API docs as far as I'm concerned.
Finally, the patch fixes a small JSDoc related typo in `src/display/api.js` when referring to the `TextStyle` typedef.
The `moz-chunked-arraybuffer` responseType is a non-standard property, which has been subsumed by the Fetch API, and it's in the process of being removed from Firefox; please see https://bugzilla.mozilla.org/show_bug.cgi?id=1120171 and https://bugzilla.mozilla.org/show_bug.cgi?id=1411865
*Please note:* Rather than waiting for both `Fetch` *and* `ReadableStream` to be available in e.g. a Firefox ESR version (which is probably going to be 68 at the earliest), let's just decide that PDF.js release `2.1.266` will be the last one with `moz-chunked-arraybuffer` support and land this patch (since nothing should outright break without it anyway).
This transform resulted in an incorrectly positioned object when the
bounding box's upper-left corner did not start at (0,0), because
the translation was not reverted. This patch adds the missing transform.
The test file (tiling-pattern-box.pdf) is based on the PDF from #2825.
All but the first cube (including the PDF data) have been removed.
To trigger the bug that is fixed by this commit, I changed the BBox of
the first pattern from "[ 0 0 596 842]" to "[90 0 596 842]". Without
this patch, the dashed vertical line that intersects the corners at A
and E would disappear.
The new test file (tiling-pattern-large-steps.pdf) was manually created,
to have the following characteristics:
- Large xstep and ystep (90000)
- Page width is 4000 (which is larger than MAX_PATTERN_SIZE)
- Visually, the page consists of a red rectangle with a black border,
surrounded by a 50 unit white padding.
- Before patch: blurry; After patch: sharp
Fixes#6496Fixes#5698Fixes#1434Fixes#2825
Currently if trying to set `disableRange=true` in the built-in PDF Viewer in Firefox, either through `about:config` or via the URL hash, the PDF document will never load. It appears that this has been broken for a couple of years, without anyone noticing.
Obviously it's not a good idea to set `disableRange=true`, however it seems that this bug affects the PDF Viewer in Firefox even with default settings:
- In the case where `initialData` already contains the *entire* file, we're forced to dispatch a range request to re-fetch already available data just so that file loading may complete.
- (In the case where the data arrives, via streaming, before being specifically requested through `requestDataRange`, we're also forced to re-fetch data unnecessarily.) *This part was removed, to reduce the scope/risk of the patch somewhat.*
In the cases outlined above, we're having to re-fetch already available data thus potentially delaying loading/rendering of PDF files in Firefox (and wasting resources in the process).
This will further help reduce the amount of image data that's currently being held alive, by explicitly removing the `src` attribute.
Please note that this is mostly relevant for browsers which do not support `URL.createObjectURL`, or where `disableCreateObjectURL` was manually set by the user, since `blob:` URLs will be revoked (see the previous patch).
However, using `about:memory` (in Firefox) it does seem that this may also be generally helpful, given that calling `URL.revokeObjectURL` won't invalidate the image data itself (as far as I can tell).
Natively supported JPEG images are sent as-is, using a `blob:` or possibly a `data` URL, to the main-thread for loading/decoding.
However there's currently no attempt at releasing these resources, which are held alive by `blob:` URLs, which seems unfortunately given that images can be arbitrarily large.
As mentioned in https://developer.mozilla.org/en-US/docs/Web/API/URL/createObjectURL the lifetime of these URLs are tied to the document, hence they are not being removed when a page is cleaned-up/destroyed (e.g. when being removed from the `PDFPageViewBuffer` in the viewer).
This is easy to test with the help of `about:memory` (in Firefox), which clearly shows the number of `blob:` URLs becomming arbitrarily large *without* this patch. With this patch however the `blob:` URLs are immediately release upon clean-up as expected, and the memory consumption should thus be considerably reduced for long documents with (simple) JPEG images.
Note how `PDFDocumentProxy.destroy` is a nothing more than an alias for `PDFDocumentLoadingTask.destroy`. While removing the latter method would be a breaking API change, there's still room for at least some clean-up here.
The main changes in this patch are:
- Stop providing a `PDFDocumentLoadingTask` instance *separately* when creating a `PDFDocumentProxy`, since the loadingTask is already available through the `WorkerTransport` instance.
- Stop tracking the `PDFDocumentProxy` instance on the `WorkerTransport`, since that property is completely unused.
- Simplify the 'Multiple `getDocument` instances' unit-tests by only destroying *once*, rather than twice, for each document.
Given that the function is (purposely) independent of the verbosity level and that its message is worded to only apply on the main-thread, there's no reason to duplicate this across the built `pdf.js`/`pdf.worker.js` files.
Currently for every single parsed/rendered page there's no less than *four* `Date.now()` calls being made on the worker-side. This seems totally unnecessary, since the result of these calls are, by default, not used for anything *unless* the verbosity level is set to `INFO`.
The default size of these canvases seem to be `300 x 150` (two orders of magnitude larger than the ones in PR 10597), which probably is sufficient enough to matter since there's one such canvas for each textLayer that's rendered in the viewer.
Also fixes the incorrect rejection reason, i.e. one using a string rather than an `Error`, in the `TextLayerRenderTask.cancel` method.
While this particular canvas may be small, there can still be an arbitrarily large number of them (one per page rendered), which can/will eventually add up memory wise. This can be easily avoided by using the `cachedCanvases` abstraction instead, which will ensure that the `isFontSubpixelAAEnabled` canvas is removed together with other temporary canvases in `CanvasGraphics.endDrawing`.
This file (currently) contains not only DOM-specific helper functions/classes, but is used generally for various helper code relevant for main-thread functionality.
*Hopefully this patch makes sense, since I cannot claim to fully understand this function.*
With the changes made in PR 3354 *some* Type3 glyph outlines are no longer rendering correctly, since the final paths were being accidentally ignored.
The fact that Type3 fonts are not very common in PDF documents, and that most Type3 glyphs are unaffected by this regression, probably explains why this has gone unnoticed since 2013.
After PR 9340 all glyphs are now re-mapped to a Private Use Area (PUA) which means that if a font fails to load, for whatever reason[1], all glyphs in the font will now render as Unicode glyph outlines.
This obviously doesn't look good, to say the least, and might be seen as a "regression" since previously many glyphs were left in their original positions which provided a slightly better fallback[2].
Hence this patch, which implements a *general* fallback to the PDF.js built-in font renderer for fonts that fail to load (i.e. are rejected by the sanitizer). One caveat here is that this only works for the Font Loading API, since it's easy to handle errors in that case[3].
The solution implemented in this patch does *not* in any way delay the loading of valid fonts, which was the problem with my previous attempt at a solution, and will only require a bit of extra work/waiting for those fonts that actually fail to load.
*Please note:* This patch doesn't fix any of the underlying PDF.js font conversion bugs that's responsible for creating corrupt font files, however it does *improve* rendering in a number of cases; refer to this possibly incomplete list:
[Bug 1524888](https://bugzilla.mozilla.org/show_bug.cgi?id=1524888)
Issue 10175
Issue 10232
---
[1] Usually because the PDF.js font conversion code wasn't able to parse the font file correctly.
[2] Glyphs fell back to some default font, which while not accurate was more useful than the current state.
[3] Furthermore I'm not sure how to implement this generally, assuming that's even possible, and don't really have time/interest to look into it either.
- The only existing call-site, of this method, is never passing more than *one* font at a time anyway.
- As far as I can remember, this functionality has never actually been used (caveat: I didn't check the git history).
- This allows simplification of the method, especially by making use of the fact that it's now asynchronous.
- It should be just as easy to call `BaseFontLoader.bind` from within a loop, rather than having the loop in the method itself.
Currently all fonts are using the `_queueLoadingCallback` method to determine when they have been loaded[1]. However in most cases this is just adding unnecessary overhead, especially with `BaseFontLoader.bind` now being asynchronous, given how fonts are loaded:
- For fonts loaded using the Font Loading API, it's already possible to easily tell when a font has been loaded simply by checking the `loaded` promise on the FontFace object itself.
- For browsers, e.g. Firefox, which support synchronous font loading it's already assumed that fonts are immediately available.
Hence the `_queueLoadingCallback` method is moved into the `GenericFontLoader`, such that it's only utilized for fonts which are loaded using CSS.
---
[1] In the "fonts loaded using CSS" case, this is already a hack anyway as outlined in the comments.
This polyfill is currently used in only *one* file, i.e. `src/display/api.js`, and only when trying to build a *fallback* `workerSrc` path.
Given that the global `workerSrc` should *always* be set[1] when using the PDF.js library[2], and that the fallback `workerSrc` should only be regarded as a best-effort solution anyway, there isn't a particularily strong reason to keep the compatibility code in my opinion.
---
[1] Other supported options include setting the global `workerPort`, or passing in a `PDFWorker` instance as part of the `getDocument` call.
[2] Which is clearly mentioned in the JSDocs in `src/display/worker_options.js`.
This piggybacks of the existing `cancel` functionality, to ensure that any pending operations are closed *and* that any temporary canvases are actually being removed.
Also simplifies `finishPaintTask` in `PDFPageView.draw` slightly, by converting it to an async function.
This will allow the Metadata to be successfully extracted from the PDF file in issue 10395.
Furthermore, this patch also fixes a bug in `Metadata.get` which causes the method to return `null` rather than an empty string or zero (since either ought to be allowed).
The error was triggered for a particular set of metadata, where an end tag was encountered without the corresponding begin tag being present in the data.
(The patch also fixes a minor oversight, from a recent PR, in the `SimpleDOMNode.nextSibling` method.)
Given that the issue, as filed, is incomplete since no PDF file was provided for debugging, this patch is really the best that we can do here. *Please note:* This patch will *not* enable the Metadata to be successfully parsed, but it should at least prevent the errors.
This method creates quite a few intermediate strings on each call and
it's called often, even for smaller documents like the Tracemonkey
document. Scrolling from top to bottom in that document resulted in
14126 strings being created in this method. With this commit applied,
this is reduced to 2018 strings.
This method creates quite a few intermediate strings on each call and
it's called often, even for smaller documents like the Tracemonkey
document. Scrolling from top to bottom in that document resulted in
12936 strings being created in this method. With this commit applied,
this is reduced to 3610 strings.
If, as PR 10368 suggests, more parameters should be added to `getViewport` I think that it would be a mistake to not change the signature *first* to avoid needlessly unwieldy call-sites.
To not break any existing code and third-party use-cases, this is obviously implemented with a deprecation warning *and* with a working fallback[1] for the old method signature.
---
[1] This is limited to `GENERIC` builds, which should be sufficient.
Note that the OpenAction dictionary may contain other information besides just a destination array, e.g. instructions for auto-printing[1].
Given first of all that an arbitrary `Dict` cannot be sent from the Worker (since cloning would fail), and second of all that the data obviously needs to be validated, this patch purposely only adds support for fetching a destination from the OpenAction entry[2].
---
[1] This information is, currently in PDF.js, being included through the `getJavaScript` API method.
[2] This significantly reduces the complexity of the implementation, which seems fine for now. If there's ever need for other kinds of OpenAction to be fetched, additional API methods could/should be implemented as necessary (could e.g. follow the `getOpenActionWhatever` naming scheme).
This changes all occurrences of `var` to `let`/`const` in this code, and updates the signature of the constructor to use object destructuring for better readability (and self documentation).
Also, `useRequestAnimationFrame` is changed to a parameter and the `typeof window` check is now done *once* rather than at every `_scheduleNext` call.