The border `width` will instead fallback to the default value of `1`, rather than ignoring it altoghether, to also ensure that e.g. `LinkAnnotation`s become clickable as intended.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1552113
Usually when the worker is terminated it will also be completely destroyed/removed, which means that any global caches (such as the ones in `src/core/primitive.js`) should be automatically cleared in the process.
However, for certain ways of loading the `pdf.worker.js` file, e.g. passing in a re-usable worker to `getDocument`, using the `workerPort` functionality, or even disabling workers completely (even though this is never a good idea), the worker file may be kept in memory and these caches will not be cleared as expected.
Calling `someArray = []` will create a new Array, which seems completely unnecessary when it's sufficient to just call `someArray.length = 0` to achieve the same effect.
Even though I cannot imagine these particular cases having any noticeable performance impact, similar changes were made in `core/` code years ago since it's apparently more efficient memory wise.
The purpose of these caches is to reduce peak memory usage, by only ever having *a single* instance of a particular object.
However, as-is these caches are never cleared and they will thus remain until the worker is destroyed. This could very well have a negative effect on total memory usage, particularly for large/long documents, hence it seems to make sense to clear out these caches together with various other ones.
This is similar to the existing caching used to reduced the number of `Cmd` and `Name` objects.
With the `tracemonkey.pdf` file, this patch changes the number of `Ref` objects as follows (in the default viewer):
| | Loading the first page | Loading *all* the pages |
|----------|------------------------|-------------------------|
| `master` | 332 | 3265 |
| `patch` | 163 | 996 |
The specification states that `CreationDate` is only available for
markup annotations instead of for all annotation types.
Moreover, popup annotations are not markup annotations according to the
specification, so the creation date inheritance from the parent
annotation is also removed there (note that only the modification date
is used in e.g., the viewer).
This includes the information in the core and display layers. The
date parsing logic from the document properties is rewritten according
to the specification and now includes unit tests.
Moreover, missing unit tests for the color of a popup annotation have
been added.
Finally the styling of the popup is changed slightly to make the text a
bit smaller (it's currently quite large in comparison to other viewers)
and to make the drop shadow a bit more subtle. The former is done to be
able to easily include the modification date in the popup similar to how
other viewers do this.
Currently `handleColorN` will fallback to add a completely unparsed/unvalidated operator when no valid pattern was found. This is unfortunate, since it could very easily lead to a couple of different errors:
- `DataCloneError`s when attempting to send the data to the main-thread, e.g. when `args` is `Dict`/`Stream`.
- Errors in `getShadingPatternFromIR` on the main-thread, unless `args` just happens to have the expected format.
- Errors when actually attempting to render the pattern on the main-thread, since the `args` will most likely not have the expected format.
Hence it probably makes sense to error in `PartialEvaluator.handleColorN`, and having invalid patterns fail gracefully via the existing `ignoreErrors` code-paths instead.
It appears that this has been broken ever since PR 9089, which also introduced this code, since the `QueueOptimizer`/`NullOptimizer` choice was made based on the still undefined `this.intent` property.
Furthermore, fixing this also uncovered the fact that the `NullOptimizer.reset` method was missing.
First of all, while this simple approach appears to work OK in practice I'm not sure if it's the best way of addressing the problem (assuming that you even want to).
Second of all, while the solution implemented here only requires tracking/checking one new boolean in order for this to work, I'm nonetheless not entirely happy about this since it will add additional overhead (albeit *very* small) to the parsing of path operators in PDF documents just for a handful of *corrupt* ones.
This way we can avoid manually building a "document id" in multiple places in `evaluator.js`, and it also let's us avoid passing in an otherwise unnecessary `PDFManager` instance when creating a `PartialEvaluator`.
While PR 10714 did address the `disableRange=true` case, it also managed to "break" the `disableStream=true` case instead since the indeterminate loadingBar is now displayed when it shouldn't; sorry about that!
The solution is simple enough though, don't attempt to fallback to `_fullRequestReader.onProgress` when handling "incomplete" loading information.
Please see the specification, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#M11.9.12864.1Heading.71.Viewer.Preferences
Furthermore, note that this patch *only* adds API support and unit-tests but does not attempt to integrate e.g. the `ViewerPreferences -> Direction` property into the viewer (which would be necessary to address issue 10736).
The reason for this is that it's not entirely clear to me exactly if/how that could be implemented; e.g. would it be as simple as setting the `dir` attribute on the `viewerContainer` DOM element, or will it be more complicated?
There's also the question of how the `ViewerPreferences -> Direction` value interacts with the `PageMode`, and this will generally require a fair bit of manual testing. Since the direction of the *entire* viewer depends on the browser locale, there's also a somewhat open question regarding what default value to use for different locales.
Finally, if the viewer supports `ViewerPreferences -> Direction` then I'm assuming that it will be necessary to allow users to override the default value, which will require (most likely) new `SecondaryToolbar` buttons and icons for those etc.
Hence this patch only lays the necessary foundation for eventually addressing issue 10736, but defers the actual implementation until later. (Time permitting, I'll try to look into the viewer part later.)
*Please note:* This patch purposely ignores `src/display/network.js`, since its support for progressive reading depends on the non-standard `moz-chunked-arraybuffer` responseType which is currently in the process of being removed.
The file `test/pdfs/annotation-caret-ink.pdf` is already available in
the repository as a reference test for this since I supplied it for
another patch that implemented ink annotations.
This mirrors the canvas implementation where we ignore these operators.
This avoids console spam regarding unimplemented operators we're not
interested in.
For the Tracemonkey paper, we're now down to one warning about tiling
patterns which is in fact a valid one.
In particular, this should reduce intermediate string creation by using
template strings and reduce variable lookup times by removing unneeded
variables and caching `this.current` in more places.
With PR 10675 having fixed the completely broken `disableRange=true` setting in the Firefox version of PDF.js, I couldn't help but noticing that loading progress is never reported properly in that case.
Currently loading progress is only reported for the `rangeProgress` chrome-event, which obviously isn't dispatched with `disableRange=true` set. However, the `progressiveRead` chrome-event includes loading progress as well, but this information isn't being used in any way.
Furthermore, the `PDFDataRangeTransport.onDataProgress` method wasn't able to handle "complete" loading information, and neither was `PDFDataTransportStream._onProgress` since that method would only ever attempt to report it through a RangeReader (which won't exist when `disableRange=true` is set).
Note how at https://mozilla.github.io/pdf.js/api/ it's being described as API docs, however `src/core/annotation.js` is not part of the public API.
Furthermore, given that the code residing in the `src/core/` folder is run in a worker-thread, it's not even accessible on the main-thread (since `postMessage` is being used to transfer the data).
Hence the different API methods simply returns a "proxy" to the underlying data, but not actually the same objects and data structures as in the worker-thread itself; thus it doesn't make a whole lot of sense to expose this in API docs as far as I'm concerned.
Finally, the patch fixes a small JSDoc related typo in `src/display/api.js` when referring to the `TextStyle` typedef.
The `moz-chunked-arraybuffer` responseType is a non-standard property, which has been subsumed by the Fetch API, and it's in the process of being removed from Firefox; please see https://bugzilla.mozilla.org/show_bug.cgi?id=1120171 and https://bugzilla.mozilla.org/show_bug.cgi?id=1411865
*Please note:* Rather than waiting for both `Fetch` *and* `ReadableStream` to be available in e.g. a Firefox ESR version (which is probably going to be 68 at the earliest), let's just decide that PDF.js release `2.1.266` will be the last one with `moz-chunked-arraybuffer` support and land this patch (since nothing should outright break without it anyway).
Note how `XRef.fetchUncompressed`, which is used *a lot* for most PDF documents, is calling the `makeSubStream` method without providing a `length` argument.
In practice this results in the `makeSubStream` method, on the `ChunkedStream` instance, calling the `ensureRange` method with `NaN` as the end position, thus resulting in no data being requested despite it possibly being necessary.
This may be quite bad, since in this particular case it will lead to a new `ChunkedStream` being created *and* also a new `Parser`/`Lexer` instance. Given that it's quite possible that even the very first `Parser.getObj` call could throw `MissingDataException`, this could thus lead to wasted time/resources (since re-parsing is necessary once the data finally arrives).
You obviously need to be very careful to not have `ChunkedStream.makeSubStream` accidentally requesting the *entire* file, hence its `this.end` property is of no use here, but it should be possible to at least check that the `start` of the data is present before any potentially expensive parsing occurs.
This transform resulted in an incorrectly positioned object when the
bounding box's upper-left corner did not start at (0,0), because
the translation was not reverted. This patch adds the missing transform.
The test file (tiling-pattern-box.pdf) is based on the PDF from #2825.
All but the first cube (including the PDF data) have been removed.
To trigger the bug that is fixed by this commit, I changed the BBox of
the first pattern from "[ 0 0 596 842]" to "[90 0 596 842]". Without
this patch, the dashed vertical line that intersects the corners at A
and E would disappear.
The new test file (tiling-pattern-large-steps.pdf) was manually created,
to have the following characteristics:
- Large xstep and ystep (90000)
- Page width is 4000 (which is larger than MAX_PATTERN_SIZE)
- Visually, the page consists of a red rectangle with a black border,
surrounded by a 50 unit white padding.
- Before patch: blurry; After patch: sharp
Fixes#6496Fixes#5698Fixes#1434Fixes#2825
Without this some fonts may incorrectly end up with matching `hash`es, thus breaking rendering since we'll not actually try to load/parse some of the fonts.
Note that `PartialEvaluator.preEvaluateFont` will return an empty string when no hash was computed. This will complete short-circuit the `fontAlias` comparison in `PartialEvaluator.loadFont`, since fonts which are totally different will then match if their `hash`es are empty.
Currently if trying to set `disableRange=true` in the built-in PDF Viewer in Firefox, either through `about:config` or via the URL hash, the PDF document will never load. It appears that this has been broken for a couple of years, without anyone noticing.
Obviously it's not a good idea to set `disableRange=true`, however it seems that this bug affects the PDF Viewer in Firefox even with default settings:
- In the case where `initialData` already contains the *entire* file, we're forced to dispatch a range request to re-fetch already available data just so that file loading may complete.
- (In the case where the data arrives, via streaming, before being specifically requested through `requestDataRange`, we're also forced to re-fetch data unnecessarily.) *This part was removed, to reduce the scope/risk of the patch somewhat.*
In the cases outlined above, we're having to re-fetch already available data thus potentially delaying loading/rendering of PDF files in Firefox (and wasting resources in the process).
This function is currently called with the `OperatorList` instance as its argument, hence I cannot think of any good reason for not just moving it into the `OperatorList` properly. (This will also help with other planned changes regarding the `ImageCache` functionality.)
By transfering `ArrayBuffer`s you can avoid having two copies of the same data, i.e. one copy on each of the worker/main-thread, for data that's used only *once* on the worker-thread.
Note how the code in [`PDFImage.createMask`](80135378ca/src/core/image.js (L284-L285)) goes to great lengths to actually enable tranfering of the image data. However in [`PartialEvaluator.buildPaintImageXObject`](80135378ca/src/core/evaluator.js (L336)) the `cached` property is always set to `true`, which disqualifies the image data from being transfered; see [`getTransfers`](80135378ca/src/core/operator_list.js (L552-L554)).
For most ImageMask data this patch won't matter, since images found in the `/Resources -> /XObject` dictionary will always be indexed by name. However for *inline* images which contains ImageMask data, where only "small" images are cached (in both `parser.js` and `evaluator.js`), the current code will result in some unnecessary memory usage.
This will further help reduce the amount of image data that's currently being held alive, by explicitly removing the `src` attribute.
Please note that this is mostly relevant for browsers which do not support `URL.createObjectURL`, or where `disableCreateObjectURL` was manually set by the user, since `blob:` URLs will be revoked (see the previous patch).
However, using `about:memory` (in Firefox) it does seem that this may also be generally helpful, given that calling `URL.revokeObjectURL` won't invalidate the image data itself (as far as I can tell).
Natively supported JPEG images are sent as-is, using a `blob:` or possibly a `data` URL, to the main-thread for loading/decoding.
However there's currently no attempt at releasing these resources, which are held alive by `blob:` URLs, which seems unfortunately given that images can be arbitrarily large.
As mentioned in https://developer.mozilla.org/en-US/docs/Web/API/URL/createObjectURL the lifetime of these URLs are tied to the document, hence they are not being removed when a page is cleaned-up/destroyed (e.g. when being removed from the `PDFPageViewBuffer` in the viewer).
This is easy to test with the help of `about:memory` (in Firefox), which clearly shows the number of `blob:` URLs becomming arbitrarily large *without* this patch. With this patch however the `blob:` URLs are immediately release upon clean-up as expected, and the memory consumption should thus be considerably reduced for long documents with (simple) JPEG images.
Note how `PDFDocumentProxy.destroy` is a nothing more than an alias for `PDFDocumentLoadingTask.destroy`. While removing the latter method would be a breaking API change, there's still room for at least some clean-up here.
The main changes in this patch are:
- Stop providing a `PDFDocumentLoadingTask` instance *separately* when creating a `PDFDocumentProxy`, since the loadingTask is already available through the `WorkerTransport` instance.
- Stop tracking the `PDFDocumentProxy` instance on the `WorkerTransport`, since that property is completely unused.
- Simplify the 'Multiple `getDocument` instances' unit-tests by only destroying *once*, rather than twice, for each document.
For Type3 fonts text-selection is often not that great, and there's a couple of heuristics used to try and improve things. This patch simple extends those heuristics a bit, and fixes a pre-existing "naive" array comparison, but this all feels a bit brittle to say the least.
The existing Type3 test-coverage isn't that great in general, and in particular Type3 `text` tests are few and far between, hence why this patch adds *two* different new `text` tests.
Notable changes:
- Remove the `return this;` from the `MurmurHash3_64.update` method, since it's completely unused and doesn't make a lot of sense.
- Remove the loop(s) from the `MurmurHash3_64.hexdigest` method, since creating a temporary array and then looping over it is wasteful given how simple this can be written with modern JavaScript.
Given that the function is (purposely) independent of the verbosity level and that its message is worded to only apply on the main-thread, there's no reason to duplicate this across the built `pdf.js`/`pdf.worker.js` files.
Currently for every single parsed/rendered page there's no less than *four* `Date.now()` calls being made on the worker-side. This seems totally unnecessary, since the result of these calls are, by default, not used for anything *unless* the verbosity level is set to `INFO`.
The default size of these canvases seem to be `300 x 150` (two orders of magnitude larger than the ones in PR 10597), which probably is sufficient enough to matter since there's one such canvas for each textLayer that's rendered in the viewer.
Also fixes the incorrect rejection reason, i.e. one using a string rather than an `Error`, in the `TextLayerRenderTask.cancel` method.
While this particular canvas may be small, there can still be an arbitrarily large number of them (one per page rendered), which can/will eventually add up memory wise. This can be easily avoided by using the `cachedCanvases` abstraction instead, which will ensure that the `isFontSubpixelAAEnabled` canvas is removed together with other temporary canvases in `CanvasGraphics.endDrawing`.
The `src/shared/util.js` file is being bundled into both the `pdf.js` and `pdf.worker.js` files, meaning that its code is by definition duplicated.
Some main-thread only utility functions have already been moved to a separate `src/display/display_utils.js` file, and this patch simply extends that concept to utility functions which are used *only* on the worker-thread.
Note in particular the `getInheritableProperty` function, which expects a `Dict` as input and thus *cannot* possibly ever be used on the main-thread.
This file (currently) contains not only DOM-specific helper functions/classes, but is used generally for various helper code relevant for main-thread functionality.
*Hopefully this patch makes sense, since I cannot claim to fully understand this function.*
With the changes made in PR 3354 *some* Type3 glyph outlines are no longer rendering correctly, since the final paths were being accidentally ignored.
The fact that Type3 fonts are not very common in PDF documents, and that most Type3 glyphs are unaffected by this regression, probably explains why this has gone unnoticed since 2013.
After PR 9340 all glyphs are now re-mapped to a Private Use Area (PUA) which means that if a font fails to load, for whatever reason[1], all glyphs in the font will now render as Unicode glyph outlines.
This obviously doesn't look good, to say the least, and might be seen as a "regression" since previously many glyphs were left in their original positions which provided a slightly better fallback[2].
Hence this patch, which implements a *general* fallback to the PDF.js built-in font renderer for fonts that fail to load (i.e. are rejected by the sanitizer). One caveat here is that this only works for the Font Loading API, since it's easy to handle errors in that case[3].
The solution implemented in this patch does *not* in any way delay the loading of valid fonts, which was the problem with my previous attempt at a solution, and will only require a bit of extra work/waiting for those fonts that actually fail to load.
*Please note:* This patch doesn't fix any of the underlying PDF.js font conversion bugs that's responsible for creating corrupt font files, however it does *improve* rendering in a number of cases; refer to this possibly incomplete list:
[Bug 1524888](https://bugzilla.mozilla.org/show_bug.cgi?id=1524888)
Issue 10175
Issue 10232
---
[1] Usually because the PDF.js font conversion code wasn't able to parse the font file correctly.
[2] Glyphs fell back to some default font, which while not accurate was more useful than the current state.
[3] Furthermore I'm not sure how to implement this generally, assuming that's even possible, and don't really have time/interest to look into it either.
- The only existing call-site, of this method, is never passing more than *one* font at a time anyway.
- As far as I can remember, this functionality has never actually been used (caveat: I didn't check the git history).
- This allows simplification of the method, especially by making use of the fact that it's now asynchronous.
- It should be just as easy to call `BaseFontLoader.bind` from within a loop, rather than having the loop in the method itself.
Currently all fonts are using the `_queueLoadingCallback` method to determine when they have been loaded[1]. However in most cases this is just adding unnecessary overhead, especially with `BaseFontLoader.bind` now being asynchronous, given how fonts are loaded:
- For fonts loaded using the Font Loading API, it's already possible to easily tell when a font has been loaded simply by checking the `loaded` promise on the FontFace object itself.
- For browsers, e.g. Firefox, which support synchronous font loading it's already assumed that fonts are immediately available.
Hence the `_queueLoadingCallback` method is moved into the `GenericFontLoader`, such that it's only utilized for fonts which are loaded using CSS.
---
[1] In the "fonts loaded using CSS" case, this is already a hack anyway as outlined in the comments.
pdf.js had a problem when copying characters on supplementary planes
(0xPPXXXX where PP is nonzero). This is because certain methods of
PartialEvaluator use classic String.fromCharCode instead of ES6's
String.fromCodePoint.
Despite the fact that readToUnicode method *tried* to parse out-of-UCS2
code points by parsing UTF-16BE, it was inadequate because
String.fromCharCode only supports UCS-2 range of Unicode.
Unsurprisingly IE11 doesn't support this, so a polyfill is needed since otherwise the sidebar can no longer be opened.
Also, simplifies the existing `classList.toggle` polyfill.
This polyfill is currently used in only *one* file, i.e. `src/display/api.js`, and only when trying to build a *fallback* `workerSrc` path.
Given that the global `workerSrc` should *always* be set[1] when using the PDF.js library[2], and that the fallback `workerSrc` should only be regarded as a best-effort solution anyway, there isn't a particularily strong reason to keep the compatibility code in my opinion.
---
[1] Other supported options include setting the global `workerPort`, or passing in a `PDFWorker` instance as part of the `getDocument` call.
[2] Which is clearly mentioned in the JSDocs in `src/display/worker_options.js`.
This piggybacks of the existing `cancel` functionality, to ensure that any pending operations are closed *and* that any temporary canvases are actually being removed.
Also simplifies `finishPaintTask` in `PDFPageView.draw` slightly, by converting it to an async function.
All objects in the PDF document follow this pattern:
```
0000000001 0 obj
<<
% Some content here...
>>
endobj
0000000002 0 obj
<<
% More content here...
endobj
```
Based on the discussion in https://bugzilla.mozilla.org/show_bug.cgi?id=1521413, this patch simply removes the `ReadableStream` polyfill completely from MOZCENTRAL builds.
With this patch, the size of the `gulp mozcentral` build target is thus further reduced (building on PR 10470):
| | `build/mozcentral`
|-------|-------------------
|master | 3 339 666
|patch | 3 209 572
With https://bugzilla.mozilla.org/show_bug.cgi?id=1505122 landing in Firefox 65, the native `ReadableStream` implementation is now enabled by default in Firefox.
Obviously it would be nice to simply stop bundling the polyfill in MOZCENTRAL builds altogether, however given that it's still possible to disable[1] `ReadableStream` this is probably not a good idea just yet.
Nonetheless, now that native support is available, it seems unnecessary (and wasteful) to keep bundling the polyfill twice[2] in MOZCENTRAL builds. Hence this patch, which contains a suggest approach for packing the polyfill in a *separate* file which is then *only* loaded if/when needed.
With this patch, the size of the `gulp mozcentral` build target is thus reduced accordingly:
| | `build/mozcentral`
|-------|-------------------
|master | 3 461 089
|patch | 3 340 268
Besides the PDF.js files taking up less space in Firefox this way, the additional benefit is that there's (by default) less code that needs to be loaded and parsed when the PDF Viewer is used which also cannot hurt.
---
[1] In `about:config`, by toggling the `javascript.options.streams` preference.
[2] Once in the `build/pdf.js` file, and once in the `build/pdf.worker.js` file.
In many cases in the code you don't actually care about the index itself, but rather just want to know if something exists in a String/Array or if a String starts in a particular way. With modern JavaScript functionality, it's thus possible to remove a number of existing `indexOf` cases.
This will allow the Metadata to be successfully extracted from the PDF file in issue 10395.
Furthermore, this patch also fixes a bug in `Metadata.get` which causes the method to return `null` rather than an empty string or zero (since either ought to be allowed).
The error was triggered for a particular set of metadata, where an end tag was encountered without the corresponding begin tag being present in the data.
(The patch also fixes a minor oversight, from a recent PR, in the `SimpleDOMNode.nextSibling` method.)
Given that the issue, as filed, is incomplete since no PDF file was provided for debugging, this patch is really the best that we can do here. *Please note:* This patch will *not* enable the Metadata to be successfully parsed, but it should at least prevent the errors.
This method creates quite a few intermediate strings on each call and
it's called often, even for smaller documents like the Tracemonkey
document. Scrolling from top to bottom in that document resulted in
14126 strings being created in this method. With this commit applied,
this is reduced to 2018 strings.
This method creates quite a few intermediate strings on each call and
it's called often, even for smaller documents like the Tracemonkey
document. Scrolling from top to bottom in that document resulted in
12936 strings being created in this method. With this commit applied,
this is reduced to 3610 strings.
The `toString` method always creates two string objects (for the 'R'
character and for the `num` concatenation) and in the worst case
creates three string objects (one more for the `gen` concatenation).
For the Tracemonkey paper alone, this resulted in 12000 string
objects when scrolling from the top to the bottom of the document.
Since this is a hot function, it's worth minimizing the number of string
objects, especially for large documents, to reduce peak memory usage.
This commit refactors the `toString` method to always create only one
string object.