This rule is already enabled in mozilla-central, and helps avoid some confusing formatting, see https://searchfox.org/mozilla-central/rev/9e45d74b956be046e5021a746b0c8912f1c27318/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#209-210
With the recent introduction of Prettier some of the existing nested ternary statements became even more difficult to read, since any possibly helpful indentation was removed.
This particular ESLint rule wasn't entirely straightforward to enable, and I do recognize that there's a certain amount of subjectivity in the changes being made. Generally, the changes in this patch fall into three categories:
- Cases where a value is only clamped to a certain range (the easiest ones to update).
- Cases where the values involved are "simple", such as Numbers and Strings, which are re-factored to initialize the variable with the *default* value and only update it when necessary by using `if`/`else if` statements.
- Cases with more complex and/or larger values, such as TypedArrays, which are re-factored to let the variable be (implicitly) undefined and where all values are then set through `if`/`else if`/`else` statements.
Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-nested-ternary
In order to eventually get rid of SystemJS and start using native `import`s instead, we'll need to provide "complete" file identifiers since otherwise there'll be MIME type errors when attempting to use `import`.
This patch makes the follow changes:
- Remove no longer necessary inline `// eslint-disable-...` comments.
- Fix `// eslint-disable-...` comments that Prettier moved down, thus causing new linting errors.
- Concatenate strings which now fit on just one line.
- Fix comments that are now too long.
- Finally, and most importantly, adjust comments that Prettier moved down, since the new positions often is confusing or outright wrong.
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).
Prettier is being used for a couple of reasons:
- To be consistent with `mozilla-central`, where Prettier is already in use across the tree.
- To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.
Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.
*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.
(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
There's a fair number of (primarily) `Array`s/`TypedArray`s whose formatting we don't want disturb, since in many cases that would lead to the code becoming much more difficult to read and/or break existing inline comments.
*Please note:* It may be a good idea to look through these cases individually, and possibly re-write some of the them (especially the `String` ones) to reduce the need for all of these ignore commands.
Since bundlers, such as Webpack, cannot be told to leave `require` statements alone we are thus forced to jump through hoops in order to prevent these warnings in third-party deployments of the PDF.js library; please see [Webpack issue 8826](https://github.com/webpack/webpack) and libraries such as [require-fool-webpack](https://github.com/sindresorhus/require-fool-webpack).
*Please note:* This is based on the assumption that code running in Node.js won't ever be affected by e.g. Content Security Policies that prevent use of `eval`. If that ever occurs, we should revert to a normal `require` statement and simply document the Webpack warnings instead.
This patch reduces some duplication, by moving *all* fake worker loader code into the `setupFakeWorkerGlobal` function. Furthermore, the functions are simplified further by using `async`/`await` where appropriate.
There's no particularily good reason, as far as I can tell, to not support a custom worker path in Node.js environments (even if workers aren't supported). This patch thus make the Node.js fake worker loader code-path consistent with the fallback code-path used with *browser* fake worker loader.
Finally, this patch also deprecates[1] the `fallbackWorkerSrc` functionality, except in Node.js, since the user should *always* provide correct worker options since the fallback is nothing more than a best-effort solution.
---
[1] Although it probably shouldn't be removed until the next major version.
For performance reasons, and to avoid hanging the browser UI, the PDF.js library should *always* be used with web workers enabled.
At this point in time all of the supported browsers should have proper worker support, and Node.js is thus the only environment where workers aren't supported. Hence it no longer seems relevant/necessary to provide, by default, fake worker loaders for various JS builders/bundlers/frameworks in the PDF.js code itself.[1]
In order to simplify things, the fake worker loader code is thus simplified to now *only* support Node.js usage respectively "normal" browser usage out-of-the-box.[2]
*Please note:* The officially intended way of using the PDF.js library is with workers enabled, which can be done by setting `GlobalWorkerOptions.workerSrc`, `GlobalWorkerOptions.workerPort`, or manually providing a `PDFWorker` instance when calling `getDocument`.
---
[1] Note that it's still possible to *manually* disable workers, simply my manually loading the built `pdf.worker.js` file into the (current) global scope, however this's mostly intended for testing/debugging purposes.
[2] Unfortunately some bundlers such as Webpack, when used with third-party deployments of the PDF.js library, will start to print `Critical dependency: ...` warnings when run against the built `pdf.js` file from this patch. The reason is that despite the `require` calls being protected by *runtime* `isNodeJS` checks, it's not possible to simply tell Webpack to just ignore the `require`; please see [Webpack issue 8826](https://github.com/webpack/webpack) and libraries such as [require-fool-webpack](https://github.com/sindresorhus/require-fool-webpack).
Note that most (reasonably) modern browsers have supported this for a while now, see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#Browser_compatibility
By moving the polyfill into `src/shared/compatibility.js` we can thus get rid of the need to manually export/import `ReadableStream` and simply use it directly instead.
The only change here which *could* possibly lead to a difference in behavior is in the `isFetchSupported` function. Previously we attempted to check for the existence of a global `ReadableStream` implementation, which could now pass (assuming obviously that the preceding checks also succeeded).
However I'm not sure if that's a problem, since the previous check only confirmed the existence of a native `ReadableStream` implementation and not that it actually worked correctly. Finally it *could* just as well have been a globally registered polyfill from an application embedding the PDF.js library.
For Popup annotation trigger elements consisting of an arbitrary polyline, you need to ensure that the 'stroke-width' is always non-zero since otherwise it's impossible to actually open/close the popup.
Unfortunately I don't believe that any of the test-suites can be used to test this, hence why no tests are included in the patch.
For certain canvas-related errors (and probably others), the browser rendering exceptions may be propagated "as-is" to the PDF.js code. In this case, the exceptions are of the somewhat cryptic `NS_ERROR_FAILURE` type.
Unfortunately these aren't actual `Error`s, which thus ends up unintentionally triggering the `assert` in `PDFPageProxy._abortOperatorList`; sorry about that!
I happened to look at this code and the way that the link target is set seems unecessarily convoluted, since we're using `Object.values` and `Array.prototype.includes` for *every* link being parsed.
Given that the number of link targets are so few, the easist solution honestly seem to be to just use a `switch` statement to do the link target mapping.
The code in question is *only* relevant in non-`PRODUCTION` mode, i.e. the *development* version of the viewer run with `gulp server`, and has been completely unused at least since SystemJS was added.
I really cannot see any reason to keep this, since it's code which first of all isn't shipping and secondly isn't even being used in the development viewer.
This argument is a left-over from older API code, where we unconditionally initialized `StatTimer` instances for every page. For quite some time that's only been done when `pdfBug` is set, hence it seems unnecessary to keep this functionality.
Even though the currect situation only results in six unnecessary function calls per page, it nonetheless seems completely unnecessary to call dummy functions when `pdfBug` is *not* set (i.e. the default behaviour).
As can be seen in the API, there's a number of document loading Exception handlers which are both really simple and highly similar. Hence these are changed such that all the relevant Exceptions are sent via *one* message instead.
Furthermore, the patch also avoids unnecessarily re-creating `UnknownErrorException`s at the worker side and removes an unnecessary `bind` call.
Sometimes we also used `@return`, but `@returns` is what the JSDoc
documentation recommends. Even though `@return` works as an alias, it's
good to use the recommended syntax and to be consistent within the
project.
Sometimes we also used `@return` or `@returns`, but `@type` is what
the JSDoc documentation recommends. This also improves the documentation
because before this commit the types were not shown and now they are.
All of these methods have been marked as `deprecated` in *three* releases now, and I'd thus like to (slowly) move towards complete removal.
However rather than just removing the methods right away, which would cause somewhat cryptic failures, this patch tries to implement a hopefully reasonable middle ground by throwing `Error`s with (essentially) the same information as the previous warnings.
While the previous `deprecated` messages could perhaps be seen as optional, with these changes API consumers will now be forced to actually migrate their code.
By utilizing a base "class", things become significantly simpler. Unfortunately the new `BaseException` cannot be a proper ES6 class and just extend `Error`, since the SystemJS dependency doesn't seem to play well with that.
Note also that we (generally) need to keep the `name` property on the actual `...Exception` object, rather than on its prototype, since the property will otherwise be dropped during the structured cloning used with `postMessage`.
By default, i.e. with workers enabled, it's *purposely* not possible to send `Dict`s and `Stream`s from the worker-thread. This is achieved by defining a `function` on every `Dict` instance, since that ensures that [the structured clone algoritm](https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm) will throw an Error on `postMessage`.
However, with workers *disabled* we fall-back to the `LoopbackPort` implementation which just ignores any `function`s, thus incorrectly allowing sending of data which *should* be unclonable.
With this patch we're finally able to abort worker-thread parsing of the `OperatorList`, rather than *only* aborting the main-thread rendering itself, when the `RenderTask.cancel` method is being called.
This will help improve perceived performance in the default viewer, especially when reading longer and more complex documents, since pages that've been scrolled out-of-view (and thus evicted from the cache) will no longer compete for parsing resources on the worker-thread.
*Please note:* With the implementation in this patch we're *not* aborting worker-thread parsing immediately on `RenderTask.cancel`, since that would lead to *worse* performance in many cases. For example: When zoom/rotation occurs in the viewer, while parsing/rendering is still ongoing, a `cancel` call will usually be (almost) immediately folled by a new `PDFPageProxy.render` call. In that case you obviously don't want to abort parsing on the worker-thread, since that would risk throwing away a partially parsed `OperatorList` and thus force unnecessary re-parsing which will regress perceived performance (especially for more complex documents).
When choosing a reasonable delay, before cancelling `getOperatorList` on the worker-thread when `RenderTask.cancel` is called, two different positions need to be considered:
1. The delay needs to be short enough, since a timeout in the multiple seconds range would essentially make this entire functionality meaningless (by always allowing most/all pages enough time to finish parsing).
2. The delay cannot be *too* short, since that would actually *reduce* performance in the zoom/rotation case outlined above. Furthermore, the time between `RenderTask.cancel` and `PDFPageProxy.render` calls will obviously be affected by both general computer performance and current CPU load.
It's certainly possible that the timeout may require some further tweaks, however the value settled on in this patch was easily *one order* of magnitude larger than the delta between cancel/render in my tests.
There's no good reason for calling this helper function without a `url` parameter, and this way we can prevent that from happening.
Note how the `PDFOutlineViewer` call-site was already doing the right thing here, and only the `LinkAnnotationElement` call-site needed a small adjustment to make it work.
With the changes made in PR 11069, it's no longer necessary to include the `pageIndex`/`intent` parameters when sending 'GetOperatorList' data. In the previous implementation these properties were used to associate the `OperatorList` with the correct `RenderTask`, however now that `ReadableStream`s are used that's handled automatically and it's thus dead code at this point.
By transfering, rather than copying, `ArrayBuffer`s between the main- and worker-threads, you can avoid unnecessary allocations by only having *one* copy of the same data.
Hence manually setting `postMessageTransfers: false`, when calling `getDocument`, is a performance footgun[1] which will do nothing but waste memory.
Given that every reasonably modern browser supports `postMessage` transfers[2], I really don't see why it should be possible to force-disable this functionality.
Looking at the browser support, for `postMessage` transfers[2], it's highly unlikely that PDF.js is even usable in browsers without it. However, the feature testing of `postMessage` transfers is kept for the time being just to err on the safe side.
---
[1] This is somewhat similar to the, now removed, `disableWorker` parameter which also provided API users a much too simple way of reducing performance.
[2] See e.g. https://developer.mozilla.org/en-US/docs/Web/API/MessagePort/postMessage#Browser_compatibility and https://developer.mozilla.org/en-US/docs/Web/API/Transferable#Browser_compatibility
Note how the sent values have inconsistent types, with a boolean in one case and an object in the other (normal) case.
Furthermore, explicitly sending a `supportTypedArray: true` property seems superfluous at least to me.
It recently occurred to me that the CMap data should be an excellent candidate for transfering.
This will help reduce peak memory usage for PDF documents using CMaps, since transfering of data avoids duplicating it on both the main- and worker-threads.
Unfortunately it's not possible to actually transfer data when *returning* data through `sendWithPromise`, and another solution had to be used.
Initially I looked at using one message for requesting the data, and another message for returning the actual CMap data. While that should have worked, it would have meant adding a lot more complexity particularly on the worker-thread.
Hence the simplest solution, at least in my opinion, is to utilize `sendWithStream` since that makes it *really* easy to transfer the CMap data. (This required PR 11115 to land first, since otherwise CMap fetch errors won't propagate correctly to the worker-thread.)
Please note that the patch *purposely* only changes the API to Worker communication, and not the API *itself* since changing the interface of `CMapReaderFactory` would be a breaking change.
Furthermore, given the relatively small size of the `.bcmap` files (the largest one is smaller than the default range-request size) streaming doesn't really seem necessary either.
At this point in time it's easy to convert the `MessageHandler.on` call-sites to use arrow functions, and thus let the JavaScript engine handle scopes for us, rather than having to manually keep references to the relevant scopes in `MessageHandler`.[1]
An additional benefit of this is that a couple of `Function.prototype.call()` instances can now be converted into "normal" function calls, which should be a tiny bit more efficient.
All in all, I don't see any compelling reason why it'd be necessary to keep supporting custom `scope`s in the `MessageHandler` implementation.
---
[1] In the event that a custom scope is ever needed, simply using `bind` on the handler function when calling `MessageHandler.on` ought to work as well.
One of the motivations for using `setAttribute` in the first place was to support more efficient DOM updates in the `expandTextDivs` method, since performance of the `enhanceTextSelection` mode can be somewhat bad when there's a lot of `textDivs` on the page.
With recent `TextLayer` changes/optimizations it's no longer necessary to store a complete `style`-string for every `textDiv`, and we can thus re-visit the `setAttribute` usage.
Note that with the current code, in `appendText`, there's only *one* string per `textDiv` which avoids a bunch of temporary strings. While the changes in this patch means that there's now *three* strings per `textDiv` instead, the total length of these strings are now quite a bit shorter (42 characters to be exact).
*This should obviously have been done in PR 11097, but for some reason I completely overlooked it; sorry about that.*
There's no good reason to update the font unless you're actually going to measure the width of the textContent. This can reduce unnecessary font switching a fair bit, even for documents which are somewhat simple/short (in e.g. the `tracemonkey.pdf` file this cuts the amount of font switches almost in half).
For performance reasons single-char text divs aren't being scaled, as outlined in a comment in `appendText`. Hence it doesn't seem necessary, or even a good idea, to unconditionally measuring the width of the text in `_layoutText`.
*Please note:* The majority of this patch was written by Yury, and it's simply been rebased and slightly extended to prevent issues when dealing with `RenderingCancelledException`.
By leveraging streams this (finally) provides a simple way in which parsing can be aborted on the worker-thread, which will ultimately help save resources.
With this patch worker-thread parsing will *only* be aborted when the document is destroyed, and not when rendering is cancelled. There's a couple of reasons for this:
- The API currently expects the *entire* OperatorList to be extracted, or an Error to occur, once it's been started. Hence additional re-factoring/re-writing of the API code will be necessary to properly support cancelling and re-starting of OperatorList parsing in cases where the `lastChunk` hasn't yet been seen.
- Even with the above addressed, immediately cancelling when encountering a `RenderingCancelledException` will lead to worse performance in e.g. the default viewer. When zooming and/or rotation of the document occurs it's very likely that `cancel` will be (almost) immediately followed by a new `render` call. In that case you'd obviously *not* want to abort parsing on the worker-thread, since then you'd risk throwing away a partially parsed Page and thus be forced to re-parse it again which will regress perceived performance.
- This patch is already *somewhat* risky, given that it touches fundamentally important/critical code, and trying to keep it somewhat small should hopefully reduce the risk of regressions (and simplify reviewing as well).
Time permitting, once this has landed and been in Nightly for awhile, I'll try to work on the remaining points outlined above.
Co-Authored-By: Yury Delendik <ydelendik@mozilla.com>
Co-Authored-By: Jonas Jenwald <jonas.jenwald@gmail.com>
Furthermore, it's possible to re-use the same Array for all `textDiv`s on the page and the resulting padding string also becomes a lot more compact.
Please note that the `paddingLeft` branch was moved, since the padding values need to be ordered as `top, right, bottom, left`.
Finally, with this re-factoring it's no longer necessary to cache the original `style` string for every `textDiv` when `enhanceTextSelection` is enabled.
Given that browsers will reject padding values smaller than zero (which may be caused by limited numerical precision during calculations in the `expand` code), it makes no sense to include those when expanding the `textDiv`s.
With the changes to the `StreamType`/`FontType` "enums" in PR 11029, one unfortunate result is that `getStats` now *always* returns empty Arrays. Something that everyone, myself included, apparently missed is that you obviously cannot index an Array with Strings :-)
I wrongly assumed that the unit-tests would catch any bugs, but they apparently suffered from the same issue as the code in `src/core/`.
Another possible option could perhaps be to use `Set`s, rather than objects, but that will require larger changes since `LoopbackPort` (in `src/display/api.js`) doesn't support them.
There's a number of spots in the current code, and tests, where `cancel` methods are not called with appropriate arguments (leading to Promises not being rejected with Errors as intended).
In some cases the cancel `reason` is implicitly set to `undefined`, and in others the cancel `reason` is just a plain String. To address this inconsistency, the patch changes things such that cancelling is done with `AbortException`s everywhere instead.
Note that, in the old code, there was a code-path which could prevent this from happening thus affecting future cleanup.
Furthermore, ensure that we'll always attempt to cleanup when handling the 'PageError' message, similar to the code in e.g. the `PDFPageProxy._renderPageChunk` method.
The `receivingOperatorList` property is currently tracked *twice* in the rendering code, both directly and inversely through the `intentState.operatorList.lastChunk` boolean. This type of double bookkeeping is never a good idea, since it's just too easy for the properties to accidentally fall out of sync.
In this case there's even a `cleanup`-related bug caused by this, which means that `PDFPageProxy._tryCleanup` will never be able to discard any data if there's an error on the worker-thread (as handled through the 'PageError' message).
Hence the simplest solution seems, at least to me, to update `PDFPageProxy._tryCleanup` to replace the `intentState.receivingOperatorList` check with a `!intentState.operatorList.lastChunk` check and completely remove the former property.
Given that `cleanupAfterRender` is already set for large images, when handling 'obj' messages, this patch *should* thus be safe in general (since otherwise there ought be existing bugs related to cleanup and printing).
Calling `someArray = []` will create a new Array, which seems completely unnecessary when it's sufficient to just call `someArray.length = 0` to achieve the same effect.
Even though I cannot imagine these particular cases having any noticeable performance impact, similar changes were made in `core/` code years ago since it's apparently more efficient memory wise.
This includes the information in the core and display layers. The
date parsing logic from the document properties is rewritten according
to the specification and now includes unit tests.
Moreover, missing unit tests for the color of a popup annotation have
been added.
Finally the styling of the popup is changed slightly to make the text a
bit smaller (it's currently quite large in comparison to other viewers)
and to make the drop shadow a bit more subtle. The former is done to be
able to easily include the modification date in the popup similar to how
other viewers do this.
While PR 10714 did address the `disableRange=true` case, it also managed to "break" the `disableStream=true` case instead since the indeterminate loadingBar is now displayed when it shouldn't; sorry about that!
The solution is simple enough though, don't attempt to fallback to `_fullRequestReader.onProgress` when handling "incomplete" loading information.
Please see the specification, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#M11.9.12864.1Heading.71.Viewer.Preferences
Furthermore, note that this patch *only* adds API support and unit-tests but does not attempt to integrate e.g. the `ViewerPreferences -> Direction` property into the viewer (which would be necessary to address issue 10736).
The reason for this is that it's not entirely clear to me exactly if/how that could be implemented; e.g. would it be as simple as setting the `dir` attribute on the `viewerContainer` DOM element, or will it be more complicated?
There's also the question of how the `ViewerPreferences -> Direction` value interacts with the `PageMode`, and this will generally require a fair bit of manual testing. Since the direction of the *entire* viewer depends on the browser locale, there's also a somewhat open question regarding what default value to use for different locales.
Finally, if the viewer supports `ViewerPreferences -> Direction` then I'm assuming that it will be necessary to allow users to override the default value, which will require (most likely) new `SecondaryToolbar` buttons and icons for those etc.
Hence this patch only lays the necessary foundation for eventually addressing issue 10736, but defers the actual implementation until later. (Time permitting, I'll try to look into the viewer part later.)
*Please note:* This patch purposely ignores `src/display/network.js`, since its support for progressive reading depends on the non-standard `moz-chunked-arraybuffer` responseType which is currently in the process of being removed.
The file `test/pdfs/annotation-caret-ink.pdf` is already available in
the repository as a reference test for this since I supplied it for
another patch that implemented ink annotations.
This mirrors the canvas implementation where we ignore these operators.
This avoids console spam regarding unimplemented operators we're not
interested in.
For the Tracemonkey paper, we're now down to one warning about tiling
patterns which is in fact a valid one.
In particular, this should reduce intermediate string creation by using
template strings and reduce variable lookup times by removing unneeded
variables and caching `this.current` in more places.
With PR 10675 having fixed the completely broken `disableRange=true` setting in the Firefox version of PDF.js, I couldn't help but noticing that loading progress is never reported properly in that case.
Currently loading progress is only reported for the `rangeProgress` chrome-event, which obviously isn't dispatched with `disableRange=true` set. However, the `progressiveRead` chrome-event includes loading progress as well, but this information isn't being used in any way.
Furthermore, the `PDFDataRangeTransport.onDataProgress` method wasn't able to handle "complete" loading information, and neither was `PDFDataTransportStream._onProgress` since that method would only ever attempt to report it through a RangeReader (which won't exist when `disableRange=true` is set).
Note how at https://mozilla.github.io/pdf.js/api/ it's being described as API docs, however `src/core/annotation.js` is not part of the public API.
Furthermore, given that the code residing in the `src/core/` folder is run in a worker-thread, it's not even accessible on the main-thread (since `postMessage` is being used to transfer the data).
Hence the different API methods simply returns a "proxy" to the underlying data, but not actually the same objects and data structures as in the worker-thread itself; thus it doesn't make a whole lot of sense to expose this in API docs as far as I'm concerned.
Finally, the patch fixes a small JSDoc related typo in `src/display/api.js` when referring to the `TextStyle` typedef.
The `moz-chunked-arraybuffer` responseType is a non-standard property, which has been subsumed by the Fetch API, and it's in the process of being removed from Firefox; please see https://bugzilla.mozilla.org/show_bug.cgi?id=1120171 and https://bugzilla.mozilla.org/show_bug.cgi?id=1411865
*Please note:* Rather than waiting for both `Fetch` *and* `ReadableStream` to be available in e.g. a Firefox ESR version (which is probably going to be 68 at the earliest), let's just decide that PDF.js release `2.1.266` will be the last one with `moz-chunked-arraybuffer` support and land this patch (since nothing should outright break without it anyway).
This transform resulted in an incorrectly positioned object when the
bounding box's upper-left corner did not start at (0,0), because
the translation was not reverted. This patch adds the missing transform.
The test file (tiling-pattern-box.pdf) is based on the PDF from #2825.
All but the first cube (including the PDF data) have been removed.
To trigger the bug that is fixed by this commit, I changed the BBox of
the first pattern from "[ 0 0 596 842]" to "[90 0 596 842]". Without
this patch, the dashed vertical line that intersects the corners at A
and E would disappear.
The new test file (tiling-pattern-large-steps.pdf) was manually created,
to have the following characteristics:
- Large xstep and ystep (90000)
- Page width is 4000 (which is larger than MAX_PATTERN_SIZE)
- Visually, the page consists of a red rectangle with a black border,
surrounded by a 50 unit white padding.
- Before patch: blurry; After patch: sharp
Fixes#6496Fixes#5698Fixes#1434Fixes#2825
Currently if trying to set `disableRange=true` in the built-in PDF Viewer in Firefox, either through `about:config` or via the URL hash, the PDF document will never load. It appears that this has been broken for a couple of years, without anyone noticing.
Obviously it's not a good idea to set `disableRange=true`, however it seems that this bug affects the PDF Viewer in Firefox even with default settings:
- In the case where `initialData` already contains the *entire* file, we're forced to dispatch a range request to re-fetch already available data just so that file loading may complete.
- (In the case where the data arrives, via streaming, before being specifically requested through `requestDataRange`, we're also forced to re-fetch data unnecessarily.) *This part was removed, to reduce the scope/risk of the patch somewhat.*
In the cases outlined above, we're having to re-fetch already available data thus potentially delaying loading/rendering of PDF files in Firefox (and wasting resources in the process).
This will further help reduce the amount of image data that's currently being held alive, by explicitly removing the `src` attribute.
Please note that this is mostly relevant for browsers which do not support `URL.createObjectURL`, or where `disableCreateObjectURL` was manually set by the user, since `blob:` URLs will be revoked (see the previous patch).
However, using `about:memory` (in Firefox) it does seem that this may also be generally helpful, given that calling `URL.revokeObjectURL` won't invalidate the image data itself (as far as I can tell).
Natively supported JPEG images are sent as-is, using a `blob:` or possibly a `data` URL, to the main-thread for loading/decoding.
However there's currently no attempt at releasing these resources, which are held alive by `blob:` URLs, which seems unfortunately given that images can be arbitrarily large.
As mentioned in https://developer.mozilla.org/en-US/docs/Web/API/URL/createObjectURL the lifetime of these URLs are tied to the document, hence they are not being removed when a page is cleaned-up/destroyed (e.g. when being removed from the `PDFPageViewBuffer` in the viewer).
This is easy to test with the help of `about:memory` (in Firefox), which clearly shows the number of `blob:` URLs becomming arbitrarily large *without* this patch. With this patch however the `blob:` URLs are immediately release upon clean-up as expected, and the memory consumption should thus be considerably reduced for long documents with (simple) JPEG images.
Note how `PDFDocumentProxy.destroy` is a nothing more than an alias for `PDFDocumentLoadingTask.destroy`. While removing the latter method would be a breaking API change, there's still room for at least some clean-up here.
The main changes in this patch are:
- Stop providing a `PDFDocumentLoadingTask` instance *separately* when creating a `PDFDocumentProxy`, since the loadingTask is already available through the `WorkerTransport` instance.
- Stop tracking the `PDFDocumentProxy` instance on the `WorkerTransport`, since that property is completely unused.
- Simplify the 'Multiple `getDocument` instances' unit-tests by only destroying *once*, rather than twice, for each document.
Given that the function is (purposely) independent of the verbosity level and that its message is worded to only apply on the main-thread, there's no reason to duplicate this across the built `pdf.js`/`pdf.worker.js` files.
Currently for every single parsed/rendered page there's no less than *four* `Date.now()` calls being made on the worker-side. This seems totally unnecessary, since the result of these calls are, by default, not used for anything *unless* the verbosity level is set to `INFO`.
The default size of these canvases seem to be `300 x 150` (two orders of magnitude larger than the ones in PR 10597), which probably is sufficient enough to matter since there's one such canvas for each textLayer that's rendered in the viewer.
Also fixes the incorrect rejection reason, i.e. one using a string rather than an `Error`, in the `TextLayerRenderTask.cancel` method.
While this particular canvas may be small, there can still be an arbitrarily large number of them (one per page rendered), which can/will eventually add up memory wise. This can be easily avoided by using the `cachedCanvases` abstraction instead, which will ensure that the `isFontSubpixelAAEnabled` canvas is removed together with other temporary canvases in `CanvasGraphics.endDrawing`.
This file (currently) contains not only DOM-specific helper functions/classes, but is used generally for various helper code relevant for main-thread functionality.
*Hopefully this patch makes sense, since I cannot claim to fully understand this function.*
With the changes made in PR 3354 *some* Type3 glyph outlines are no longer rendering correctly, since the final paths were being accidentally ignored.
The fact that Type3 fonts are not very common in PDF documents, and that most Type3 glyphs are unaffected by this regression, probably explains why this has gone unnoticed since 2013.
After PR 9340 all glyphs are now re-mapped to a Private Use Area (PUA) which means that if a font fails to load, for whatever reason[1], all glyphs in the font will now render as Unicode glyph outlines.
This obviously doesn't look good, to say the least, and might be seen as a "regression" since previously many glyphs were left in their original positions which provided a slightly better fallback[2].
Hence this patch, which implements a *general* fallback to the PDF.js built-in font renderer for fonts that fail to load (i.e. are rejected by the sanitizer). One caveat here is that this only works for the Font Loading API, since it's easy to handle errors in that case[3].
The solution implemented in this patch does *not* in any way delay the loading of valid fonts, which was the problem with my previous attempt at a solution, and will only require a bit of extra work/waiting for those fonts that actually fail to load.
*Please note:* This patch doesn't fix any of the underlying PDF.js font conversion bugs that's responsible for creating corrupt font files, however it does *improve* rendering in a number of cases; refer to this possibly incomplete list:
[Bug 1524888](https://bugzilla.mozilla.org/show_bug.cgi?id=1524888)
Issue 10175
Issue 10232
---
[1] Usually because the PDF.js font conversion code wasn't able to parse the font file correctly.
[2] Glyphs fell back to some default font, which while not accurate was more useful than the current state.
[3] Furthermore I'm not sure how to implement this generally, assuming that's even possible, and don't really have time/interest to look into it either.
- The only existing call-site, of this method, is never passing more than *one* font at a time anyway.
- As far as I can remember, this functionality has never actually been used (caveat: I didn't check the git history).
- This allows simplification of the method, especially by making use of the fact that it's now asynchronous.
- It should be just as easy to call `BaseFontLoader.bind` from within a loop, rather than having the loop in the method itself.
Currently all fonts are using the `_queueLoadingCallback` method to determine when they have been loaded[1]. However in most cases this is just adding unnecessary overhead, especially with `BaseFontLoader.bind` now being asynchronous, given how fonts are loaded:
- For fonts loaded using the Font Loading API, it's already possible to easily tell when a font has been loaded simply by checking the `loaded` promise on the FontFace object itself.
- For browsers, e.g. Firefox, which support synchronous font loading it's already assumed that fonts are immediately available.
Hence the `_queueLoadingCallback` method is moved into the `GenericFontLoader`, such that it's only utilized for fonts which are loaded using CSS.
---
[1] In the "fonts loaded using CSS" case, this is already a hack anyway as outlined in the comments.
This polyfill is currently used in only *one* file, i.e. `src/display/api.js`, and only when trying to build a *fallback* `workerSrc` path.
Given that the global `workerSrc` should *always* be set[1] when using the PDF.js library[2], and that the fallback `workerSrc` should only be regarded as a best-effort solution anyway, there isn't a particularily strong reason to keep the compatibility code in my opinion.
---
[1] Other supported options include setting the global `workerPort`, or passing in a `PDFWorker` instance as part of the `getDocument` call.
[2] Which is clearly mentioned in the JSDocs in `src/display/worker_options.js`.
This piggybacks of the existing `cancel` functionality, to ensure that any pending operations are closed *and* that any temporary canvases are actually being removed.
Also simplifies `finishPaintTask` in `PDFPageView.draw` slightly, by converting it to an async function.
This will allow the Metadata to be successfully extracted from the PDF file in issue 10395.
Furthermore, this patch also fixes a bug in `Metadata.get` which causes the method to return `null` rather than an empty string or zero (since either ought to be allowed).
The error was triggered for a particular set of metadata, where an end tag was encountered without the corresponding begin tag being present in the data.
(The patch also fixes a minor oversight, from a recent PR, in the `SimpleDOMNode.nextSibling` method.)
Given that the issue, as filed, is incomplete since no PDF file was provided for debugging, this patch is really the best that we can do here. *Please note:* This patch will *not* enable the Metadata to be successfully parsed, but it should at least prevent the errors.
This method creates quite a few intermediate strings on each call and
it's called often, even for smaller documents like the Tracemonkey
document. Scrolling from top to bottom in that document resulted in
14126 strings being created in this method. With this commit applied,
this is reduced to 2018 strings.
This method creates quite a few intermediate strings on each call and
it's called often, even for smaller documents like the Tracemonkey
document. Scrolling from top to bottom in that document resulted in
12936 strings being created in this method. With this commit applied,
this is reduced to 3610 strings.
If, as PR 10368 suggests, more parameters should be added to `getViewport` I think that it would be a mistake to not change the signature *first* to avoid needlessly unwieldy call-sites.
To not break any existing code and third-party use-cases, this is obviously implemented with a deprecation warning *and* with a working fallback[1] for the old method signature.
---
[1] This is limited to `GENERIC` builds, which should be sufficient.
Note that the OpenAction dictionary may contain other information besides just a destination array, e.g. instructions for auto-printing[1].
Given first of all that an arbitrary `Dict` cannot be sent from the Worker (since cloning would fail), and second of all that the data obviously needs to be validated, this patch purposely only adds support for fetching a destination from the OpenAction entry[2].
---
[1] This information is, currently in PDF.js, being included through the `getJavaScript` API method.
[2] This significantly reduces the complexity of the implementation, which seems fine for now. If there's ever need for other kinds of OpenAction to be fetched, additional API methods could/should be implemented as necessary (could e.g. follow the `getOpenActionWhatever` naming scheme).
This changes all occurrences of `var` to `let`/`const` in this code, and updates the signature of the constructor to use object destructuring for better readability (and self documentation).
Also, `useRequestAnimationFrame` is changed to a parameter and the `typeof window` check is now done *once* rather than at every `_scheduleNext` call.
This changes all occurrences of `var` to `let`/`const` in this code, and updates the signatures of a couple of methods to use object destructuring.
Finally, when creating `InternalRenderTask` instances *only* the necessary parameter are now provided, since passing through the `RenderParameters` as-is seems completely unnecessary.
First of all, note how there's currently *two* methods for checking if a certain object exists, which seems completely unwarranted.
Furthermore, the rarely used `getData` method was removed and its only callsite changed to use a combination of `PDFObjects.{has, get}` instead.
Finally, the methods were rearranged slightly, to bring the most important ones (for an API user) to the top of the class.
Note how nowhere in the code `canvasInRendering.get()` is ever called, and that this structure is really only used to store references to `<canvas>` DOM elements.
The reason for this being a `WeakMap` is probably because at the time we weren't using `core-js` polyfills yet, and since there already existed a manually implemented `WeakMap` polyfill it was probably simpler to use that.
Please note that, given the lack of a runnable example, I'm not totally sure if this first of all is enough to *completely* address the issue as filed and second of all if we actually want this new behaviour.
*Please note:* I'm totally fine with this patch being rejected, and the issue closed as WONTFIX; however these changes should address the issue if that's desired.
From a conceptual point of view, reporting loading progress doesn't really make a lot of sense for PDF files opened by passing raw binary data directly to `getDocument` (since obviously *all* data was loaded).
This is compared to PDF files loaded via e.g. `XMLHttpRequest` or the Fetch API, where the entire PDF file isn't available from the start and knowing the loading progress makes total sense.
However I can certainly see why the current API could be considered inconsistent, which isn't great, since a registered `onProgress` callback will never be called for certain `getDocument` calls.
The simplest solution to this inconsistency thus seem to be to ensure that `onProgress` is always called when handling the `DataLoaded` message, since that will *always* be dispatched[1] from the worker-thread.
---
[1] Note that this isn't guaranteed to happen, since setting `disableAutoFetch = true` often prevents the *entire* file from ever loading. However, this isn't relevant for the issue at hand, and is a well-known consequence of using `disableAutoFetch = true`; note how the default viewer even has a specialized code-path for hiding the loadingBar.
*This should have been part of PR 10139.*
In the event that a user has attempted to manually load the worker file on the main-thread, but somehow failed to do that correctly, there's a possibility that `getMainThreadWorkerMessageHandler` could throw. Considering how/where that helper function is being called, an error could still prevent `PDFDocumentLoadingTask` from completing (regardless if it's being resolved/rejected).
With the way that the `getWorkerSrc()` helper function is implemented now, there's no longer a particularly strong reason for keeping the global `pdfjsFilePath` variable around.
With this patch the fallback `workerSrc` will thus, assuming is wasn't already set, be set to the "pdfjsFilePath" which simplifies the `getWorkerSrc()` function and reduces the amount of global state.
Finally, the global `workerSrc` variable was renamed to prevent shadowing.
This should, hopefully, cover all the possible ways[1] in which "fake workers" are loaded. Given the different code-paths, adding unit-tests might not be that simple.
Note that in order to make this work, the various `fakeWorkerFilesLoader` functions were converted to return `Promises`.
---
[1] Unfortunately there's lots of them, for various build targets and configurations.
This attempts to reduced the level of indirection, and the amount of code, when dispatching `fileattachmentannotation` events, by removing the `PDFLinkService.onFileAttachmentAnnotation` method and just accessing `PDFLinkService.eventBus` directly in the `FileAttachmentAnnotationElement` constructor.
Given that other properties, such as `externalLinkTarget`/`externalLinkRel`, are already being accessed directly this pattern seems fine here as well.
The `started` timestamp is completely usused, and the `end` timestamp is currently[1] being used essentially like a boolean value.
Hence this code can be simplified to use an actual boolean value instead, which avoids potentially hundreds (or even thousands) of unnecessary `Date.now()` calls.
---
[1] Looking briefly at the history of this code, I cannot tell if the timestamps themselves were ever used for anything (except for tracking "boolean" state).
The `Font.loading` property is only ever used *once* in the code, whereas `Font.missingFile` is more widely used. Furthermore the name `loading` feels, at least to me, slight less clear than `missingFile`. Finally, note that these two properties are the inverse of each other.
It's no longer necessary since https://bugzilla.mozilla.org/show_bug.cgi?id=1305963 is fixed quite some time ago.
While we're here, mark the `cachedGetSinglePixelWidth` member as being
private and use ES6 syntax in the `getSinglePixelWidth` method.
For proof-of-concept, this patch converts a couple of `Promise` returning methods to use `async` instead.
Please note that the `generic` build, based on this patch, has been successfully testing in IE11 (i.e. the viewer loads and nothing is obviously broken).
Being able to use modern JavaScript features like `async`/`await` is a huge plus, but there's one (obvious) side-effect: The size of the built files will increase slightly (unless `SKIP_BABEL == true`). That's unavoidable, but seems like a small price to pay in the grand scheme of things.
Finally, note that the `chromium` build target was changed to no longer skip Babel translation, since the Chrome extension still supports version `49` of the browser (where native `async` support isn't available).
With the new XML parser, see PR 9573, the referenced PDF file now causes `getMetadata` to fail when incomplete XML tags are encountered. This provides a simple, and hopefully generally useful, work-around that may also help prevent future bugs.
(Without being able to reproduce nor even understand the other (non XML) errors mentioned in issue 8884, I'd say that this patch is enough to close that one as fixed.)
This moves/exposes the `URL` polyfill similarily to the existing `ReadableStream` polyfill, rather than exposing it globally, to avoid interfering with any "outside" code.
Both the `URL` and `ReadableStream` polyfills are now exposed on the `pdfjsLib` object, such that they are accessible to the viewer components.
Furthermore, the `no-restricted-globals` ESLint rule is also enabled to prevent accidental usage of the native `URL`/`ReadableStream` implementations directly in the `src/` and `web/` folders; see also https://eslint.org/docs/rules/no-restricted-globals
Addresses the remaining TODO in https://github.com/mozilla/pdf.js/projects/6
Currently if `RenderTask.cancel` is called *immediately* after rendering was started, then by the time that `InternalRenderTask.initializeGraphics` is called rendering will already have been cancelled.
However, we're still inserting the canvas into the `canvasInRendering` map, thus breaking any future attempts at re-rendering using the same canvas. Considering that `InternalRenderTask.cancel` always removes the canvas from the map, I cannot imagine that we'd ever want to re-add it *after* rendering was cancelled (it was likely just a simple oversight in PR 8519).
Fixes 9456.
This wasn't included in PR 9245, since all the API options were still global at that time.
Writing the unit-tests also uncovered an issue with `getOperatorList` not starting the "Page Request" timer.
Obviously it's still not possible to render non-embedded fonts as paths, but in this way the rest of the page will at least be allowed to continue rendering.
*Please note:* Including the 14 standard fonts in PDF.js probably wouldn't be *that* difficult to implement. (I'm not a lawyer, but the fonts from PDFium could probably be used given their BSD license.)
However, the main blocker ought to be the total size of the necessary font data, since I cannot imagine people being OK with shipping ~5 MB of (additional) font data with Firefox. (Based on the reactions when the CMap files were added, and those are only ~1 MB in size.)
The built-in image decoders are already using `Uint8ClampedArray` when returning data, and this patch simply extends that to the rest of the image/colorspace code.
As far as I can tell, the only reason for using manual clamping/rounding in the first place was because TypedArrays used to be polyfilled (using regular arrays). And trying to polyfill the native clamping/rounding would probably have been had too much overhead, but given that TypedArray support is required in PDF.js version `2.0` that's no longer a concern.
*Please note:* Because of different rounding behaviour, basically `Math.round` in `Uint8ClampedArray` respectively `Math.floor` in the old code, there will be very slight movement in quite a few existing test-cases. However, the changes should be imperceivable to the naked eye, given that the absolute difference is *at most* `1` for each RGB component when comparing `master` and this patch (see also the updated expectation values in the unit-tests).
Not only is the `Util.loadScript` helper function unused on the Worker side, even trying to use it there would throw an Error (since `document` isn't defined/available in Workers).
Hence this helper function is moved, and its code modernized slightly by having it return a Promise rather than needing a callback function.
Finally, to reduced code duplication, the "new" loadScript function is exported and used in the viewer.
The various classes have `this._errored` and `this._reason` properties, where the first one is a boolean indicating if an error was encountered and the second one contains the actual `Error` (or `null` initially).
In practice this means that errors are basically tracked *twice*, rather than just once. This kind of double-bookkeeping is generally a bad idea, since it's quite easy for the properties to (accidentally) get into an inconsistent state whenever the relevant code is modified.
Rather than using a separate boolean, we can just as well check the "error" property directly (since `null` is falsy).
---
Somewhat unrelated to this patch, but `src/display/node_stream.js` is currently *not* handling errors in a consistent or even correct way; compared with `src/display/network.js` and `src/display/fetch_stream.js`.
Obviously using the `createResponseStatusError` utility function, from `src/display/network_utils.js`, might not make much sense in a Node.js environment. However at the *very* least it seems that `MissingPDFException`, or `UnknownErrorException` when one cannot tell that the PDF file is "missing", should be manually thrown.
As is, the API (i.e. `getDocument`) is not returning the *expected* errors when loading fails in Node.js environments (as evident from the `pending` API unit-test).
This special handling was added in PR 8567, but was made redundant in PR 8721 which stopped sending everything but the kitchen sink to the Worker side.
Since `PDFPageProxy` already provide getters for all the data returned by `GetPage` (in the Worker), there isn't any compelling reason for accessing the `pageInfo` directly on `PDFPageProxy`.
The patch also changes the `GetPage` handler, in `src/core/worker.js`, to use modern JavaScript features.
Since `PDFDocumentProxy` already provide getters for all the data returned by `GetDoc` (in the Worker), there isn't any compelling reason for accessing the `pdfInfo` directly on `PDFDocumentProxy`.
After PR 8617 the `PDFManagerReady` message handler function, in `src/display/api.js`, is now a no-op. Hence it seems completely unnecessary to keep sending this message from `src/core/worker.js`.
With native typed array support now being mandatory in PDF.js, since version 2.0, this probably isn't a huge problem even though the current code seems wrong (it was changed in PR 6571).
Note how in the `!(data instanceof Uint8Array)` case we're currently attempting to send `handler.send('test', 'main', false);` to the main-thread, which doesn't really make any sense since the signature of the method reads `send(actionName, data, transfers) {`.
Hence the data that's *actually* being sent here is `'main'`, with `false` as the transferList, which just seems weird. On the main-thread, this means that we're in this case checking `data && data.supportTypedArray`, where `data` contains the string `'main'` rather than being falsy. Since a string doesn't have a `supportTypedArray` property, that check still fails as expected but it doesn't seem great nonetheless.
Since all the built-in PDF.js image decoders now return their data as `Uint8ClampedArray`, for consistency `JpegDecode` on the main-thread should be doing the same thing; follow-up to PR 8778.
The signature of the `PDFWorker.fromPort` method, in addition to the `PDFWorker` constructor, was changed in PR 9480.
Hence it's probably a good idea to add a bit more validation to `PDFWorker.fromPort`, to ensure that it won't fail silently for an API consumer that updates to version 2.0 of the PDF.js library.
With version 2.0, native support for typed arrays is now a requirement for using the PDF.js library; see PR 9094 where the old polyfills were removed.
Hence the `isTypedArraysPresent` check, when setting up fake workers, no longer serves any purpose here and can thus be removed.
There's no good reason, as far as I can tell, to duplicate the functionality of the `LoopbackPort` in the unit-tests. The only difference between the implementations is that `LoopbackPort` mimics the (native) structured cloning, however that shouldn't matter here since the tests are only sending "simple" data (strings respectively arrays with numbers).
Furthermore the patch also changes `LoopbackPort` to default to using "structured cloning" and deferred invocation of the listeners, since native typed array support is now a requirement for using the PDF.js library.
The `MessageHandler` itself, and its assorted helper functions, are currently the single largest[1] piece of code in the `src/shared/util.js` file. By moving this code into its own file, `src/shared/util.js` thus becomes smaller and more manageable.
The `fontScale` property was added in PR 1531, see commit b312719d7e in particular, apparently for the sole purpose of supporting the "acroforms" example.
However, the `fontScale` property was never used anywhere else in the code-base, and after the modernization of the "acroforms" example in PR 8030 it's been completely unused.
Finally, note that there's also a (more suitably named) `scale` property on `PageViewport` instances, which contains the exact same information as the property being removed here.
I made some mistakes when trying to make the content_disposition.js
compatible with non-modern browsers (IE/Edge).
Notably, text decoding was usually skipped because of the inverted
logical check at the top of `textdecode`.
I verified that this new version works as expected, as follows:
1. Visit 55c71eb44e/test/
and get test-content-disposition.js
also get test-content-disposition.node.js if using Node.js,
or get test-content-disposition.html if you use a browser.
2. Modify `test-content-disposition.node.js` (or the HTML file) and
change `../extension/content-disposition.js` to `PDFJS-content_disposition.js`
3. Copy the `getFilenameFromContentDispositionHeader` function from
`content_disposition.js` (i.e. the file without the trailing exports)
and save it as `PDFJS-content_disposition.js`.
4. Run the tests (`node test-content-disposition.node.js` or by opening
`test-content-disposition.html` in a browser).
5. Confirm that there are no failures: "Finished all tests (0 failures)"
The code has a best-efforts fallback for Microsoft Edge, which lacks the
TextDecoder API. The fallback only supports the common UTF-8 encoding.
To simulate this in a test, modify `PDFJS-content_disposition.js` and
deliberately throw an error before `new TextDecoder`. There will be two
failures because we don't want to include too much code to support text
decoding for non-UTF-8 encodings in Edge
```
test-content-disposition.js:265 Assertion failed: Input: attachment; filename*=ISO-8859-1''%c3%a4
Expected: "ä"
Actual : "ä"
test-content-disposition.js:268 Assertion failed: Input: attachment; filename*=ISO-8859-1''%e2%82%ac
Expected: "€"
Actual : "€"
```
Please note that while the current code works, both in the viewer and the unit-tests, it can leave the `WorkerTransport._passwordCapability` Promise in a pending state.
In the `PasswordRequest` handler, in src/display/api.js, we're returning the Promise from a `capability` object (rather than just a "plain" Promise). While an error thrown anywhere within this handler was fortunately enough to propagate it to the Worker side, it won't cause the Promise (in `WorkerTransport._passwordCapability`) to actually be rejected.
Finally note that while we're now catching errors in the `PasswordRequest` handler, those errors are still propagated to the Worker side via the (now) rejected Promise and the existing `return this._passwordCapability.promise;` line.
This prevents warnings about uncaught Promises, with messages such as "Error: Worker was destroyed during onPassword callback", when running the unit-tests both in browsers *and* in Node.js/Travis.
This should provide a better out-of-the-box experience when using PDF.js in a Node.js environment, since it's missing native support for both `@font-face` and `Image`.
Please note that this change *only* affects the default values, hence it's still possible for an API consumer to override those values when calling `getDocument`.
Also, prevents "ReferenceError: document is not defined" errors, when running the unit-tests in Node.js/Travis.
*This is a final piece of clean-up of code that I recently wrote, after which I'm done :-)*
When the `getMainThreadWorkerMessageHandler` function was added, in PR 9385, it did so by basically introducing a `web/app.js` dependency in `src/display/api.js` through the `window.pdfjsNonProductionPdfWorker` property[1]. Even though this is limited to non-`PRODUCTION` mode, i.e. `gulp server`, it still seems unfortunate to have that sort of viewer dependency in the API code itself.
With the new, much nicer and shorter, names introduced in PR 9565 we can remove this non-`PRODUCTION` hack and just use `window.pdfjsWorker` in both the viewer and the API regardless of the build mode.
---
[1] It didn't seem correct to piggy-back on the `window.pdfjsDistBuildPdfWorker` property in non-`PRODUCTION` mode.
The `getPageSizeInches` method was implemented on `PDFDocumentProxy`, which seems conceptually wrong since the size property isn't global to the document but rather specific to each page. Hence the method is moved into `PDFPageProxy`, as `get pageSizeInches` instead to address this.
Despite the fact that new API functionality was implemented, no unit-tests were added. To prevent issues later on, we should *always* ensure that new functionality has at least some test-coverage; something that this patch also takes care of.
The new `PDFDocumentProperties._parsePageSize` method seemed unnecessary convoluted. Furthermore, in the "no data provided"-case it even returned incorrect data (an array, rather than the expected object).
Finally, the fallback strings didn't actually agree with the `en-US` locale. This inconsistency doesn't look too great, and it's thus addressed here as well.
With options being moved from the global `PDFJS` object and into `getDocument`, a side-effect is that we're now passing in a fair number of useless parameters to the various transport/network streams.
Even though this doesn't *currently* cause any problems, it nonetheless seem like a good idea to explicitly provide the parameters that are actually necessary.
One additional complication with removing this option from the global `PDFJS` object, is that the viewer currently needs to check `disableAutoFetch` in a couple of places. To address this I'm thus proposing adding a getter in `PDFDocumentProxy`, to allow checking the *actually* used values for a particular `getDocument` invocation.
Unfortunately, as far as I can tell, we still need the ability to adjust certain API options depending on the browser environment in PDF.js version `2.0`. However, we should be able to separate this from the general compatibility code in the `src/shared/compatibility.js` file.
Compared to most other options currently/previously residing on the global `PDFJS` object, some of the Worker specific ones (e.g. `workerPort`/`workerSrc`) probably cannot be moved into options provided directly when initializing e.g. `PDFWorker`.
The reason is that in some cases, e.g. the Webpack examples, we try to provide Worker auto-configuration and I cannot see a good solution for that use-case if we completely remove the globally available Worker configuration.
However inline with previous patches for PDF.js version `2.0`, it does seem like a worthwhile goal to move away from storing options directly on the global `PDFJS` object, since that is a pattern we should avoid going forward. Especially since one of the (eventual) goals is to attempt to *completely* remove the global `PDFJS` object, and rely solely on exporting/importing the needed functionality.
By introducing the `GlobalWorkerOptions` we thus have larger flexibility in the future, if/when the global `PDFJS` object will be removed.
In order to simplify things, the undocumented `enableStats` option was removed and `pdfBug` is now instead used to enabled general debugging *and* page request/rendering stats.
Considering that in the default viewer the `stats` was only used when debugging was also enabled, this simplification (code wise) definitely seem worthwhile to me.
This removes the `PDFJS.externalLinkTarget`/`PDFJS.externalLinkRel` dependency from the viewer components, but please note that as a *temporary* solution the default viewer still uses it.
This removes the `PDFJS.imageResourcesPath` dependency from the viewer components and the test-suite, but please note that as a *temporary* solution the default viewer still uses it.
Fallback to the built-in JPEG decoder when browser decoding fails, and attempt to handle JPEG images with DNL (Define Number of Lines) markers (issue 8614)
First of all, note how in both `fetch_stream.js` and `node_stream.js` we always overwrite the `this._contentLength` property even when the response headers doesn't actually contain any (valid) length information. This could thus result in the `length` parameter, as passed to the network stream, being completely ignored despite having no better information available.
Secondly, in `node_stream.js` the `this._isRangeSupported` property wasn't always updated correctly based on the response headers.
This works by making `PartialEvaluator.buildPaintImageXObject` wait for the success/failure of `loadJpegStream` on the API side *before* parsing continues.
Please note that in practice, it should be quite rare for the browser to fail loading/decoding of a JPEG image. In the general case, it should thus not be completely surprising if even `src/core/jpg.js` will fail to decode the image.
Since `loadJpegStream` is only used at a *single* spot in the code-base, and given that it's very heavily tailored to the calling code (since it relies on the data structure of `PDFObjects`), this patch simply inlines the code in `src/display/api.js` instead.
Despite this patch removing the `disableWorker` option itself, please note that we'll still fallback to loading the worker file(s) on the main-thread when running in environments without proper Web Worker support.
Furthermore it's still possible, even with this patch, to force the use of fake workers by manually loading the necessary file using a `<script>` tag on the main-thread.[1]
That way, the functionality of the now removed `SINGLE_FILE` build target and the resulting `build/pdf.combined.js` file can still be achieved simply by adding e.g. `<script src="build/pdf.worker.js"></script>` to the HTML (obviously with the path adjusted as needed).
Finally note that the `disableWorker` option is a performance footgun, and unfortunately many existing third-party examples actually use it without providing any sort of warning/justification.
---
[1] This approach is used in the default viewer, since certain kind of debugging may be easier if the code is running directly on the main-thread.
This method returns the currently used `workerSrc`, which thus allows obtaining the fallback `workerSrc` value (e.g. when the option wasn't set by the user).
Please note that this build target, and the resulting `build/pdf.combined.js` file, is equivalent to setting the `PDFJS.disableWorker` option to `true` which is a performance footgun.
It turns out that PR 9245 unfortunately broke benchmarking completely, sorry about that!
The bug is that we were attempting to reset the current instance of `StatTimer`, instead of creating a new one as was previously done. By resetting the current instance, the `StatTimer` data fetched in `test/driver.js` is now wiped out since it points to the *same* underlying object.
This re-use of a `StatTimer` instance was asked for during review, and unfortunately I didn't test this thoroughly enough before submitting the final version of the PR.[1]
---
[1] Note that while I did test the benchmarking scripts with that PR *before* initially submitting it, I did however forget to do that after addressing the review comments which might explain why this problem went unnoticed.
This patch updates the `IPDFStreamReader` interface and ensures that the interface/implementation of `network.js`, `fetch_stream.js`, `node_stream.js`, and `transport_stream.js` all match properly.
The unit-tests are also adjusted, to more closely replicate the actual behaviour of the various actual `IPDFStreamReader` implementations.
Finally, this patch adjusts the use of the Content-Disposition filename when setting the title in the viewer, and adds `PDFDocumentProperties` support as well.
The `fetch` API is only supported for http(s), even in Chrome extensions.
Because of this limitation, we should use the XMLHttpRequest API when the
requested URL is not a http(s) URL.
Fixes#9361
These were removed in PR 9170, since they were unused in the browsers that we'll support in PDF.js version `2.0`.
However looking at the output of Travis, where a subset of the unit-tests are run using Node.js, there's warnings about `btoa` being undefined. This doesn't appear to cause any errors, which probably explains why we didn't notice this before (despite PR 9201).
Since multiple empty lines is virtually unused in the code-base, and the few cases that do exist look like "typos", let's enforce greater consistency here; please see https://eslint.org/docs/rules/no-multiple-empty-lines.
It is only used in a few places to handle prefixing style properties if
necessary. However, we used it only for `transform`, `transformOrigin`
and `borderRadius`, which according to Can I Use are supported natively
(unprefixed) in the browsers that PDF.js 2.0 supports. Therefore, we can
remove this class, which should help performance too since this avoids
extra function calls in parts of the code that are called often.
This was an oversight in PR 9095, which unfortunately breaks rendering in some PDF files (e.g. the one from issue 6737).
It thus appears that we don't have any test-coverage for this code-path, and given the relative complexity of the PDF files affected by this bug I wasn't able to easily create a reduced test-case.
*Please note:* The linked test-case included in this patch is currently *not* rendered correctly (that'd be the PR 6606), but it at least gives us some test-coverage here.
I recall being confused as to the purpose of the `encrypted` property all the way back when working on PR 4750.
Looking at the history, this property was added in PR 1698 when password support was added to the API/viewer. However, its only purpose seem to have been to facilitate the addition of a `isEncrypted` function in the API. That function never, as far as I can tell, saw any use and was unceremoniously removed in PR 4144.
Since we want to avoid sending all non-essential data early during initial document loading (e.g. PR 4750), it seems correct to get rid of the `encrypted` property. Especially since it hasn't even been exposed in the API for over three years, with no complaints that I'm aware of.
Finally note that the `encrypt` property on the `XRef` instance isn't tied to the code that's being removed here. Given that we're calling `PDFDocument.parse` during `createDocumentHandler` in the worker which, via `PDFDocument.setup`, calls `XRef.parse` where the `Encrypt` data (if it exists) is always parsed.
Please note that while this could be considered a regression in user-facing behaviour, I'm not convinced that it's really a regression as such since prior to PR 8912 the Metadata would fail to parse (with an XML error) and thus be ignored when setting the viewer title.
With the refactored Metadata parsing we're now able to parse this, which uncovered issues with a subset of broken Ghostscript Metadata that uses HTML character names.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1424938
Unless the debugging tools (i.e. `PDFBug`) are enabled, or the `browsertest` is running, the `PDFPageProxy.stats` aren't actually used for anything.
Rather than initializing unnecessary `StatTimer` instances, we can simply re-use *one* dummy class (with static methods) for every page. Note that by using a dummy `StatTimer` in this way, rather than letting `PDFPageProxy.stats` be undefined, we don't need to guard *every* single stats collection callsite.
Since it wouldn't make much sense to attempt to use `PDFPageProxy.stats` when stat collection is disabled, it was instead changed to a "private" property (i.e. `PDFPageProxy._stats`) and a getter was added for accessing `PDFPageProxy.stats`. This getter will now return `null` when stat collection is disabled, making that case easy to handle.
For benchmarking purposes, the test-suite used to re-create the `StatTimer` after loading/rendering each page. However, modifying properties on various API code from the outside in this way seems very error-prone, and is an anti-pattern that we really should avoid at all cost. Hence the `PDFPageProxy.cleanup` method was modified to accept an optional parameter, which will take care of resetting `this.stats` when necessary, and `test/driver.js` was updated accordingly.
Finally, a tiny bit more validation was added on the viewer side, to ensure that all the code we're attempting to access is defined when handling `PDFPageProxy` stats.
*This patch is the result of me starting to look into moving parameters from `PDFJS` into `getDocument` and other API methods.*
When familiarizing myself with the code, the signatures of the various network streams seemed to be unnecessarily cumbersome since `disableRange` is currently handled separately from other parameters.
I'm assuming that the explanation for this is probably "for historical reasons", as is often the case. Hence I'd like to clean this up *before* we start the larger, and more invasive, `PDFJS` parameter re-factoring.
Nothing uses this option anymore, so setting it is a no-op now. We can
safely remove it.
Use `SKIP_BABEL` (instead of `PDFJS_NEXT`) now if you want to skip Babel
translation for a build.
I don't have a good example at hand right know, but I recall seeing custom deployments of PDF.js that bundle a *specific* version of the `build/pdf.js` file and then set `PDFJS.workerSrc` to point to https://mozilla.github.io/pdf.js/build/pdf.worker.js.
That practice seems really bad since, besides (obviously) causing unnecessary server load, it will very quickly result in a version mismatch between the `pdf.js` and `pdf.worker.js` files in those PDF.js deployments.
Such a version mismatch could easily lead to either breaking errors, or even worse slightly inconsistent behaviour for an API call (if the API -> Worker interface changes, which does happen from time to time).
To avoid the problems described above, I'm thus proposing that we enforce that the versions of the `pdf.js` and `pdf.worker.js` files must always match.
The `DOMParser` is most likely overkill and may be less secure.
Moreover, it is not supported in Node.js environments.
This patch replaces the `DOMParser` with a simple XML parser. This
should be faster and gives us Node.js support for free. The simple XML
parser is a port of the one that existed in the examples folder with a
small regex fix to make the parsing work correctly.
The unit tests are extended for increased test coverage of the metadata
code. The new method `getAll` is provided so the example does not have
to access internal properties of the object anymore.
Currently `PDFFunction` is implemented (basically) like a class with only `static` methods. Since it's used directly in a number of different `src/core/` files, attempting to pass in `isEvalSupported` would result in code that's *very* messy, not to mention difficult to maintain (since *every* single `PDFFunction` method call would need to include a `isEvalSupported` argument).
Rather than having to wait for a possible re-factoring of `PDFFunction` that would avoid the above problems by design, it probably makes sense to at least set `isEvalSupported` globally for `PDFFunction`.
Please note that there's one caveat with this solution: If `PDFJS.getDocument` is used to open multiple files simultaneously, with *different* `PDFJS.isEvalSupported` values set before each call, then the last one will always win.
However, that seems like enough of an edge-case that we shouldn't have to worry about it. Besides, since we'll also test that `eval` is actually supported, it should be fine.
Fixes 5573.
This patch provides a new unit tested factory for creating SVG
containers and elements. This code is duplicated twice in the
codebase, but with upcoming changes this would need to be duplicated
even more. Moreover, consolidating this code in one factory allows
us to replace it easily for e.g., supporting Node.js. Therefore, move
this to a central place and update/ES6-ify the related code.
Finally, we replace `setAttributeNS` with `setAttribute` because no
namespace is provided.
Rather than displaying links that does *nothing* when clicked, it probably makes more sense to simply not render them instead. Especially since it turns out that, at least at this point in time, this is *very* easy to both implement and test.
Fixes 3897.
It seems that the status check, for non-HTTP loads, causes the default viewer to *refuse* to open local PDF files.
***STR:***
1. Make sure that fetch support is enabled in the browser. In Firefox Nightly, set `dom.streams.enabled = true` and `javascript.options.streams = true` in `about:config`.
2. Open https://mozilla.github.io/pdf.js/web/viewer.html.
3. Click on the "Open file" button, and open a new PDF file.
***ER:***
A new PDF file should open in the viewer.
***AR:***
The PDF file fails to open, with an error message of the following format:
`Message: Unexpected server response (200) while retrieving PDF "blob:https://mozilla.github.io/a4fc455f-bc05-45b5-b6aa-2ecff3cb45ce".`
The property and the setter for text rise were already present, but they
were never used or called. This patch completes the implementation by
calling the setter when the operator is encountered and by using the
text rise value when rendering text.
This bug is similar to the canvas bug of #6721.
I found this bug when I tried to run pdf2svg on a SVG file, and the generated
SVG could not be viewed in Chrome due to a SVG/XML parsing error:
"PCDATA invalid Char value 3"
Reduced test case:
- https://github.com/mozilla/pdf.js/files/1229507/pcdatainvalidchar.pdf
- expected: "hardware performance"
- Actual SVG source: "hardware\x03performance"
(where "\x03" is a non-printable character, and invalid XML).
In terms of rendering, this bug is similar to #6721, where an unexpected glyph
appeared in the canvas renderer. This was fixed by #7023, which skips over
missing glyphs. This commit follows a similar logic.
The test case from #6721 can be used here too:
- https://github.com/mozilla/pdf.js/files/52205/issue6721_reduced.pdf
expected: "Issue 6721"
actual (before this patch): "Issue ààà6721"
This replaces `assert` calls with `throw new FormatError()`/`throw new Error()`.
In a few places, throwing an `Error` (which is what `assert` meant) isn't correct since the enclosing function is supposed to return a `Promise`, hence some cases were changed to `Promise.reject(...)` and similarily for `createPromiseCapability` instances.
Use the environment's zlib implementation if available to get
reasonably-sized SVG files when an XObject image is converted to PNG.
The generated PNG is not optimal because we do not use a PNG predictor.
Futher, when our SVG backend is run in a browser, the generated PNG
images will still be unnecessarily large (though the use of blob:-URLs
when available should reduce the impact on memory usage). If we want to
optimize PNG images in browsers too, we can either try to use a DEFLATE
library such as pako, or re-use our XObject image painting logic in
src/display/canvas.js. This potential improvement is not implemented by
this commit
Tested with:
- Node.js 8.1.3 (uses zlib)
- Node.js 0.11.12 (uses zlib)
- Node.js 0.10.48 (falls back to inferior existing implementation).
- Chrome 59.0.3071.86
- Firefox 54.0
Tests:
Unit test on Node.js:
```
$ gulp lib
$ JASMINE_CONFIG_PATH=test/unit/clitests.json node ./node_modules/.bin/jasmine --filter=SVG
```
Unit test in browser: Run `gulp server` and open
http://localhost:8888/test/unit/unit_test.html?spec=SVGGraphics
To verify that the patch works as desired,
```
$ node examples/node/pdf2svg.js test/pdfs/xobject-image.pdf
$ du -b svgdump/xobject-image-1.svg
# ^ Calculates the file size. Confirm that the size is small
# (784 instead of 80664 bytes).
```
Move the DEFLATE logic in convertImgDataToPng to a separate function.
A later commit will introduce a more efficient deflate algorithm,
and fall back to the existing, naive algorithm if needed.
`__pdfjsdev_webpack__` was used to skip evaluating part of an AST,
in order to not mangle some `require` symbols.
This commit removes `__pdfjsdev_webpack__`, and:
- Uses `__non_webpack_require__` when one wants the output to
contain `require` instead of `__webpack_require__`.
- Adds options to the webpack config to prevent "polyfills" for
some Node.js-specific APIs to be added.
- Use `// eslint-disable-next-line no-undef` instead of `/* globals ... */`
for variables that are not meant to be used globally.
In general, we may not know the stroke properties when path construction
happens. Since we must know the properties when we apply the stroke, we
should set the properties at that point. Note that we already do that
for the color and opacity, but not yet for the other properties.
In the PDF from issue 8527, the clip operator (W) shows up before a path
is defined. The current SVG backend however expects a path to exist
before generating a `<svg:clipPath>` element.
In the example, the path was defined after the clip, followed by a
endPath operator (n).
So this commit fixes the bug by moving the path generation logic from
clip to endPath.
Our canvas backend appears to use similar logic:
`CanvasGraphics_endPath` calls `consumePath`, which in turn draws the
clip and resets the `pendingClip` state. The canvas backend calls
`consumePath` from multiple other places, so we probably need to check
whether doing so is also necessary for the SVG backend.
I scanned our corpus of PDF files in test/pdfs, and found that in every
instance (except for one), the "W" PDF operator (clip) is immediately
followed by "n" (endPath). The new test from this commit (clippath.pdf)
starts with "W", followed by a path definition and then "n".
# Commands used to find some of the clipping commands:
grep -ra '^W$' -C7 | less -S
grep -ra '^W ' -C7 | less -S
grep -ra ' W$' -C7 | less -S
test/pdfs/issue6413.pdf is the only file where "W" (a tline 55) is not
followed by "n". In fact, the "W" is the last operation of a series of
XObject painting operations, and removing it does not have any effect
on the rendered PDF (confirmed by looking at the output of PDF.js's
canvas backend, and ImageMagick's convert command).
This patch adds Streams API support in getTextContent
so that we can stream data in chunks instead of fetching
whole data from worker thread to main thread. This patch
supports Streams API without changing the core functionality
of getTextContent.
Enqueue textContent directly at getTextContent in partialEvaluator.
Adds desiredSize and ready property in streamSink.
http://eslint.org/docs/rules/comma-danglehttp://eslint.org/docs/rules/object-curly-spacing
Given that we currently have quite inconsistent object formatting, fixing this in *one* big patch probably wouldn't be feasible (since I cannot imagine anyone wanting to review that); hence I've opted to try and do this piecewise instead.
Please note: This patch was created automatically, using the ESLint `--fix` command line option. In a couple of places this caused lines to become too long, and I've fixed those manually; please refer to the interdiff below for the only hand-edits in this patch.
```diff
diff --git a/src/display/canvas.js b/src/display/canvas.js
index 5739f6f2..4216b2d2 100644
--- a/src/display/canvas.js
+++ b/src/display/canvas.js
@@ -2071,7 +2071,7 @@ var CanvasGraphics = (function CanvasGraphicsClosure() {
var map = [];
for (var i = 0, ii = positions.length; i < ii; i += 2) {
map.push({ transform: [scaleX, 0, 0, scaleY, positions[i],
- positions[i + 1]], x: 0, y: 0, w: width, h: height, });
+ positions[i + 1]], x: 0, y: 0, w: width, h: height, });
}
this.paintInlineImageXObjectGroup(imgData, map);
},
diff --git a/src/display/svg.js b/src/display/svg.js
index 9eb05dfa..2aa21482 100644
--- a/src/display/svg.js
+++ b/src/display/svg.js
@@ -458,7 +458,11 @@ SVGGraphics = (function SVGGraphicsClosure() {
for (var x = 0; x < fnArrayLen; x++) {
var fnId = fnArray[x];
- opList.push({ 'fnId': fnId, 'fn': REVOPS[fnId], 'args': argsArray[x], });
+ opList.push({
+ 'fnId': fnId,
+ 'fn': REVOPS[fnId],
+ 'args': argsArray[x],
+ });
}
return opListToTree(opList);
},
```
Note that by using `let` instead of `var` in `PartialEvaluator.setGState` and `TranslatedFont.loadType3Data`, we can get rid of further `bind` usages since `let` is block-scoped.
Also, the fact that `bind` wasn't used in the `Font` case inside of `setGState` is actually a bug which has been present ever since PR 5205, where a closure was replaced by a standard loop.[1]
---
[1] I'm not aware of any bugs caused by this, but that is probably more a happy accident than anything else, since e.g. just removing the `bind` from the `SMask` case without using block-scoped variables causes test failures.
Please see http://eslint.org/docs/rules/object-shorthand.
For the most part, these changes are of the search-and-replace kind, and the previously enabled `no-undef` rule should complement the tests in helping ensure that no stupid errors crept into to the patch.
[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815)
This patch implements support for line annotations. Other viewers only
show the popup annotation when hovering over the line, which may have
any orientation. To make this possible, we render an invisible line (SVG
element) over the line on the canvas that acts as the trigger for the
popup annotation. This invisible line has the same starting coordinates,
ending coordinates and width of the line on the canvas.
Other PDF readers, e.g. Adobe Reader and PDFium (in Chrome), will attempt to render as much of a page as possible even if there are errors present.
Currently we just bail as soon the first error is hit, which means that we'll usually not render anything in these cases and just display a blank page instead.
NOTE: This patch changes the default behaviour of the PDF.js API to always attempt to recover as much data as possible, even when encountering errors during e.g. `getOperatorList`/`getTextContent`, which thus improve our handling of corrupt PDF files and allow the default viewer to handle errors slightly more gracefully.
In the event that an API consumer wishes to use the old behaviour, where we stop parsing as soon as an error is encountered, the `stopAtErrors` parameter can be set at `getDocument`.
Fixes, inasmuch it's possible since the PDF files are corrupt, e.g. issue 6342, issue 3795, and [bug 1130815](https://bugzilla.mozilla.org/show_bug.cgi?id=1130815) (and probably others too).
- When the minified version is used the resolver of the worker can not find it properly and throws 404 error.
- The problem was that:
- It was getting the current name of the file.
- It was replacing **.js** by **.worker.js**
- When it was loading the unminified version it was working fine because:
- *pdf.js - .js + .worker.js* = **pdf.worker.js**
- When it was loading the minified version it didtn't work because:
- *pdf.min.js - .js + .worker.js* = **pdf.min.worker.js**
- **pdf.min.worker.js** doesn't exist the real file name is **pdf.worker.min.js**
The display layer is responsible for creating the HTML elements for the
annotations from the core layer. If we need to ignore border styling for
the containers of certain elements, the display layer should do so and
not the core layer. I noticed this during the implementation of line
annotations, for which we actually need the original border width in the
display layer, even though we ignore it for the container. If we set the
border style to zero in the core layer, this becomes impossible.
To prevent this, this patch moves the container border removal code from
the core layer to the display layer. This makes the core layer output
the unchanged annotation data and lets the display layer remove any
border styling if necessary.
I happened to notice that the error handling wasn't that great, which I missed previously since there were no unit-tests for failure to load built-in CMap files.
Hence this patch, which improves the error handling *and* adds tests.