It's no longer necessary since https://bugzilla.mozilla.org/show_bug.cgi?id=1305963 is fixed quite some time ago.
While we're here, mark the `cachedGetSinglePixelWidth` member as being
private and use ES6 syntax in the `getSinglePixelWidth` method.
For proof-of-concept, this patch converts a couple of `Promise` returning methods to use `async` instead.
Please note that the `generic` build, based on this patch, has been successfully testing in IE11 (i.e. the viewer loads and nothing is obviously broken).
Being able to use modern JavaScript features like `async`/`await` is a huge plus, but there's one (obvious) side-effect: The size of the built files will increase slightly (unless `SKIP_BABEL == true`). That's unavoidable, but seems like a small price to pay in the grand scheme of things.
Finally, note that the `chromium` build target was changed to no longer skip Babel translation, since the Chrome extension still supports version `49` of the browser (where native `async` support isn't available).
With the new XML parser, see PR 9573, the referenced PDF file now causes `getMetadata` to fail when incomplete XML tags are encountered. This provides a simple, and hopefully generally useful, work-around that may also help prevent future bugs.
(Without being able to reproduce nor even understand the other (non XML) errors mentioned in issue 8884, I'd say that this patch is enough to close that one as fixed.)
This moves/exposes the `URL` polyfill similarily to the existing `ReadableStream` polyfill, rather than exposing it globally, to avoid interfering with any "outside" code.
Both the `URL` and `ReadableStream` polyfills are now exposed on the `pdfjsLib` object, such that they are accessible to the viewer components.
Furthermore, the `no-restricted-globals` ESLint rule is also enabled to prevent accidental usage of the native `URL`/`ReadableStream` implementations directly in the `src/` and `web/` folders; see also https://eslint.org/docs/rules/no-restricted-globals
Addresses the remaining TODO in https://github.com/mozilla/pdf.js/projects/6
Currently if `RenderTask.cancel` is called *immediately* after rendering was started, then by the time that `InternalRenderTask.initializeGraphics` is called rendering will already have been cancelled.
However, we're still inserting the canvas into the `canvasInRendering` map, thus breaking any future attempts at re-rendering using the same canvas. Considering that `InternalRenderTask.cancel` always removes the canvas from the map, I cannot imagine that we'd ever want to re-add it *after* rendering was cancelled (it was likely just a simple oversight in PR 8519).
Fixes 9456.
This wasn't included in PR 9245, since all the API options were still global at that time.
Writing the unit-tests also uncovered an issue with `getOperatorList` not starting the "Page Request" timer.
Obviously it's still not possible to render non-embedded fonts as paths, but in this way the rest of the page will at least be allowed to continue rendering.
*Please note:* Including the 14 standard fonts in PDF.js probably wouldn't be *that* difficult to implement. (I'm not a lawyer, but the fonts from PDFium could probably be used given their BSD license.)
However, the main blocker ought to be the total size of the necessary font data, since I cannot imagine people being OK with shipping ~5 MB of (additional) font data with Firefox. (Based on the reactions when the CMap files were added, and those are only ~1 MB in size.)
The built-in image decoders are already using `Uint8ClampedArray` when returning data, and this patch simply extends that to the rest of the image/colorspace code.
As far as I can tell, the only reason for using manual clamping/rounding in the first place was because TypedArrays used to be polyfilled (using regular arrays). And trying to polyfill the native clamping/rounding would probably have been had too much overhead, but given that TypedArray support is required in PDF.js version `2.0` that's no longer a concern.
*Please note:* Because of different rounding behaviour, basically `Math.round` in `Uint8ClampedArray` respectively `Math.floor` in the old code, there will be very slight movement in quite a few existing test-cases. However, the changes should be imperceivable to the naked eye, given that the absolute difference is *at most* `1` for each RGB component when comparing `master` and this patch (see also the updated expectation values in the unit-tests).
Not only is the `Util.loadScript` helper function unused on the Worker side, even trying to use it there would throw an Error (since `document` isn't defined/available in Workers).
Hence this helper function is moved, and its code modernized slightly by having it return a Promise rather than needing a callback function.
Finally, to reduced code duplication, the "new" loadScript function is exported and used in the viewer.
The various classes have `this._errored` and `this._reason` properties, where the first one is a boolean indicating if an error was encountered and the second one contains the actual `Error` (or `null` initially).
In practice this means that errors are basically tracked *twice*, rather than just once. This kind of double-bookkeeping is generally a bad idea, since it's quite easy for the properties to (accidentally) get into an inconsistent state whenever the relevant code is modified.
Rather than using a separate boolean, we can just as well check the "error" property directly (since `null` is falsy).
---
Somewhat unrelated to this patch, but `src/display/node_stream.js` is currently *not* handling errors in a consistent or even correct way; compared with `src/display/network.js` and `src/display/fetch_stream.js`.
Obviously using the `createResponseStatusError` utility function, from `src/display/network_utils.js`, might not make much sense in a Node.js environment. However at the *very* least it seems that `MissingPDFException`, or `UnknownErrorException` when one cannot tell that the PDF file is "missing", should be manually thrown.
As is, the API (i.e. `getDocument`) is not returning the *expected* errors when loading fails in Node.js environments (as evident from the `pending` API unit-test).
This special handling was added in PR 8567, but was made redundant in PR 8721 which stopped sending everything but the kitchen sink to the Worker side.
Since `PDFPageProxy` already provide getters for all the data returned by `GetPage` (in the Worker), there isn't any compelling reason for accessing the `pageInfo` directly on `PDFPageProxy`.
The patch also changes the `GetPage` handler, in `src/core/worker.js`, to use modern JavaScript features.
Since `PDFDocumentProxy` already provide getters for all the data returned by `GetDoc` (in the Worker), there isn't any compelling reason for accessing the `pdfInfo` directly on `PDFDocumentProxy`.
After PR 8617 the `PDFManagerReady` message handler function, in `src/display/api.js`, is now a no-op. Hence it seems completely unnecessary to keep sending this message from `src/core/worker.js`.
With native typed array support now being mandatory in PDF.js, since version 2.0, this probably isn't a huge problem even though the current code seems wrong (it was changed in PR 6571).
Note how in the `!(data instanceof Uint8Array)` case we're currently attempting to send `handler.send('test', 'main', false);` to the main-thread, which doesn't really make any sense since the signature of the method reads `send(actionName, data, transfers) {`.
Hence the data that's *actually* being sent here is `'main'`, with `false` as the transferList, which just seems weird. On the main-thread, this means that we're in this case checking `data && data.supportTypedArray`, where `data` contains the string `'main'` rather than being falsy. Since a string doesn't have a `supportTypedArray` property, that check still fails as expected but it doesn't seem great nonetheless.
Since all the built-in PDF.js image decoders now return their data as `Uint8ClampedArray`, for consistency `JpegDecode` on the main-thread should be doing the same thing; follow-up to PR 8778.
The signature of the `PDFWorker.fromPort` method, in addition to the `PDFWorker` constructor, was changed in PR 9480.
Hence it's probably a good idea to add a bit more validation to `PDFWorker.fromPort`, to ensure that it won't fail silently for an API consumer that updates to version 2.0 of the PDF.js library.
With version 2.0, native support for typed arrays is now a requirement for using the PDF.js library; see PR 9094 where the old polyfills were removed.
Hence the `isTypedArraysPresent` check, when setting up fake workers, no longer serves any purpose here and can thus be removed.
There's no good reason, as far as I can tell, to duplicate the functionality of the `LoopbackPort` in the unit-tests. The only difference between the implementations is that `LoopbackPort` mimics the (native) structured cloning, however that shouldn't matter here since the tests are only sending "simple" data (strings respectively arrays with numbers).
Furthermore the patch also changes `LoopbackPort` to default to using "structured cloning" and deferred invocation of the listeners, since native typed array support is now a requirement for using the PDF.js library.
The `MessageHandler` itself, and its assorted helper functions, are currently the single largest[1] piece of code in the `src/shared/util.js` file. By moving this code into its own file, `src/shared/util.js` thus becomes smaller and more manageable.
The `fontScale` property was added in PR 1531, see commit b312719d7e in particular, apparently for the sole purpose of supporting the "acroforms" example.
However, the `fontScale` property was never used anywhere else in the code-base, and after the modernization of the "acroforms" example in PR 8030 it's been completely unused.
Finally, note that there's also a (more suitably named) `scale` property on `PageViewport` instances, which contains the exact same information as the property being removed here.
I made some mistakes when trying to make the content_disposition.js
compatible with non-modern browsers (IE/Edge).
Notably, text decoding was usually skipped because of the inverted
logical check at the top of `textdecode`.
I verified that this new version works as expected, as follows:
1. Visit 55c71eb44e/test/
and get test-content-disposition.js
also get test-content-disposition.node.js if using Node.js,
or get test-content-disposition.html if you use a browser.
2. Modify `test-content-disposition.node.js` (or the HTML file) and
change `../extension/content-disposition.js` to `PDFJS-content_disposition.js`
3. Copy the `getFilenameFromContentDispositionHeader` function from
`content_disposition.js` (i.e. the file without the trailing exports)
and save it as `PDFJS-content_disposition.js`.
4. Run the tests (`node test-content-disposition.node.js` or by opening
`test-content-disposition.html` in a browser).
5. Confirm that there are no failures: "Finished all tests (0 failures)"
The code has a best-efforts fallback for Microsoft Edge, which lacks the
TextDecoder API. The fallback only supports the common UTF-8 encoding.
To simulate this in a test, modify `PDFJS-content_disposition.js` and
deliberately throw an error before `new TextDecoder`. There will be two
failures because we don't want to include too much code to support text
decoding for non-UTF-8 encodings in Edge
```
test-content-disposition.js:265 Assertion failed: Input: attachment; filename*=ISO-8859-1''%c3%a4
Expected: "ä"
Actual : "ä"
test-content-disposition.js:268 Assertion failed: Input: attachment; filename*=ISO-8859-1''%e2%82%ac
Expected: "€"
Actual : "€"
```
Please note that while the current code works, both in the viewer and the unit-tests, it can leave the `WorkerTransport._passwordCapability` Promise in a pending state.
In the `PasswordRequest` handler, in src/display/api.js, we're returning the Promise from a `capability` object (rather than just a "plain" Promise). While an error thrown anywhere within this handler was fortunately enough to propagate it to the Worker side, it won't cause the Promise (in `WorkerTransport._passwordCapability`) to actually be rejected.
Finally note that while we're now catching errors in the `PasswordRequest` handler, those errors are still propagated to the Worker side via the (now) rejected Promise and the existing `return this._passwordCapability.promise;` line.
This prevents warnings about uncaught Promises, with messages such as "Error: Worker was destroyed during onPassword callback", when running the unit-tests both in browsers *and* in Node.js/Travis.
This should provide a better out-of-the-box experience when using PDF.js in a Node.js environment, since it's missing native support for both `@font-face` and `Image`.
Please note that this change *only* affects the default values, hence it's still possible for an API consumer to override those values when calling `getDocument`.
Also, prevents "ReferenceError: document is not defined" errors, when running the unit-tests in Node.js/Travis.
*This is a final piece of clean-up of code that I recently wrote, after which I'm done :-)*
When the `getMainThreadWorkerMessageHandler` function was added, in PR 9385, it did so by basically introducing a `web/app.js` dependency in `src/display/api.js` through the `window.pdfjsNonProductionPdfWorker` property[1]. Even though this is limited to non-`PRODUCTION` mode, i.e. `gulp server`, it still seems unfortunate to have that sort of viewer dependency in the API code itself.
With the new, much nicer and shorter, names introduced in PR 9565 we can remove this non-`PRODUCTION` hack and just use `window.pdfjsWorker` in both the viewer and the API regardless of the build mode.
---
[1] It didn't seem correct to piggy-back on the `window.pdfjsDistBuildPdfWorker` property in non-`PRODUCTION` mode.
The `getPageSizeInches` method was implemented on `PDFDocumentProxy`, which seems conceptually wrong since the size property isn't global to the document but rather specific to each page. Hence the method is moved into `PDFPageProxy`, as `get pageSizeInches` instead to address this.
Despite the fact that new API functionality was implemented, no unit-tests were added. To prevent issues later on, we should *always* ensure that new functionality has at least some test-coverage; something that this patch also takes care of.
The new `PDFDocumentProperties._parsePageSize` method seemed unnecessary convoluted. Furthermore, in the "no data provided"-case it even returned incorrect data (an array, rather than the expected object).
Finally, the fallback strings didn't actually agree with the `en-US` locale. This inconsistency doesn't look too great, and it's thus addressed here as well.
With options being moved from the global `PDFJS` object and into `getDocument`, a side-effect is that we're now passing in a fair number of useless parameters to the various transport/network streams.
Even though this doesn't *currently* cause any problems, it nonetheless seem like a good idea to explicitly provide the parameters that are actually necessary.
One additional complication with removing this option from the global `PDFJS` object, is that the viewer currently needs to check `disableAutoFetch` in a couple of places. To address this I'm thus proposing adding a getter in `PDFDocumentProxy`, to allow checking the *actually* used values for a particular `getDocument` invocation.
Unfortunately, as far as I can tell, we still need the ability to adjust certain API options depending on the browser environment in PDF.js version `2.0`. However, we should be able to separate this from the general compatibility code in the `src/shared/compatibility.js` file.
Compared to most other options currently/previously residing on the global `PDFJS` object, some of the Worker specific ones (e.g. `workerPort`/`workerSrc`) probably cannot be moved into options provided directly when initializing e.g. `PDFWorker`.
The reason is that in some cases, e.g. the Webpack examples, we try to provide Worker auto-configuration and I cannot see a good solution for that use-case if we completely remove the globally available Worker configuration.
However inline with previous patches for PDF.js version `2.0`, it does seem like a worthwhile goal to move away from storing options directly on the global `PDFJS` object, since that is a pattern we should avoid going forward. Especially since one of the (eventual) goals is to attempt to *completely* remove the global `PDFJS` object, and rely solely on exporting/importing the needed functionality.
By introducing the `GlobalWorkerOptions` we thus have larger flexibility in the future, if/when the global `PDFJS` object will be removed.
In order to simplify things, the undocumented `enableStats` option was removed and `pdfBug` is now instead used to enabled general debugging *and* page request/rendering stats.
Considering that in the default viewer the `stats` was only used when debugging was also enabled, this simplification (code wise) definitely seem worthwhile to me.
This removes the `PDFJS.externalLinkTarget`/`PDFJS.externalLinkRel` dependency from the viewer components, but please note that as a *temporary* solution the default viewer still uses it.
This removes the `PDFJS.imageResourcesPath` dependency from the viewer components and the test-suite, but please note that as a *temporary* solution the default viewer still uses it.
Fallback to the built-in JPEG decoder when browser decoding fails, and attempt to handle JPEG images with DNL (Define Number of Lines) markers (issue 8614)
First of all, note how in both `fetch_stream.js` and `node_stream.js` we always overwrite the `this._contentLength` property even when the response headers doesn't actually contain any (valid) length information. This could thus result in the `length` parameter, as passed to the network stream, being completely ignored despite having no better information available.
Secondly, in `node_stream.js` the `this._isRangeSupported` property wasn't always updated correctly based on the response headers.
This works by making `PartialEvaluator.buildPaintImageXObject` wait for the success/failure of `loadJpegStream` on the API side *before* parsing continues.
Please note that in practice, it should be quite rare for the browser to fail loading/decoding of a JPEG image. In the general case, it should thus not be completely surprising if even `src/core/jpg.js` will fail to decode the image.
Since `loadJpegStream` is only used at a *single* spot in the code-base, and given that it's very heavily tailored to the calling code (since it relies on the data structure of `PDFObjects`), this patch simply inlines the code in `src/display/api.js` instead.
Despite this patch removing the `disableWorker` option itself, please note that we'll still fallback to loading the worker file(s) on the main-thread when running in environments without proper Web Worker support.
Furthermore it's still possible, even with this patch, to force the use of fake workers by manually loading the necessary file using a `<script>` tag on the main-thread.[1]
That way, the functionality of the now removed `SINGLE_FILE` build target and the resulting `build/pdf.combined.js` file can still be achieved simply by adding e.g. `<script src="build/pdf.worker.js"></script>` to the HTML (obviously with the path adjusted as needed).
Finally note that the `disableWorker` option is a performance footgun, and unfortunately many existing third-party examples actually use it without providing any sort of warning/justification.
---
[1] This approach is used in the default viewer, since certain kind of debugging may be easier if the code is running directly on the main-thread.
This method returns the currently used `workerSrc`, which thus allows obtaining the fallback `workerSrc` value (e.g. when the option wasn't set by the user).
Please note that this build target, and the resulting `build/pdf.combined.js` file, is equivalent to setting the `PDFJS.disableWorker` option to `true` which is a performance footgun.
It turns out that PR 9245 unfortunately broke benchmarking completely, sorry about that!
The bug is that we were attempting to reset the current instance of `StatTimer`, instead of creating a new one as was previously done. By resetting the current instance, the `StatTimer` data fetched in `test/driver.js` is now wiped out since it points to the *same* underlying object.
This re-use of a `StatTimer` instance was asked for during review, and unfortunately I didn't test this thoroughly enough before submitting the final version of the PR.[1]
---
[1] Note that while I did test the benchmarking scripts with that PR *before* initially submitting it, I did however forget to do that after addressing the review comments which might explain why this problem went unnoticed.
This patch updates the `IPDFStreamReader` interface and ensures that the interface/implementation of `network.js`, `fetch_stream.js`, `node_stream.js`, and `transport_stream.js` all match properly.
The unit-tests are also adjusted, to more closely replicate the actual behaviour of the various actual `IPDFStreamReader` implementations.
Finally, this patch adjusts the use of the Content-Disposition filename when setting the title in the viewer, and adds `PDFDocumentProperties` support as well.
The `fetch` API is only supported for http(s), even in Chrome extensions.
Because of this limitation, we should use the XMLHttpRequest API when the
requested URL is not a http(s) URL.
Fixes#9361
These were removed in PR 9170, since they were unused in the browsers that we'll support in PDF.js version `2.0`.
However looking at the output of Travis, where a subset of the unit-tests are run using Node.js, there's warnings about `btoa` being undefined. This doesn't appear to cause any errors, which probably explains why we didn't notice this before (despite PR 9201).
Since multiple empty lines is virtually unused in the code-base, and the few cases that do exist look like "typos", let's enforce greater consistency here; please see https://eslint.org/docs/rules/no-multiple-empty-lines.
It is only used in a few places to handle prefixing style properties if
necessary. However, we used it only for `transform`, `transformOrigin`
and `borderRadius`, which according to Can I Use are supported natively
(unprefixed) in the browsers that PDF.js 2.0 supports. Therefore, we can
remove this class, which should help performance too since this avoids
extra function calls in parts of the code that are called often.
This was an oversight in PR 9095, which unfortunately breaks rendering in some PDF files (e.g. the one from issue 6737).
It thus appears that we don't have any test-coverage for this code-path, and given the relative complexity of the PDF files affected by this bug I wasn't able to easily create a reduced test-case.
*Please note:* The linked test-case included in this patch is currently *not* rendered correctly (that'd be the PR 6606), but it at least gives us some test-coverage here.
I recall being confused as to the purpose of the `encrypted` property all the way back when working on PR 4750.
Looking at the history, this property was added in PR 1698 when password support was added to the API/viewer. However, its only purpose seem to have been to facilitate the addition of a `isEncrypted` function in the API. That function never, as far as I can tell, saw any use and was unceremoniously removed in PR 4144.
Since we want to avoid sending all non-essential data early during initial document loading (e.g. PR 4750), it seems correct to get rid of the `encrypted` property. Especially since it hasn't even been exposed in the API for over three years, with no complaints that I'm aware of.
Finally note that the `encrypt` property on the `XRef` instance isn't tied to the code that's being removed here. Given that we're calling `PDFDocument.parse` during `createDocumentHandler` in the worker which, via `PDFDocument.setup`, calls `XRef.parse` where the `Encrypt` data (if it exists) is always parsed.
Please note that while this could be considered a regression in user-facing behaviour, I'm not convinced that it's really a regression as such since prior to PR 8912 the Metadata would fail to parse (with an XML error) and thus be ignored when setting the viewer title.
With the refactored Metadata parsing we're now able to parse this, which uncovered issues with a subset of broken Ghostscript Metadata that uses HTML character names.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1424938
Unless the debugging tools (i.e. `PDFBug`) are enabled, or the `browsertest` is running, the `PDFPageProxy.stats` aren't actually used for anything.
Rather than initializing unnecessary `StatTimer` instances, we can simply re-use *one* dummy class (with static methods) for every page. Note that by using a dummy `StatTimer` in this way, rather than letting `PDFPageProxy.stats` be undefined, we don't need to guard *every* single stats collection callsite.
Since it wouldn't make much sense to attempt to use `PDFPageProxy.stats` when stat collection is disabled, it was instead changed to a "private" property (i.e. `PDFPageProxy._stats`) and a getter was added for accessing `PDFPageProxy.stats`. This getter will now return `null` when stat collection is disabled, making that case easy to handle.
For benchmarking purposes, the test-suite used to re-create the `StatTimer` after loading/rendering each page. However, modifying properties on various API code from the outside in this way seems very error-prone, and is an anti-pattern that we really should avoid at all cost. Hence the `PDFPageProxy.cleanup` method was modified to accept an optional parameter, which will take care of resetting `this.stats` when necessary, and `test/driver.js` was updated accordingly.
Finally, a tiny bit more validation was added on the viewer side, to ensure that all the code we're attempting to access is defined when handling `PDFPageProxy` stats.
*This patch is the result of me starting to look into moving parameters from `PDFJS` into `getDocument` and other API methods.*
When familiarizing myself with the code, the signatures of the various network streams seemed to be unnecessarily cumbersome since `disableRange` is currently handled separately from other parameters.
I'm assuming that the explanation for this is probably "for historical reasons", as is often the case. Hence I'd like to clean this up *before* we start the larger, and more invasive, `PDFJS` parameter re-factoring.
Nothing uses this option anymore, so setting it is a no-op now. We can
safely remove it.
Use `SKIP_BABEL` (instead of `PDFJS_NEXT`) now if you want to skip Babel
translation for a build.
I don't have a good example at hand right know, but I recall seeing custom deployments of PDF.js that bundle a *specific* version of the `build/pdf.js` file and then set `PDFJS.workerSrc` to point to https://mozilla.github.io/pdf.js/build/pdf.worker.js.
That practice seems really bad since, besides (obviously) causing unnecessary server load, it will very quickly result in a version mismatch between the `pdf.js` and `pdf.worker.js` files in those PDF.js deployments.
Such a version mismatch could easily lead to either breaking errors, or even worse slightly inconsistent behaviour for an API call (if the API -> Worker interface changes, which does happen from time to time).
To avoid the problems described above, I'm thus proposing that we enforce that the versions of the `pdf.js` and `pdf.worker.js` files must always match.
The `DOMParser` is most likely overkill and may be less secure.
Moreover, it is not supported in Node.js environments.
This patch replaces the `DOMParser` with a simple XML parser. This
should be faster and gives us Node.js support for free. The simple XML
parser is a port of the one that existed in the examples folder with a
small regex fix to make the parsing work correctly.
The unit tests are extended for increased test coverage of the metadata
code. The new method `getAll` is provided so the example does not have
to access internal properties of the object anymore.
Currently `PDFFunction` is implemented (basically) like a class with only `static` methods. Since it's used directly in a number of different `src/core/` files, attempting to pass in `isEvalSupported` would result in code that's *very* messy, not to mention difficult to maintain (since *every* single `PDFFunction` method call would need to include a `isEvalSupported` argument).
Rather than having to wait for a possible re-factoring of `PDFFunction` that would avoid the above problems by design, it probably makes sense to at least set `isEvalSupported` globally for `PDFFunction`.
Please note that there's one caveat with this solution: If `PDFJS.getDocument` is used to open multiple files simultaneously, with *different* `PDFJS.isEvalSupported` values set before each call, then the last one will always win.
However, that seems like enough of an edge-case that we shouldn't have to worry about it. Besides, since we'll also test that `eval` is actually supported, it should be fine.
Fixes 5573.
This patch provides a new unit tested factory for creating SVG
containers and elements. This code is duplicated twice in the
codebase, but with upcoming changes this would need to be duplicated
even more. Moreover, consolidating this code in one factory allows
us to replace it easily for e.g., supporting Node.js. Therefore, move
this to a central place and update/ES6-ify the related code.
Finally, we replace `setAttributeNS` with `setAttribute` because no
namespace is provided.
Rather than displaying links that does *nothing* when clicked, it probably makes more sense to simply not render them instead. Especially since it turns out that, at least at this point in time, this is *very* easy to both implement and test.
Fixes 3897.
It seems that the status check, for non-HTTP loads, causes the default viewer to *refuse* to open local PDF files.
***STR:***
1. Make sure that fetch support is enabled in the browser. In Firefox Nightly, set `dom.streams.enabled = true` and `javascript.options.streams = true` in `about:config`.
2. Open https://mozilla.github.io/pdf.js/web/viewer.html.
3. Click on the "Open file" button, and open a new PDF file.
***ER:***
A new PDF file should open in the viewer.
***AR:***
The PDF file fails to open, with an error message of the following format:
`Message: Unexpected server response (200) while retrieving PDF "blob:https://mozilla.github.io/a4fc455f-bc05-45b5-b6aa-2ecff3cb45ce".`
The property and the setter for text rise were already present, but they
were never used or called. This patch completes the implementation by
calling the setter when the operator is encountered and by using the
text rise value when rendering text.
This bug is similar to the canvas bug of #6721.
I found this bug when I tried to run pdf2svg on a SVG file, and the generated
SVG could not be viewed in Chrome due to a SVG/XML parsing error:
"PCDATA invalid Char value 3"
Reduced test case:
- https://github.com/mozilla/pdf.js/files/1229507/pcdatainvalidchar.pdf
- expected: "hardware performance"
- Actual SVG source: "hardware\x03performance"
(where "\x03" is a non-printable character, and invalid XML).
In terms of rendering, this bug is similar to #6721, where an unexpected glyph
appeared in the canvas renderer. This was fixed by #7023, which skips over
missing glyphs. This commit follows a similar logic.
The test case from #6721 can be used here too:
- https://github.com/mozilla/pdf.js/files/52205/issue6721_reduced.pdf
expected: "Issue 6721"
actual (before this patch): "Issue ààà6721"
This replaces `assert` calls with `throw new FormatError()`/`throw new Error()`.
In a few places, throwing an `Error` (which is what `assert` meant) isn't correct since the enclosing function is supposed to return a `Promise`, hence some cases were changed to `Promise.reject(...)` and similarily for `createPromiseCapability` instances.
Use the environment's zlib implementation if available to get
reasonably-sized SVG files when an XObject image is converted to PNG.
The generated PNG is not optimal because we do not use a PNG predictor.
Futher, when our SVG backend is run in a browser, the generated PNG
images will still be unnecessarily large (though the use of blob:-URLs
when available should reduce the impact on memory usage). If we want to
optimize PNG images in browsers too, we can either try to use a DEFLATE
library such as pako, or re-use our XObject image painting logic in
src/display/canvas.js. This potential improvement is not implemented by
this commit
Tested with:
- Node.js 8.1.3 (uses zlib)
- Node.js 0.11.12 (uses zlib)
- Node.js 0.10.48 (falls back to inferior existing implementation).
- Chrome 59.0.3071.86
- Firefox 54.0
Tests:
Unit test on Node.js:
```
$ gulp lib
$ JASMINE_CONFIG_PATH=test/unit/clitests.json node ./node_modules/.bin/jasmine --filter=SVG
```
Unit test in browser: Run `gulp server` and open
http://localhost:8888/test/unit/unit_test.html?spec=SVGGraphics
To verify that the patch works as desired,
```
$ node examples/node/pdf2svg.js test/pdfs/xobject-image.pdf
$ du -b svgdump/xobject-image-1.svg
# ^ Calculates the file size. Confirm that the size is small
# (784 instead of 80664 bytes).
```
Move the DEFLATE logic in convertImgDataToPng to a separate function.
A later commit will introduce a more efficient deflate algorithm,
and fall back to the existing, naive algorithm if needed.
`__pdfjsdev_webpack__` was used to skip evaluating part of an AST,
in order to not mangle some `require` symbols.
This commit removes `__pdfjsdev_webpack__`, and:
- Uses `__non_webpack_require__` when one wants the output to
contain `require` instead of `__webpack_require__`.
- Adds options to the webpack config to prevent "polyfills" for
some Node.js-specific APIs to be added.
- Use `// eslint-disable-next-line no-undef` instead of `/* globals ... */`
for variables that are not meant to be used globally.
In general, we may not know the stroke properties when path construction
happens. Since we must know the properties when we apply the stroke, we
should set the properties at that point. Note that we already do that
for the color and opacity, but not yet for the other properties.
In the PDF from issue 8527, the clip operator (W) shows up before a path
is defined. The current SVG backend however expects a path to exist
before generating a `<svg:clipPath>` element.
In the example, the path was defined after the clip, followed by a
endPath operator (n).
So this commit fixes the bug by moving the path generation logic from
clip to endPath.
Our canvas backend appears to use similar logic:
`CanvasGraphics_endPath` calls `consumePath`, which in turn draws the
clip and resets the `pendingClip` state. The canvas backend calls
`consumePath` from multiple other places, so we probably need to check
whether doing so is also necessary for the SVG backend.
I scanned our corpus of PDF files in test/pdfs, and found that in every
instance (except for one), the "W" PDF operator (clip) is immediately
followed by "n" (endPath). The new test from this commit (clippath.pdf)
starts with "W", followed by a path definition and then "n".
# Commands used to find some of the clipping commands:
grep -ra '^W$' -C7 | less -S
grep -ra '^W ' -C7 | less -S
grep -ra ' W$' -C7 | less -S
test/pdfs/issue6413.pdf is the only file where "W" (a tline 55) is not
followed by "n". In fact, the "W" is the last operation of a series of
XObject painting operations, and removing it does not have any effect
on the rendered PDF (confirmed by looking at the output of PDF.js's
canvas backend, and ImageMagick's convert command).
This patch adds Streams API support in getTextContent
so that we can stream data in chunks instead of fetching
whole data from worker thread to main thread. This patch
supports Streams API without changing the core functionality
of getTextContent.
Enqueue textContent directly at getTextContent in partialEvaluator.
Adds desiredSize and ready property in streamSink.
http://eslint.org/docs/rules/comma-danglehttp://eslint.org/docs/rules/object-curly-spacing
Given that we currently have quite inconsistent object formatting, fixing this in *one* big patch probably wouldn't be feasible (since I cannot imagine anyone wanting to review that); hence I've opted to try and do this piecewise instead.
Please note: This patch was created automatically, using the ESLint `--fix` command line option. In a couple of places this caused lines to become too long, and I've fixed those manually; please refer to the interdiff below for the only hand-edits in this patch.
```diff
diff --git a/src/display/canvas.js b/src/display/canvas.js
index 5739f6f2..4216b2d2 100644
--- a/src/display/canvas.js
+++ b/src/display/canvas.js
@@ -2071,7 +2071,7 @@ var CanvasGraphics = (function CanvasGraphicsClosure() {
var map = [];
for (var i = 0, ii = positions.length; i < ii; i += 2) {
map.push({ transform: [scaleX, 0, 0, scaleY, positions[i],
- positions[i + 1]], x: 0, y: 0, w: width, h: height, });
+ positions[i + 1]], x: 0, y: 0, w: width, h: height, });
}
this.paintInlineImageXObjectGroup(imgData, map);
},
diff --git a/src/display/svg.js b/src/display/svg.js
index 9eb05dfa..2aa21482 100644
--- a/src/display/svg.js
+++ b/src/display/svg.js
@@ -458,7 +458,11 @@ SVGGraphics = (function SVGGraphicsClosure() {
for (var x = 0; x < fnArrayLen; x++) {
var fnId = fnArray[x];
- opList.push({ 'fnId': fnId, 'fn': REVOPS[fnId], 'args': argsArray[x], });
+ opList.push({
+ 'fnId': fnId,
+ 'fn': REVOPS[fnId],
+ 'args': argsArray[x],
+ });
}
return opListToTree(opList);
},
```
Note that by using `let` instead of `var` in `PartialEvaluator.setGState` and `TranslatedFont.loadType3Data`, we can get rid of further `bind` usages since `let` is block-scoped.
Also, the fact that `bind` wasn't used in the `Font` case inside of `setGState` is actually a bug which has been present ever since PR 5205, where a closure was replaced by a standard loop.[1]
---
[1] I'm not aware of any bugs caused by this, but that is probably more a happy accident than anything else, since e.g. just removing the `bind` from the `SMask` case without using block-scoped variables causes test failures.
Please see http://eslint.org/docs/rules/object-shorthand.
For the most part, these changes are of the search-and-replace kind, and the previously enabled `no-undef` rule should complement the tests in helping ensure that no stupid errors crept into to the patch.
[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815)
This patch implements support for line annotations. Other viewers only
show the popup annotation when hovering over the line, which may have
any orientation. To make this possible, we render an invisible line (SVG
element) over the line on the canvas that acts as the trigger for the
popup annotation. This invisible line has the same starting coordinates,
ending coordinates and width of the line on the canvas.
Other PDF readers, e.g. Adobe Reader and PDFium (in Chrome), will attempt to render as much of a page as possible even if there are errors present.
Currently we just bail as soon the first error is hit, which means that we'll usually not render anything in these cases and just display a blank page instead.
NOTE: This patch changes the default behaviour of the PDF.js API to always attempt to recover as much data as possible, even when encountering errors during e.g. `getOperatorList`/`getTextContent`, which thus improve our handling of corrupt PDF files and allow the default viewer to handle errors slightly more gracefully.
In the event that an API consumer wishes to use the old behaviour, where we stop parsing as soon as an error is encountered, the `stopAtErrors` parameter can be set at `getDocument`.
Fixes, inasmuch it's possible since the PDF files are corrupt, e.g. issue 6342, issue 3795, and [bug 1130815](https://bugzilla.mozilla.org/show_bug.cgi?id=1130815) (and probably others too).
- When the minified version is used the resolver of the worker can not find it properly and throws 404 error.
- The problem was that:
- It was getting the current name of the file.
- It was replacing **.js** by **.worker.js**
- When it was loading the unminified version it was working fine because:
- *pdf.js - .js + .worker.js* = **pdf.worker.js**
- When it was loading the minified version it didtn't work because:
- *pdf.min.js - .js + .worker.js* = **pdf.min.worker.js**
- **pdf.min.worker.js** doesn't exist the real file name is **pdf.worker.min.js**
The display layer is responsible for creating the HTML elements for the
annotations from the core layer. If we need to ignore border styling for
the containers of certain elements, the display layer should do so and
not the core layer. I noticed this during the implementation of line
annotations, for which we actually need the original border width in the
display layer, even though we ignore it for the container. If we set the
border style to zero in the core layer, this becomes impossible.
To prevent this, this patch moves the container border removal code from
the core layer to the display layer. This makes the core layer output
the unchanged annotation data and lets the display layer remove any
border styling if necessary.
I happened to notice that the error handling wasn't that great, which I missed previously since there were no unit-tests for failure to load built-in CMap files.
Hence this patch, which improves the error handling *and* adds tests.
I really cannot understand why this change is necessary, since modern browsers such as Firefox and Chrome work just fine with the old code.
Hence this is patch is yet another "hack" that's needed just because IE apparently cannot just work like you'd expect.
For consistency, the Node factory used in the CMap unit-tests is changed as well.
Fixes 8193.
This patch gets rid of the only case in the code-base where we're throwing a plain `string`, rather than an `Error`, which besides better/more consistent error handling also allows us to enable the [`no-throw-literal`](http://eslint.org/docs/rules/no-throw-literal) ESLint rule.
Currently the built-in CMap files are loaded in `src/core/cmap.js` using `XMLHttpRequest` directly. For some environments that might be a problem, hence this patch refactors that to instead use a factory to load built-in CMaps on the main thread and message the data to the worker thread.
This is inspired by other recent work, e.g. the addition of the `CanvasFactory`, and to a large extent on the IRC discussion starting at http://logs.glob.uno/?c=mozilla%23pdfjs&s=12+Oct+2016&e=12+Oct+2016#c53010.
Note that I initially tried to add this as a parameter to the `PDFPageProxy.render` method, such that it could be passed to `PartialEvaluator.getOperatorList`.
However, given all the different code-paths that call `getOperatorList` (there's a bunch only in `annotation.js`), this seemed to very quickly become unwieldy and thus difficult to maintain compared to simply using the existing `evaluatorOptions`.
Fixes extra canvas create calls.
Fixes unnecessary call of `new DOMCanvasFactory`.
Fixes undefined error of DOMCanvasFactory.
Fixes failures in some of the tests.
Fixes expected behaviour.
Remove unused vars.
The `Driver._cleanup` method is removing all stylesheets between test runs, which causes "TypeError: styleElement.parentNode is null" console errors in `FontLoader.clear`.
As can also be seen during various tests, some of the changes I made in PR 7972 unfortunately causes console errors.
It seems that I didn't test this properly, since it *should* have been obvious to me that while tests are triggered using Node.js, the files in question are run within the *browser*.
My apologies for not testing this thoroughly, and for causing unnecessary churn in the code!
See http://eslint.org/docs/rules/brace-style.
Having the opening/closing braces on the same line can often make the code slightly more difficult to read, in particular for `if`/`else if` statements, compared to using new lines.
This patch also, for consistency with `mozilla-central`, enables the [`no-iterator`](http://eslint.org/docs/rules/no-iterator) rule. Note that this rule didn't require a single code change.
Other PDF viewers, e.g. Adobe Reader, seem to append `FileAttachment`s to their attachments views.
One obvious difference in PDF.js is that we cannot append all the annotations on document load, since that would require parsing *every* page. Despite that, it still seems like a good idea to add `FileAttachment`s, since it's thus possible to access all the various types of attachments from a single place.
*Note:* With the previous patch we display a notification when a `FileAttachment` is added to the sidebar, which thus makes appending the contents of these annotations to the sidebar slightly more visible/useful.
Please see http://eslint.org/docs/rules/no-unused-vars; note that this patch purposely uses the same rule options as in `mozilla-central`, such that it fixes part of issue 7957.
It wasn't, in my opinion, entirely straightforward to enable this rule compared to the already existing rules. In many cases a `var descriptiveName = ...` format was used (more or less) to document the code, and I choose to place the old variable name in a trailing comment to not lose that information.
I welcome feedback on these changes, since it wasn't always entirely easy to know what changes made the most sense in every situation.
Please see http://eslint.org/docs/rules/spaced-comment.
Note that the exceptions added for `line` comments are intended to still allow use of the old preprocessor without linting errors.
Also, I took the opportunity to improve the grammar slightly (w.r.t. capitalization and punctuation) for comments touched in the patch.
Using `else` after `return` is not necessary, and can often lead to unnecessarily cluttered code. By using the `no-else-return` rule in ESLint we can avoid this pattern, see http://eslint.org/docs/rules/no-else-return.
This patch also removes the `UpdatePassword` message, in favour of using the `sendWithPromise` method of `MessageHandler`.
Furthermore, the patch also refactors the `BasePdfManager_updatePassword`/`BasePdfManager_passwordChanged` methods (in pdf_manager.js), and the `pdfManagerReady` function (in worker.js).
Modern browsers support styling radio buttons and checkboxes with CSS.
This makes the implementation much easier, and the fallback for older
browsers is still decent.
*Please note that most of the necessary code adjustments were made in PR 7890.*
ESLint has a number of advantageous properties, compared to JSHint. Among those are:
- The ability to find subtle bugs, thanks to more rules (e.g. PR 7881).
- Much more customizable in general, and many rules allow fine-tuned behaviour rather than the just the on/off rules in JSHint.
- Many more rules that can help developers avoid bugs, and a lot of rules that can be used to enforce a consistent coding style. The latter should be particularily useful for new contributors (and reduce the amount of stylistic review comments necessary).
- The ability to easily specify exactly what rules to use/not to use, as opposed to JSHint which has a default set. *Note:* in future JSHint version some of the rules we depend on will be removed, according to warnings in http://jshint.com/docs/options/, so we wouldn't be able to update without losing lint coverage.
- More easily disable one, or more, rules temporarily. In JSHint this requires using a numeric code, which isn't very user friendly, whereas in ESLint the rule name is simply used instead.
By default there's no rules enabled in ESLint, but there are some default rule sets available. However, to prevent linting failures if we update ESLint in the future, it seemed easier to just explicitly specify what rules we want.
Obviously this makes the ESLint config file somewhat bigger than the old JSHint config file, but given how rarely that one has been updated over the years I don't think that matters too much.
I've tried, to the best of my ability, to ensure that we enable the same rules for ESLint that we had for JSHint. Furthermore, I've also enabled a number of rules that seemed to make sense, both to catch possible errors *and* various style guide violations.
Despite the ESLint README claiming that it's slower that JSHint, https://github.com/eslint/eslint#how-does-eslint-performance-compare-to-jshint, locally this patch actually reduces the runtime for `gulp` lint (by approximately 20-25%).
A couple of stylistic rules that would have been nice to enable, but where our code currently differs to much to make it feasible:
- `comma-dangle`, controls trailing commas in Objects and Arrays (among others).
- `object-curly-spacing`, controls spacing inside of Objects.
- `spaced-comment`, used to enforce spaces after `//` and `/*. (This is made difficult by the fact that there's still some usage of the old preprocessor left.)
Rules that I indend to look into possibly enabling in follow-ups, if it seems to make sense: `no-else-return`, `no-lonely-if`, `brace-style` with the `allowSingleLine` parameter removed.
Useful links:
- http://eslint.org/docs/user-guide/configuring
- http://eslint.org/docs/rules/