Without providing useful (custom) error messages for the `no-restricted-globals` rule, see https://eslint.org/docs/rules/no-restricted-globals, it's quite likely that the rule will be incorrectly disabled rather than the required globals being imported as intended.
To reduced duplication of the `no-restricted-globals` rule in multiple `.eslintrc` files, it's instead moved to the top-level `.eslintrc` file and disabled as needed on a folder/file basis outside of `/src` and `/web`.
With the new XML parser, see PR 9573, the referenced PDF file now causes `getMetadata` to fail when incomplete XML tags are encountered. This provides a simple, and hopefully generally useful, work-around that may also help prevent future bugs.
(Without being able to reproduce nor even understand the other (non XML) errors mentioned in issue 8884, I'd say that this patch is enough to close that one as fixed.)
Compared to all the other (static) methods in `Util`, the `toRoman` one looks slightly out of place. Even more so considering that `Util` is being exposed through `pdfjsLib`, where access to a Roman numerals conversion method doesn't make much sense.
*I was feeling bored; so this is a very quick, and somewhat naive, attempt at fixing the bug.*
The breaking error, i.e. `Error during font loading: invalid array length`, was thrown when attempting to re-size the `stack` to a *negative* length when parsing the CALL functions.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1473809.
PR 6141 changed `CompiledFont.compileGlyph` to, in the general case, return an Array. However, that PR apparenly forgot to update the no-glyph, empty-glyph, and endchar-glyph code-path and a String was still being (incorrectly) returned.
Given the way that `FontFaceObject.getPathGenerator` (on the API side) is implemented, this shouldn't have caused any bugs despite the Worker possible returning unexpected data.
This moves/exposes the `URL` polyfill similarily to the existing `ReadableStream` polyfill, rather than exposing it globally, to avoid interfering with any "outside" code.
Both the `URL` and `ReadableStream` polyfills are now exposed on the `pdfjsLib` object, such that they are accessible to the viewer components.
Furthermore, the `no-restricted-globals` ESLint rule is also enabled to prevent accidental usage of the native `URL`/`ReadableStream` implementations directly in the `src/` and `web/` folders; see also https://eslint.org/docs/rules/no-restricted-globals
Addresses the remaining TODO in https://github.com/mozilla/pdf.js/projects/6
Currently if `RenderTask.cancel` is called *immediately* after rendering was started, then by the time that `InternalRenderTask.initializeGraphics` is called rendering will already have been cancelled.
However, we're still inserting the canvas into the `canvasInRendering` map, thus breaking any future attempts at re-rendering using the same canvas. Considering that `InternalRenderTask.cancel` always removes the canvas from the map, I cannot imagine that we'd ever want to re-add it *after* rendering was cancelled (it was likely just a simple oversight in PR 8519).
Fixes 9456.
This wasn't included in PR 9245, since all the API options were still global at that time.
Writing the unit-tests also uncovered an issue with `getOperatorList` not starting the "Page Request" timer.
Incomplete path operators, in particular, can result in fairly chaotic rendering artifacts, as can be observed on page four of the referenced PDF file.
The initial (naive) solution that was attempted, was to simply throw a `FormatError` as soon as any invalid (i.e. too short) operator was found and rely on the existing `ignoreErrors` code-paths. However, doing so would have caused regressions in some files; see the existing `issue2391-1` test-case, which was promoted to an `eq` test to help prevent future bugs.
Hence this patch, which adds special handling for invalid path operators since those may cause quite bad rendering artifacts.
You could, in all fairness, argue that the patch is a handwavy solution and I wouldn't object. However, given that this only concerns *corrupt* PDF files, the way that PDF viewers (PDF.js included) try to gracefully deal with those could probably be described as a best-effort solution anyway.
This patch also adjusts the existing `warn`/`info` messages to print the command name according to the PDF specification, rather than an internal PDF.js enumeration value. The former should be much more useful for debugging purposes.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1443140.
This patch avoids choosing a (possible) 'trailer' dictionary that `XRef.parse` and/or the `Catalog` constructor/methods will reject anyway.
Since `XRef.indexObjects` is already parsing the entire PDF file, the extra dictionary look-ups added here shouldn't matter much. Besides, this is a fallback code-path that only applies to corrupt PDF files anyway.
Note that the `Catalog` constructor, and some of its methods, are already enforcing that the 'Root' dictionary is valid/well-formed. However, by doing additional validation already in `XRef.parse` there's a slightly larger chance that corrupt PDF files could be successfully parsed/rendered.
With the current code line-breaks are accepted not just after an operator, but after a decimal point as well. When looking at this again, the latter case seems prone to cause false positives and might also interfere with subsequent patches.
Hence this is code is adjusted to actually do what the original commit message says, and nothing more.
Please note that the standalone `pdf.image_decoders.js` file will be including the complete `src/shared/util.js` file, despite only using parts of it.[1] This was done *purposely*, to not negatively impact the readability/maintainability of the core PDF.js code.
Furthermore, to ensure that the compatibility is the same in the regular PDF.js library *and* in the the standalone image decoders, `src/shared/compatibility.js` was included as well.
To (hopefully) prevent future complaints about the size of the built `pdf.image_decoders.js` file, a few existing async-related polyfills are being skipped (since all of the image decoders are completely synchronous).
Obviously this required adding a couple of pre-processor statements, but given that these are all limited to "compatibility" code, I think this might be OK!?
---
[1] However, please note that previous commits moved `PageViewport` and `MessageHandler` out of `src/shared/util.js` which reduced its size.
The purpose of this patch is to hopefully provide *slightly* better user ergonomics, if/when the PDF.js image decoders are used standalone.
This implementation is (basically) reverting the changes in PR 9386, in conjunction with code from the `parse` method found at https://github.com/notmasteryet/jpgjs/blob/master/src/pdfjs.js
Obviously it's still not possible to render non-embedded fonts as paths, but in this way the rest of the page will at least be allowed to continue rendering.
*Please note:* Including the 14 standard fonts in PDF.js probably wouldn't be *that* difficult to implement. (I'm not a lawyer, but the fonts from PDFium could probably be used given their BSD license.)
However, the main blocker ought to be the total size of the necessary font data, since I cannot imagine people being OK with shipping ~5 MB of (additional) font data with Firefox. (Based on the reactions when the CMap files were added, and those are only ~1 MB in size.)
Since `ColorSpace` now depends on the native clamping of `Uint8ClampedArray`, this patch adds non-production/test-only `assert`s to enforce that the expected TypedArray is used for the output.
These `assert`s are purposely *not* included in PRODUCTION builds since that would break rendering completely, as opposed to "only" displaying some weird colours, when a `Uint8Array` was used. Furthermore, these are mostly added to help catch explicit developer errors when working with the `ColorSpace` and `PDFImage` code.
Since the tests (currently) run with the `pdf.worker.js` file built, i.e. with `PRODUCTION = true` set, there's no simple way to add e.g. `assert` calls for both non-production *and* test-only builds without also affecting PRODUCTION builds.
The built-in image decoders are already using `Uint8ClampedArray` when returning data, and this patch simply extends that to the rest of the image/colorspace code.
As far as I can tell, the only reason for using manual clamping/rounding in the first place was because TypedArrays used to be polyfilled (using regular arrays). And trying to polyfill the native clamping/rounding would probably have been had too much overhead, but given that TypedArray support is required in PDF.js version `2.0` that's no longer a concern.
*Please note:* Because of different rounding behaviour, basically `Math.round` in `Uint8ClampedArray` respectively `Math.floor` in the old code, there will be very slight movement in quite a few existing test-cases. However, the changes should be imperceivable to the naked eye, given that the absolute difference is *at most* `1` for each RGB component when comparing `master` and this patch (see also the updated expectation values in the unit-tests).
The built-in image decoders are already returning data as `Uint8ClampedArray`, and subsequently the JPEG/JBIG2/JPX streams are as well. However, for general streams we obviously don't want to force the use of `Uint8ClampedArray` unless an "Image" is actually being decoded.
Hence this patch, which adds a parameter that allows the caller of the `getBytes`/`peekBytes` methods to force a `Uint8ClampedArray` (rather than a `Uint8Array`) to be returned.
Not only is the `Util.loadScript` helper function unused on the Worker side, even trying to use it there would throw an Error (since `document` isn't defined/available in Workers).
Hence this helper function is moved, and its code modernized slightly by having it return a Promise rather than needing a callback function.
Finally, to reduced code duplication, the "new" loadScript function is exported and used in the viewer.