Over the years there's been a fair number of issues/PRs opened, where people have wanted to add `hasOwnProperty` checks in (hot) loops in the font parsing code. This has always been rejected, since we don't want to risk reducing performance in the Firefox PDF viewer simply because some users of the general PDF.js library are *incorrectly* extending the `Array.prototype` with enumerable properties.
With this patch the general PDF.js library will now fail immediately with a hopefully useful Error message, rather than having (some) fonts fail to render, when the `Array.prototype` is incorrectly extended.
Note that I did consider making this a warning, but ultimately decided against it since it's first of all possible to disable those (with the `verbosity` parameter). Secondly, even when printed, warnings can be easy to overlook and finally a warning may also *seem* OK to ignore (as opposed to an actual Error).
This patch makes the following changes, to improve these API methods:
- Let `PDFPageProxy.cleanup` return a boolean indicating if clean-up actually happened, since ongoing rendering will block clean-up.
Besides being used in other parts of this patch, it seems that an API user may also be interested in the return value given that clean-up isn't *guaranteed* to happen.
- Let `PDFDocumentProxy.cleanup` return the promise indicating when clean-up is finished.
- Improve the JSDoc comment for `PDFDocumentProxy.cleanup` to mention that clean-up is triggered on *both* threads (without going into unnecessary specifics regarding what *exactly* said data actually is).
Add a note in the JSDoc comment about not calling this method when rendering is ongoing.
- Change `WorkerTransport.startCleanup` to throw an `Error` if it's called when rendering is ongoing, to prevent rendering from breaking.
Please note that this won't stop *worker-thread* clean-up from happening (since there's no general "something is rendering"-flag), however I'm not sure if that's really a problem; but please don't quote me on that :-)
All of the caches that's being cleared in `Catalog.cleanup`, on the worker-thread, *should* be re-filled automatically even if cleared *during* parsing/rendering, and the only thing that probably happens is that e.g. font data would have to be re-parsed.
On the main-thread, on the other hand, clearing the caches is more-or-less guaranteed to cause rendering errors, since the rendering code in `src/display/canvas.js` isn't able to re-request any image/font data that's suddenly being pulled out from under it.
- Last, but not least, add a couple of basic unit-tests for the clean-up functionality.
While it would be nice to change the `PDFFormatVersion` property, as returned through `PDFDocumentProxy.getMetadata`, to a number (rather than a string) that would unfortunately be a breaking API change.
However, it does seem like a good idea to at least *validate* the PDF header version on the worker-thread, rather than potentially returning an arbitrary string.
This should hopefully be useful in environments where restrictive CSPs are in effect.
In most cases the replacement is entirely straighforward, and there's only a couple of special cases:
- For the `src/display/font_loader.js` and `web/pdf_outline_viewer.js `cases, since the elements aren't appended to the document yet, it shouldn't matter if the style properties are set one-by-one rather than all at once.
- For the `web/debugger.js` case, there's really no need to set the `padding` inline at all and the definition was simply moved to `web/viewer.css` instead.
*Please note:* There's still *a single* case left, in `web/toolbar.js` for setting the width of the zoom dropdown, which is left intact for now.
The reasons are that this particular case shouldn't matter for users of the general PDF.js library, and that it'd make a lot more sense to just try and re-factor that very old code anyway (thus fixing the `setAttribute` usage in the process).
Based on the PDF spec, with `v` operator, current point should be used as the first control point of the curve.
Do not overwrite current point before an SVG curve is built, so it can b actually used as first control point.
As can be seen in the code, the `xScaleBlockOffset` typed array doesn't depend on the actual image data but only on the width and x-scale. The width is obviously consistent for an image, and it turns out that in practice the `componentScaleX` is quite often identical between two (or more) adjacent image components.
All-in-all it's thus not necessary to *unconditionally* re-compute the `xScaleBlockOffset` when getting the JPEG image data.
While avoiding, in many cases, one or more loops can never be a bad thing these changes are unfortunately completely dominated by the rest of the JpegImage code and consequently doesn't really show up in benchmark results. *Hence I'd understand if this patch is ultimately deemed not necessary.*
*Note:* This is inspired by PR 5473, which made similar changes for another kind of JPEG data.
Since the implementation in `src/core/jpg.js` only supports 8-bit data, as opposed to similar code in `src/core/colorspace.js`, the computations can be further simplified since the `scale` is always constant.
By updating the coefficients, effectively inlining the `scale`, we'll thus avoid *four* multiplications for each loop iteration.
Unfortunately I wasn't able, based on a quick look through the test-files, to find a sufficiently *large* CMYK JPEG image in order for these changes to really show up in benchmark results. However, when testing the `cmykjpeg.pdf` manually there's a total of `120 000` fewer multiplication with this patch.
In the current `AnnotationLayer` implementation, Popup annotations require that the parent annotation have already been rendered (otherwise they're simply ignored).
Usually the annotations are ordered, in the `/Annots` array, in such a way that this isn't a problem, however there's obviously no guarantee that all PDF generators actually do so. Hence we simply ensure, when rendering the `AnnotationLayer`, that the Popup annotations are handled last.
Interestingly the viewer already seem to work correctly as-is, with workers disabled and a non-standard `verbosity` level.
Hence this is possibly Node.js specific, but given that the issue is lacking *both* the PDF file in question and a runnable test-case, so this patch is essentially a best-effort guess at what the problem could be.
- Re-factor the "incorrect encoding" check, since this can be easily achieved using the general `findNextFileMarker` helper function (with a suitable `startPos` argument).
- Tweak a condition, to make it easier to see that the end of the data has been reached.
- Add a reference test for issue 1877, since it's what prompted the "incorrect encoding" check.
The other image decoders, i.e. the JBIG2 and JPEG 2000 ones, are using the common helper function `readUint16`. Most likely, the only reason that the JPEG decoder is doing it this way is because it originated *outside* of the PDF.js library.
Hence we can simply re-factor `src/core/jpg.js` to use the common `readUint16` helper function, which is especially nice given that the functionality was essentially *duplicated* in the code.
This covers cases that the `--fix` command couldn't deal with, and in a few cases (notably `src/core/jbig2.js`) the code was changed to use block-scoped variables instead.
Please find additional details about the ESLint rule at https://eslint.org/docs/rules/prefer-const
With the recent introduction of Prettier this sort of mass enabling of ESLint rules becomes a lot easier, since the code will be automatically reformatted as necessary to account for e.g. changed line lengths.
Note that this patch is generated automatically, by using the ESLint `--fix` argument, and will thus require some additional clean-up (which is done separately).
These two cases should have been whitelisted prior to re-formatting respectively had the comments fixed afterwards, however I unfortunately missed them because of the massive size of the diff.
Fixes#11403
The PDF uses the non-embedded Type1 font Helvetica. Character codes 194 and 160 (`Â` and `NBSP`) are encoded as `.notdef`. We shouldn't show those glyphs because it seems that Acrobat Reader doesn't draw glyphs that are named `.notdef` in fonts like this.
In addition to testing `glyphName === ".notdef"`, we must test also `glyphName === ""` because the name `""` is used in `core/encodings.js` for undefined glyphs in encodings like `WinAnsiEncoding`.
The solution above hides the `Â` characters but now the replacement character (space) appears to be too wide. I found out that PDF.js ignores font's `Widths` array if the font has no `FontDescriptor` entry. That happens in #11403, so the default widths of Helvetica were used as specified in `core/metrics.js` and `.nodef` got a width of 333. The correct width is 0 as specified by the `Widths` array in the PDF. Thus we must never ignore `Widths`.
This removes a couple of, thanks to preceeding code, unnecessary `typeof PDFJSDev` checks, and also fixes a couple of incorrectly implemented (my fault) checks intended for `TESTING` builds.
After PR 9566, which removed all of the old Firefox extension code, the `FIREFOX` build flag is no longer used for anything.
It thus seems to me that it should be removed, for a couple of reasons:
- It's simply dead code now, which only serves to add confusion when looking at the `PDFJSDev` calls.
- It used to be that `MOZCENTRAL` and `FIREFOX` was *almost* always used together. However, ever since PR 9566 there's obviously been no effort put into keeping the `FIREFOX` build flags up to date.
- In the event that a new, Webextension based, Firefox addon is created in the future you'd still need to audit all `MOZCENTRAL` (and possibly `CHROME`) build flags to see what'd make sense for the addon.
This particular JSDoc comment is fairly old and it also contains some now unrelated/confusing information.
The only way to *guarantee* that the PDF.js library works as expected is to correctly set the global `workerSrc`[1], hence giving the impression that the option isn't strictly necessary is thus incorrect.
---
[1] Since advertising the fallbackWorkerSrc functionality definitely seems like the *wrong* thing to do.
This way we'll benefit from the existing font caching, and can thus avoid re-creating a fallback font over and over again during parsing.
(Thece changes necessitated the previous patch, since otherwise breakage could occur e.g. with fake workers.)
This is beneficial in situations where the Worker is being re-used, for example with fake workers, since it ensures that things like font resources are actually released.
This rule is already enabled in mozilla-central, and helps avoid some confusing formatting, see https://searchfox.org/mozilla-central/rev/9e45d74b956be046e5021a746b0c8912f1c27318/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#209-210
With the recent introduction of Prettier some of the existing nested ternary statements became even more difficult to read, since any possibly helpful indentation was removed.
This particular ESLint rule wasn't entirely straightforward to enable, and I do recognize that there's a certain amount of subjectivity in the changes being made. Generally, the changes in this patch fall into three categories:
- Cases where a value is only clamped to a certain range (the easiest ones to update).
- Cases where the values involved are "simple", such as Numbers and Strings, which are re-factored to initialize the variable with the *default* value and only update it when necessary by using `if`/`else if` statements.
- Cases with more complex and/or larger values, such as TypedArrays, which are re-factored to let the variable be (implicitly) undefined and where all values are then set through `if`/`else if`/`else` statements.
Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-nested-ternary