Commit Graph

3888 Commits

Author SHA1 Message Date
Jani Pehkonen
71e7686950 Fix Type1 font parsing when .notdef is not at index zero
Fixes #11477
The PDF draws many space characters but the embedded fonts don't have a glyph named `space`, so `.notdef` should be drawn instead. PDF.js assumed that Type1 fonts define `.notdef` as the first glyph (index 0). However, now the fonts have the glyph `A` at index 0 and `.notdef` is the last one, so `A` appears where spaces are expected.

Because the rest of the font machinery in `core/fonts.js` assumes `.notdef` is at index zero, it's easiest to modify `core/type1_parser.js` so that it "repairs" fonts and makes sure `.notdef` is at index 0.
2020-03-03 21:55:51 +02:00
Jonas Jenwald
65e514e063 Ensure that there's always a setFont (Tf) operator before text rendering operators (issue 11651)
The PDF document in question is *corrupt*, since it contains multiple instances of incorrect operators.
We obviously don't want to slow down parsing of *all* documents (since most are valid), just to accommodate a particular bad PDF generator, hence the reason for the inline check before calling the `ensureStateFont` method.
2020-03-03 10:05:18 +01:00
Jonas Jenwald
1ad65cf405 Simplify the BaseFontLoader.isFontLoadingAPISupported getter
It's no longer necessary to special-case this getter in the `GenericFontLoader` case, since the GENERIC build hasn't been using `mozPrintCallback` for years now (furthermore Firefox 63 is really old as well).
2020-03-02 23:14:48 +01:00
Takashi Tamura
d6b67cd28a Fix the horizontal scaling of texts with SVG backend. #10988 2020-03-02 14:54:41 +09:00
Takashi Tamura
d8c9f119b0 Fix the vertical writing mode with horizontal scaling. #11555.
It is not valid to multiply textHScale when the writing mode is vertical.

See 9.4.4 Text Space Details, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1694762
2020-02-29 07:48:29 +09:00
Tim van der Meij
e1586016c5
Merge pull request #11577 from Snuffleupagus/Pages-tree-refs
Prevent circular references in the /Pages tree
2020-02-27 23:36:11 +01:00
Tim van der Meij
965ebe63fd
Merge pull request #11540 from tamuratak/charspacing
Fix text spacing with vertical fonts. #7687 and #11526.
2020-02-26 22:26:27 +01:00
Jonas Jenwald
c55d30a715 Use the same non-embedded Wingdings fallback for fonts named "Wingdings-Regular" too (PR 5463 follow-up, issue 11451)
This patch extends the existing heuristics, which are really the best that we can do in general for these kinds of non-embedded *and* non-standard fonts.

Furthermore, this patch also tries to improve the copy-and-paste behaviour for non-embedded Wingdings fonts by also using the `ZapfDingbatsEncoding` in this case.

*Note:* I'm not sure that adding additional tests for Wingdings fonts matters that much, given how limited our "support" for them really is.
2020-02-24 17:40:06 +01:00
Jonas Jenwald
bf09d79eea Use the ESLint no-restricted-syntax rule to prevent direct usage of new Cmd()/new Name()/new Ref()
Given that all of these primitives implement caching, to avoid unnecessarily duplicating those objects *a lot* during parsing, it would thus be good to actually enforce usage of `Cmd.get()`/`Name.get()`/`Ref.get()` in the code-base.
Luckily it turns out that there's an ESLint rule, which is fairly easy to use, that can be used to disallow arbitrary JavaScript syntax.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-restricted-syntax
2020-02-22 21:15:00 +01:00
Jonas Jenwald
c3c3b8cd81 Add a heuristic, in src/core/jpg.js, to handle JPEG images with a wildly incorrect SOF (Start of Frame) scanLines parameter (issue 10880)
*This whole patch feels somewhat arbitrary, and I'd be slightly worried about possibly breaking something else.*

To limit the impact of these changes, we only re-parse JPEG images using a reduced `scanLines` value if and only if: An unexpected EOI (End of Image) marker was encountered during decoding of Scan data *and* the "actual" `scanLines` value is at least one order of magnitude smaller than expected.
2020-02-22 14:16:07 +01:00
Jonas Jenwald
5494f7d5bc Add basic validation of the scanLines parameter in JPEG images, before delegating decoding to the browser
In some cases PDF documents can contain JPEG images that the native browser decoder cannot handle, e.g. images with DNL (Define Number of Lines) markers or images where the SOF (Start of Frame) marker contains a wildly incorrect `scanLines` parameter.
Currently, for "simple" JPEG images, we're relying on native image decoding to *fail* before falling back to the implementation in `src/core/jpg.js`. In some cases, note e.g. issue 10880, the native image decoder doesn't outright fail and thus some images may not render.

In an attempt to improve the current situation, this patch adds additional validation of the JPEG image SOF data to force the use of `src/core/jpg.js` directly in cases where the native JPEG decoder cannot be trusted to do the right thing.
The only way to implement this is unfortunately to parse the *beginning* of the JPEG image data, looking for a SOF marker. To limit the impact of this extra parsing, the result is cached on the `JpegStream` instance and this code is only run for images which passed all of the pre-existing "can the JPEG image be natively rendered and/or decoded" checks.

---

*Slightly off-topic:* Working on this *really* makes me start questioning if native rendering/decoding of JPEG images is actually a good idea.
There's certain kinds of JPEG images not supported natively, and all of the validation which is now necessary isn't "free". At this point, in the `NativeImageDecoder`, we're having to check for certain properties in the image dictionary, parse the `ColorSpace`, and finally read the actual image data to find the SOF marker.
Furthermore, we cannot just send the image to the main-thread and be done in the "JpegStream" case, but we also need to wait for rendering to complete (or fail) before continuing with other parsing.
In the "JpegDecode" case we're even having to parse part of the image on the main-thread, which seems completely at odds with the principle of doing all heavy parsing in the Worker, and there's also a couple of potentially large (temporary) allocations/copies of TypedArray data involved as well.
2020-02-22 14:16:07 +01:00
Jonas Jenwald
6b44ae2170 Remove the unused thisArg from RefSetCache.forEach
Given that this is completely unused, and that a "normal" function call may be a *tiny* bit more efficient, there's no good reason as far as I can tell to keep it.
2020-02-21 14:23:05 +01:00
Jonas Jenwald
3c7b7be100 Prevent circular references in the /Pages tree 2020-02-19 01:49:39 +01:00
Tim van der Meij
4092aa9fbd
Merge pull request #11604 from Snuffleupagus/eslint-prefer-starts-ends-with
Enable the `unicorn/prefer-starts-ends-with` ESLint plugin rule
2020-02-16 13:17:27 +01:00
Jonas Jenwald
bc31a4be5d Enable the unicorn/prefer-starts-ends-with ESLint plugin rule
This complements the existing `mozilla/use-includes-instead-of-indexOf` plugin rule, by also disallowing unnecessary regular expressions when comparing strings.

Please see https://github.com/sindresorhus/eslint-plugin-unicorn/blob/master/docs/rules/prefer-starts-ends-with.md for additional information.
2020-02-16 12:41:53 +01:00
Jonas Jenwald
6ebd851d27 Enable the no-buffer-constructor ESLint rule
According to https://nodejs.org/api/buffer.html#buffer_class_buffer: `new Buffer(...)` is deprecated in up-to-date versions of Node.js, hence you want to prevent it from being accidentally used.

Please see https://eslint.org/docs/rules/no-buffer-constructor for additional information.
2020-02-16 12:21:40 +01:00
Tim van der Meij
f6ffc2bf37
Merge pull request #11598 from Snuffleupagus/polyfill-Map-Set-iteration
Add polyfills to support iteration of `Map` and `Set`
2020-02-14 23:24:20 +01:00
Jonas Jenwald
c97c778f8f [api-minor] Produce non-translated/non-polyfilled builds by default 2020-02-14 18:12:07 +01:00
Jonas Jenwald
4a76ab352c Add polyfills to support iteration of Map and Set
Without this, things such as e.g. `Metadata.getAll` is broken in IE11 (see PR 11596).

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map#Browser_compatibility

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Set#Browser_compatibility
2020-02-14 15:53:02 +01:00
Tim van der Meij
cd3f2d49e6
Merge pull request #11596 from Snuffleupagus/metadata-map
Re-factor how `Metadata` class instances store its data internally
2020-02-13 23:01:51 +01:00
Jonas Jenwald
5cdfff4a47 Re-factor how Metadata class instances store its data internally
Please note that these changes do *not* affect the *public* interface of the `Metadata` class, but only touches internal structures.[1]

These changes were prompted by looking at the `getAll` method, which simply returns the "private" metadata object to the consumer. This seems wrong conceptually, since it allows way too easy/accidental changes to the internal parsed metadata.
As part of fixing this, the internal metadata was changed to use a `Map` rather than a plain Object.

---
[1] Basically, we shouldn't need to worry about someone depending on internal implementation details.
2020-02-13 18:23:15 +01:00
Jonas Jenwald
3f1568b51a A couple of small improvements of the Metadata._repair method
- Remove the "capturing group" in the regular expression that removes leading "junk" from the raw metadata, since it's not necessary here (it's simply a case of too much copy-pasting in a prior patch).
   According to [MDN](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Cheatsheet#Groups_and_ranges) you want to, for performance reasons, avoid "capturing groups" unless actually needed.

 - Add inline comments to document a bunch of magic values in the code.
2020-02-13 17:20:52 +01:00
Jonas Jenwald
a5db4e985a Remove LoopbackPort.postMessage special-case for polyfilled TypedArrays
Given that all `TypedArray` polyfills were removed in PDF.js version `2.0`, since native support is now required, this branch has been dead code for awhile.
2020-02-13 12:50:41 +01:00
Jonas Jenwald
7b0836ca75 [TextLayer] Immediately set the padding, rather than checking if it's empty, in expandTextDivs
In practice it's extremely rare[1] for the padding to be zero in *all* components, hence it seems better to just set it directly rather than creating a temporary variable and checking for the "no padding"-case.

---
[1] In the `tracemonkey.pdf` file that only happens with `0.08%` of all text elements.
2020-02-11 15:52:36 +01:00
Takashi Tamura
512dbe3060 Fix text spacing with vertical fonts. #7687 and #11526.
When the writing mode is vertical, we have to reverse
the sign of spacing since we are subtracting it from
current.y. We have to add it to current.y.
See 9.4.4 Text Space Details, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1694762
2020-02-11 08:49:23 +09:00
Jonas Jenwald
ae5a34c520 [api-minor] Ensure that the Array.prototype doesn't contain any enumerable properties
Over the years there's been a fair number of issues/PRs opened, where people have wanted to add `hasOwnProperty` checks in (hot) loops in the font parsing code. This has always been rejected, since we don't want to risk reducing performance in the Firefox PDF viewer simply because some users of the general PDF.js library are *incorrectly* extending the `Array.prototype` with enumerable properties.

With this patch the general PDF.js library will now fail immediately with a hopefully useful Error message, rather than having (some) fonts fail to render, when the `Array.prototype` is incorrectly extended.

Note that I did consider making this a warning, but ultimately decided against it since it's first of all possible to disable those (with the `verbosity` parameter). Secondly, even when printed, warnings can be easy to overlook and finally a warning may also *seem* OK to ignore (as opposed to an actual Error).
2020-02-10 14:17:27 +01:00
Tim van der Meij
dced0a3821
Merge pull request #11579 from Snuffleupagus/issue-11578
Ignore spaces when normalizing the font name in `Font.fallbackToSystemFont` (issue 11578)
2020-02-09 17:33:09 +01:00
Tim van der Meij
61056a9238
Merge pull request #11551 from Snuffleupagus/issue-11549
Allow skipping of errors when reading broken/corrupt ToUnicode data (issue 11549)
2020-02-09 17:32:35 +01:00
Tim van der Meij
2fb4076e05 Merge pull request #11568 from Snuffleupagus/PDF-header-validation
Ensure that the PDF header contains an actual number (PR 11463 follow-up)
2020-02-09 17:16:25 +01:00
Tim van der Meij
102af0f915 Merge pull request #11547 from Snuffleupagus/convertCmykToRgb-scale
Use fewer multiplications in `JpegImage._convertCmykToRgb`
2020-02-09 17:06:23 +01:00
Tim van der Meij
f178805412 Merge pull request #11557 from Snuffleupagus/_getLinearizedBlockData-xScaleBlockOffset
Avoid re-calculating the `xScaleBlockOffset` when not necessary in `JpegImage._getLinearizedBlockData`
2020-02-09 16:54:28 +01:00
Tim van der Meij
7948faf675 Merge pull request #11573 from Snuffleupagus/api-cleanup-returns
[api-minor] Change `PDFDocumentProxy.cleanup`/`PDFPageProxy.cleanup` to return data
2020-02-08 20:42:28 +01:00
Tim van der Meij
a73a38029c Merge pull request #11569 from Snuffleupagus/rm-most-setAttribute
Replace most remaining `Element.setAttribute("style", ...)` usage with `Element.style = ...` instead
2020-02-08 20:13:56 +01:00
Jonas Jenwald
7937165537 Ignore spaces when normalizing the font name in Font.fallbackToSystemFont (issue 11578) 2020-02-08 19:59:04 +01:00
Jonas Jenwald
7117ee03d6 [api-minor] Change PDFDocumentProxy.cleanup/PDFPageProxy.cleanup to return data
This patch makes the following changes, to improve these API methods:

 - Let `PDFPageProxy.cleanup` return a boolean indicating if clean-up actually happened, since ongoing rendering will block clean-up.
   Besides being used in other parts of this patch, it seems that an API user may also be interested in the return value given that clean-up isn't *guaranteed* to happen.

 - Let `PDFDocumentProxy.cleanup` return the promise indicating when clean-up is finished.

 - Improve the JSDoc comment for `PDFDocumentProxy.cleanup` to mention that clean-up is triggered on *both* threads (without going into unnecessary specifics regarding what *exactly* said data actually is).
   Add a note in the JSDoc comment about not calling this method when rendering is ongoing.

 - Change `WorkerTransport.startCleanup` to throw an `Error` if it's called when rendering is ongoing, to prevent rendering from breaking.
   Please note that this won't stop *worker-thread* clean-up from happening (since there's no general "something is rendering"-flag), however I'm not sure if that's really a problem; but please don't quote me on that :-)
   All of the caches that's being cleared in `Catalog.cleanup`, on the worker-thread, *should* be re-filled automatically even if cleared *during* parsing/rendering, and the only thing that probably happens is that e.g. font data would have to be re-parsed.
  On the main-thread, on the other hand, clearing the caches is more-or-less guaranteed to cause rendering errors, since the rendering code in `src/display/canvas.js` isn't able to re-request any image/font data that's suddenly being pulled out from under it.

 - Last, but not least, add a couple of basic unit-tests for the clean-up functionality.
2020-02-07 17:00:29 +01:00
Jonas Jenwald
88c35d872f Ensure that the PDF header contains an actual number (PR 11463 follow-up)
While it would be nice to change the `PDFFormatVersion` property, as returned through `PDFDocumentProxy.getMetadata`, to a number (rather than a string) that would unfortunately be a breaking API change.
However, it does seem like a good idea to at least *validate* the PDF header version on the worker-thread, rather than potentially returning an arbitrary string.
2020-02-07 12:25:07 +01:00
Tim van der Meij
e12e83702d
Merge pull request #11559 from bhasto/curveto2-fix
Fix how curveTo2 (v operator) is translated to SVG
2020-02-06 23:10:41 +01:00
Brendan Dahl
09a6e17d22
Merge pull request #11528 from janpe2/type1-nonemb-notdef
Hide .notdef glyphs in non-embedded Type1 fonts and don't ignore Widths
2020-02-06 13:30:07 -08:00
Jonas Jenwald
5cbd44b628 Replace most remaining Element.setAttribute("style", ...) usage with Element.style = ... instead
This should hopefully be useful in environments where restrictive CSPs are in effect.
In most cases the replacement is entirely straighforward, and there's only a couple of special cases:
 - For the `src/display/font_loader.js` and `web/pdf_outline_viewer.js `cases, since the elements aren't appended to the document yet, it shouldn't matter if the style properties are set one-by-one rather than all at once.
 - For the `web/debugger.js` case, there's really no need to set the `padding` inline at all and the definition was simply moved to `web/viewer.css` instead.

*Please note:* There's still *a single* case left, in `web/toolbar.js` for setting the width of the zoom dropdown, which is left intact for now.
The reasons are that this particular case shouldn't matter for users of the general PDF.js library, and that it'd make a lot more sense to just try and re-factor that very old code anyway (thus fixing the `setAttribute` usage in the process).
2020-02-05 22:26:47 +01:00
Branislav Hašto
393aed9978 Fix how curveTo2 (v operator) is translated to SVG
Based on the PDF spec, with `v` operator, current point should be used as the first control point of the curve.

Do not overwrite current point before an SVG curve is built, so it can b actually used as first control point.
2020-02-02 17:03:29 +01:00
Jonas Jenwald
a4440a1c6b Avoid re-calculating the xScaleBlockOffset when not necessary in JpegImage._getLinearizedBlockData
As can be seen in the code, the `xScaleBlockOffset` typed array doesn't depend on the actual image data but only on the width and x-scale. The width is obviously consistent for an image, and it turns out that in practice the `componentScaleX` is quite often identical between two (or more) adjacent image components.
All-in-all it's thus not necessary to *unconditionally* re-compute the `xScaleBlockOffset` when getting the JPEG image data.

While avoiding, in many cases, one or more loops can never be a bad thing these changes are unfortunately completely dominated by the rest of the JpegImage code and consequently doesn't really show up in benchmark results. *Hence I'd understand if this patch is ultimately deemed not necessary.*
2020-02-01 11:58:50 +01:00
Jonas Jenwald
4c54395ff6 Allow skipping of errors when reading broken/corrupt ToUnicode data (issue 11549)
This will allow font loading/parsing to continue, rather than immediately failing, when broken/corrupt CMap data is encountered.
2020-01-30 13:19:05 +01:00
Jonas Jenwald
ce4f41d06a Use fewer multiplications in JpegImage._convertCmykToRgb
*Note:* This is inspired by PR 5473, which made similar changes for another kind of JPEG data.

Since the implementation in `src/core/jpg.js` only supports 8-bit data, as opposed to similar code in `src/core/colorspace.js`, the computations can be further simplified since the `scale` is always constant.
By updating the coefficients, effectively inlining the `scale`, we'll thus avoid *four* multiplications for each loop iteration.

Unfortunately I wasn't able, based on a quick look through the test-files, to find a sufficiently *large* CMYK JPEG image in order for these changes to really show up in benchmark results. However, when testing the `cmykjpeg.pdf` manually there's a total of `120 000` fewer multiplication with this patch.
2020-01-29 18:34:58 +01:00
Takashi Tamura
0b701e7950 Fix the indices of arguments for RadialAxial. It is related to #10646. 2020-01-29 19:18:50 +09:00
Tim van der Meij
7ae504222f
Merge pull request #11544 from Snuffleupagus/decodeHuffman
Make the `decodeHuffman` function, in `src/core/jpg.js`, slightly more efficient
2020-01-28 22:54:46 +01:00
Tim van der Meij
e9dc179673
Merge pull request #11537 from Snuffleupagus/setupFakeWorker-configure
Send the `verbosity` level when setting up fake workers (issue 11536)
2020-01-28 22:50:30 +01:00
Jonas Jenwald
f5a617a334 Make the decodeHuffman function, in src/core/jpg.js, slightly more efficient
Rather than repeating the `typeof node` check twice, we can use a `switch` statement instead.

This patch was tested using the PDF file from issue 3809, i.e. https://web.archive.org/web/20140801150504/http://vs.twonky.dk/invitation.pdf, with the following manifest file:
```
[
    {  "id": "issue3809",
       "file": "../web/pdfs/issue3809.pdf",
       "md5": "",
       "rounds": 50,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |    %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ----- | -------------
Firefox | Overall      |    50 |        12537 |       12451 | -86 | -0.69 |        faster
Firefox | Page Request |    50 |            5 |           5 |   0 |  0.77 |
Firefox | Rendering    |    50 |        12532 |       12446 | -86 | -0.69 |        faster
```
2020-01-28 14:23:58 +01:00
Tim van der Meij
474fe1757e
Merge pull request #11508 from Snuffleupagus/jpg-default-marker
Simplify the handling of unsupported/incorrect markers in `src/core/jpg.js`
2020-01-26 21:32:13 +01:00
Jonas Jenwald
62b2b984cc Render Popup annotations last, once all other annotations have been rendered (issue 11362)
In the current `AnnotationLayer` implementation, Popup annotations require that the parent annotation have already been rendered (otherwise they're simply ignored).
Usually the annotations are ordered, in the `/Annots` array, in such a way that this isn't a problem, however there's obviously no guarantee that all PDF generators actually do so. Hence we simply ensure, when rendering the `AnnotationLayer`, that the Popup annotations are handled last.
2020-01-26 15:49:55 +01:00
Jonas Jenwald
427df2dfd7 Send the verbosity level when setting up fake workers (issue 11536)
Interestingly the viewer already seem to work correctly as-is, with workers disabled and a non-standard `verbosity` level.
Hence this is possibly Node.js specific, but given that the issue is lacking *both* the PDF file in question and a runnable test-case, so this patch is essentially a best-effort guess at what the problem could be.
2020-01-26 12:37:45 +01:00
Jonas Jenwald
13930e5202 Simplify the handling of unsupported/incorrect markers in src/core/jpg.js
- Re-factor the "incorrect encoding" check, since this can be easily achieved using the general `findNextFileMarker` helper function (with a suitable `startPos` argument).

 - Tweak a condition, to make it easier to see that the end of the data has been reached.

 - Add a reference test for issue 1877, since it's what prompted the "incorrect encoding" check.
2020-01-25 22:52:24 +01:00
Tim van der Meij
3775b711ed
Merge pull request #11482 from Snuffleupagus/more-core-utils
Convert `src/core/jpg.js` to use the `readUint16` helper function in `src/core/core_utils.js`, rather than re-implementing it twice
2020-01-25 21:38:34 +01:00
Tim van der Meij
cbbda9d883
Merge pull request #11515 from Snuffleupagus/cache-fallback-font
Cache the fallback font dictionary on the `PartialEvaluator` (PR 11218 follow-up)
2020-01-25 21:32:28 +01:00
Jonas Jenwald
188b320e18 Convert src/core/jpg.js to use the readUint16 helper function in src/core/core_utils.js, rather than re-implementing it twice
The other image decoders, i.e. the JBIG2 and JPEG 2000 ones, are using the common helper function `readUint16`. Most likely, the only reason that the JPEG decoder is doing it this way is because it originated *outside* of the PDF.js library.
Hence we can simply re-factor `src/core/jpg.js` to use the common `readUint16` helper function, which is especially nice given that the functionality was essentially *duplicated* in the code.
2020-01-25 00:35:10 +01:00
Jonas Jenwald
3f031f69c2 Move additional worker-thread only functions from src/shared/util.js and into a src/core/core_utils.js instead
This moves the `log2`, `readInt8`, `readUint16`, `readUint32`, and `isSpace` functions since they are only used in the worker-thread.
2020-01-25 00:33:52 +01:00
Jonas Jenwald
83bdb525a4 Fix remaining linting errors, from enabling the prefer-const ESLint rule globally
This covers cases that the `--fix` command couldn't deal with, and in a few cases (notably `src/core/jbig2.js`) the code was changed to use block-scoped variables instead.
2020-01-25 00:20:23 +01:00
Jonas Jenwald
9e262ae7fa Enable the ESLint prefer-const rule globally (PR 11450 follow-up)
Please find additional details about the ESLint rule at https://eslint.org/docs/rules/prefer-const

With the recent introduction of Prettier this sort of mass enabling of ESLint rules becomes a lot easier, since the code will be automatically reformatted as necessary to account for e.g. changed line lengths.

Note that this patch is generated automatically, by using the ESLint `--fix` argument, and will thus require some additional clean-up (which is done separately).
2020-01-25 00:20:22 +01:00
Tim van der Meij
d2d9441373
Merge pull request #11489 from Snuffleupagus/rm-FIREFOX-define
Remove the `FIREFOX` build flag, since it's completely unused and simplify a couple of `PDFJSDev` checks
2020-01-24 23:59:13 +01:00
Tim van der Meij
668a29aa45
Merge pull request #11497 from Snuffleupagus/Promise-allSettled
Add support for `Promise.allSettled`
2020-01-22 23:06:54 +01:00
Tim van der Meij
a88dec197f
Merge pull request #11511 from Snuffleupagus/eslint-no-nested-ternary
Enable the `no-nested-ternary` ESLint rule (PR 11488 follow-up)
2020-01-22 22:52:59 +01:00
Jonas Jenwald
3b78f4e8f8 Fix a couple of cases where Prettier broke existing formatting (PR 11446 follow-up)
These two cases should have been whitelisted prior to re-formatting respectively had the comments fixed afterwards, however I unfortunately missed them because of the massive size of the diff.
2020-01-22 09:12:12 +01:00
Jani Pehkonen
809b96b40c Hide .notdef glyphs in non-embedded Type1 fonts and don't ignore Widths
Fixes #11403
The PDF uses the non-embedded Type1 font Helvetica. Character codes 194 and 160 (`Â` and `NBSP`) are encoded as `.notdef`. We shouldn't show those glyphs because it seems that Acrobat Reader doesn't draw glyphs that are named `.notdef` in fonts like this.

In addition to testing `glyphName === ".notdef"`, we must test also `glyphName === ""` because the name `""` is used in `core/encodings.js` for undefined glyphs in encodings like `WinAnsiEncoding`.

The solution above hides the `Â` characters but now the replacement character (space) appears to be too wide. I found out that PDF.js ignores font's `Widths` array if the font has no `FontDescriptor` entry. That happens in #11403, so the default widths of Helvetica were used as specified in `core/metrics.js` and `.nodef` got a width of 333. The correct width is 0 as specified by the `Widths` array in the PDF. Thus we must never ignore `Widths`.
2020-01-21 21:35:25 +02:00
Jonas Jenwald
a39943554a Simplify, and tweak, a couple of PDFJSDev checks
This removes a couple of, thanks to preceeding code, unnecessary `typeof PDFJSDev` checks, and also fixes a couple of incorrectly implemented (my fault) checks intended for `TESTING` builds.
2020-01-21 00:06:15 +01:00
Jonas Jenwald
7322a24ce4 Remove the FIREFOX build flag, since it's completely unused
After PR 9566, which removed all of the old Firefox extension code, the `FIREFOX` build flag is no longer used for anything.
It thus seems to me that it should be removed, for a couple of reasons:
 - It's simply dead code now, which only serves to add confusion when looking at the `PDFJSDev` calls.
 - It used to be that `MOZCENTRAL` and `FIREFOX` was *almost* always used together. However, ever since PR 9566 there's obviously been no effort put into keeping the `FIREFOX` build flags up to date.
 - In the event that a new, Webextension based, Firefox addon is created in the future you'd still need to audit all `MOZCENTRAL` (and possibly `CHROME`) build flags to see what'd make sense for the addon.
2020-01-21 00:06:15 +01:00
Tim van der Meij
ccf327538b
Merge pull request #11519 from tamuratak/enable_eslint_import_extensions
Enable import/extensions of ESlint plugin to enforce all `import` have a `.js` file extension.
2020-01-19 17:37:19 +01:00
Jonas Jenwald
ee87e898db Update the GlobalWorkerOptions.workerSrc JSDoc comment
This particular JSDoc comment is fairly old and it also contains some now unrelated/confusing information.
The only way to *guarantee* that the PDF.js library works as expected is to correctly set the global `workerSrc`[1], hence giving the impression that the option isn't strictly necessary is thus incorrect.

---
[1] Since advertising the fallbackWorkerSrc functionality definitely seems like the *wrong* thing to do.
2020-01-19 12:44:42 +01:00
Takashi Tamura
00ce7898a2 Enable import/extensions of ESlint plugin to enforce all import have a .js file extension.
Related to #11465.

- https://github.com/benmosher/eslint-plugin-import/blob/master/docs/rules/extensions.md
2020-01-18 10:53:01 +09:00
Jonas Jenwald
9ab7c280aa Cache the fallback font dictionary on the PartialEvaluator (PR 11218 follow-up)
This way we'll benefit from the existing font caching, and can thus avoid re-creating a fallback font over and over again during parsing.
(Thece changes necessitated the previous patch, since otherwise breakage could occur e.g. with fake workers.)
2020-01-16 15:12:05 +01:00
Jonas Jenwald
090ff116d4 Ensure that full clean-up is always run when handling the "Terminate" message in src/core/worker.js
This is beneficial in situations where the Worker is being re-used, for example with fake workers, since it ensures that things like font resources are actually released.
2020-01-16 15:11:56 +01:00
Jonas Jenwald
c591826f3b Enable the no-nested-ternary ESLint rule (PR 11488 follow-up)
This rule is already enabled in mozilla-central, and helps avoid some confusing formatting, see https://searchfox.org/mozilla-central/rev/9e45d74b956be046e5021a746b0c8912f1c27318/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#209-210

With the recent introduction of Prettier some of the existing nested ternary statements became even more difficult to read, since any possibly helpful indentation was removed.
This particular ESLint rule wasn't entirely straightforward to enable, and I do recognize that there's a certain amount of subjectivity in the changes being made. Generally, the changes in this patch fall into three categories:
 - Cases where a value is only clamped to a certain range (the easiest ones to update).
 - Cases where the values involved are "simple", such as Numbers and Strings, which are re-factored to initialize the variable with the *default* value and only update it when necessary by using `if`/`else if` statements.
 - Cases with more complex and/or larger values, such as TypedArrays, which are re-factored to let the variable be (implicitly) undefined and where all values are then set through `if`/`else if`/`else` statements.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-nested-ternary
2020-01-14 17:49:39 +01:00
Jonas Jenwald
78917bab91 Update src/display/{annotation_layer.js, svg.js} to determine the fontWeight in the same way as with canvas (PR 6091 and 7839 follow-up) 2020-01-14 15:29:59 +01:00
Jonas Jenwald
6590cc32f2 Extract the subroutine bias computation into a helper function in src/core/font_renderer.js 2020-01-14 15:29:53 +01:00
Jonas Jenwald
2942233c9c Add support for Promise.allSettled
Please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/allSettled
2020-01-10 14:35:12 +01:00
Tim van der Meij
93aa613db7
Merge pull request #11465 from Snuffleupagus/import-file-extension
Ensure that all `import` and `require` statements, in the entire code-base, have a `.js` file extension
2020-01-06 23:24:43 +01:00
Jonas Jenwald
94f084958a Update the year in the license_header files 2020-01-05 12:14:03 +01:00
Jonas Jenwald
36881e3770 Ensure that all import and require statements, in the entire code-base, have a .js file extension
In order to eventually get rid of SystemJS and start using native `import`s instead, we'll need to provide "complete" file identifiers since otherwise there'll be MIME type errors when attempting to use `import`.
2020-01-04 13:01:43 +01:00
Jonas Jenwald
f8ab8c4d3a Move the SegoeUISymbol font to the getNonStdFontMap (PR 8698 follow-up)
For reasons that I now cannot even begin to understand, the non-standard SegoeUISymbol font was placed in the `getStdFontMap`. That honestly makes no sense, hence this patch which does what I *should* have done from the start.
2019-12-28 11:02:49 +01:00
Jonas Jenwald
a63f7ad486 Fix the linting errors, from the Prettier auto-formatting, that ESLint --fix couldn't handle
This patch makes the follow changes:
 - Remove no longer necessary inline `// eslint-disable-...` comments.
 - Fix `// eslint-disable-...` comments that Prettier moved down, thus causing new linting errors.
 - Concatenate strings which now fit on just one line.
 - Fix comments that are now too long.
 - Finally, and most importantly, adjust comments that Prettier moved down, since the new positions often is confusing or outright wrong.
2019-12-26 12:35:12 +01:00
Jonas Jenwald
de36b2aaba Enable auto-formatting of the entire code-base using Prettier (issue 11444)
Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes).

Prettier is being used for a couple of reasons:

 - To be consistent with `mozilla-central`, where Prettier is already in use across the tree.

 - To ensure a *consistent* coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters.

Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some).
Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that *comments* won't become too long.

*Please note:* This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a *separate* commit.

(On a more personal note, I'll readily admit that some of the changes Prettier makes are *extremely* ugly. However, in the name of consistency we'll probably have to live with that.)
2019-12-26 12:34:24 +01:00
Jonas Jenwald
8ec1dfde49 Add // prettier-ignore comments to prevent re-formatting of certain data structures
There's a fair number of (primarily) `Array`s/`TypedArray`s whose formatting we don't want disturb, since in many cases that would lead to the code becoming much more difficult to read and/or break existing inline comments.

*Please note:* It may be a good idea to look through these cases individually, and possibly re-write some of the them (especially the `String` ones) to reduce the need for all of these ignore commands.
2019-12-26 00:14:03 +01:00
Jonas Jenwald
70e3345cb4 Support OpenAction dictionaries without Type entries when parsing Print actions (issue 11442)
The PDF generator didn't bother including the `Type` entry in the OpenAction dictionary, hence we skipped parsing the `Print` action.
2019-12-24 10:41:33 +01:00
Wojciech Maj
d40d33682b
Extract & use createHeaders helper in src/display/fetch_stream.js 2019-12-23 08:08:17 +01:00
Jonas Jenwald
d370037618 [api-minor] Tweak the Node.js fake worker loader to prevent Critical dependency: ... warnings from Webpack
Since bundlers, such as Webpack, cannot be told to leave `require` statements alone we are thus forced to jump through hoops in order to prevent these warnings in third-party deployments of the PDF.js library; please see [Webpack issue 8826](https://github.com/webpack/webpack) and libraries such as [require-fool-webpack](https://github.com/sindresorhus/require-fool-webpack).

*Please note:* This is based on the assumption that code running in Node.js won't ever be affected by e.g. Content Security Policies that prevent use of `eval`. If that ever occurs, we should revert to a normal `require` statement and simply document the Webpack warnings instead.
2019-12-20 17:36:10 +01:00
Jonas Jenwald
8519f87efb Re-factor the setupFakeWorkerGlobal function (in src/display/api.js), and the loadFakeWorker function (in web/app.js)
This patch reduces some duplication, by moving *all* fake worker loader code into the `setupFakeWorkerGlobal` function. Furthermore, the functions are simplified further by using `async`/`await` where appropriate.
2019-12-20 17:36:10 +01:00
Jonas Jenwald
a5485e1ef7 [api-minor] Support loading the fake worker from GlobalWorkerOptions.workerSrc in Node.js
There's no particularily good reason, as far as I can tell, to not support a custom worker path in Node.js environments (even if workers aren't supported). This patch thus make the Node.js fake worker loader code-path consistent with the fallback code-path used with *browser* fake worker loader.

Finally, this patch also deprecates[1] the `fallbackWorkerSrc` functionality, except in Node.js, since the user should *always* provide correct worker options since the fallback is nothing more than a best-effort solution.

---
[1] Although it probably shouldn't be removed until the next major version.
2019-12-20 17:36:10 +01:00
Jonas Jenwald
591e754831 Move the fake worker loader code into the PDFWorkerClosure
Given that this code isn't needed "globally" in the file, it seems reasonable to move it to where it's actually used instead.
2019-12-20 17:36:10 +01:00
Jonas Jenwald
aab0f91740 [api-minor] Simplify the *fallback* fake worker loader code in src/display/api.js
For performance reasons, and to avoid hanging the browser UI, the PDF.js library should *always* be used with web workers enabled.
At this point in time all of the supported browsers should have proper worker support, and Node.js is thus the only environment where workers aren't supported. Hence it no longer seems relevant/necessary to provide, by default, fake worker loaders for various JS builders/bundlers/frameworks in the PDF.js code itself.[1]

In order to simplify things, the fake worker loader code is thus simplified to now *only* support Node.js usage respectively "normal" browser usage out-of-the-box.[2]

*Please note:* The officially intended way of using the PDF.js library is with workers enabled, which can be done by setting `GlobalWorkerOptions.workerSrc`, `GlobalWorkerOptions.workerPort`, or manually providing a `PDFWorker` instance when calling `getDocument`.

---
[1] Note that it's still possible to *manually* disable workers, simply my manually loading the built `pdf.worker.js` file into the (current) global scope, however this's mostly intended for testing/debugging purposes.

[2] Unfortunately some bundlers such as Webpack, when used with third-party deployments of the PDF.js library, will start to print `Critical dependency: ...` warnings when run against the built `pdf.js` file from this patch. The reason is that despite the `require` calls being protected by *runtime* `isNodeJS` checks, it's not possible to simply tell Webpack to just ignore the `require`; please see [Webpack issue 8826](https://github.com/webpack/webpack) and libraries such as [require-fool-webpack](https://github.com/sindresorhus/require-fool-webpack).
2019-12-20 17:36:08 +01:00
Jonas Jenwald
dbb82f05fc Re-factor the find helper function, in src/core/document.js, to search through the raw bytes rather than a string
During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization.
This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function.

The main benefits here are:
 - No longer necessary to allocate *temporary* `1 kB` strings during initial parsing, thus saving some memory.
 - In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a *total* of only 30 loop iterations.)
2019-12-14 13:43:26 +01:00
Jonas Jenwald
e24050fa13 [api-minor] Move the ReadableStream polyfill to the global scope
Note that most (reasonably) modern browsers have supported this for a while now, see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#Browser_compatibility

By moving the polyfill into `src/shared/compatibility.js` we can thus get rid of the need to manually export/import `ReadableStream` and simply use it directly instead.

The only change here which *could* possibly lead to a difference in behavior is in the `isFetchSupported` function. Previously we attempted to check for the existence of a global `ReadableStream` implementation, which could now pass (assuming obviously that the preceding checks also succeeded).
However I'm not sure if that's a problem, since the previous check only confirmed the existence of a native `ReadableStream` implementation and not that it actually worked correctly. Finally it *could* just as well have been a globally registered polyfill from an application embedding the PDF.js library.
2019-12-11 19:02:37 +01:00
Jonas Jenwald
b00835f589 Attempt to improve the PDFDocument error message for empty files (issue 5887)
Given that the error in question is surfaced on the API-side, this patch makes the following changes:
 - Updates the wording such that it'll hopefully be slightly easier for users to understand.
 - Changes the plain `Error` to an `InvalidPDFException` instead, since that should work better with the existing Error handling.
 - Adds a unit-test which loads an empty PDF document (and also improves a pre-existing `InvalidPDFException` message and its test-case).
2019-12-09 15:45:50 +01:00
Tim van der Meij
a6db045789
Merge pull request #11387 from Snuffleupagus/issue-11385
Handle corrupt ASCII85Decode inline images with truncated EOD markers (issue 11385)
2019-12-08 20:27:46 +01:00
Tim van der Meij
16778118f6
Merge pull request #11391 from Snuffleupagus/globalThis
Replace `globalScope` with the standard `globalThis` property instead
2019-12-08 20:23:19 +01:00
Jonas Jenwald
71d61e4c6f Re-factor getMainThreadWorkerMessageHandler to support arbitrary global scopes, rather than only window 2019-12-08 20:19:04 +01:00
Jonas Jenwald
a8fc306b6e Replace globalScope with the standard globalThis property instead
Please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/globalThis and note that most (reasonably) modern browsers have supported this for a while now, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/globalThis#Browser_compatibility

Since ESLint doesn't support this new global yet, it was added to the `globals` list in the top-level configuration file to prevent issues.

Finally, for older browsers a polyfill was added in `ssrc/shared/compatibility.js`.
2019-12-08 20:19:02 +01:00
Jonas Jenwald
a02122e984 Ensure that PDFDocument.checkFirstPage waits for cleanup to complete (PR 10392 follow-up)
Given how this method is currently used there shouldn't be any fonts loaded at the point in time where it's called, but it does seem like a bad idea to assume that that's always going to be the case. Since `PDFDocument.checkFirstPage` is already asynchronous, it's easy enough to simply await `Catalog.cleanup` here.

(The patch also makes a tiny simplification in a loop in `Catalog.cleanup`.)
2019-12-07 12:31:41 +01:00
Jonas Jenwald
5c0336872e Handle corrupt ASCII85Decode inline images with truncated EOD markers (issue 11385)
In the PDF document in question, there's an ASCII85Decode inline image where the '>' part of EOD (end-of-data) marker is missing; hence the PDF document is corrupt.
2019-12-05 15:53:18 +01:00
Jonas Jenwald
c3b1c8f857 Slightly simplify the XRef cache lookup in XRef.fetch
Note that the XRef cache will only hold objects returned through `Parser.getObj`, and indirectly via `Lexer.getObj`. Since neither of those methods will ever return `undefined`, we can simply `assert` that when inserting objects into the cache and thus get rid of one function call when doing cache lookups.

Obviously this won't have a huge effect on performance, however `XRef.fetch` is usually called *a lot* in larger documents and this patch thus cannot hurt.
2019-11-30 22:41:53 +01:00
Jonas Jenwald
168c6aecae Stop caching Streams in XRef.fetchCompressed
I'm slightly surprised that this hasn't actually caused any (known) bugs, but that may be more luck than anything else since it fortunately doesn't seem common for Streams to be defined inside of an 'ObjStm'.[1]

Note that in the `XRef.fetchUncompressed` method we're *not* caching Streams, and that for very good reasons too.

 - Streams, especially the `DecodeStream` ones, can become *very* large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so.

 - Attempting to read from the *same* Stream more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position.

 - Given that even the `src/core/` code is now fairly asynchronous, see e.g. the `PartialEvaluator`, it's generally impossible to assert that any one Stream isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Streams isn't going to work in the general case.

All in all, I cannot understand why it'd ever be correct to cache Streams in the `XRef.fetchCompressed` method.

---
[1] One example where that happens is the `issue3115r.pdf` file in the test-suite, where the streams in question are not actually used for anything within the PDF.js code.
2019-11-30 10:21:08 +01:00
Jonas Jenwald
06412a557b Slighthly re-factor XRef.fetchCompressed
- Change all occurences of `var` to `let`/`const`.

 - Initialize the (temporary) Arrays with the correct sizes upfront.

 - Inline the `isCmd` check. Obviously this won't make a huge difference, but given that the check is only relevant for corrupt documents it cannot hurt.
2019-11-30 09:49:51 +01:00
Jonas Jenwald
725566cfea Remove the Number.isInteger checks from XRef.fetchUncompressed (PR 8857 follow-up)
Having ran the entire test-suite locally with these `Number.isInteger` checks removed, there wasn't a single test failure anywhere; see also PR 8857.
Hence everything points to this being completely unnecessary now, and by removing this code there's thus fewer function calls being made in `XRef.fetchUncompressed`.
2019-11-28 23:25:39 +01:00