pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	cdc60402f6	[api-minor] Change `PageViewport` to throw when the `rotation` is not a multiple of 90 degrees As evident from the code, `PageViewport` only supports[1] `rotation` values which are a multiple of 90 degrees. Besides it being somewhat difficult to imagine meaningful use-cases for a non-multiple of 90 degrees `rotation`, the code also becomes both simpler and more efficient by not having to consider arbitrary `rotation` values. However, any invalid rotation will silently fallback to assume zero `rotation` which probably isn't great for e.g. `PDFPageProxy.getViewport` in the API. Hence this patch, which will now enforce that only valid `rotation` values are accepted. --- [1] As far as I can tell, from looking through the history, nothing else has ever been supported either.	2020-04-22 15:19:13 +02:00
Jonas Jenwald	1cc3dbb694	Enable the `dot-notation` ESLint rule Please note: These changes were done automatically, using the `gulp lint --fix` command. This rule is already enabled in mozilla-central, see https://searchfox.org/mozilla-central/rev/567b68b8ff4b6d607ba34a6f1926873d21a7b4d7/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#103-104 The main advantage, besides improved consistency, of this rule is that it reduces the size of the code (by 3 bytes for each case). In the PDF.js code-base there's close to 8000 instances being fixed by the `dot-notation` ESLint rule, which end up reducing the size of even the built files significantly; the total size of the `gulp mozcentral` build target changes from `3 247 456` to `3 224 278` bytes, which is a reduction of `23 178` bytes (or ~0.7%) for a completely mechanical change. A large number of these changes affect the (large) lookup tables used on the worker-thread, but given that they are still initialized lazily I don't think that the new formatting this patch introduces should undo any of the improvements from PR 6915. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/dot-notation	2020-04-17 12:24:46 +02:00
Tim van der Meij	96923eb2a6	Merge pull request #11805 from Snuffleupagus/issue-11794 Always skip over any additional, unexpected, RSTx (restart) markers in corrupt JPEG images (issue 11794)	2020-04-16 00:08:58 +02:00
Tim van der Meij	a7def05aa1	Merge pull request #11810 from Snuffleupagus/fromCodePoint-followup A couple of small `String.fromCodePoint` improvements (PR 11698 and 11769 follow-up)	2020-04-16 00:08:16 +02:00
Jonas Jenwald	44b4a74f48	A couple of small `String.fromCodePoint` improvements (PR 11698 and 11769 follow-up) - Add a reduced test-case for issue 11768, to prevent future regressions. (Given that PR 11769 is only a work-around, rather than a proper solution, it may not be entirely accurate for the issue to be closed as fixed.) - Add more validation of the charCode, as found by the heuristics, in `PartialEvaluator._buildSimpleFontToUnicode` to prevent future issues.	2020-04-15 13:45:08 +02:00
Jonas Jenwald	06f6f8719f	Always skip over any additional, unexpected, RSTx (restart) markers in corrupt JPEG images (issue 11794)	2020-04-14 23:27:08 +02:00
Jonas Jenwald	746eaf3154	[api-minor] Fix the return value of `PDFDocumentProxy.getViewerPreferences` when no viewer preferences are present (PR 10738 follow-up) This patch fixes yet another instalment in the never-ending series of "what the bleep was I thinking", by changing the `PDFDocumentProxy.getViewerPreferences` method to return `null` by default. Not only is this method now consistent with many other API methods, for the data not present case, but it also avoids having to e.g. loop through an object to check if it's actually empty (note the old unit-test).	2020-04-14 23:25:50 +02:00
Jonas Jenwald	426945b480	Update Prettier to version 2.0 Please note that these changes were done automatically, using `gulp lint --fix`. Given that the major version number was increased, there's a fair number of (primarily whitespace) changes; please see https://prettier.io/blog/2020/03/21/2.0.0.html In order to reduce the size of these changes somewhat, this patch maintains the old "arrowParens" style for now (once mozilla-central updates Prettier we can simply choose the same formatting, assuming it will differ here).	2020-04-14 12:28:14 +02:00
Jonas Jenwald	91efde5246	Add a heuristic to scale even single-char text, when the horizontal/vertical scaling differs significantly (issue 11713) At this point in time, compared to when the "ignore single-char" code was added, we should generally be doing a much better job of combining text into as few chunks as possible. However, there's still bad cases where we're not able to combine text as much as one would like, which is why I'm not proposing to simply measure/scale all text. Instead this patch will to only measure/scale single-char text in cases where the horizontal/vertical scale is off significantly, since that's were you'd expect bad text-selection behaviour otherwise. Note that most of the movement caused by this patch is with Type3 fonts, which is a somewhat special font type and one where our current text-selection behaviour is probably the least good.	2020-04-07 00:36:23 +02:00
Jonas Jenwald	938d519192	Create the glyph mapping correctly for composite Type1, i.e. CIDFontType0, fonts (issue 11740) This updates `Type1Font.getGlyphMapping` with a code-path "borrowed" from `CFFFont.getGlyphMapping`.	2020-04-06 11:21:02 +02:00
Jonas Jenwald	710704508c	Fail early, in modern `GENERIC` builds, if certain required browser functionality is missing (issue 11762) With two kind of builds now being produced, with/without translation/polyfills, it's unfortunately somewhat easy for users to accidentally pick the wrong one. In the case where a user would attempt to use a modern build of PDF.js in an older browser, such as e.g. IE11, the failure would be immediate when the code is loaded (given the use of unsupported ECMAScript features). However in some browsers/environments, in particular Node.js, a modern PDF.js build may load correctly and thus appear to function, only to fail for e.g. certain API calls. To hopefully lessen the support burden, and to try and improve things overall, this patch adds checks to ensure that a modern build of PDF.js cannot be used in browsers/environments which lack native support for critical functionality (such as e.g. `ReadableStream`). Hence we'll fail early, with an error message telling users to pick an ES5-compatible build instead. To ensure that we actually test things better especially w.r.t. usage of the PDF.js library in Node.js environments, the `gulp npm-test` task as used by Node.js/Travis was changed (back) to test an ES5-compatible build. (Since the bots still test the code as-is, without transpilation/polyfills, this shouldn't really be a problem as far as I can tell.) As part of these changes there's now both `gulp lib` and `gulp lib-es5` build targets, similar to e.g. the generic builds, which thanks to some re-factoring only required adding a small amount of code. Please note: While it's probably too early to tell if this will be a widespread issue, it's possible that this is the sort of patch that may warrant being `git cherry-pick`ed onto the current beta version (v2.4.456).	2020-04-01 19:42:48 +02:00
Jonas Jenwald	664b79abe0	[api-minor] Remove the `eventBusDispatchToDOM` option/preference, and thus the general ability to dispatch "viewer components" events to the DOM This functionality was only added to the default viewer for backwards compatibility and to support the various PDF viewer tests in mozilla-central, with the intention to eventually remove it completely. While the different mozilla-central tests cannot be easily converted from DOM events, it's however possible to limit that functionality to only MOZCENTRAL builds and when tests are running. Rather than depending of the re-dispatching of internal events to the DOM, the default viewer can instead be used in e.g. the following way: ```javascript document.addEventListener("webviewerloaded", function() { PDFViewerApplication.initializedPromise.then(function() { // The viewer has now been initialized, and its properties can be accessed. PDFViewerApplication.eventBus.on("pagerendered", function(event) { console.log("Has rendered page number: " + event.pageNumber); }); }); }); ```	2020-03-29 12:24:46 +02:00
Jonas Jenwald	fdfcde2b40	Remove a spurious `console.log` from the `ChromiumBrowser` function in `test/webbrowser.js` file This looks entirely like something which was left-over from debugging, and that line hasn't been touched since PR 4515, especially considering that the corresponding branch in `FirefoxBrowser` doesn't print anything.	2020-03-25 11:57:12 +01:00
Jonas Jenwald	dcb16af968	Whitelist closure related cases to address the remaining `no-shadow` linting errors Given the way that "classes" were previously implemented in PDF.js, using regular functions and closures, there's a fair number of false positives when the `no-shadow` ESLint rule was enabled. Note that while some of these `eslint-disable` statements can be removed if/when the relevant code is converted to proper `class`es, we'll probably never be able to get rid of all of them given our naming/coding conventions (however I don't really see this being a problem).	2020-03-25 11:57:12 +01:00
Jonas Jenwald	1d2f787d6a	Enable the ESLint `no-shadow` rule This rule is not currently enabled in mozilla-central, but it appears commented out[1] in the ESLint definition file; see https://searchfox.org/mozilla-central/rev/c80fa7258c935223fe319c5345b58eae85d4c6ae/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#238-239 Unfortunately this rule is, for fairly obvious reasons, impossible to `--fix` automatically (even partially) and each case thus required careful manual analysis. Hence this ESLint rule is, by some margin, probably the most difficult one that we've enabled thus far. However, using this rule does seem like a good idea in general since allowing variable shadowing could lead to subtle (and difficult to find) bugs or at the very least confusing code. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-shadow --- [1] Most likely, a very large number of lint errors have prevented this rule from being enabled thus far.	2020-03-25 11:56:05 +01:00
Tim van der Meij	475fa1f97f	Merge pull request #11744 from janpe2/cff-glyph-zero The first glyph in CFF CIDFonts must be named 0 instead of ".notdef"	2020-03-24 23:52:21 +01:00
Jani Pehkonen	a22c0eab48	The first glyph in CFF CIDFonts must be named 0 instead of ".notdef" Fixes #11718 in which the `ff` ligature glyph is at index zero in a CFF font. Beacuse this is a CIDFont, glyph names are CIDs, which are integers. Thus the string `".notdef"` is not correct. The rest of the charset data is already parsed correctly as integers when the boolean argument `cid` is true.	2020-03-24 15:56:50 +02:00
Jonas Jenwald	66ee8f5acd	Remove variable shadowing from the JavaScript files in the `test/unit/` folder This is part of a series of patches that will try to split PR 11566 into smaller chunks, to make reviewing more feasible. Once all the code has been fixed, we'll be able to eventually enable the ESLint no-shadow rule; see https://eslint.org/docs/rules/no-shadow	2020-03-24 10:44:17 +01:00
Jonas Jenwald	b02be3b268	Update the `eslint-plugin-no-unsanitized` package to the latest version	2020-03-20 11:25:39 +01:00
Jonas Jenwald	ae2900e510	[api-minor] Change the pageIndex, on `PDFPageProxy` instances, to a private property This property has never been documented and/or intentionally exposed through the API, instead the `PDFPageProxy.pageNumber` property is the documented/intended API to use here. Hence pageIndex is changed to a "private" property on `PDFPageProxy` instances, and internal API functionality is also updated to consistently use `this._pageIndex` rather than a mix of formats.	2020-03-19 15:47:11 +01:00
Tim van der Meij	1bc5cef2b5	Merge pull request #11698 from Snuffleupagus/issue-11697 Don't accidentally accept invalid glyphNames which appear to follow the Cdd{d}/cdd{d} format in `PartialEvaluator._buildSimpleFontToUnicode` (issue 11697)	2020-03-15 13:36:09 +01:00
Tim van der Meij	aa3e5a2b8f	Merge pull request #11644 from Snuffleupagus/openAction [api-minor] Add more general OpenAction support (PR 10334 follow-up, issue 11642)	2020-03-15 13:16:37 +01:00
Jonas Jenwald	15e8692eff	Don't accidentally accept invalid glyphNames which appear to follow the Cdd{d}/cdd{d} format in `PartialEvaluator._buildSimpleFontToUnicode` (issue 11697) The /Differences array of the problematic font contains a `/c.1` entry, which is consequently detected as a possible Cdd{d}/cdd{d} glyphName by the existing heuristics. Because of how the base 10 conversion is implemented, which is necessary for the base 16 special case, the parsed charCode becomes `0.1` thus causing `String.fromCodePoint` to throw since that obviously isn't a valid code point. To fix the referenced issue, and to hopefully prevent similar ones in the future, the patch adds additional validation of the charCode found by the heuristics.	2020-03-13 23:35:47 +01:00
Jonas Jenwald	c5f67300e9	Rename the `isSpace` helper function to `isWhiteSpace` Trying to enable the ESLint rule `no-shadow`, against the `master` branch, would result in a fair number of errors in the `Glyph` class in `src/core/fonts.js`. Since the glyphs are exposed through the API, we can't very well change the `isSpace` property on `Glyph` instances. Thus the best approach seems, at least to me, to simply rename the `isSpace` helper function to `isWhiteSpace` which shouldn't cause any issues given that it's only used in the `src/core/` folder.	2020-03-12 11:36:59 +01:00
Jani Pehkonen	e0daabd2dd	Magnifier positioning in reftest analyzer When reftest analyzer shows magnified pixels, there is a seemingly random offset between the mouse position and the magnified position. The reason for this is that reftest analyzer assumes all images have 800 * 1000 pixels but actually the test images have varying sizes.	2020-03-10 19:09:15 +02:00
Jonas Jenwald	65e6ea2cb2	Prevent lookup errors in `PartialEvaluator.hasBlendModes` from breaking all parsing/rendering of a page (issue 11678) The PDF document in question is corrupt, since it contains an XObject with a truncated dictionary and where the stream contents start without a "stream" operator.	2020-03-09 12:00:12 +01:00
Tim van der Meij	1a97c142b3	Merge pull request #11523 from Snuffleupagus/issue-10880 Add a heuristic, in `src/core/jpg.js`, to handle JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10880)	2020-03-06 23:03:09 +01:00
Jonas Jenwald	160cfc4084	Slightly simplify the lookup of data in `Dict.{get, getAsync, has}` Note that `Dict.set` will only be called with values returned through `Parser.getObj`, and thus indirectly via `Lexer.getObj`. Since neither of those methods will ever return `undefined`, we can simply assert that that's the case when inserting data into the `Dict` and thus get rid of `in` checks when doing the data lookups. In this case, since `Dict.set` is fairly hot, the patch utilizes an inline check and when necessary a direct call to `unreachable` to not affect performance of `gulp server/test` too much (rather than always just calling `assert`). For very large and complex PDF files this will help performance slightly, since `Dict.{get, getAsync, has}` is called a lot during parsing in the worker. This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file: ``` [ { "id": "issue2618", "file": "../web/pdfs/issue2618.pdf", "md5": "", "rounds": 250, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 250 \| 2838 \| 2820 \| -18 \| -0.65 \| faster Firefox \| Page Request \| 250 \| 1 \| 2 \| 0 \| 11.92 \| slower Firefox \| Rendering \| 250 \| 2837 \| 2818 \| -19 \| -0.65 \| faster ```	2020-03-06 14:12:14 +01:00
Jonas Jenwald	01fb309a2a	[api-minor] Add more general OpenAction support (PR 10334 follow-up, issue 11642) This patch deprecates the existing `getOpenActionDestination` API method, in favor of a better and more general `getOpenAction` method instead. (For now JavaScript actions, related to printing, are still handled as before.) By clearly separating "regular" Print actions from the JavaScript handling, it's thus possible to get rid of the somewhat annoying and strictly incorrect warning when the viewer loads.	2020-03-06 13:03:00 +01:00
Tim van der Meij	c95b9b1e17	Merge pull request #11653 from Snuffleupagus/ensureStateFont Ensure that there's always a setFont (Tf) operator before text rendering operators (issue 11651)	2020-03-03 23:33:13 +01:00
Jani Pehkonen	71e7686950	Fix Type1 font parsing when .notdef is not at index zero Fixes #11477 The PDF draws many space characters but the embedded fonts don't have a glyph named `space`, so `.notdef` should be drawn instead. PDF.js assumed that Type1 fonts define `.notdef` as the first glyph (index 0). However, now the fonts have the glyph `A` at index 0 and `.notdef` is the last one, so `A` appears where spaces are expected. Because the rest of the font machinery in `core/fonts.js` assumes `.notdef` is at index zero, it's easiest to modify `core/type1_parser.js` so that it "repairs" fonts and makes sure `.notdef` is at index 0.	2020-03-03 21:55:51 +02:00
Jonas Jenwald	65e514e063	Ensure that there's always a setFont (Tf) operator before text rendering operators (issue 11651) The PDF document in question is corrupt, since it contains multiple instances of incorrect operators. We obviously don't want to slow down parsing of all documents (since most are valid), just to accommodate a particular bad PDF generator, hence the reason for the inline check before calling the `ensureStateFont` method.	2020-03-03 10:05:18 +01:00
Takashi Tamura	d8c9f119b0	Fix the vertical writing mode with horizontal scaling. #11555 . It is not valid to multiply textHScale when the writing mode is vertical. See 9.4.4 Text Space Details, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1694762	2020-02-29 07:48:29 +09:00
Tim van der Meij	e1586016c5	Merge pull request #11577 from Snuffleupagus/Pages-tree-refs Prevent circular references in the /Pages tree	2020-02-27 23:36:11 +01:00
Tim van der Meij	965ebe63fd	Merge pull request #11540 from tamuratak/charspacing Fix text spacing with vertical fonts. #7687 and #11526.	2020-02-26 22:26:27 +01:00
Jonas Jenwald	bf09d79eea	Use the ESLint `no-restricted-syntax` rule to prevent direct usage of `new Cmd()`/`new Name()`/`new Ref()` Given that all of these primitives implement caching, to avoid unnecessarily duplicating those objects a lot during parsing, it would thus be good to actually enforce usage of `Cmd.get()`/`Name.get()`/`Ref.get()` in the code-base. Luckily it turns out that there's an ESLint rule, which is fairly easy to use, that can be used to disallow arbitrary JavaScript syntax. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-restricted-syntax	2020-02-22 21:15:00 +01:00
Jonas Jenwald	c3c3b8cd81	Add a heuristic, in `src/core/jpg.js`, to handle JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10880) This whole patch feels somewhat arbitrary, and I'd be slightly worried about possibly breaking something else. To limit the impact of these changes, we only re-parse JPEG images using a reduced `scanLines` value if and only if: An unexpected EOI (End of Image) marker was encountered during decoding of Scan data and the "actual" `scanLines` value is at least one order of magnitude smaller than expected.	2020-02-22 14:16:07 +01:00
Jonas Jenwald	3c7b7be100	Prevent circular references in the /Pages tree	2020-02-19 01:49:39 +01:00
Takashi Tamura	512dbe3060	Fix text spacing with vertical fonts. #7687 and #11526 . When the writing mode is vertical, we have to reverse the sign of spacing since we are subtracting it from current.y. We have to add it to current.y. See 9.4.4 Text Space Details, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1694762	2020-02-11 08:49:23 +09:00
Tim van der Meij	dced0a3821	Merge pull request #11579 from Snuffleupagus/issue-11578 Ignore spaces when normalizing the font name in `Font.fallbackToSystemFont` (issue 11578)	2020-02-09 17:33:09 +01:00
Tim van der Meij	61056a9238	Merge pull request #11551 from Snuffleupagus/issue-11549 Allow skipping of errors when reading broken/corrupt ToUnicode data (issue 11549)	2020-02-09 17:32:35 +01:00
Tim van der Meij	2fb4076e05	Merge pull request #11568 from Snuffleupagus/PDF-header-validation Ensure that the PDF header contains an actual number (PR 11463 follow-up)	2020-02-09 17:16:25 +01:00
Jonas Jenwald	7937165537	Ignore spaces when normalizing the font name in `Font.fallbackToSystemFont` (issue 11578)	2020-02-08 19:59:04 +01:00
Jonas Jenwald	7117ee03d6	[api-minor] Change `PDFDocumentProxy.cleanup`/`PDFPageProxy.cleanup` to return data This patch makes the following changes, to improve these API methods: - Let `PDFPageProxy.cleanup` return a boolean indicating if clean-up actually happened, since ongoing rendering will block clean-up. Besides being used in other parts of this patch, it seems that an API user may also be interested in the return value given that clean-up isn't guaranteed to happen. - Let `PDFDocumentProxy.cleanup` return the promise indicating when clean-up is finished. - Improve the JSDoc comment for `PDFDocumentProxy.cleanup` to mention that clean-up is triggered on both threads (without going into unnecessary specifics regarding what exactly said data actually is). Add a note in the JSDoc comment about not calling this method when rendering is ongoing. - Change `WorkerTransport.startCleanup` to throw an `Error` if it's called when rendering is ongoing, to prevent rendering from breaking. Please note that this won't stop worker-thread clean-up from happening (since there's no general "something is rendering"-flag), however I'm not sure if that's really a problem; but please don't quote me on that :-) All of the caches that's being cleared in `Catalog.cleanup`, on the worker-thread, should be re-filled automatically even if cleared during parsing/rendering, and the only thing that probably happens is that e.g. font data would have to be re-parsed. On the main-thread, on the other hand, clearing the caches is more-or-less guaranteed to cause rendering errors, since the rendering code in `src/display/canvas.js` isn't able to re-request any image/font data that's suddenly being pulled out from under it. - Last, but not least, add a couple of basic unit-tests for the clean-up functionality.	2020-02-07 17:00:29 +01:00
Jonas Jenwald	88c35d872f	Ensure that the PDF header contains an actual number (PR 11463 follow-up) While it would be nice to change the `PDFFormatVersion` property, as returned through `PDFDocumentProxy.getMetadata`, to a number (rather than a string) that would unfortunately be a breaking API change. However, it does seem like a good idea to at least validate the PDF header version on the worker-thread, rather than potentially returning an arbitrary string.	2020-02-07 12:25:07 +01:00
Brendan Dahl	09a6e17d22	Merge pull request #11528 from janpe2/type1-nonemb-notdef Hide .notdef glyphs in non-embedded Type1 fonts and don't ignore Widths	2020-02-06 13:30:07 -08:00
Jonas Jenwald	4c54395ff6	Allow skipping of errors when reading broken/corrupt ToUnicode data (issue 11549) This will allow font loading/parsing to continue, rather than immediately failing, when broken/corrupt CMap data is encountered.	2020-01-30 13:19:05 +01:00
Tim van der Meij	474fe1757e	Merge pull request #11508 from Snuffleupagus/jpg-default-marker Simplify the handling of unsupported/incorrect markers in `src/core/jpg.js`	2020-01-26 21:32:13 +01:00
Jonas Jenwald	62b2b984cc	Render Popup annotations last, once all other annotations have been rendered (issue 11362) In the current `AnnotationLayer` implementation, Popup annotations require that the parent annotation have already been rendered (otherwise they're simply ignored). Usually the annotations are ordered, in the `/Annots` array, in such a way that this isn't a problem, however there's obviously no guarantee that all PDF generators actually do so. Hence we simply ensure, when rendering the `AnnotationLayer`, that the Popup annotations are handled last.	2020-01-26 15:49:55 +01:00
Jonas Jenwald	13930e5202	Simplify the handling of unsupported/incorrect markers in `src/core/jpg.js` - Re-factor the "incorrect encoding" check, since this can be easily achieved using the general `findNextFileMarker` helper function (with a suitable `startPos` argument). - Tweak a condition, to make it easier to see that the end of the data has been reached. - Add a reference test for issue 1877, since it's what prompted the "incorrect encoding" check.	2020-01-25 22:52:24 +01:00

1 2 3 4 5 ...

1979 Commits