pdf.js

Author	SHA1	Message	Date
Tim van der Meij	475fa1f97f	Merge pull request #11744 from janpe2/cff-glyph-zero The first glyph in CFF CIDFonts must be named 0 instead of ".notdef"	2020-03-24 23:52:21 +01:00
Jani Pehkonen	a22c0eab48	The first glyph in CFF CIDFonts must be named 0 instead of ".notdef" Fixes #11718 in which the `ff` ligature glyph is at index zero in a CFF font. Beacuse this is a CIDFont, glyph names are CIDs, which are integers. Thus the string `".notdef"` is not correct. The rest of the charset data is already parsed correctly as integers when the boolean argument `cid` is true.	2020-03-24 15:56:50 +02:00
Jonas Jenwald	66ee8f5acd	Remove variable shadowing from the JavaScript files in the `test/unit/` folder This is part of a series of patches that will try to split PR 11566 into smaller chunks, to make reviewing more feasible. Once all the code has been fixed, we'll be able to eventually enable the ESLint no-shadow rule; see https://eslint.org/docs/rules/no-shadow	2020-03-24 10:44:17 +01:00
Jonas Jenwald	b02be3b268	Update the `eslint-plugin-no-unsanitized` package to the latest version	2020-03-20 11:25:39 +01:00
Jonas Jenwald	ae2900e510	[api-minor] Change the pageIndex, on `PDFPageProxy` instances, to a private property This property has never been documented and/or intentionally exposed through the API, instead the `PDFPageProxy.pageNumber` property is the documented/intended API to use here. Hence pageIndex is changed to a "private" property on `PDFPageProxy` instances, and internal API functionality is also updated to consistently use `this._pageIndex` rather than a mix of formats.	2020-03-19 15:47:11 +01:00
Tim van der Meij	aa3e5a2b8f	Merge pull request #11644 from Snuffleupagus/openAction [api-minor] Add more general OpenAction support (PR 10334 follow-up, issue 11642)	2020-03-15 13:16:37 +01:00
Jonas Jenwald	c5f67300e9	Rename the `isSpace` helper function to `isWhiteSpace` Trying to enable the ESLint rule `no-shadow`, against the `master` branch, would result in a fair number of errors in the `Glyph` class in `src/core/fonts.js`. Since the glyphs are exposed through the API, we can't very well change the `isSpace` property on `Glyph` instances. Thus the best approach seems, at least to me, to simply rename the `isSpace` helper function to `isWhiteSpace` which shouldn't cause any issues given that it's only used in the `src/core/` folder.	2020-03-12 11:36:59 +01:00
Jonas Jenwald	160cfc4084	Slightly simplify the lookup of data in `Dict.{get, getAsync, has}` Note that `Dict.set` will only be called with values returned through `Parser.getObj`, and thus indirectly via `Lexer.getObj`. Since neither of those methods will ever return `undefined`, we can simply assert that that's the case when inserting data into the `Dict` and thus get rid of `in` checks when doing the data lookups. In this case, since `Dict.set` is fairly hot, the patch utilizes an inline check and when necessary a direct call to `unreachable` to not affect performance of `gulp server/test` too much (rather than always just calling `assert`). For very large and complex PDF files this will help performance slightly, since `Dict.{get, getAsync, has}` is called a lot during parsing in the worker. This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file: ``` [ { "id": "issue2618", "file": "../web/pdfs/issue2618.pdf", "md5": "", "rounds": 250, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 250 \| 2838 \| 2820 \| -18 \| -0.65 \| faster Firefox \| Page Request \| 250 \| 1 \| 2 \| 0 \| 11.92 \| slower Firefox \| Rendering \| 250 \| 2837 \| 2818 \| -19 \| -0.65 \| faster ```	2020-03-06 14:12:14 +01:00
Jonas Jenwald	01fb309a2a	[api-minor] Add more general OpenAction support (PR 10334 follow-up, issue 11642) This patch deprecates the existing `getOpenActionDestination` API method, in favor of a better and more general `getOpenAction` method instead. (For now JavaScript actions, related to printing, are still handled as before.) By clearly separating "regular" Print actions from the JavaScript handling, it's thus possible to get rid of the somewhat annoying and strictly incorrect warning when the viewer loads.	2020-03-06 13:03:00 +01:00
Tim van der Meij	e1586016c5	Merge pull request #11577 from Snuffleupagus/Pages-tree-refs Prevent circular references in the /Pages tree	2020-02-27 23:36:11 +01:00
Jonas Jenwald	bf09d79eea	Use the ESLint `no-restricted-syntax` rule to prevent direct usage of `new Cmd()`/`new Name()`/`new Ref()` Given that all of these primitives implement caching, to avoid unnecessarily duplicating those objects a lot during parsing, it would thus be good to actually enforce usage of `Cmd.get()`/`Name.get()`/`Ref.get()` in the code-base. Luckily it turns out that there's an ESLint rule, which is fairly easy to use, that can be used to disallow arbitrary JavaScript syntax. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-restricted-syntax	2020-02-22 21:15:00 +01:00
Jonas Jenwald	3c7b7be100	Prevent circular references in the /Pages tree	2020-02-19 01:49:39 +01:00
Tim van der Meij	2fb4076e05	Merge pull request #11568 from Snuffleupagus/PDF-header-validation Ensure that the PDF header contains an actual number (PR 11463 follow-up)	2020-02-09 17:16:25 +01:00
Jonas Jenwald	7117ee03d6	[api-minor] Change `PDFDocumentProxy.cleanup`/`PDFPageProxy.cleanup` to return data This patch makes the following changes, to improve these API methods: - Let `PDFPageProxy.cleanup` return a boolean indicating if clean-up actually happened, since ongoing rendering will block clean-up. Besides being used in other parts of this patch, it seems that an API user may also be interested in the return value given that clean-up isn't guaranteed to happen. - Let `PDFDocumentProxy.cleanup` return the promise indicating when clean-up is finished. - Improve the JSDoc comment for `PDFDocumentProxy.cleanup` to mention that clean-up is triggered on both threads (without going into unnecessary specifics regarding what exactly said data actually is). Add a note in the JSDoc comment about not calling this method when rendering is ongoing. - Change `WorkerTransport.startCleanup` to throw an `Error` if it's called when rendering is ongoing, to prevent rendering from breaking. Please note that this won't stop worker-thread clean-up from happening (since there's no general "something is rendering"-flag), however I'm not sure if that's really a problem; but please don't quote me on that :-) All of the caches that's being cleared in `Catalog.cleanup`, on the worker-thread, should be re-filled automatically even if cleared during parsing/rendering, and the only thing that probably happens is that e.g. font data would have to be re-parsed. On the main-thread, on the other hand, clearing the caches is more-or-less guaranteed to cause rendering errors, since the rendering code in `src/display/canvas.js` isn't able to re-request any image/font data that's suddenly being pulled out from under it. - Last, but not least, add a couple of basic unit-tests for the clean-up functionality.	2020-02-07 17:00:29 +01:00
Jonas Jenwald	88c35d872f	Ensure that the PDF header contains an actual number (PR 11463 follow-up) While it would be nice to change the `PDFFormatVersion` property, as returned through `PDFDocumentProxy.getMetadata`, to a number (rather than a string) that would unfortunately be a breaking API change. However, it does seem like a good idea to at least validate the PDF header version on the worker-thread, rather than potentially returning an arbitrary string.	2020-02-07 12:25:07 +01:00
Jonas Jenwald	3f031f69c2	Move additional worker-thread only functions from `src/shared/util.js` and into a `src/core/core_utils.js` instead This moves the `log2`, `readInt8`, `readUint16`, `readUint32`, and `isSpace` functions since they are only used in the worker-thread.	2020-01-25 00:33:52 +01:00
Jonas Jenwald	9e262ae7fa	Enable the ESLint `prefer-const` rule globally (PR 11450 follow-up) Please find additional details about the ESLint rule at https://eslint.org/docs/rules/prefer-const With the recent introduction of Prettier this sort of mass enabling of ESLint rules becomes a lot easier, since the code will be automatically reformatted as necessary to account for e.g. changed line lengths. Note that this patch is generated automatically, by using the ESLint `--fix` argument, and will thus require some additional clean-up (which is done separately).	2020-01-25 00:20:22 +01:00
Jonas Jenwald	36881e3770	Ensure that all `import` and `require` statements, in the entire code-base, have a `.js` file extension In order to eventually get rid of SystemJS and start using native `import`s instead, we'll need to provide "complete" file identifiers since otherwise there'll be MIME type errors when attempting to use `import`.	2020-01-04 13:01:43 +01:00
Jonas Jenwald	d9d856020f	Move the regular expression, used with auto printing in the viewer, to `web/ui_utils.js` and also use it in the API unit-tests Rather than having a copy of this regular expression in the `test/unit/api_spec.js` file, with a comment about keeping it up-to-date with the code in the viewer (note the incorrect file reference as well), we can just import it instead to simplify all of this.	2019-12-27 00:38:28 +01:00
Tim van der Meij	dfe42a5ca4	Include a unit test for OpenAction dictionaries without `Type` entries (PR 11443 follow-up) The original issue did not contain a (reduced) test case that we could include and linked test cases are not ideal for unit tests, so the original PR could only be verified manually. I found this a bit unfortunate considering that the print data is exposed through the API, so I thought about how we could have an automated test and managed to create a reduced test case with the OpenAction dictionary from the file in the original issue. Therefore, this commit includes a unit test for parsing OpenAction dictionaries without `Type` entries. I verified that this PDF file behaves the same as the original one, i.e., no print dialog is shown for older viewers and the print dialog is shown for the most recent viewer.	2019-12-27 00:05:51 +01:00
Jonas Jenwald	a63f7ad486	Fix the linting errors, from the Prettier auto-formatting, that ESLint `--fix` couldn't handle This patch makes the follow changes: - Remove no longer necessary inline `// eslint-disable-...` comments. - Fix `// eslint-disable-...` comments that Prettier moved down, thus causing new linting errors. - Concatenate strings which now fit on just one line. - Fix comments that are now too long. - Finally, and most importantly, adjust comments that Prettier moved down, since the new positions often is confusing or outright wrong.	2019-12-26 12:35:12 +01:00
Jonas Jenwald	de36b2aaba	Enable auto-formatting of the entire code-base using Prettier (issue 11444) Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes). Prettier is being used for a couple of reasons: - To be consistent with `mozilla-central`, where Prettier is already in use across the tree. - To ensure a consistent coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters. Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some). Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that comments won't become too long. Please note: This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a separate commit. (On a more personal note, I'll readily admit that some of the changes Prettier makes are extremely ugly. However, in the name of consistency we'll probably have to live with that.)	2019-12-26 12:34:24 +01:00
Jonas Jenwald	8ec1dfde49	Add `// prettier-ignore` comments to prevent re-formatting of certain data structures There's a fair number of (primarily) `Array`s/`TypedArray`s whose formatting we don't want disturb, since in many cases that would lead to the code becoming much more difficult to read and/or break existing inline comments. Please note: It may be a good idea to look through these cases individually, and possibly re-write some of the them (especially the `String` ones) to reduce the need for all of these ignore commands.	2019-12-26 00:14:03 +01:00
Jonas Jenwald	b4d95f3763	Tweak the "gets page stats after rendering page, with `pdfBug` set" unit-test to remove an intermittent failure on Travis I recently noticed a couple of intermittent failures on Travis, hence this patch which changes the expectation to be identical to the 'Page Request' check in the preceding test-case.	2019-12-23 23:07:02 +01:00
Jonas Jenwald	e24050fa13	[api-minor] Move the `ReadableStream` polyfill to the global scope Note that most (reasonably) modern browsers have supported this for a while now, see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#Browser_compatibility By moving the polyfill into `src/shared/compatibility.js` we can thus get rid of the need to manually export/import `ReadableStream` and simply use it directly instead. The only change here which could possibly lead to a difference in behavior is in the `isFetchSupported` function. Previously we attempted to check for the existence of a global `ReadableStream` implementation, which could now pass (assuming obviously that the preceding checks also succeeded). However I'm not sure if that's a problem, since the previous check only confirmed the existence of a native `ReadableStream` implementation and not that it actually worked correctly. Finally it could just as well have been a globally registered polyfill from an application embedding the PDF.js library.	2019-12-11 19:02:37 +01:00
Jonas Jenwald	b00835f589	Attempt to improve the `PDFDocument` error message for empty files (issue 5887) Given that the error in question is surfaced on the API-side, this patch makes the following changes: - Updates the wording such that it'll hopefully be slightly easier for users to understand. - Changes the plain `Error` to an `InvalidPDFException` instead, since that should work better with the existing Error handling. - Adds a unit-test which loads an empty PDF document (and also improves a pre-existing `InvalidPDFException` message and its test-case).	2019-12-09 15:45:50 +01:00
Jonas Jenwald	74e00ed93c	Change `isNodeJS` from a function to a constant Given that this shouldn't change after the `pdf.js`/`pdf.worker.js` files have been loaded, it doesn't seems necessary to keep this as a function.	2019-11-10 16:44:29 +01:00
Jonas Jenwald	2817121bc1	Convert `globalScope` and `isNodeJS` to proper modules Slightly unrelated to the rest of the patch, but this also removes an out-of-place `globals` definition from the `web/viewer.js` file.	2019-11-10 16:44:29 +01:00
Jonas Jenwald	80342e2fdc	Support UTF-16 little-endian strings in the `stringToPDFString` helper function (bug 1593902) The bug report seem to suggest that we don't support UTF-16 strings with a BOM (byte order mark), which we actually do as evident by both the code and a unit-test. The issue at play here is rather that we previously only supported big-endian UTF-16 BOM, and the `Title` string in the PDF document is using a little-endian UTF-16 BOM instead. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1593902	2019-11-05 12:43:17 +01:00
Jonas Jenwald	681bc9d70e	[api-minor] Support custom `offsetX`/`offsetY` values in `PDFPageProxy.getViewport` and `PageViewport.clone` There's no good reason, as far as I can tell, to not also support `offsetX`/`offsetY` in addition to e.g. `dontFlip`.	2019-10-23 20:48:14 +02:00
Jonas Jenwald	2046adcc49	Change the 'gets viewport respecting "dontFlip" argument' unit-test to use a valid rotation angle As can be seen in `PageViewport` only multiples of 90 degrees are really supported by the code, hence the unit-test doesn't really make sense. (Possibly this should be enforced in the API, to avoid surprises, but given that this problem has always existed I'm passing on that for now.)	2019-10-23 20:30:25 +02:00
Jonas Jenwald	7f18c57c12	Fix the inconsistent return types for `Dict.{get, getAsync}` Having these methods fallback to returning `null` in only one particular case seems outright wrong, since a "falsy" value will thus be handled incorrectly. The only reason that this hasn't caused issues in practice is that there's only one call-site passing in three keys, and in that case we're trying to read a font file where falling back to `null` isn't a problem.	2019-09-23 11:41:19 +02:00
Tim van der Meij	1f5ebfbf0c	Replace our `URL` polyfill with the one from `core-js` `core-js` polyfills have proven to be of good quality and using them prevents us from having to maintain them ourselves.	2019-09-19 14:09:51 +02:00
Jonas Jenwald	af22dc9b0c	For Type1 fonts, replace missing font dictionary /Widths entries with ones from the font data (issue 11150) Hopefully this patch makes sense, and in order to reduce the regression risk the implementation ensures that only completely missing widths are being replaced.	2019-09-18 10:15:09 +02:00
Jonas Jenwald	74f5a59f43	Ensure that the `cancel`/`error` methods on Streams are always called with valid `reason` arguments	2019-09-02 23:31:07 +02:00
Jonas Jenwald	02bdacef42	Ensure that `Error`s are handled correctly when using `postMessage` with Streams in `MessageHandler` Having recently worked with this code, it struck me that most of the `postMessage` calls where `Error`s are involved have never been correctly implemented (i.e. missing `wrapReason` calls).	2019-09-02 23:31:07 +02:00
Tim van der Meij	09df1ee0ce	Include a reduced, non-linked PDF file for the attachments API unit test	2019-08-25 15:14:57 +02:00
Jonas Jenwald	711040ecc5	Stop re-throwing errors in the 'GetOperatorList' and 'GetTextContent' handlers, in `src/core/worker.js` These functions aren't returning anything, now that they're using `ReadableStream`s, and it thus doesn't seem necessary to re-throw errors (also given the console message that's caused by it).	2019-08-24 15:56:41 +02:00
Yury Delendik	66e0dd1b06	Use streams for OperatorList chunking (issue 10023) Please note: The majority of this patch was written by Yury, and it's simply been rebased and slightly extended to prevent issues when dealing with `RenderingCancelledException`. By leveraging streams this (finally) provides a simple way in which parsing can be aborted on the worker-thread, which will ultimately help save resources. With this patch worker-thread parsing will only be aborted when the document is destroyed, and not when rendering is cancelled. There's a couple of reasons for this: - The API currently expects the entire OperatorList to be extracted, or an Error to occur, once it's been started. Hence additional re-factoring/re-writing of the API code will be necessary to properly support cancelling and re-starting of OperatorList parsing in cases where the `lastChunk` hasn't yet been seen. - Even with the above addressed, immediately cancelling when encountering a `RenderingCancelledException` will lead to worse performance in e.g. the default viewer. When zooming and/or rotation of the document occurs it's very likely that `cancel` will be (almost) immediately followed by a new `render` call. In that case you'd obviously not want to abort parsing on the worker-thread, since then you'd risk throwing away a partially parsed Page and thus be forced to re-parse it again which will regress perceived performance. - This patch is already somewhat risky, given that it touches fundamentally important/critical code, and trying to keep it somewhat small should hopefully reduce the risk of regressions (and simplify reviewing as well). Time permitting, once this has landed and been in Nightly for awhile, I'll try to work on the remaining points outlined above. Co-Authored-By: Yury Delendik <ydelendik@mozilla.com> Co-Authored-By: Jonas Jenwald <jonas.jenwald@gmail.com>	2019-08-24 15:56:40 +02:00
Jonas Jenwald	d637b25e36	Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it. Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly". The patch makes the following notable changes: - Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.) - Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer. - Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`. - Add an optional parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty. - Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange. --- [1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.	2019-08-09 10:18:13 +02:00
Tim van der Meij	be70ee236d	Merge pull request #11013 from timvandermeij/annotations-quadpoints [api-minor] Implement quadpoints for annotations in the core layer	2019-08-04 16:06:10 +02:00
Jonas Jenwald	0276385e6e	[api-minor] Fix completely broken `getStats` method by returning stats in Objects, rather than in Arrays (PR 11029 follow-up) With the changes to the `StreamType`/`FontType` "enums" in PR 11029, one unfortunate result is that `getStats` now always returns empty Arrays. Something that everyone, myself included, apparently missed is that you obviously cannot index an Array with Strings :-) I wrongly assumed that the unit-tests would catch any bugs, but they apparently suffered from the same issue as the code in `src/core/`. Another possible option could perhaps be to use `Set`s, rather than objects, but that will require larger changes since `LoopbackPort` (in `src/display/api.js`) doesn't support them.	2019-08-02 14:09:24 +02:00
Jonas Jenwald	a3150166ec	Ensure that `ReadableStream`s are cancelled with actual Errors There's a number of spots in the current code, and tests, where `cancel` methods are not called with appropriate arguments (leading to Promises not being rejected with Errors as intended). In some cases the cancel `reason` is implicitly set to `undefined`, and in others the cancel `reason` is just a plain String. To address this inconsistency, the patch changes things such that cancelling is done with `AbortException`s everywhere instead.	2019-08-01 16:40:46 +02:00
wangsongyan	c61205d980	decode filename when match an urlencode filename from contentDispositionFilename	2019-07-31 09:33:56 +08:00
Tim van der Meij	9114004d5b	[api-minor] Implement quadpoints for annotations in the core layer	2019-07-28 20:36:21 +02:00
Jonas Jenwald	ff90aa4323	Inline the `isCmd` check in the `Parser.shift` method For very large and complex PDF files this will help performance slightly, since `Parser.shift` is called a lot during parsing. This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471 (with well over four million `Parser.shift` calls for just the one page), using the following manifest file: ``` [ { "id": "issue2618", "file": "../web/pdfs/issue2618.pdf", "md5": "", "rounds": 100, "type": "eq" } ] ``` This gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 100 \| 3386 \| 3322 \| -65 \| -1.92 \| faster Firefox \| Page Request \| 100 \| 1 \| 1 \| 0 \| -8.08 \| Firefox \| Rendering \| 100 \| 3385 \| 3321 \| -65 \| -1.92 \| faster ```	2019-07-22 12:07:36 +02:00
Tim van der Meij	6e96a158f4	Merge pull request #10820 from vlastimilmaca/annot-irt-rt-states Annotations - Added parsing of IRT, RT, State and StateModel	2019-07-17 23:34:31 +02:00
vlastimilmaca	fe49f0f766	Annotations - Implement parsing of IRT, RT, State and StateModel	2019-07-16 23:33:07 +02:00
Jonas Jenwald	c7de6dbe41	Update the `fingerprint` API unit-tests to explicitly check for the expected result The current tests won't catch inadvertent changes to the logic used to obtain/compute the document `fingerprint`.	2019-07-15 11:19:17 +02:00
Jonas Jenwald	c7fb7116d6	Add an API unit-test for the `stopAtErrors` option (PRs 8240 and 8922 follow-up) Also fixes an inconsistency in the 'PageError' handler, for `getOperatorList`, in the API.	2019-07-13 16:06:05 +02:00

1 2 3 4 5 ...

567 Commits