pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	80342e2fdc	Support UTF-16 little-endian strings in the `stringToPDFString` helper function (bug 1593902) The bug report seem to suggest that we don't support UTF-16 strings with a BOM (byte order mark), which we actually do as evident by both the code and a unit-test. The issue at play here is rather that we previously only supported big-endian UTF-16 BOM, and the `Title` string in the PDF document is using a little-endian UTF-16 BOM instead. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1593902	2019-11-05 12:43:17 +01:00
Jonas Jenwald	04497bcb3c	Re-factor the `ObjectLoader._walk` method to be properly asynchronous Rather than having to store a `PromiseCapability` on the `ObjectLoader` instances, we can simply convert `_walk` to be `async` and thus have the same functionality with native JavaScript instead.	2019-11-03 15:04:20 +01:00
Jonas Jenwald	fec1f02b2a	Slightly re-factor setting of the link `target` in `addLinkAttributes` I happened to look at this code and the way that the link target is set seems unecessarily convoluted, since we're using `Object.values` and `Array.prototype.includes` for every link being parsed. Given that the number of link targets are so few, the easist solution honestly seem to be to just use a `switch` statement to do the link target mapping.	2019-11-02 14:01:31 +01:00
Tim van der Meij	bbd2386bd9	Merge pull request #11296 from Snuffleupagus/parseColorSpace-stopAtErrors Allow skipping of errors when parsing broken/unsupported ColorSpaces (issue 6707, issue 11287)	2019-11-01 22:47:50 +01:00
Jonas Jenwald	829d6ba2dc	Ensure that the `peekByte` methods, on the various Streams, handles end of data correctly (PR 5286 follow-up) When the end of data has already been reached for the various Streams, the `getByte` methods will return `-1` to signal that to the caller. Note however that the current position obviously won't be incremented in this case, meaning that the `peekByte` methods will in this case incorrectly decrement the position. Thankfully the corresponding `peekBytes` shouldn't be affected by this bug, since they decrement the current position with the actually returned number of bytes. I'm not aware of any bugs caused by this blatant oversight, but that doesn't mean this shouldn't be fixed :-)	2019-11-01 18:22:33 +01:00
Jonas Jenwald	835d8c2be5	Allow skipping of errors when parsing broken/unsupported ColorSpaces (issue 6707, issue 11287) This will allow us to attempt to recover as much as possible of a page, rather than immediately failing, when a broken/unsupported ColorSpace is encountered. This patch thus extends the framework added in PRs such as e.g. 8240 and 8922, to also cover parsing of ColorSpaces.	2019-11-01 09:01:24 +01:00
Tim van der Meij	30ef05c161	Merge pull request #11290 from Snuffleupagus/MessageHandler-rm-in [MessageHandler] Re-factor and convert the code to a proper `class`	2019-10-31 23:57:52 +01:00
Jonas Jenwald	eedd449cb4	Remove some unused `require` statements, used when loading fake workers, in non-`PRODUCTION` mode The code in question is only relevant in non-`PRODUCTION` mode, i.e. the development version of the viewer run with `gulp server`, and has been completely unused at least since SystemJS was added. I really cannot see any reason to keep this, since it's code which first of all isn't shipping and secondly isn't even being used in the development viewer.	2019-10-31 12:08:07 +01:00
Jonas Jenwald	0293222b96	[MessageHandler] Convert the code to a proper `class`	2019-10-30 23:22:59 +01:00
Jonas Jenwald	5d5733c0a7	[MessageHandler] Convert all instances of `var` to `const` in the code	2019-10-30 23:22:59 +01:00
Jonas Jenwald	f61fb3e0f9	[MessageHandler] Re-factor the `_onComObjOnMessage` function to use early returns When `ReadableStream` support was added to the `MessageHandler`, the `_onComObjOnMessage` function became more complex than previously. All of the nested `if`/`else if`/`else` branches are now, at least in my opinion, making some of this code a bit difficult to follow. Hence this patch, which attempts to help readability by making use of early `return`s and `Error`s. The patch also changes a couple of `var`/`let` occurences to `const`.	2019-10-30 23:22:59 +01:00
Jonas Jenwald	62f28e11a3	[MessageHandler] Remove unnecessary usage of `in` from the code Note that using `in` leads to unnecessary stringification of the properties, which seems completely unnecessary here. To avoid future problems from these changes the `MessageHandler.on` method will now assert, in non-`PRODUCTION`/`TESTING` builds, that it's always called with a function as expected. This patch also renames `callbacksCapabilities` to `callbackCapabilities`, note the removed "s", since using a double plural format looks a bit strange.	2019-10-30 23:22:59 +01:00
Jonas Jenwald	3e46e800a0	[MessageHandler] Replace the internal `isReply` property, as sent when Promise callbacks are used, with enumeration values Given that the `isReply` property is an internal implementation detail, changing its type shouldn't be a problem. Note that by directly indicating if either data or an Error is sent, it's no longer necessary to use `in` when handling the callback.	2019-10-30 23:22:59 +01:00
Jonas Jenwald	2d35a49dd8	Inline a couple of `isRef`/`isDict` checks in the `ObjectLoader` code As we've seen in numerous other cases, avoiding unnecessary function calls is never a bad thing (even if the effect is probably tiny here).	2019-10-29 23:20:10 +01:00
Jonas Jenwald	1133dbac33	Make the `ObjectLoader` use more efficient methods when determining if data needs to be loaded Currently, for data in `ChunkedStream` instances, the `getMissingChunks` method is used in a couple of places to determine if data is already available or if it needs to be loaded. When looking at how `ChunkedStream.getMissingChunks` is being used in the `ObjectLoader` you'll notice that we don't actually care about which specific chunks are missing, but rather only want essentially a yes/no answer to the "Is the data available?" question. Furthermore, when looking at how `ChunkedStream.getMissingChunks` itself is implemented you'll notice that it (somewhat expectedly) always iterates over all chunks. All in all, using `ChunkedStream.getMissingChunks` in the `ObjectLoader` seems like an unnecessary "heavy" and roundabout way to obtain a boolean value. However, it turns out there already exists a `ChunkedStream.allChunksLoaded` method, consisting of a single simple check, which seems like a perfect fit for the `ObjectLoader` use cases. In particular, once the entire PDF document has been loaded (which is usually fairly quick with streaming enabled), you'd really want the `ObjectLoader` to be as simple/quick as possible (similar to e.g. loading a local files) which this patch should help with. Note that I wouldn't expect this patch to have a huge effect on performance, but it will nonetheless save some CPU/memory resources when the `ObjectLoader` is used. (As usual this should help larger PDF documents, w.r.t. both file size and number of pages, the most.)	2019-10-29 23:20:09 +01:00
Jonas Jenwald	0496ea61f5	Ensure that `PartialEvaluator.hasBlendModes` handles Blend Modes in Arrays (PR 11281 follow-up) I completely overlooked this in PR 11281, but you obviously need to make similar changes in `PartialEvaluator.hasBlendModes` since it will otherwise ignore valid Blend Modes.	2019-10-28 11:37:05 +01:00
Jonas Jenwald	5c266f0e8c	Support Blend Modes which are specified in an Array of Names (issue 11279) According to the specification, the first supported Blend Mode should be choosen in this case; please see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G10.4848607	2019-10-26 14:24:31 +02:00
Tim van der Meij	4a5a4328f4	Merge pull request #11273 from Snuffleupagus/getViewport-offsets [api-minor] Support custom `offsetX`/`offsetY` values in `PDFPageProxy.getViewport` and `PageViewport.clone`	2019-10-24 00:08:40 +02:00
Jonas Jenwald	681bc9d70e	[api-minor] Support custom `offsetX`/`offsetY` values in `PDFPageProxy.getViewport` and `PageViewport.clone` There's no good reason, as far as I can tell, to not also support `offsetX`/`offsetY` in addition to e.g. `dontFlip`.	2019-10-23 20:48:14 +02:00
Jonas Jenwald	6f7f8257bc	Slightly re-factor the String handling in `StatTimer` This uses template strings in a couple of spots, and a buffer in the `toString` method.	2019-10-23 14:45:18 +02:00
Jonas Jenwald	8e5d3836d6	Remove the `enable` argument from the `StatTimer` constructor This argument is a left-over from older API code, where we unconditionally initialized `StatTimer` instances for every page. For quite some time that's only been done when `pdfBug` is set, hence it seems unnecessary to keep this functionality.	2019-10-23 14:45:18 +02:00
Jonas Jenwald	9fc40f8b84	Remove `DummyStatTimer` since it's unused now Since this isn't part of the API surface, removing it now that it's unused shouldn't cause any problems.	2019-10-23 14:45:16 +02:00
Jonas Jenwald	860da8b840	Stop using the `DummyStatTimer` in the API, and check if `this._stats` exists when trying to report statistics Even though the currect situation only results in six unnecessary function calls per page, it nonetheless seems completely unnecessary to call dummy functions when `pdfBug` is not set (i.e. the default behaviour).	2019-10-23 13:23:41 +02:00
Jonas Jenwald	df0e1edab5	Re-factor sending of various Exceptions from the worker to the API As can be seen in the API, there's a number of document loading Exception handlers which are both really simple and highly similar. Hence these are changed such that all the relevant Exceptions are sent via one message instead. Furthermore, the patch also avoids unnecessarily re-creating `UnknownErrorException`s at the worker side and removes an unnecessary `bind` call.	2019-10-19 12:54:54 +02:00
Tim van der Meij	11f3851a97	Merge pull request #11243 from Snuffleupagus/issue-11242 Add a fallback for non-embedded composite Verdana fonts (issue 11242)	2019-10-18 23:56:46 +02:00
Tim van der Meij	c54bb222ca	Merge pull request #11231 from Snuffleupagus/indexObjects-entries-gen Allow over-writing entries, in `XRef.indexObjects`, only when the generation number matches (issues 11230, 11139, 9552, 9129, 7303)	2019-10-17 23:56:26 +02:00
Jonas Jenwald	2fcb5afc7b	Add a fallback for non-embedded composite Verdana fonts (issue 11242) Obviously this won't look exactly right, but considering that the PDF file doesn't bother embedding non-standard fonts this is the best that we can do here.	2019-10-17 17:00:55 +02:00
Pedro Luiz Cabral Salomon Prado	4d0c759b7f	Change variable assignment (#11247 ) Remove unused variable assignment in `src/core/fonts.js`	2019-10-16 00:39:25 +02:00
Jonas Jenwald	ffc847eaa5	Allow over-writing entries, in `XRef.indexObjects`, only when the generation number matches (issues 11230, 11139, 9552, 9129, 7303) This patch is making me somewhat worried about future regressions, since it's certainly easy to imagine this completely breaking certain kinds of corrupt/edited PDF documents while fixing others.[1] Obviously it passes all existing reference tests (and even improves one), however compared to many other patches there's no telling how much it could break. The only reason that I'm even submitting this patch, is because of the number of open issues that it would address. Generally speaking though, the best course of action would probably be if `XRef.indexObjects` was re-written to be much more robust (since it currently feels somewhat hand-wavy in parts). E.g. by actually checking/validating more of the objects before committing to them. --- [1] Especially given that it's reverting part of PR 5910, however in the case of issue 5909 it seems that other (more recent) changes have actually made that PR redundant.	2019-10-14 22:10:04 +02:00
Tim van der Meij	ec6a99d781	Bundle all API documentation in a module This commit allows JSDoc to generate all API documentation in the `pdfjsLib` module (namespace) so the documentation becomes easier to navigate.	2019-10-13 21:23:00 +02:00
Tim van der Meij	9f4d45ddf4	Don't include private methods in the the `PDFPageProxy` API documentation	2019-10-13 21:23:00 +02:00
Tim van der Meij	36c01c2c2a	Deduplicate the documentation for `PDFDocumentLoadingTask` and `PDFWorker` Both classes live inside a closure with the same name, which confuses JSDoc. Move the documentation to the inner class to deduplicate them.	2019-10-13 21:23:00 +02:00
Tim van der Meij	ca3a58f93a	Consistently use `@returns` for returned data types in JSDoc comments Sometimes we also used `@return`, but `@returns` is what the JSDoc documentation recommends. Even though `@return` works as an alias, it's good to use the recommended syntax and to be consistent within the project.	2019-10-13 13:58:17 +02:00
Tim van der Meij	8b4ae6f3eb	Consistently use `@type` for getter data types in JSDoc comments Sometimes we also used `@return` or `@returns`, but `@type` is what the JSDoc documentation recommends. This also improves the documentation because before this commit the types were not shown and now they are.	2019-10-13 13:58:17 +02:00
Tim van der Meij	f4daafc077	Consistently use square brackets for optional parameters in JSDoc comments Square brackets are recommended to indicate optional parameters. Using them helps for automatically generating correct documentation.	2019-10-13 13:58:17 +02:00
Tim van der Meij	efd331daa1	Consistently use `string` for string data types in JSDoc comments Sometimes we also used `String`, but `string` is the what the JSDoc documentation recommends.	2019-10-13 13:58:17 +02:00
Tim van der Meij	e75991b49e	Consistently use `number` for numeric data types in JSDoc comments Sometimes we also used `Number` and `integer`, but `number` is what the JSDoc documentation recommends.	2019-10-13 13:58:13 +02:00
Jonas Jenwald	03387ebaa8	Update `src/shared/compatibility.js` to only run with `SKIP_BABEL = false` set Rather than specifying certain build targets manually, it seems much more appropriate (and future-proof) to use the `SKIP_BABEL` build target instead. Also, the patch adds a missing `/* eslint no-var: error */` line since I'm touch the file anyway and no code-changes were necessary for it.	2019-10-13 11:33:41 +02:00
Jonas Jenwald	bfcbf2d78d	Cache processed 'ExtGState's in `PartialEvaluator.hasBlendModes` to avoid unnecessary parsing/lookups This simply extends the already existing caching of processed resources to avoid duplicated parsing of 'ExtGState's, which should help with badly generated PDF documents. This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf, with the following manifest file: ``` [ { "id": "issue6961", "file": "../web/pdfs/issue6961.pdf", "md5": "", "rounds": 200, "type": "eq" } ] ``` which gave the following overall results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 400 \| 1063 \| 1051 \| -12 \| -1.17 \| faster Firefox \| Page Request \| 400 \| 552 \| 543 \| -9 \| -1.69 \| faster Firefox \| Rendering \| 400 \| 511 \| 508 \| -3 \| -0.61 \| ``` and the following page-specific results: ``` -- Grouped By page, stat -- page \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ---- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- 0 \| Overall \| 200 \| 1122 \| 1110 \| -12 \| -1.03 \| 0 \| Page Request \| 200 \| 552 \| 544 \| -8 \| -1.48 \| faster 0 \| Rendering \| 200 \| 570 \| 566 \| -4 \| -0.62 \| 1 \| Overall \| 200 \| 1005 \| 992 \| -13 \| -1.33 \| faster 1 \| Page Request \| 200 \| 552 \| 542 \| -11 \| -1.91 \| faster 1 \| Rendering \| 200 \| 452 \| 450 \| -3 \| -0.61 \| ```	2019-10-12 12:35:42 +02:00
Jonas Jenwald	af71f9b40a	Inline all the possible type checks in `PartialEvaluator.hasBlendModes` to avoid unnecessary function calls For badly generated PDF documents, with issue 6961 being one example, there's well over one hundred thousand function calls being made in total for just the two pages.	2019-10-12 11:24:37 +02:00
huzjakd	94171d9d72	Attempt to fallback to a default font, for non-available ones, in `PartialEvaluator.loadFont` This handles the two different ways that fonts can be loaded, either by Name (which is the common case) or by Reference. Furthermore, this also takes the `ignoreErrors` option into account when deciding whether to fallback or Error. Finally, by creating a minimal but valid Font dictionary, there's no special-cases necessary in any of the font parsing code. Co-authored-by: huzjakd <huzjakd@gmail.com> Co-Authored-By: Jonas Jenwald <jonas.jenwald@gmail.com>	2019-10-10 16:49:46 +02:00
Jonas Jenwald	ea729ec55c	[api-minor] Replace all `deprecated` calls with throwing of actual `Error`s All of these methods have been marked as `deprecated` in three releases now, and I'd thus like to (slowly) move towards complete removal. However rather than just removing the methods right away, which would cause somewhat cryptic failures, this patch tries to implement a hopefully reasonable middle ground by throwing `Error`s with (essentially) the same information as the previous warnings. While the previous `deprecated` messages could perhaps be seen as optional, with these changes API consumers will now be forced to actually migrate their code.	2019-10-09 09:21:15 +02:00
Takashi Tamura	d5ee083050	* use square brackets for optional properties in the JSDoc comments of src/display/api.js	2019-10-08 20:34:17 +09:00
Tim van der Meij	cead77ef3a	Merge pull request #11186 from Snuffleupagus/issue-9655 Improve the heuristics, in `PartialEvaluator._buildSimpleFontToUnicode`, for glyphNames of the Cdd{d}/cdd{d} format (issue 9655)	2019-10-06 19:50:43 +02:00
Jonas Jenwald	eabedab38e	[MessageHandler] Add a non-PRODUCTION/TESTING check to ensure that `wrapReason` is called with a valid `reason` There shouldn't be any situation where `reason` isn't either an `Error`, or a cloned "Error" sent via `postMessage`.	2019-10-06 14:15:13 +02:00
Jonas Jenwald	9201c8dad4	[MessageHandler] Convert the `deleteStreamController` helper function to a "private" method instead	2019-10-06 14:15:02 +02:00
Jonas Jenwald	f5be2d62a3	Improve the heuristics, in `PartialEvaluator._buildSimpleFontToUnicode`, for glyphNames of the Cdd{d}/cdd{d} format (issue 9655) Please note: I've been thinking about possible ways of addressing this issue for a while now, but all of the solutions I came up with became too complicated and thus hurt readability of the code. However, it occured to me that we're essentially trying to add a heuristic on top of another heuristic, and that it shouldn't matter how efficient the code is as long as it works. In the PDF file in the issue the Encoding contains glyphNames of the `Cdd` format, which our existing heuristics will treat as base 10 values. However, in this particular file they actually contain base 16 values, which we thus attempt to detect and fix such that text-selection works.	2019-10-06 10:47:29 +02:00
Jonas Jenwald	572abdcb4a	Convert the various image decoder `...Error`s to classes extending `BaseException` (PR 11185 follow-up) Somehow I missed these in PR 11185, but there's no good reason not to convert them as well.	2019-10-01 13:10:14 +02:00
Tim van der Meij	8c4f4b5eec	Merge pull request #11182 from Snuffleupagus/disableWorker-disable-Dict-postMessage Forbid sending of `Dict`s and `Stream`s, with `postMessage`, when workers are disabled	2019-09-29 15:09:42 +02:00
Jonas Jenwald	5d93fda4f2	Convert the various `...Exception`s to proper classes, to reduce code duplication By utilizing a base "class", things become significantly simpler. Unfortunately the new `BaseException` cannot be a proper ES6 class and just extend `Error`, since the SystemJS dependency doesn't seem to play well with that. Note also that we (generally) need to keep the `name` property on the actual `...Exception` object, rather than on its prototype, since the property will otherwise be dropped during the structured cloning used with `postMessage`.	2019-09-29 10:16:20 +02:00

1 2 3 4 5 ...

3724 Commits