pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	df0e1edab5	Re-factor sending of various Exceptions from the worker to the API As can be seen in the API, there's a number of document loading Exception handlers which are both really simple and highly similar. Hence these are changed such that all the relevant Exceptions are sent via one message instead. Furthermore, the patch also avoids unnecessarily re-creating `UnknownErrorException`s at the worker side and removes an unnecessary `bind` call.	2019-10-19 12:54:54 +02:00
Tim van der Meij	11f3851a97	Merge pull request #11243 from Snuffleupagus/issue-11242 Add a fallback for non-embedded composite Verdana fonts (issue 11242)	2019-10-18 23:56:46 +02:00
Tim van der Meij	c54bb222ca	Merge pull request #11231 from Snuffleupagus/indexObjects-entries-gen Allow over-writing entries, in `XRef.indexObjects`, only when the generation number matches (issues 11230, 11139, 9552, 9129, 7303)	2019-10-17 23:56:26 +02:00
Jonas Jenwald	2fcb5afc7b	Add a fallback for non-embedded composite Verdana fonts (issue 11242) Obviously this won't look exactly right, but considering that the PDF file doesn't bother embedding non-standard fonts this is the best that we can do here.	2019-10-17 17:00:55 +02:00
Pedro Luiz Cabral Salomon Prado	4d0c759b7f	Change variable assignment (#11247 ) Remove unused variable assignment in `src/core/fonts.js`	2019-10-16 00:39:25 +02:00
Jonas Jenwald	ffc847eaa5	Allow over-writing entries, in `XRef.indexObjects`, only when the generation number matches (issues 11230, 11139, 9552, 9129, 7303) This patch is making me somewhat worried about future regressions, since it's certainly easy to imagine this completely breaking certain kinds of corrupt/edited PDF documents while fixing others.[1] Obviously it passes all existing reference tests (and even improves one), however compared to many other patches there's no telling how much it could break. The only reason that I'm even submitting this patch, is because of the number of open issues that it would address. Generally speaking though, the best course of action would probably be if `XRef.indexObjects` was re-written to be much more robust (since it currently feels somewhat hand-wavy in parts). E.g. by actually checking/validating more of the objects before committing to them. --- [1] Especially given that it's reverting part of PR 5910, however in the case of issue 5909 it seems that other (more recent) changes have actually made that PR redundant.	2019-10-14 22:10:04 +02:00
Tim van der Meij	ec6a99d781	Bundle all API documentation in a module This commit allows JSDoc to generate all API documentation in the `pdfjsLib` module (namespace) so the documentation becomes easier to navigate.	2019-10-13 21:23:00 +02:00
Tim van der Meij	9f4d45ddf4	Don't include private methods in the the `PDFPageProxy` API documentation	2019-10-13 21:23:00 +02:00
Tim van der Meij	36c01c2c2a	Deduplicate the documentation for `PDFDocumentLoadingTask` and `PDFWorker` Both classes live inside a closure with the same name, which confuses JSDoc. Move the documentation to the inner class to deduplicate them.	2019-10-13 21:23:00 +02:00
Tim van der Meij	ca3a58f93a	Consistently use `@returns` for returned data types in JSDoc comments Sometimes we also used `@return`, but `@returns` is what the JSDoc documentation recommends. Even though `@return` works as an alias, it's good to use the recommended syntax and to be consistent within the project.	2019-10-13 13:58:17 +02:00
Tim van der Meij	8b4ae6f3eb	Consistently use `@type` for getter data types in JSDoc comments Sometimes we also used `@return` or `@returns`, but `@type` is what the JSDoc documentation recommends. This also improves the documentation because before this commit the types were not shown and now they are.	2019-10-13 13:58:17 +02:00
Tim van der Meij	f4daafc077	Consistently use square brackets for optional parameters in JSDoc comments Square brackets are recommended to indicate optional parameters. Using them helps for automatically generating correct documentation.	2019-10-13 13:58:17 +02:00
Tim van der Meij	efd331daa1	Consistently use `string` for string data types in JSDoc comments Sometimes we also used `String`, but `string` is the what the JSDoc documentation recommends.	2019-10-13 13:58:17 +02:00
Tim van der Meij	e75991b49e	Consistently use `number` for numeric data types in JSDoc comments Sometimes we also used `Number` and `integer`, but `number` is what the JSDoc documentation recommends.	2019-10-13 13:58:13 +02:00
Jonas Jenwald	03387ebaa8	Update `src/shared/compatibility.js` to only run with `SKIP_BABEL = false` set Rather than specifying certain build targets manually, it seems much more appropriate (and future-proof) to use the `SKIP_BABEL` build target instead. Also, the patch adds a missing `/* eslint no-var: error */` line since I'm touch the file anyway and no code-changes were necessary for it.	2019-10-13 11:33:41 +02:00
Jonas Jenwald	bfcbf2d78d	Cache processed 'ExtGState's in `PartialEvaluator.hasBlendModes` to avoid unnecessary parsing/lookups This simply extends the already existing caching of processed resources to avoid duplicated parsing of 'ExtGState's, which should help with badly generated PDF documents. This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf, with the following manifest file: ``` [ { "id": "issue6961", "file": "../web/pdfs/issue6961.pdf", "md5": "", "rounds": 200, "type": "eq" } ] ``` which gave the following overall results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 400 \| 1063 \| 1051 \| -12 \| -1.17 \| faster Firefox \| Page Request \| 400 \| 552 \| 543 \| -9 \| -1.69 \| faster Firefox \| Rendering \| 400 \| 511 \| 508 \| -3 \| -0.61 \| ``` and the following page-specific results: ``` -- Grouped By page, stat -- page \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ---- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- 0 \| Overall \| 200 \| 1122 \| 1110 \| -12 \| -1.03 \| 0 \| Page Request \| 200 \| 552 \| 544 \| -8 \| -1.48 \| faster 0 \| Rendering \| 200 \| 570 \| 566 \| -4 \| -0.62 \| 1 \| Overall \| 200 \| 1005 \| 992 \| -13 \| -1.33 \| faster 1 \| Page Request \| 200 \| 552 \| 542 \| -11 \| -1.91 \| faster 1 \| Rendering \| 200 \| 452 \| 450 \| -3 \| -0.61 \| ```	2019-10-12 12:35:42 +02:00
Jonas Jenwald	af71f9b40a	Inline all the possible type checks in `PartialEvaluator.hasBlendModes` to avoid unnecessary function calls For badly generated PDF documents, with issue 6961 being one example, there's well over one hundred thousand function calls being made in total for just the two pages.	2019-10-12 11:24:37 +02:00
huzjakd	94171d9d72	Attempt to fallback to a default font, for non-available ones, in `PartialEvaluator.loadFont` This handles the two different ways that fonts can be loaded, either by Name (which is the common case) or by Reference. Furthermore, this also takes the `ignoreErrors` option into account when deciding whether to fallback or Error. Finally, by creating a minimal but valid Font dictionary, there's no special-cases necessary in any of the font parsing code. Co-authored-by: huzjakd <huzjakd@gmail.com> Co-Authored-By: Jonas Jenwald <jonas.jenwald@gmail.com>	2019-10-10 16:49:46 +02:00
Jonas Jenwald	ea729ec55c	[api-minor] Replace all `deprecated` calls with throwing of actual `Error`s All of these methods have been marked as `deprecated` in three releases now, and I'd thus like to (slowly) move towards complete removal. However rather than just removing the methods right away, which would cause somewhat cryptic failures, this patch tries to implement a hopefully reasonable middle ground by throwing `Error`s with (essentially) the same information as the previous warnings. While the previous `deprecated` messages could perhaps be seen as optional, with these changes API consumers will now be forced to actually migrate their code.	2019-10-09 09:21:15 +02:00
Takashi Tamura	d5ee083050	* use square brackets for optional properties in the JSDoc comments of src/display/api.js	2019-10-08 20:34:17 +09:00
Tim van der Meij	cead77ef3a	Merge pull request #11186 from Snuffleupagus/issue-9655 Improve the heuristics, in `PartialEvaluator._buildSimpleFontToUnicode`, for glyphNames of the Cdd{d}/cdd{d} format (issue 9655)	2019-10-06 19:50:43 +02:00
Jonas Jenwald	eabedab38e	[MessageHandler] Add a non-PRODUCTION/TESTING check to ensure that `wrapReason` is called with a valid `reason` There shouldn't be any situation where `reason` isn't either an `Error`, or a cloned "Error" sent via `postMessage`.	2019-10-06 14:15:13 +02:00
Jonas Jenwald	9201c8dad4	[MessageHandler] Convert the `deleteStreamController` helper function to a "private" method instead	2019-10-06 14:15:02 +02:00
Jonas Jenwald	f5be2d62a3	Improve the heuristics, in `PartialEvaluator._buildSimpleFontToUnicode`, for glyphNames of the Cdd{d}/cdd{d} format (issue 9655) Please note: I've been thinking about possible ways of addressing this issue for a while now, but all of the solutions I came up with became too complicated and thus hurt readability of the code. However, it occured to me that we're essentially trying to add a heuristic on top of another heuristic, and that it shouldn't matter how efficient the code is as long as it works. In the PDF file in the issue the Encoding contains glyphNames of the `Cdd` format, which our existing heuristics will treat as base 10 values. However, in this particular file they actually contain base 16 values, which we thus attempt to detect and fix such that text-selection works.	2019-10-06 10:47:29 +02:00
Jonas Jenwald	572abdcb4a	Convert the various image decoder `...Error`s to classes extending `BaseException` (PR 11185 follow-up) Somehow I missed these in PR 11185, but there's no good reason not to convert them as well.	2019-10-01 13:10:14 +02:00
Tim van der Meij	8c4f4b5eec	Merge pull request #11182 from Snuffleupagus/disableWorker-disable-Dict-postMessage Forbid sending of `Dict`s and `Stream`s, with `postMessage`, when workers are disabled	2019-09-29 15:09:42 +02:00
Jonas Jenwald	5d93fda4f2	Convert the various `...Exception`s to proper classes, to reduce code duplication By utilizing a base "class", things become significantly simpler. Unfortunately the new `BaseException` cannot be a proper ES6 class and just extend `Error`, since the SystemJS dependency doesn't seem to play well with that. Note also that we (generally) need to keep the `name` property on the actual `...Exception` object, rather than on its prototype, since the property will otherwise be dropped during the structured cloning used with `postMessage`.	2019-09-29 10:16:20 +02:00
Jonas Jenwald	3f8fee371b	Forbid sending of `Dict`s and `Stream`s, with `postMessage`, when workers are disabled By default, i.e. with workers enabled, it's purposely not possible to send `Dict`s and `Stream`s from the worker-thread. This is achieved by defining a `function` on every `Dict` instance, since that ensures that [the structured clone algoritm](https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm) will throw an Error on `postMessage`. However, with workers disabled we fall-back to the `LoopbackPort` implementation which just ignores any `function`s, thus incorrectly allowing sending of data which should be unclonable.	2019-09-26 16:16:13 +02:00
Tim van der Meij	cd909c531f	Merge pull request #11169 from Snuffleupagus/Dict-inline-Ref-checks Reduce the number of function calls in the `Dict` class	2019-09-24 23:33:37 +02:00
Tim van der Meij	f762d59ad2	Merge pull request #11173 from Snuffleupagus/ReadableStream-polyfill Replace the bundled `ReadableStream` polyfill with the `web-streams-polyfill` npm package (issue 11157)	2019-09-24 23:22:17 +02:00
Jonas Jenwald	2cac68467f	Reduce the number of function calls in the `Dict` class The following changes were made: - Remove unnecessary `typeof` checks in the `get`/`getAsync` methods. - Reduce unnecessary code duplication in the `get`/`getAsync` methods. - Inline the `Ref` checks in the `get`/`getAsync`/`getArray` methods, since it helps avoid many unnecessary functions calls. I.e. this way it's possible to directly call `XRef.{fetch, fetchAsync)` only when necessary, rather than always having to call `XRef.{fetchIfRef, fetchIfRefAsync)`. This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, using the following manifest file: ``` [ { "id": "issue2618", "file": "../web/pdfs/issue2618.pdf", "md5": "", "rounds": 250, "type": "eq" } ] ``` This gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 250 \| 2821 \| 2790 \| -32 \| -1.12 \| faster Firefox \| Page Request \| 250 \| 2 \| 2 \| 0 \| 6.68 \| Firefox \| Rendering \| 250 \| 2820 \| 2788 \| -32 \| -1.13 \| faster ```	2019-09-24 08:31:39 +02:00
Jonas Jenwald	0ee373f9cc	Replace the bundled `ReadableStream` polyfill with the `web-streams-polyfill` npm package (issue 11157) Compared to the recently replaced `URL` polyfill, the new `ReadableStream` polyfill isn't being exported globally for two reasons: - We're currently checking for the existence of a global `ReadableStream` implementation when determining if the Fetch API will be used; please see `isFetchSupported` in the src/display/display_utils.js file. - Given that it's much newer functionality (compared to `URL`) and that not all browsers may implement all parts of the specification yet, not exposing the `ReadableStream` globally seems safer for now.	2019-09-23 22:16:59 +02:00
Jonas Jenwald	7f18c57c12	Fix the inconsistent return types for `Dict.{get, getAsync}` Having these methods fallback to returning `null` in only one particular case seems outright wrong, since a "falsy" value will thus be handled incorrectly. The only reason that this hasn't caused issues in practice is that there's only one call-site passing in three keys, and in that case we're trying to read a font file where falling back to `null` isn't a problem.	2019-09-23 11:41:19 +02:00
Tim van der Meij	1f5ebfbf0c	Replace our `URL` polyfill with the one from `core-js` `core-js` polyfills have proven to be of good quality and using them prevents us from having to maintain them ourselves.	2019-09-19 14:09:51 +02:00
Tim van der Meij	c71a291317	Upgrade `core-js` to version 3.2.1 This only required changing the import paths. The `es` folder contains all polyfills we need now. If we want to import everything, we need to explicitly require the `index` file.	2019-09-19 13:58:36 +02:00
Tim van der Meij	3da680cdfc	Merge pull request #11158 from janpe2/gradient-stops Avoid floating point inaccuracy in gradient color stops	2019-09-19 13:15:11 +02:00
Tim van der Meij	58e5f36666	Merge pull request #11159 from Snuffleupagus/issue-11150 For Type1 fonts, replace missing font dictionary /Widths entries with ones from the font data (issue 11150)	2019-09-19 13:14:27 +02:00
Jonas Jenwald	af22dc9b0c	For Type1 fonts, replace missing font dictionary /Widths entries with ones from the font data (issue 11150) Hopefully this patch makes sense, and in order to reduce the regression risk the implementation ensures that only completely missing widths are being replaced.	2019-09-18 10:15:09 +02:00
Jani Pehkonen	911df237f3	Avoid floating point inaccuracy in gradient color stops	2019-09-17 21:01:17 +03:00
Jonas Jenwald	4bd79ec4b3	Inline the `resolveOrReject` helper function at its call-sites in `MessageHandler`, and rename an `error` key to `reason` Given that there's only a couple of call-sites, and that the helper function is really simple, it doesn't seem entirely necessary to keep it around. While fewer function calls is always a good thing, in this case the performance impact is small enough to be unmeasurable. With one single exception the code in `MessageHandler` is using `reason` when passing around various Errors, hence this patch also renames an `error` key for consistency.	2019-09-17 14:22:24 +02:00
Jonas Jenwald	0617984b59	Remove unnecessary `data.streamId` accesses in `MessageHandler._processStreamMessage`, and use a constant object shape in `MessageHandler.sendWithStream` The `streamId` short-hand in `MessageHandler._processStreamMessage` was only used partially througout the method, which seemed kind of strange, hence that's fixed in this patch. Furthermore, always giving the `streamController` object a constant shape in `MessageHandler.sendWithStream` cannot hurt either.	2019-09-17 14:18:57 +02:00
Jonas Jenwald	281ed33e43	Abort, with a small delay, `getOperatorList` on the worker-thread when rendering is cancelled (PR 11069 follow-up) With this patch we're finally able to abort worker-thread parsing of the `OperatorList`, rather than only aborting the main-thread rendering itself, when the `RenderTask.cancel` method is being called. This will help improve perceived performance in the default viewer, especially when reading longer and more complex documents, since pages that've been scrolled out-of-view (and thus evicted from the cache) will no longer compete for parsing resources on the worker-thread. Please note: With the implementation in this patch we're not aborting worker-thread parsing immediately on `RenderTask.cancel`, since that would lead to worse performance in many cases. For example: When zoom/rotation occurs in the viewer, while parsing/rendering is still ongoing, a `cancel` call will usually be (almost) immediately folled by a new `PDFPageProxy.render` call. In that case you obviously don't want to abort parsing on the worker-thread, since that would risk throwing away a partially parsed `OperatorList` and thus force unnecessary re-parsing which will regress perceived performance (especially for more complex documents). When choosing a reasonable delay, before cancelling `getOperatorList` on the worker-thread when `RenderTask.cancel` is called, two different positions need to be considered: 1. The delay needs to be short enough, since a timeout in the multiple seconds range would essentially make this entire functionality meaningless (by always allowing most/all pages enough time to finish parsing). 2. The delay cannot be too short, since that would actually reduce performance in the zoom/rotation case outlined above. Furthermore, the time between `RenderTask.cancel` and `PDFPageProxy.render` calls will obviously be affected by both general computer performance and current CPU load. It's certainly possible that the timeout may require some further tweaks, however the value settled on in this patch was easily one order of magnitude larger than the delta between cancel/render in my tests.	2019-09-14 11:30:32 +02:00
Jonas Jenwald	00efff532c	Ensure that `addLinkAttributes` is always called with a valid `url` parameter There's no good reason for calling this helper function without a `url` parameter, and this way we can prevent that from happening. Note how the `PDFOutlineViewer` call-site was already doing the right thing here, and only the `LinkAnnotationElement` call-site needed a small adjustment to make it work.	2019-09-11 13:24:04 +02:00
Jonas Jenwald	12e1c91f73	Don't `enqueue` unused properties when sending 'GetOperatorList' data from the worker-thread (PR 11069 follow-up) With the changes made in PR 11069, it's no longer necessary to include the `pageIndex`/`intent` parameters when sending 'GetOperatorList' data. In the previous implementation these properties were used to associate the `OperatorList` with the correct `RenderTask`, however now that `ReadableStream`s are used that's handled automatically and it's thus dead code at this point.	2019-09-09 17:41:26 +02:00
Tim van der Meij	37d5b80ba8	Merge pull request #11118 from Snuffleupagus/FetchBuiltInCMap-sendWithStream Transfer, rather than copy, CMap data to the worker-thread	2019-09-06 22:56:14 +02:00
Jonas Jenwald	7dea3f9389	[api-minor] Remove the `postMessageTransfers` parameter, and thus the ability to manually disable transferring of data, from the API By transfering, rather than copying, `ArrayBuffer`s between the main- and worker-threads, you can avoid unnecessary allocations by only having one copy of the same data. Hence manually setting `postMessageTransfers: false`, when calling `getDocument`, is a performance footgun[1] which will do nothing but waste memory. Given that every reasonably modern browser supports `postMessage` transfers[2], I really don't see why it should be possible to force-disable this functionality. Looking at the browser support, for `postMessage` transfers[2], it's highly unlikely that PDF.js is even usable in browsers without it. However, the feature testing of `postMessage` transfers is kept for the time being just to err on the safe side. --- [1] This is somewhat similar to the, now removed, `disableWorker` parameter which also provided API users a much too simple way of reducing performance. [2] See e.g. https://developer.mozilla.org/en-US/docs/Web/API/MessagePort/postMessage#Browser_compatibility and https://developer.mozilla.org/en-US/docs/Web/API/Transferable#Browser_compatibility	2019-09-05 13:09:54 +02:00
Jonas Jenwald	f0534b9b51	Adjust the values sent, with the 'test' message, by the `WorkerMessageHandler.setup` method Note how the sent values have inconsistent types, with a boolean in one case and an object in the other (normal) case. Furthermore, explicitly sending a `supportTypedArray: true` property seems superfluous at least to me.	2019-09-05 11:27:27 +02:00
Jonas Jenwald	7212ff4eea	Stop checking for the `response` property, on `XMLHttpRequest`, when setting up the `WorkerMessageHandler` This check was added in PR 2445, however it's no longer necessary since all data[1] is now loaded on the main-thread (and then transferred to the worker-thread). Furthermore, by default the Fetch API is now (usually) used rather than `XMLHttpRequest`. All in all, while these checks were necessary at one point that's no longer the case and they can thus be removed. --- [1] This includes both the actual PDF data, as well as the CMap data.	2019-09-05 11:27:22 +02:00
Jonas Jenwald	f11a4ba750	Transfer, rather than copy, CMap data to the worker-thread It recently occurred to me that the CMap data should be an excellent candidate for transfering. This will help reduce peak memory usage for PDF documents using CMaps, since transfering of data avoids duplicating it on both the main- and worker-threads. Unfortunately it's not possible to actually transfer data when returning data through `sendWithPromise`, and another solution had to be used. Initially I looked at using one message for requesting the data, and another message for returning the actual CMap data. While that should have worked, it would have meant adding a lot more complexity particularly on the worker-thread. Hence the simplest solution, at least in my opinion, is to utilize `sendWithStream` since that makes it really easy to transfer the CMap data. (This required PR 11115 to land first, since otherwise CMap fetch errors won't propagate correctly to the worker-thread.) Please note that the patch purposely only changes the API to Worker communication, and not the API itself since changing the interface of `CMapReaderFactory` would be a breaking change. Furthermore, given the relatively small size of the `.bcmap` files (the largest one is smaller than the default range-request size) streaming doesn't really seem necessary either.	2019-09-04 11:46:04 +02:00
Jonas Jenwald	74f5a59f43	Ensure that the `cancel`/`error` methods on Streams are always called with valid `reason` arguments	2019-09-02 23:31:07 +02:00

1 2 3 4 5 ...

3701 Commits