pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	7212ff4eea	Stop checking for the `response` property, on `XMLHttpRequest`, when setting up the `WorkerMessageHandler` This check was added in PR 2445, however it's no longer necessary since all data[1] is now loaded on the main-thread (and then transferred to the worker-thread). Furthermore, by default the Fetch API is now (usually) used rather than `XMLHttpRequest`. All in all, while these checks were necessary at one point that's no longer the case and they can thus be removed. --- [1] This includes both the actual PDF data, as well as the CMap data.	2019-09-05 11:27:22 +02:00
Jonas Jenwald	f11a4ba750	Transfer, rather than copy, CMap data to the worker-thread It recently occurred to me that the CMap data should be an excellent candidate for transfering. This will help reduce peak memory usage for PDF documents using CMaps, since transfering of data avoids duplicating it on both the main- and worker-threads. Unfortunately it's not possible to actually transfer data when returning data through `sendWithPromise`, and another solution had to be used. Initially I looked at using one message for requesting the data, and another message for returning the actual CMap data. While that should have worked, it would have meant adding a lot more complexity particularly on the worker-thread. Hence the simplest solution, at least in my opinion, is to utilize `sendWithStream` since that makes it really easy to transfer the CMap data. (This required PR 11115 to land first, since otherwise CMap fetch errors won't propagate correctly to the worker-thread.) Please note that the patch purposely only changes the API to Worker communication, and not the API itself since changing the interface of `CMapReaderFactory` would be a breaking change. Furthermore, given the relatively small size of the `.bcmap` files (the largest one is smaller than the default range-request size) streaming doesn't really seem necessary either.	2019-09-04 11:46:04 +02:00
Jonas Jenwald	74f5a59f43	Ensure that the `cancel`/`error` methods on Streams are always called with valid `reason` arguments	2019-09-02 23:31:07 +02:00
Jonas Jenwald	02bdacef42	Ensure that `Error`s are handled correctly when using `postMessage` with Streams in `MessageHandler` Having recently worked with this code, it struck me that most of the `postMessage` calls where `Error`s are involved have never been correctly implemented (i.e. missing `wrapReason` calls).	2019-09-02 23:31:07 +02:00
Tim van der Meij	e59b11860d	Merge pull request #11108 from timvandermeij/es6-annotations Use more ES6 syntax in the annotation code	2019-09-02 23:13:24 +02:00
Tim van der Meij	2866c8a39e	Use more ES6 syntax in `src/core/annotation.js` `let` is converted to `const` where possible.	2019-09-02 22:37:27 +02:00
Tim van der Meij	c37a2c0408	Merge pull request #11112 from Snuffleupagus/TESTING-rm-version-warn Remove the API/Worker version warning message in `TESTING` mode	2019-09-02 22:22:33 +02:00
Jonas Jenwald	229f6f34d1	Remove the API/Worker version warning message in `TESTING` mode The warning messages turn out to be more annoying than helpful when looking at the `console` during tests, so let's just remove them.	2019-09-01 16:47:26 +02:00
Jonas Jenwald	cd82b81bc7	Inline the `resolveCall` helper function at its call-sites in `MessageHandler` There's only three call-sites and one of them doesn't even need the complete functionality of `resolveCall`, hence it seems reasonable to just inline this code. An additional benefit of this is that the `Function.prototype.apply()` instance can also be converted into "normal" function calls, which should be a tiny bit more efficient. The patch also replaces a number of unnecessary arrow functions, in relevant parts of the `MessageHandler` code, with "normal" functions instead. Finally, all `Promise.resolve().then(...)` calls are replaced with `new Promise(...)` instead since the latter is a tiny bit more efficient. This also explains the test failures on the Linux bot, with a prior version of the patch, since the `Promise.resolve().then(...)` format essentially creates two Promises thus causing additional delay.	2019-09-01 13:40:19 +02:00
Jonas Jenwald	055f03938b	Remove support for the `scope` parameter in the `MessageHandler.on` method At this point in time it's easy to convert the `MessageHandler.on` call-sites to use arrow functions, and thus let the JavaScript engine handle scopes for us, rather than having to manually keep references to the relevant scopes in `MessageHandler`.[1] An additional benefit of this is that a couple of `Function.prototype.call()` instances can now be converted into "normal" function calls, which should be a tiny bit more efficient. All in all, I don't see any compelling reason why it'd be necessary to keep supporting custom `scope`s in the `MessageHandler` implementation. --- [1] In the event that a custom scope is ever needed, simply using `bind` on the handler function when calling `MessageHandler.on` ought to work as well.	2019-09-01 09:24:15 +02:00
Tim van der Meij	49018482dc	Use more ES6 syntax in `src/display/annotation_layer.js` `let` is converted to `const` where possible, `var` usage is disabled and template strings are used where possible.	2019-08-31 16:40:39 +02:00
Jonas Jenwald	f71ea2de0e	Remove the `makeReasonSerializable` helper function, and use `wrapReason` instead, in `src/shared/message_handler.js` Since `wrapReason` and `makeReasonSerializable` are essentially functionally equivalent it doesn't seem necessary to keep both of them around, especially when `makeReasonSerializable` only has a single call-site.	2019-08-30 19:36:10 +02:00
Jonas Jenwald	4e6a9b54c7	Change the internal `stream` property, as sent when Streams are used, from a String to a Number Given that the `stream` property is an internal implementation detail, changing its type shouldn't be a problem. By using Numbers instead, we can avoid unnecessary String allocations when creating/processing Streams.	2019-08-30 13:27:18 +02:00
Jonas Jenwald	252a3e35fb	Reduce the amount of unnecessary function calls and object allocations, in `MessageHandler`, when using Streams With PR 11069 we're now using Streams for OperatorList parsing (in addition to just TextContent parsing), which brings the nice benefit of being able to easily abort parsing on the worker-thread thus saving resources. However, since we're now creating many more `ReadableStream` there appears to be a tiny bit more overhead because of it (giving ~1% slower runtime of `browsertest` on the bots). In this case we're just going to have to accept such a small regression, since the benefits of using Streams clearly outweighs it. What we can do here, is to try and make the Streams part of the `MessageHandler` implementation slightly more efficient by e.g. removing unnecessary function calls (which has been helpful in other parts of the code-base). To that end, this patch makes the following changes: - Actually support `transfers` in `MessageHandler.sendWithStream`, since the parameter was being ignored. - Inline the `sendStreamRequest`/`sendStreamResponse` helper functions at their respective call-sites. Obviously this causes some amount of code duplication, however I still think this change seems reasonable since for each call-site: - It avoids making one unnecessary function call. - It avoids allocating one temporary object. - It avoids sending, and thus structure clone, various undefined object properties. - Inline objects in the `MessageHandler.{send, sendWithPromise}` methods. - Finally, directly call `comObj.postMessage` in various methods when `transfers` are not present, rather than calling `MessageHandler.postMessage`, to further reduce the amount of function calls.	2019-08-30 12:32:20 +02:00
Jonas Jenwald	ae0d9e8c2a	Replace some instances of implicit `function.bind(this)` usage, in `src/display/api.js`, with arrow functions instead	2019-08-30 11:35:05 +02:00
Jonas Jenwald	667e548e5f	[TextLayer] Remove `setAttribute` usage in `appendText` (issue 8066) One of the motivations for using `setAttribute` in the first place was to support more efficient DOM updates in the `expandTextDivs` method, since performance of the `enhanceTextSelection` mode can be somewhat bad when there's a lot of `textDivs` on the page. With recent `TextLayer` changes/optimizations it's no longer necessary to store a complete `style`-string for every `textDiv`, and we can thus re-visit the `setAttribute` usage. Note that with the current code, in `appendText`, there's only one string per `textDiv` which avoids a bunch of temporary strings. While the changes in this patch means that there's now three strings per `textDiv` instead, the total length of these strings are now quite a bit shorter (42 characters to be exact).	2019-08-28 16:52:09 +02:00
Jonas Jenwald	106b239c5d	[TextLayer] Avoid unnecessary font updates in `_layoutText` (PR 11097 follow-up) This should obviously have been done in PR 11097, but for some reason I completely overlooked it; sorry about that. There's no good reason to update the font unless you're actually going to measure the width of the textContent. This can reduce unnecessary font switching a fair bit, even for documents which are somewhat simple/short (in e.g. the `tracemonkey.pdf` file this cuts the amount of font switches almost in half).	2019-08-28 16:08:06 +02:00
Jonas Jenwald	a1398048e5	[TextLayer] Simplify building of the expanded transform in `expandTextDivs` Rather than essentially re-computing the `originalTransform` every time, we can simply use it directly instead.	2019-08-25 13:09:04 +02:00
Jonas Jenwald	b68f7bb404	[TextLayer] Only measure the width of the text, in `_layoutText`, for multi-char text divs For performance reasons single-char text divs aren't being scaled, as outlined in a comment in `appendText`. Hence it doesn't seem necessary, or even a good idea, to unconditionally measuring the width of the text in `_layoutText`.	2019-08-25 12:32:49 +02:00
Jonas Jenwald	711040ecc5	Stop re-throwing errors in the 'GetOperatorList' and 'GetTextContent' handlers, in `src/core/worker.js` These functions aren't returning anything, now that they're using `ReadableStream`s, and it thus doesn't seem necessary to re-throw errors (also given the console message that's caused by it).	2019-08-24 15:56:41 +02:00
Yury Delendik	66e0dd1b06	Use streams for OperatorList chunking (issue 10023) Please note: The majority of this patch was written by Yury, and it's simply been rebased and slightly extended to prevent issues when dealing with `RenderingCancelledException`. By leveraging streams this (finally) provides a simple way in which parsing can be aborted on the worker-thread, which will ultimately help save resources. With this patch worker-thread parsing will only be aborted when the document is destroyed, and not when rendering is cancelled. There's a couple of reasons for this: - The API currently expects the entire OperatorList to be extracted, or an Error to occur, once it's been started. Hence additional re-factoring/re-writing of the API code will be necessary to properly support cancelling and re-starting of OperatorList parsing in cases where the `lastChunk` hasn't yet been seen. - Even with the above addressed, immediately cancelling when encountering a `RenderingCancelledException` will lead to worse performance in e.g. the default viewer. When zooming and/or rotation of the document occurs it's very likely that `cancel` will be (almost) immediately followed by a new `render` call. In that case you'd obviously not want to abort parsing on the worker-thread, since then you'd risk throwing away a partially parsed Page and thus be forced to re-parse it again which will regress perceived performance. - This patch is already somewhat risky, given that it touches fundamentally important/critical code, and trying to keep it somewhat small should hopefully reduce the risk of regressions (and simplify reviewing as well). Time permitting, once this has landed and been in Nightly for awhile, I'll try to work on the remaining points outlined above. Co-Authored-By: Yury Delendik <ydelendik@mozilla.com> Co-Authored-By: Jonas Jenwald <jonas.jenwald@gmail.com>	2019-08-24 15:56:40 +02:00
Jonas Jenwald	29a2516e4c	[TextLayer] Use an Array to build the total `padding`, rather than concatenating Strings, in `expandTextDivs` Furthermore, it's possible to re-use the same Array for all `textDiv`s on the page and the resulting padding string also becomes a lot more compact. Please note that the `paddingLeft` branch was moved, since the padding values need to be ordered as `top, right, bottom, left`. Finally, with this re-factoring it's no longer necessary to cache the original `style` string for every `textDiv` when `enhanceTextSelection` is enabled.	2019-08-24 01:13:59 +02:00
Tim van der Meij	edbebb8bf7	Merge pull request #11090 from Snuffleupagus/textLayer-expandTextDivs-transform [TextLayer] Use an Array to build the total `transform`, rather than concatenating Strings, in `expandTextDivs`	2019-08-23 23:12:42 +02:00
Jonas Jenwald	932fcacff8	[TextLayer] Only handle positive padding values in `expandTextDivs` Given that browsers will reject padding values smaller than zero (which may be caused by limited numerical precision during calculations in the `expand` code), it makes no sense to include those when expanding the `textDiv`s.	2019-08-23 13:16:20 +02:00
Jonas Jenwald	37e8a8189b	[TextLayer] Use an Array to build the total `transform`, rather than concatenating Strings, in `expandTextDivs` Furthermore, it's possible to re-use the same Array for all `textDiv`s on the page.	2019-08-23 12:17:12 +02:00
Tim van der Meij	490deb1b65	Merge pull request #11086 from Snuffleupagus/textLayer-originalTransform [TextLayer] Only cache the `originalTransform` when `enhanceTextSelection` is enabled	2019-08-22 23:09:07 +02:00
Brendan Dahl	31f319301d	Merge pull request #11087 from brendandahl/disable-links Add a way to disable external links.	2019-08-22 11:13:11 -07:00
Jonas Jenwald	a519ceffee	[TextLayer] Use template strings when updating the font property in the `_layoutText` method	2019-08-22 14:47:44 +02:00
Jonas Jenwald	6afe3221b7	[TextLayer] Only cache the `originalTransform` when `enhanceTextSelection` is enabled Given that this is completely unused in "regular" text-selection mode, there's no reason to unconditionally store one string for every `textDiv`.	2019-08-22 14:47:18 +02:00
Brendan Dahl	98e989116c	Add a way to disable external links.	2019-08-21 11:20:41 -07:00
Jonas Jenwald	431a264126	[TextLayer] Reduce the amount of intermediary strings in `expandTextDivs` By using template strings, we can avoid some unnecessary string allocations (which is also helped by shortening a variable name).	2019-08-19 12:09:18 +02:00
Jonas Jenwald	45dfad8640	[TextLayer] Only cache the current `textDiv` style when `enhanceTextSelection` is enabled This will help save a little bit of memory, by not storing one unused string for each `textDiv` in regular text-selection mode.	2019-08-19 11:02:56 +02:00
Jonas Jenwald	1cd9a28c81	Replace the `XRef.cache` Array with a Map instead Given that the different types of `Stream`s will never be cached, this thus implies that the `XRef.cache` Array will always be more-or-less sparse. Generally speaking, the longer the document the more sparse the `XRef.cache` will thus become. For example, looking at the `pdf.pdf` file from the test-suite: The length of the `XRef.cache` Array will be a few hundred thousand elements, with approximately 95% of them being empty. Hence it seems pretty clear that an Array isn't really the best data-structure for this kind of cache, and this patch thus changes it to a Map instead. This patch-series was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file: ``` [ { "id": "issue2618", "file": "../web/pdfs/issue2618.pdf", "md5": "", "rounds": 200, "type": "eq" } ] ``` which gave the following results when comparing this patch-series against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 200 \| 2736 \| 2736 \| 1 \| 0.02 \| Firefox \| Page Request \| 200 \| 2 \| 2 \| 0 \| -8.26 \| faster Firefox \| Rendering \| 200 \| 2733 \| 2734 \| 1 \| 0.03 \| ```	2019-08-18 12:07:18 +02:00
Jonas Jenwald	34a53b9f5d	Inline the `isRef` checks in the various `XRef.fetch` related methods The relevant methods are usually not hot enough for these changes to have an easily measurable effect, however there's been a lot of other cases where similiar inlining has helped performance. (And these changes may help offset the changes made in the next patch.)	2019-08-18 11:57:48 +02:00
Tim van der Meij	1565d1849d	Merge pull request #11073 from brendandahl/code-point Move polyfill for codePointAt to String prototype.	2019-08-17 13:26:35 +02:00
Brendan Dahl	c8129b8787	Move polyfill for codePointAt to String prototype. This method belongs on the prototype not the String object.	2019-08-16 14:32:43 -07:00
Jonas Jenwald	40d3916f31	Reduce the number of temporary variables in the `Parser.getObj` method This avoids allocating approximately 1.7 million short-lived variables when loading the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, in the default viewer.	2019-08-16 13:51:41 +02:00
Jonas Jenwald	7728a6630c	Inline the `isString` check in the `Parser.getObj` method For very large and complex PDF files this will help performance slightly, since `Parser.getObj` is called a lot during parsing in the worker. This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file: ``` [ { "id": "issue2618", "file": "../web/pdfs/issue2618.pdf", "md5": "", "rounds": 200, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 200 \| 2847 \| 2830 \| -17 \| -0.60 \| faster Firefox \| Page Request \| 200 \| 2 \| 2 \| 0 \| -7.14 \| Firefox \| Rendering \| 200 \| 2844 \| 2827 \| -17 \| -0.60 \| faster ```	2019-08-16 10:34:24 +02:00
Jonas Jenwald	7f456b3e2e	Replace of all usages of `var` with `let`/`const` in the `src/shared/util.js` file Also removes a couple of unnecessary (temporary) variable assigments in `arraysToBytes` and uses template strings in a few spots.	2019-08-11 14:35:35 +02:00
Jonas Jenwald	f6c4a1f080	Convert `Util` to a class with static methods Also replaces `var` with `const` in all the relevant code.	2019-08-11 14:35:35 +02:00
Jonas Jenwald	7ee370a394	Remove the `skipEmpty` parameter from `Util.intersect` (PR 11059 follow-up) Looking at this again, it struck me that added functionality in `Util.intersect` is probably more confusing than helpful in general; sorry about the churn in this code! Based on the parameter name you'd probably expect it to only match when the intersection is `[0, 0, 0, 0]` and not when only one component is zero, hence the `skipEmpty` parameter thus feels too tightly coupled to the `Page.view` getter.	2019-08-11 14:33:52 +02:00
Tim van der Meij	fbe8c6127c	Merge pull request #11059 from Snuffleupagus/boundingBox-more-validation Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries	2019-08-09 22:39:01 +02:00
Jonas Jenwald	d637b25e36	Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it. Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly". The patch makes the following notable changes: - Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.) - Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer. - Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`. - Add an optional parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty. - Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange. --- [1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.	2019-08-09 10:18:13 +02:00
Jonas Jenwald	0f78fdb229	Handle some corrupt/truncated JPEG images that are missing the EOI (End of Image) marker (issue 11052) Note that even Adobe Reader cannot render the PDF file completely, which is always a good indication that it's corrupt.	2019-08-08 10:37:41 +02:00
Jonas Jenwald	e9b7996f2f	Actually compare the `cropBox` and `mediaBox` correctly in the `Page.view` getter The current code will only consider the `cropBox` and `mediaBox` as equal when they both point to the same underlying Array. In the case where a PDF file actually specifies both boxes independently, with the exact same values in each, the comparison will currently fail and lead to an unneeded intersection computation.	2019-08-07 17:15:57 +02:00
Jonas Jenwald	5ac9c7c384	Support corrupt PDF files with invalid/non-existent Group /CS entries (issue 11045) The PDF file in question tries to reference a non-existent ColorSpace, which should be quite rare in practice.	2019-08-06 14:33:05 +02:00
Tim van der Meij	be70ee236d	Merge pull request #11013 from timvandermeij/annotations-quadpoints [api-minor] Implement quadpoints for annotations in the core layer	2019-08-04 16:06:10 +02:00
Jonas Jenwald	0276385e6e	[api-minor] Fix completely broken `getStats` method by returning stats in Objects, rather than in Arrays (PR 11029 follow-up) With the changes to the `StreamType`/`FontType` "enums" in PR 11029, one unfortunate result is that `getStats` now always returns empty Arrays. Something that everyone, myself included, apparently missed is that you obviously cannot index an Array with Strings :-) I wrongly assumed that the unit-tests would catch any bugs, but they apparently suffered from the same issue as the code in `src/core/`. Another possible option could perhaps be to use `Set`s, rather than objects, but that will require larger changes since `LoopbackPort` (in `src/display/api.js`) doesn't support them.	2019-08-02 14:09:24 +02:00
Tim van der Meij	9c8fe3142a	Merge pull request #11034 from Snuffleupagus/cancel-with-AbortException Ensure that `ReadableStream`s are cancelled with actual Errors	2019-08-02 00:18:44 +02:00
Tim van der Meij	e0b38bed3c	Merge pull request #11029 from brendandahl/pdfjs-telemetry-update [api-minor] Update telemetry to use 'categorical' histograms.	2019-08-02 00:11:02 +02:00

1 2 3 4 5 ...

3754 Commits