pdf.js

Author	SHA1	Message	Date
Jani Pehkonen	911df237f3	Avoid floating point inaccuracy in gradient color stops	2019-09-17 21:01:17 +03:00
Jonas Jenwald	4bd79ec4b3	Inline the `resolveOrReject` helper function at its call-sites in `MessageHandler`, and rename an `error` key to `reason` Given that there's only a couple of call-sites, and that the helper function is really simple, it doesn't seem entirely necessary to keep it around. While fewer function calls is always a good thing, in this case the performance impact is small enough to be unmeasurable. With one single exception the code in `MessageHandler` is using `reason` when passing around various Errors, hence this patch also renames an `error` key for consistency.	2019-09-17 14:22:24 +02:00
Jonas Jenwald	0617984b59	Remove unnecessary `data.streamId` accesses in `MessageHandler._processStreamMessage`, and use a constant object shape in `MessageHandler.sendWithStream` The `streamId` short-hand in `MessageHandler._processStreamMessage` was only used partially througout the method, which seemed kind of strange, hence that's fixed in this patch. Furthermore, always giving the `streamController` object a constant shape in `MessageHandler.sendWithStream` cannot hurt either.	2019-09-17 14:18:57 +02:00
Jonas Jenwald	281ed33e43	Abort, with a small delay, `getOperatorList` on the worker-thread when rendering is cancelled (PR 11069 follow-up) With this patch we're finally able to abort worker-thread parsing of the `OperatorList`, rather than only aborting the main-thread rendering itself, when the `RenderTask.cancel` method is being called. This will help improve perceived performance in the default viewer, especially when reading longer and more complex documents, since pages that've been scrolled out-of-view (and thus evicted from the cache) will no longer compete for parsing resources on the worker-thread. Please note: With the implementation in this patch we're not aborting worker-thread parsing immediately on `RenderTask.cancel`, since that would lead to worse performance in many cases. For example: When zoom/rotation occurs in the viewer, while parsing/rendering is still ongoing, a `cancel` call will usually be (almost) immediately folled by a new `PDFPageProxy.render` call. In that case you obviously don't want to abort parsing on the worker-thread, since that would risk throwing away a partially parsed `OperatorList` and thus force unnecessary re-parsing which will regress perceived performance (especially for more complex documents). When choosing a reasonable delay, before cancelling `getOperatorList` on the worker-thread when `RenderTask.cancel` is called, two different positions need to be considered: 1. The delay needs to be short enough, since a timeout in the multiple seconds range would essentially make this entire functionality meaningless (by always allowing most/all pages enough time to finish parsing). 2. The delay cannot be too short, since that would actually reduce performance in the zoom/rotation case outlined above. Furthermore, the time between `RenderTask.cancel` and `PDFPageProxy.render` calls will obviously be affected by both general computer performance and current CPU load. It's certainly possible that the timeout may require some further tweaks, however the value settled on in this patch was easily one order of magnitude larger than the delta between cancel/render in my tests.	2019-09-14 11:30:32 +02:00
Jonas Jenwald	00efff532c	Ensure that `addLinkAttributes` is always called with a valid `url` parameter There's no good reason for calling this helper function without a `url` parameter, and this way we can prevent that from happening. Note how the `PDFOutlineViewer` call-site was already doing the right thing here, and only the `LinkAnnotationElement` call-site needed a small adjustment to make it work.	2019-09-11 13:24:04 +02:00
Jonas Jenwald	12e1c91f73	Don't `enqueue` unused properties when sending 'GetOperatorList' data from the worker-thread (PR 11069 follow-up) With the changes made in PR 11069, it's no longer necessary to include the `pageIndex`/`intent` parameters when sending 'GetOperatorList' data. In the previous implementation these properties were used to associate the `OperatorList` with the correct `RenderTask`, however now that `ReadableStream`s are used that's handled automatically and it's thus dead code at this point.	2019-09-09 17:41:26 +02:00
Tim van der Meij	37d5b80ba8	Merge pull request #11118 from Snuffleupagus/FetchBuiltInCMap-sendWithStream Transfer, rather than copy, CMap data to the worker-thread	2019-09-06 22:56:14 +02:00
Jonas Jenwald	7dea3f9389	[api-minor] Remove the `postMessageTransfers` parameter, and thus the ability to manually disable transferring of data, from the API By transfering, rather than copying, `ArrayBuffer`s between the main- and worker-threads, you can avoid unnecessary allocations by only having one copy of the same data. Hence manually setting `postMessageTransfers: false`, when calling `getDocument`, is a performance footgun[1] which will do nothing but waste memory. Given that every reasonably modern browser supports `postMessage` transfers[2], I really don't see why it should be possible to force-disable this functionality. Looking at the browser support, for `postMessage` transfers[2], it's highly unlikely that PDF.js is even usable in browsers without it. However, the feature testing of `postMessage` transfers is kept for the time being just to err on the safe side. --- [1] This is somewhat similar to the, now removed, `disableWorker` parameter which also provided API users a much too simple way of reducing performance. [2] See e.g. https://developer.mozilla.org/en-US/docs/Web/API/MessagePort/postMessage#Browser_compatibility and https://developer.mozilla.org/en-US/docs/Web/API/Transferable#Browser_compatibility	2019-09-05 13:09:54 +02:00
Jonas Jenwald	f0534b9b51	Adjust the values sent, with the 'test' message, by the `WorkerMessageHandler.setup` method Note how the sent values have inconsistent types, with a boolean in one case and an object in the other (normal) case. Furthermore, explicitly sending a `supportTypedArray: true` property seems superfluous at least to me.	2019-09-05 11:27:27 +02:00
Jonas Jenwald	7212ff4eea	Stop checking for the `response` property, on `XMLHttpRequest`, when setting up the `WorkerMessageHandler` This check was added in PR 2445, however it's no longer necessary since all data[1] is now loaded on the main-thread (and then transferred to the worker-thread). Furthermore, by default the Fetch API is now (usually) used rather than `XMLHttpRequest`. All in all, while these checks were necessary at one point that's no longer the case and they can thus be removed. --- [1] This includes both the actual PDF data, as well as the CMap data.	2019-09-05 11:27:22 +02:00
Jonas Jenwald	f11a4ba750	Transfer, rather than copy, CMap data to the worker-thread It recently occurred to me that the CMap data should be an excellent candidate for transfering. This will help reduce peak memory usage for PDF documents using CMaps, since transfering of data avoids duplicating it on both the main- and worker-threads. Unfortunately it's not possible to actually transfer data when returning data through `sendWithPromise`, and another solution had to be used. Initially I looked at using one message for requesting the data, and another message for returning the actual CMap data. While that should have worked, it would have meant adding a lot more complexity particularly on the worker-thread. Hence the simplest solution, at least in my opinion, is to utilize `sendWithStream` since that makes it really easy to transfer the CMap data. (This required PR 11115 to land first, since otherwise CMap fetch errors won't propagate correctly to the worker-thread.) Please note that the patch purposely only changes the API to Worker communication, and not the API itself since changing the interface of `CMapReaderFactory` would be a breaking change. Furthermore, given the relatively small size of the `.bcmap` files (the largest one is smaller than the default range-request size) streaming doesn't really seem necessary either.	2019-09-04 11:46:04 +02:00
Jonas Jenwald	74f5a59f43	Ensure that the `cancel`/`error` methods on Streams are always called with valid `reason` arguments	2019-09-02 23:31:07 +02:00
Jonas Jenwald	02bdacef42	Ensure that `Error`s are handled correctly when using `postMessage` with Streams in `MessageHandler` Having recently worked with this code, it struck me that most of the `postMessage` calls where `Error`s are involved have never been correctly implemented (i.e. missing `wrapReason` calls).	2019-09-02 23:31:07 +02:00
Tim van der Meij	e59b11860d	Merge pull request #11108 from timvandermeij/es6-annotations Use more ES6 syntax in the annotation code	2019-09-02 23:13:24 +02:00
Tim van der Meij	2866c8a39e	Use more ES6 syntax in `src/core/annotation.js` `let` is converted to `const` where possible.	2019-09-02 22:37:27 +02:00
Tim van der Meij	c37a2c0408	Merge pull request #11112 from Snuffleupagus/TESTING-rm-version-warn Remove the API/Worker version warning message in `TESTING` mode	2019-09-02 22:22:33 +02:00
Jonas Jenwald	229f6f34d1	Remove the API/Worker version warning message in `TESTING` mode The warning messages turn out to be more annoying than helpful when looking at the `console` during tests, so let's just remove them.	2019-09-01 16:47:26 +02:00
Jonas Jenwald	cd82b81bc7	Inline the `resolveCall` helper function at its call-sites in `MessageHandler` There's only three call-sites and one of them doesn't even need the complete functionality of `resolveCall`, hence it seems reasonable to just inline this code. An additional benefit of this is that the `Function.prototype.apply()` instance can also be converted into "normal" function calls, which should be a tiny bit more efficient. The patch also replaces a number of unnecessary arrow functions, in relevant parts of the `MessageHandler` code, with "normal" functions instead. Finally, all `Promise.resolve().then(...)` calls are replaced with `new Promise(...)` instead since the latter is a tiny bit more efficient. This also explains the test failures on the Linux bot, with a prior version of the patch, since the `Promise.resolve().then(...)` format essentially creates two Promises thus causing additional delay.	2019-09-01 13:40:19 +02:00
Jonas Jenwald	055f03938b	Remove support for the `scope` parameter in the `MessageHandler.on` method At this point in time it's easy to convert the `MessageHandler.on` call-sites to use arrow functions, and thus let the JavaScript engine handle scopes for us, rather than having to manually keep references to the relevant scopes in `MessageHandler`.[1] An additional benefit of this is that a couple of `Function.prototype.call()` instances can now be converted into "normal" function calls, which should be a tiny bit more efficient. All in all, I don't see any compelling reason why it'd be necessary to keep supporting custom `scope`s in the `MessageHandler` implementation. --- [1] In the event that a custom scope is ever needed, simply using `bind` on the handler function when calling `MessageHandler.on` ought to work as well.	2019-09-01 09:24:15 +02:00
Tim van der Meij	49018482dc	Use more ES6 syntax in `src/display/annotation_layer.js` `let` is converted to `const` where possible, `var` usage is disabled and template strings are used where possible.	2019-08-31 16:40:39 +02:00
Jonas Jenwald	f71ea2de0e	Remove the `makeReasonSerializable` helper function, and use `wrapReason` instead, in `src/shared/message_handler.js` Since `wrapReason` and `makeReasonSerializable` are essentially functionally equivalent it doesn't seem necessary to keep both of them around, especially when `makeReasonSerializable` only has a single call-site.	2019-08-30 19:36:10 +02:00
Jonas Jenwald	4e6a9b54c7	Change the internal `stream` property, as sent when Streams are used, from a String to a Number Given that the `stream` property is an internal implementation detail, changing its type shouldn't be a problem. By using Numbers instead, we can avoid unnecessary String allocations when creating/processing Streams.	2019-08-30 13:27:18 +02:00
Jonas Jenwald	252a3e35fb	Reduce the amount of unnecessary function calls and object allocations, in `MessageHandler`, when using Streams With PR 11069 we're now using Streams for OperatorList parsing (in addition to just TextContent parsing), which brings the nice benefit of being able to easily abort parsing on the worker-thread thus saving resources. However, since we're now creating many more `ReadableStream` there appears to be a tiny bit more overhead because of it (giving ~1% slower runtime of `browsertest` on the bots). In this case we're just going to have to accept such a small regression, since the benefits of using Streams clearly outweighs it. What we can do here, is to try and make the Streams part of the `MessageHandler` implementation slightly more efficient by e.g. removing unnecessary function calls (which has been helpful in other parts of the code-base). To that end, this patch makes the following changes: - Actually support `transfers` in `MessageHandler.sendWithStream`, since the parameter was being ignored. - Inline the `sendStreamRequest`/`sendStreamResponse` helper functions at their respective call-sites. Obviously this causes some amount of code duplication, however I still think this change seems reasonable since for each call-site: - It avoids making one unnecessary function call. - It avoids allocating one temporary object. - It avoids sending, and thus structure clone, various undefined object properties. - Inline objects in the `MessageHandler.{send, sendWithPromise}` methods. - Finally, directly call `comObj.postMessage` in various methods when `transfers` are not present, rather than calling `MessageHandler.postMessage`, to further reduce the amount of function calls.	2019-08-30 12:32:20 +02:00
Jonas Jenwald	ae0d9e8c2a	Replace some instances of implicit `function.bind(this)` usage, in `src/display/api.js`, with arrow functions instead	2019-08-30 11:35:05 +02:00
Jonas Jenwald	667e548e5f	[TextLayer] Remove `setAttribute` usage in `appendText` (issue 8066) One of the motivations for using `setAttribute` in the first place was to support more efficient DOM updates in the `expandTextDivs` method, since performance of the `enhanceTextSelection` mode can be somewhat bad when there's a lot of `textDivs` on the page. With recent `TextLayer` changes/optimizations it's no longer necessary to store a complete `style`-string for every `textDiv`, and we can thus re-visit the `setAttribute` usage. Note that with the current code, in `appendText`, there's only one string per `textDiv` which avoids a bunch of temporary strings. While the changes in this patch means that there's now three strings per `textDiv` instead, the total length of these strings are now quite a bit shorter (42 characters to be exact).	2019-08-28 16:52:09 +02:00
Jonas Jenwald	106b239c5d	[TextLayer] Avoid unnecessary font updates in `_layoutText` (PR 11097 follow-up) This should obviously have been done in PR 11097, but for some reason I completely overlooked it; sorry about that. There's no good reason to update the font unless you're actually going to measure the width of the textContent. This can reduce unnecessary font switching a fair bit, even for documents which are somewhat simple/short (in e.g. the `tracemonkey.pdf` file this cuts the amount of font switches almost in half).	2019-08-28 16:08:06 +02:00
Jonas Jenwald	a1398048e5	[TextLayer] Simplify building of the expanded transform in `expandTextDivs` Rather than essentially re-computing the `originalTransform` every time, we can simply use it directly instead.	2019-08-25 13:09:04 +02:00
Jonas Jenwald	b68f7bb404	[TextLayer] Only measure the width of the text, in `_layoutText`, for multi-char text divs For performance reasons single-char text divs aren't being scaled, as outlined in a comment in `appendText`. Hence it doesn't seem necessary, or even a good idea, to unconditionally measuring the width of the text in `_layoutText`.	2019-08-25 12:32:49 +02:00
Jonas Jenwald	711040ecc5	Stop re-throwing errors in the 'GetOperatorList' and 'GetTextContent' handlers, in `src/core/worker.js` These functions aren't returning anything, now that they're using `ReadableStream`s, and it thus doesn't seem necessary to re-throw errors (also given the console message that's caused by it).	2019-08-24 15:56:41 +02:00
Yury Delendik	66e0dd1b06	Use streams for OperatorList chunking (issue 10023) Please note: The majority of this patch was written by Yury, and it's simply been rebased and slightly extended to prevent issues when dealing with `RenderingCancelledException`. By leveraging streams this (finally) provides a simple way in which parsing can be aborted on the worker-thread, which will ultimately help save resources. With this patch worker-thread parsing will only be aborted when the document is destroyed, and not when rendering is cancelled. There's a couple of reasons for this: - The API currently expects the entire OperatorList to be extracted, or an Error to occur, once it's been started. Hence additional re-factoring/re-writing of the API code will be necessary to properly support cancelling and re-starting of OperatorList parsing in cases where the `lastChunk` hasn't yet been seen. - Even with the above addressed, immediately cancelling when encountering a `RenderingCancelledException` will lead to worse performance in e.g. the default viewer. When zooming and/or rotation of the document occurs it's very likely that `cancel` will be (almost) immediately followed by a new `render` call. In that case you'd obviously not want to abort parsing on the worker-thread, since then you'd risk throwing away a partially parsed Page and thus be forced to re-parse it again which will regress perceived performance. - This patch is already somewhat risky, given that it touches fundamentally important/critical code, and trying to keep it somewhat small should hopefully reduce the risk of regressions (and simplify reviewing as well). Time permitting, once this has landed and been in Nightly for awhile, I'll try to work on the remaining points outlined above. Co-Authored-By: Yury Delendik <ydelendik@mozilla.com> Co-Authored-By: Jonas Jenwald <jonas.jenwald@gmail.com>	2019-08-24 15:56:40 +02:00
Jonas Jenwald	29a2516e4c	[TextLayer] Use an Array to build the total `padding`, rather than concatenating Strings, in `expandTextDivs` Furthermore, it's possible to re-use the same Array for all `textDiv`s on the page and the resulting padding string also becomes a lot more compact. Please note that the `paddingLeft` branch was moved, since the padding values need to be ordered as `top, right, bottom, left`. Finally, with this re-factoring it's no longer necessary to cache the original `style` string for every `textDiv` when `enhanceTextSelection` is enabled.	2019-08-24 01:13:59 +02:00
Tim van der Meij	edbebb8bf7	Merge pull request #11090 from Snuffleupagus/textLayer-expandTextDivs-transform [TextLayer] Use an Array to build the total `transform`, rather than concatenating Strings, in `expandTextDivs`	2019-08-23 23:12:42 +02:00
Jonas Jenwald	932fcacff8	[TextLayer] Only handle positive padding values in `expandTextDivs` Given that browsers will reject padding values smaller than zero (which may be caused by limited numerical precision during calculations in the `expand` code), it makes no sense to include those when expanding the `textDiv`s.	2019-08-23 13:16:20 +02:00
Jonas Jenwald	37e8a8189b	[TextLayer] Use an Array to build the total `transform`, rather than concatenating Strings, in `expandTextDivs` Furthermore, it's possible to re-use the same Array for all `textDiv`s on the page.	2019-08-23 12:17:12 +02:00
Tim van der Meij	490deb1b65	Merge pull request #11086 from Snuffleupagus/textLayer-originalTransform [TextLayer] Only cache the `originalTransform` when `enhanceTextSelection` is enabled	2019-08-22 23:09:07 +02:00
Brendan Dahl	31f319301d	Merge pull request #11087 from brendandahl/disable-links Add a way to disable external links.	2019-08-22 11:13:11 -07:00
Jonas Jenwald	a519ceffee	[TextLayer] Use template strings when updating the font property in the `_layoutText` method	2019-08-22 14:47:44 +02:00
Jonas Jenwald	6afe3221b7	[TextLayer] Only cache the `originalTransform` when `enhanceTextSelection` is enabled Given that this is completely unused in "regular" text-selection mode, there's no reason to unconditionally store one string for every `textDiv`.	2019-08-22 14:47:18 +02:00
Brendan Dahl	98e989116c	Add a way to disable external links.	2019-08-21 11:20:41 -07:00
Jonas Jenwald	431a264126	[TextLayer] Reduce the amount of intermediary strings in `expandTextDivs` By using template strings, we can avoid some unnecessary string allocations (which is also helped by shortening a variable name).	2019-08-19 12:09:18 +02:00
Jonas Jenwald	45dfad8640	[TextLayer] Only cache the current `textDiv` style when `enhanceTextSelection` is enabled This will help save a little bit of memory, by not storing one unused string for each `textDiv` in regular text-selection mode.	2019-08-19 11:02:56 +02:00
Jonas Jenwald	1cd9a28c81	Replace the `XRef.cache` Array with a Map instead Given that the different types of `Stream`s will never be cached, this thus implies that the `XRef.cache` Array will always be more-or-less sparse. Generally speaking, the longer the document the more sparse the `XRef.cache` will thus become. For example, looking at the `pdf.pdf` file from the test-suite: The length of the `XRef.cache` Array will be a few hundred thousand elements, with approximately 95% of them being empty. Hence it seems pretty clear that an Array isn't really the best data-structure for this kind of cache, and this patch thus changes it to a Map instead. This patch-series was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file: ``` [ { "id": "issue2618", "file": "../web/pdfs/issue2618.pdf", "md5": "", "rounds": 200, "type": "eq" } ] ``` which gave the following results when comparing this patch-series against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 200 \| 2736 \| 2736 \| 1 \| 0.02 \| Firefox \| Page Request \| 200 \| 2 \| 2 \| 0 \| -8.26 \| faster Firefox \| Rendering \| 200 \| 2733 \| 2734 \| 1 \| 0.03 \| ```	2019-08-18 12:07:18 +02:00
Jonas Jenwald	34a53b9f5d	Inline the `isRef` checks in the various `XRef.fetch` related methods The relevant methods are usually not hot enough for these changes to have an easily measurable effect, however there's been a lot of other cases where similiar inlining has helped performance. (And these changes may help offset the changes made in the next patch.)	2019-08-18 11:57:48 +02:00
Tim van der Meij	1565d1849d	Merge pull request #11073 from brendandahl/code-point Move polyfill for codePointAt to String prototype.	2019-08-17 13:26:35 +02:00
Brendan Dahl	c8129b8787	Move polyfill for codePointAt to String prototype. This method belongs on the prototype not the String object.	2019-08-16 14:32:43 -07:00
Jonas Jenwald	40d3916f31	Reduce the number of temporary variables in the `Parser.getObj` method This avoids allocating approximately 1.7 million short-lived variables when loading the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, in the default viewer.	2019-08-16 13:51:41 +02:00
Jonas Jenwald	7728a6630c	Inline the `isString` check in the `Parser.getObj` method For very large and complex PDF files this will help performance slightly, since `Parser.getObj` is called a lot during parsing in the worker. This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file: ``` [ { "id": "issue2618", "file": "../web/pdfs/issue2618.pdf", "md5": "", "rounds": 200, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 200 \| 2847 \| 2830 \| -17 \| -0.60 \| faster Firefox \| Page Request \| 200 \| 2 \| 2 \| 0 \| -7.14 \| Firefox \| Rendering \| 200 \| 2844 \| 2827 \| -17 \| -0.60 \| faster ```	2019-08-16 10:34:24 +02:00
Jonas Jenwald	7f456b3e2e	Replace of all usages of `var` with `let`/`const` in the `src/shared/util.js` file Also removes a couple of unnecessary (temporary) variable assigments in `arraysToBytes` and uses template strings in a few spots.	2019-08-11 14:35:35 +02:00
Jonas Jenwald	f6c4a1f080	Convert `Util` to a class with static methods Also replaces `var` with `const` in all the relevant code.	2019-08-11 14:35:35 +02:00
Jonas Jenwald	7ee370a394	Remove the `skipEmpty` parameter from `Util.intersect` (PR 11059 follow-up) Looking at this again, it struck me that added functionality in `Util.intersect` is probably more confusing than helpful in general; sorry about the churn in this code! Based on the parameter name you'd probably expect it to only match when the intersection is `[0, 0, 0, 0]` and not when only one component is zero, hence the `skipEmpty` parameter thus feels too tightly coupled to the `Page.view` getter.	2019-08-11 14:33:52 +02:00
Tim van der Meij	fbe8c6127c	Merge pull request #11059 from Snuffleupagus/boundingBox-more-validation Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries	2019-08-09 22:39:01 +02:00
Jonas Jenwald	d637b25e36	Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it. Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly". The patch makes the following notable changes: - Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.) - Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer. - Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`. - Add an optional parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty. - Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange. --- [1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.	2019-08-09 10:18:13 +02:00
Jonas Jenwald	0f78fdb229	Handle some corrupt/truncated JPEG images that are missing the EOI (End of Image) marker (issue 11052) Note that even Adobe Reader cannot render the PDF file completely, which is always a good indication that it's corrupt.	2019-08-08 10:37:41 +02:00
Jonas Jenwald	e9b7996f2f	Actually compare the `cropBox` and `mediaBox` correctly in the `Page.view` getter The current code will only consider the `cropBox` and `mediaBox` as equal when they both point to the same underlying Array. In the case where a PDF file actually specifies both boxes independently, with the exact same values in each, the comparison will currently fail and lead to an unneeded intersection computation.	2019-08-07 17:15:57 +02:00
Jonas Jenwald	5ac9c7c384	Support corrupt PDF files with invalid/non-existent Group /CS entries (issue 11045) The PDF file in question tries to reference a non-existent ColorSpace, which should be quite rare in practice.	2019-08-06 14:33:05 +02:00
Tim van der Meij	be70ee236d	Merge pull request #11013 from timvandermeij/annotations-quadpoints [api-minor] Implement quadpoints for annotations in the core layer	2019-08-04 16:06:10 +02:00
Jonas Jenwald	0276385e6e	[api-minor] Fix completely broken `getStats` method by returning stats in Objects, rather than in Arrays (PR 11029 follow-up) With the changes to the `StreamType`/`FontType` "enums" in PR 11029, one unfortunate result is that `getStats` now always returns empty Arrays. Something that everyone, myself included, apparently missed is that you obviously cannot index an Array with Strings :-) I wrongly assumed that the unit-tests would catch any bugs, but they apparently suffered from the same issue as the code in `src/core/`. Another possible option could perhaps be to use `Set`s, rather than objects, but that will require larger changes since `LoopbackPort` (in `src/display/api.js`) doesn't support them.	2019-08-02 14:09:24 +02:00
Tim van der Meij	9c8fe3142a	Merge pull request #11034 from Snuffleupagus/cancel-with-AbortException Ensure that `ReadableStream`s are cancelled with actual Errors	2019-08-02 00:18:44 +02:00
Tim van der Meij	e0b38bed3c	Merge pull request #11029 from brendandahl/pdfjs-telemetry-update [api-minor] Update telemetry to use 'categorical' histograms.	2019-08-02 00:11:02 +02:00
Brendan Dahl	31d71808e7	[api-minor] Update telemetry to use 'categorical' histograms. Firefox telemetry supports using string labels now. Convert our integers that we used for categories to just use strings. The upstream work will happen in: https://bugzilla.mozilla.org/show_bug.cgi?id=1566882	2019-08-01 09:51:02 -07:00
Jonas Jenwald	a3150166ec	Ensure that `ReadableStream`s are cancelled with actual Errors There's a number of spots in the current code, and tests, where `cancel` methods are not called with appropriate arguments (leading to Promises not being rejected with Errors as intended). In some cases the cancel `reason` is implicitly set to `undefined`, and in others the cancel `reason` is just a plain String. To address this inconsistency, the patch changes things such that cancelling is done with `AbortException`s everywhere instead.	2019-08-01 16:40:46 +02:00
Tim van der Meij	d909b86b28	Merge pull request #11020 from Snuffleupagus/issue-11016 Add a work-around, in `glyphlist.js`, for bad PDF generators which use a non-standard `/f_f` string in the `Encoding` dictionary when referring to the ff ligature (issue 11016)	2019-07-31 23:33:34 +02:00
wangsongyan	c61205d980	decode filename when match an urlencode filename from contentDispositionFilename	2019-07-31 09:33:56 +08:00
Jonas Jenwald	9ad50521b1	Add a work-around, in `glyphlist.js`, for bad PDF generators which use a non-standard `/f_f` string in the `Encoding` dictionary when referring to the ff ligature (issue 11016) This patch will not incur any (measurable) overhead, since the glyphlist is already quite long and one more entry won't really matter, which is important given that this sort of PDF corruption ought to be very rare. Furthermore, this patch purposely does not add a bunch of similarly modified ligature names on pure speculation. Any similar additions, for other ligatures, should only be made if there's real-world examples of PDF files where that's actually necessary.	2019-07-30 17:06:58 +02:00
Jonas Jenwald	38ccb43436	Reduce the number of function calls in `EvaluatorPreprocessor.read` For very large and complex PDF files this will help performance slightly, since `EvaluatorPreprocessor.read` is called a lot during parsing in the worker. This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, using the following manifest file: ``` [ { "id": "issue2618", "file": "../web/pdfs/issue2618.pdf", "md5": "", "rounds": 200, "type": "eq" } ] ``` This gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 200 \| 3402 \| 3358 \| -43 \| -1.28 \| faster Firefox \| Page Request \| 200 \| 1 \| 1 \| 0 \| 26.71 \| Firefox \| Rendering \| 200 \| 3401 \| 3357 \| -44 \| -1.28 \| faster ```	2019-07-29 08:43:36 +02:00
Tim van der Meij	9114004d5b	[api-minor] Implement quadpoints for annotations in the core layer	2019-07-28 20:36:21 +02:00
Jonas Jenwald	ff90aa4323	Inline the `isCmd` check in the `Parser.shift` method For very large and complex PDF files this will help performance slightly, since `Parser.shift` is called a lot during parsing. This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471 (with well over four million `Parser.shift` calls for just the one page), using the following manifest file: ``` [ { "id": "issue2618", "file": "../web/pdfs/issue2618.pdf", "md5": "", "rounds": 100, "type": "eq" } ] ``` This gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 100 \| 3386 \| 3322 \| -65 \| -1.92 \| faster Firefox \| Page Request \| 100 \| 1 \| 1 \| 0 \| -8.08 \| Firefox \| Rendering \| 100 \| 3385 \| 3321 \| -65 \| -1.92 \| faster ```	2019-07-22 12:07:36 +02:00
Jonas Jenwald	b5254f2745	Attempt to significantly reduce the number of `ChunkedStream.{ensureByte, ensureRange}` calls by inlining the `this.progressiveDataLength` checks at the call-sites The number of in particular `ChunkedStream.ensureByte` calls is often absolutely huge (on the order of million calls) when loading and rendering even moderately complicated PDF files, which isn't entirely surprising considering that the `getByte`/`getBytes`/`peekByte`/`peekBytes` methods are used for essentially all data reading/parsing. The idea implemented in this patch is to inline an inverted `progressiveDataLength` check at all of the `ensureByte`/`ensureRange` call-sites, which in practice will often result in several orders of magnitude fewer function calls. Obviously this patch will only help if the browser supports streaming, which all reasonably modern browsers now do (including the Firefox built-in PDF viewer), and assuming that the user didn't set the `disableStream` option (e.g. for using `disableAutoFetch`). However, I think we should be able to improve performance for the default out-of-the-box use case, without worrying about e.g. older browsers (where this patch will thus incur one additional check before calling `ensureByte`/`ensureRange`). This patch was inspired by the first commit in PR 5005, which was subsequently backed out in PR 5145 for causing regressions. Since the general idea of avoiding unnecessary function calls was really nice, I figured that re-attempting this in one way or another wouldn't be a bad idea. Given that streaming is now supported, which it wasn't back then, using `progressiveDataLength` seemed like an easier approach in general since it also allowed supporting both `ensureByte` and `ensureRange`. This sort of patch obviously needs data to back it up, hence I've benchmarked the changes using the following manifest file (with the default `tracemonkey` file): ``` [ { "id": "tracemonkey-eq", "file": "pdfs/tracemonkey.pdf", "md5": "9a192d8b1a7dc652a19835f6f08098bd", "rounds": 250, "type": "eq" } ] ``` I get the following complete results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 3500 \| 140 \| 134 \| -6 \| -4.46 \| faster Firefox \| Page Request \| 3500 \| 2 \| 2 \| 0 \| -0.10 \| Firefox \| Rendering \| 3500 \| 138 \| 131 \| -6 \| -4.54 \| faster ``` Here it's pretty clear that the patch does have a positive net effect, even for a PDF file of fairly moderate size and complexity. However, in this case it's probably interesting to also look at the results per page: ``` -- Grouped By page, stat -- page \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ---- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ------ \| ------------- 0 \| Overall \| 250 \| 74 \| 75 \| 1 \| 0.69 \| 0 \| Page Request \| 250 \| 1 \| 1 \| 0 \| 33.20 \| 0 \| Rendering \| 250 \| 73 \| 74 \| 0 \| 0.25 \| 1 \| Overall \| 250 \| 123 \| 121 \| -2 \| -1.87 \| faster 1 \| Page Request \| 250 \| 3 \| 2 \| 0 \| -11.73 \| 1 \| Rendering \| 250 \| 121 \| 119 \| -2 \| -1.67 \| 2 \| Overall \| 250 \| 64 \| 63 \| -1 \| -1.91 \| 2 \| Page Request \| 250 \| 1 \| 1 \| 0 \| 8.81 \| 2 \| Rendering \| 250 \| 63 \| 62 \| -1 \| -2.13 \| faster 3 \| Overall \| 250 \| 97 \| 97 \| 0 \| -0.06 \| 3 \| Page Request \| 250 \| 1 \| 1 \| 0 \| 25.37 \| 3 \| Rendering \| 250 \| 96 \| 95 \| 0 \| -0.34 \| 4 \| Overall \| 250 \| 97 \| 97 \| 0 \| -0.38 \| 4 \| Page Request \| 250 \| 1 \| 1 \| 0 \| -5.97 \| 4 \| Rendering \| 250 \| 96 \| 96 \| 0 \| -0.27 \| 5 \| Overall \| 250 \| 99 \| 97 \| -3 \| -2.92 \| 5 \| Page Request \| 250 \| 2 \| 1 \| 0 \| -17.20 \| 5 \| Rendering \| 250 \| 98 \| 95 \| -3 \| -2.68 \| 6 \| Overall \| 250 \| 99 \| 99 \| 0 \| -0.14 \| 6 \| Page Request \| 250 \| 2 \| 2 \| 0 \| -16.49 \| 6 \| Rendering \| 250 \| 97 \| 98 \| 0 \| 0.16 \| 7 \| Overall \| 250 \| 96 \| 95 \| -1 \| -0.55 \| 7 \| Page Request \| 250 \| 1 \| 2 \| 1 \| 66.67 \| slower 7 \| Rendering \| 250 \| 95 \| 94 \| -1 \| -1.19 \| 8 \| Overall \| 250 \| 92 \| 92 \| -1 \| -0.69 \| 8 \| Page Request \| 250 \| 1 \| 1 \| 0 \| -17.60 \| 8 \| Rendering \| 250 \| 91 \| 91 \| 0 \| -0.52 \| 9 \| Overall \| 250 \| 112 \| 112 \| 0 \| 0.29 \| 9 \| Page Request \| 250 \| 2 \| 1 \| 0 \| -7.92 \| 9 \| Rendering \| 250 \| 110 \| 111 \| 0 \| 0.37 \| 10 \| Overall \| 250 \| 589 \| 522 \| -67 \| -11.38 \| faster 10 \| Page Request \| 250 \| 14 \| 13 \| 0 \| -1.26 \| 10 \| Rendering \| 250 \| 575 \| 508 \| -67 \| -11.62 \| faster 11 \| Overall \| 250 \| 66 \| 66 \| -1 \| -0.86 \| 11 \| Page Request \| 250 \| 1 \| 1 \| 0 \| -16.48 \| 11 \| Rendering \| 250 \| 65 \| 65 \| 0 \| -0.62 \| 12 \| Overall \| 250 \| 303 \| 291 \| -12 \| -4.07 \| faster 12 \| Page Request \| 250 \| 2 \| 2 \| 0 \| 12.93 \| 12 \| Rendering \| 250 \| 301 \| 289 \| -13 \| -4.19 \| faster 13 \| Overall \| 250 \| 48 \| 47 \| 0 \| -0.45 \| 13 \| Page Request \| 250 \| 1 \| 1 \| 0 \| 1.59 \| 13 \| Rendering \| 250 \| 47 \| 46 \| 0 \| -0.52 \| ``` Here it's clear that this patch significantly improves the rendering performance of the slowest pages, while not causing any big regressions elsewhere. As expected, this patch thus helps larger and/or more complex pages the most (which is also where even small improvements will be most beneficial). There's obviously the question if this is slightly regressing simpler pages, but given just how short the times are in most cases it's not inconceivable that the page results above are simply caused be e.g. limited `Date.now()` and/or limited numerical precision.	2019-07-18 17:30:22 +02:00
Tim van der Meij	6e96a158f4	Merge pull request #10820 from vlastimilmaca/annot-irt-rt-states Annotations - Added parsing of IRT, RT, State and StateModel	2019-07-17 23:34:31 +02:00
vlastimilmaca	fe49f0f766	Annotations - Implement parsing of IRT, RT, State and StateModel	2019-07-16 23:33:07 +02:00
Jonas Jenwald	bea15b6ce5	Simplify the `PDFDocument.fingerprint` method slightly The way that this method handles documents without an `ID` entry in the Trailer dictionary feels overly complicated to me. Hence this patch adds `getByteRange` methods to the various Stream implementations[1], and utilize that rather than manually calling `ensureRange` when computing a fallback `fingerprint`. --- [1] Note that `PDFDocument` is only ever initialized with either a `Stream` or a `ChunkedStream`, hence why the `DecodeStream.getByteRange` method isn't implemented.	2019-07-15 13:26:08 +02:00
Tim van der Meij	13ebfec903	Merge pull request #10969 from Snuffleupagus/api-test-stopAtErrors Add an API unit-test for the `stopAtErrors` option (PRs 8240 and 8922 follow-up)	2019-07-14 14:47:57 +02:00
Jonas Jenwald	b548bafef7	Simplify, and inline, the `finalize` function in the `MessageHandler` class The `finalize` helper function has only a single call-site, and furthermore it's just a one-liner too. Furthermore it's only ever called with a `Promise` as its argument, meaning that it's unnecessarily convoluted as well (i.e. the `Promise.resolve()` part shouldn't be necessary). Hence this code can be both simplified and inlined at its only call-site instead.	2019-07-13 17:54:32 +02:00
Jonas Jenwald	c7fb7116d6	Add an API unit-test for the `stopAtErrors` option (PRs 8240 and 8922 follow-up) Also fixes an inconsistency in the 'PageError' handler, for `getOperatorList`, in the API.	2019-07-13 16:06:05 +02:00
Jonas Jenwald	17116917f7	Remove useless `wrapReason` calls in the `MessageHandler` class Currently `wrapReason` is manually called at every `resolveOrReject` call-site, despite it being completely unnecessary unless there's an actual error being handled. This is obviously inefficient, and it's easy enough to avoid by having `resolveOrReject` handle this only when actually needed.	2019-07-13 13:08:29 +02:00
Tim van der Meij	ed3954fc7a	Merge pull request #10851 from brendandahl/shading-bbox Apply bounding box before using shading patterns.	2019-07-12 22:52:07 +02:00
Tim van der Meij	87f36e3520	Merge pull request #10850 from brendandahl/scale-line-width Scale stroking line width when using a tiling pattern.	2019-07-12 22:50:32 +02:00
Tim van der Meij	28326165ff	Merge pull request #10958 from Snuffleupagus/api-rm-receivingOperatorList Remove the `intentState.receivingOperatorList` boolean since it's redundant	2019-07-11 23:55:00 +02:00
Jonas Jenwald	9a4d14bf36	Prevent "Uncaught promise" messages in the console when cancelling `TextLayer` tasks (PR 10601 follow-up) Since `finally` won't stop error propagation, this causes unnecessary messages to be printed in the console whenever a `TextLayer` task is cancelled.	2019-07-11 11:48:33 +02:00
Jonas Jenwald	ef48a9a713	Update the `PageError` handler, in the API, to always mark the `operatorList` as done and finalize any pending renderTasks Note that, in the old code, there was a code-path which could prevent this from happening thus affecting future cleanup. Furthermore, ensure that we'll always attempt to cleanup when handling the 'PageError' message, similar to the code in e.g. the `PDFPageProxy._renderPageChunk` method.	2019-07-10 14:23:59 +02:00
Jonas Jenwald	c6fcdf474b	Remove the `intentState.receivingOperatorList` boolean since it's redundant The `receivingOperatorList` property is currently tracked twice in the rendering code, both directly and inversely through the `intentState.operatorList.lastChunk` boolean. This type of double bookkeeping is never a good idea, since it's just too easy for the properties to accidentally fall out of sync. In this case there's even a `cleanup`-related bug caused by this, which means that `PDFPageProxy._tryCleanup` will never be able to discard any data if there's an error on the worker-thread (as handled through the 'PageError' message). Hence the simplest solution seems, at least to me, to update `PDFPageProxy._tryCleanup` to replace the `intentState.receivingOperatorList` check with a `!intentState.operatorList.lastChunk` check and completely remove the former property.	2019-07-10 14:23:10 +02:00
Brendan Dahl	6fab0a0dac	Apply bounding box before using shading patterns. Fixes #8092	2019-07-08 14:05:48 -07:00
Brendan Dahl	446efab707	Scale stroking line width when using a tiling pattern.	2019-07-08 13:47:54 -07:00
Jonas Jenwald	bdc31f8b50	Make the `find` helper function, in `src/core/document.js`, more efficient by using `peekBytes` rather reading the stream one byte at a time Please note: A a similar change was attempted in PR 5005, but it was subsequently backed out in PR 5069. Unfortunately I don't think anyone ever tried to debug exactly why it didn't work, since it ought to have worked, and having re-tested this now I'm not able to reproduce the problem any more. However, given just how inefficient the current code is, with thousands of strictly unnecessary function calls for each `find` invocation, I'd really like to try fixing this again.	2019-07-06 11:44:17 +02:00
Jonas Jenwald	41745a5996	Reduce the number of `isCmd` calls slightly in the `XRef` class This reduces the total number of function calls, when reading the XRef table respectively when fetching uncompressed XRef entries. Note in particular the `XRef.readXRefTable` method, where there're two back-to-back `isCmd` checks rather than just one.	2019-06-29 16:28:45 +02:00
Tim van der Meij	7fc329d6e3	Merge pull request #10902 from ahuglajbclajep/tiling-pattern-support Implement tiling patterns for the SVG back-end	2019-06-28 12:45:02 +02:00
Tim van der Meij	f1867de492	Merge pull request #10925 from Snuffleupagus/eslint_no-unsanitized Enable the `eslint-plugin-no-unsanitized` ESLint plugin to disallow unsafe usage of e.g. `innerHTML`	2019-06-27 20:32:24 +02:00
ahuglajbclajep	77940dbd86	Implement tiling patterns for the SVG back-end	2019-06-25 16:25:25 +09:00
Jonas Jenwald	f710eb56e4	Change the signature of the `Parser` constructor to take a parameter object A lot of the `new Parser()` call-sites look quite unwieldy/ugly as-is, with a bunch of somewhat randomly ordered arguments, which we can avoid by changing the constructor to accept an object instead. As an added bonus, this provides better documentation without having to add inline argument comments in the code.	2019-06-23 16:01:45 +02:00
Jonas Jenwald	5bb5e7741d	Enable the `eslint-plugin-no-unsanitized` ESLint plugin to disallow unsafe usage of e.g. `innerHTML` See https://github.com/mozilla/eslint-plugin-no-unsanitized Since we've generally never allowed e.g. `innerHTML`, which is enforced during review, there's only one linting failure with this patch. (Which is white-listed, according to the existing comment and the fact that it's test-only code.)	2019-06-23 13:50:30 +02:00
Jonas Jenwald	021e5ffb88	Move `PDFWorkerStream` and related code to its own file Since all other `IPDFStream` implementations live in their own files, it seems reasonable for these to do so as well. Furthermore, converts all of the relevant code to ES6 classes and updates the interface definitions to mark a couple of methods `async`.	2019-06-15 13:05:25 +02:00
Tim van der Meij	06b253d609	Merge pull request #10890 from Snuffleupagus/outline-items-hidden Add support for outline items, in the default viewer, which default to collapsed when the outline is built	2019-06-09 11:35:49 +02:00
Jonas Jenwald	26bc630e19	Add support for outline items, in the default viewer, which default to collapsed when the outline is built The PDF specification supports this feature, which is commonly used in large/long documents (such as the spec itself), and it seems reasonably straightforward to implement; see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G11.2095911	2019-06-07 12:26:23 +02:00
Jonas Jenwald	625af8d2ad	[api-minor] Attempt to reduce memory usage during printing, by always running `cleanup` once rendering has finished Given that `cleanupAfterRender` is already set for large images, when handling 'obj' messages, this patch should thus be safe in general (since otherwise there ought be existing bugs related to cleanup and printing).	2019-06-03 00:29:17 +02:00
Jonas Jenwald	876c962235	Ignore Annotations with too large border `width`s, to prevent the `annotationLayer` from rendering it over the surrounding document (bug 1552113) The border `width` will instead fallback to the default value of `1`, rather than ignoring it altoghether, to also ensure that e.g. `LinkAnnotation`s become clickable as intended. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1552113	2019-06-01 15:51:22 +02:00
Tim van der Meij	209e42043a	Merge pull request #10873 from Snuffleupagus/worker-terminate-clearPrimitiveCaches Ensure that the `Cmd`/`Name`/`Ref` caches are cleared when terminating the worker (PR 10863 follow-up)	2019-05-31 12:56:53 +02:00
Jonas Jenwald	a3742a9f83	Ensure that the `Cmd`/`Name`/`Ref` caches are cleared when terminating the worker (PR 10863 follow-up) Usually when the worker is terminated it will also be completely destroyed/removed, which means that any global caches (such as the ones in `src/core/primitive.js`) should be automatically cleared in the process. However, for certain ways of loading the `pdf.worker.js` file, e.g. passing in a re-usable worker to `getDocument`, using the `workerPort` functionality, or even disabling workers completely (even though this is never a good idea), the worker file may be kept in memory and these caches will not be cleared as expected.	2019-05-30 20:57:28 +02:00
Jonas Jenwald	8857a81c8d	Re-use, rather than re-creating, some `Array`s when resetting them in `src/display/api.js` Calling `someArray = []` will create a new Array, which seems completely unnecessary when it's sufficient to just call `someArray.length = 0` to achieve the same effect. Even though I cannot imagine these particular cases having any noticeable performance impact, similar changes were made in `core/` code years ago since it's apparently more efficient memory wise.	2019-05-30 16:33:05 +02:00
Jani Pehkonen	343b1381a2	Don't clip if path is undefined in SVG back-end	2019-05-28 18:37:15 +03:00
Jonas Jenwald	5e045bcdba	Ensure that the `Cmd`/`Name`/`Ref` caches are cleared when running other `cleanup` code The purpose of these caches is to reduce peak memory usage, by only ever having a single instance of a particular object. However, as-is these caches are never cleared and they will thus remain until the worker is destroyed. This could very well have a negative effect on total memory usage, particularly for large/long documents, hence it seems to make sense to clear out these caches together with various other ones.	2019-05-26 14:29:59 +02:00
Jonas Jenwald	2fe9f3ff8f	Add caching to reduce the number of `Ref` objects This is similar to the existing caching used to reduced the number of `Cmd` and `Name` objects. With the `tracemonkey.pdf` file, this patch changes the number of `Ref` objects as follows (in the default viewer): \| \| Loading the first page \| Loading all the pages \| \|----------\|------------------------\|-------------------------\| \| `master` \| 332 \| 3265 \| \| `patch` \| 163 \| 996 \|	2019-05-26 12:23:37 +02:00
Tim van der Meij	bc1eb49a77	Implement creation date only for markup annotations The specification states that `CreationDate` is only available for markup annotations instead of for all annotation types. Moreover, popup annotations are not markup annotations according to the specification, so the creation date inheritance from the parent annotation is also removed there (note that only the modification date is used in e.g., the viewer).	2019-05-25 15:31:06 +02:00
Tim van der Meij	cf07918ccb	Implement contents for every annotation type The specification states that `Contents` can be available for every annotation types instead of only for markup annotations.	2019-05-18 15:52:17 +02:00
Tim van der Meij	1421b2f205	Merge pull request #10827 from Snuffleupagus/network-streams-class Convert the (remaining) network streams to ES6 classes	2019-05-16 22:04:29 +02:00
Jonas Jenwald	f9769af365	Convert `network.js` to use ES6 classes	2019-05-16 10:08:51 +02:00
Jonas Jenwald	cc661a4d38	Update `fetch_stream.js` to use `const` in more places	2019-05-16 09:15:43 +02:00
Jonas Jenwald	737705264b	Convert `transport_stream.js` to use ES6 classes	2019-05-16 09:15:39 +02:00
Jonas Jenwald	0784c98172	Remove unused `ref` property from the `parameters` object used when creating annotations in `AnnotationFactory._create` The only use-cases for this property was removed in PRs 7570 and 7775, and it's been completely unused ever since the latter one.	2019-05-16 08:33:38 +02:00
Tim van der Meij	c8c937c257	Merge pull request #10794 from janpe2/cidtogidmap-zero Fix glyph at index zero in CIDFontType2 that has a CIDToGIDMap stream	2019-05-15 00:04:39 +02:00
Jonas Jenwald	173fbef05b	Enable the `consistent-return` ESLint rule This rule is already enabled in mozilla-central, and helps ensure more consistent functions/methods, see https://searchfox.org/mozilla-central/rev/b9da45f63cb567244933c77b2c7e827a057d3f9b/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#119-120 Please see https://eslint.org/docs/rules/consistent-return for additional information.	2019-05-11 14:27:21 +02:00
Jani Pehkonen	05c527f035	Fix glyph 0 in CIDFontType2 that has a CIDToGIDMap stream	2019-05-07 18:44:37 +03:00
Tim van der Meij	be1d6626a7	Implement creation/modification date for annotations This includes the information in the core and display layers. The date parsing logic from the document properties is rewritten according to the specification and now includes unit tests. Moreover, missing unit tests for the color of a popup annotation have been added. Finally the styling of the popup is changed slightly to make the text a bit smaller (it's currently quite large in comparison to other viewers) and to make the drop shadow a bit more subtle. The former is done to be able to easily include the modification date in the popup similar to how other viewers do this.	2019-05-05 14:51:03 +02:00
Jonas Jenwald	007fab6ab5	Change `PartialEvaluator.handleColorN` to throw when no valid pattern is found Currently `handleColorN` will fallback to add a completely unparsed/unvalidated operator when no valid pattern was found. This is unfortunate, since it could very easily lead to a couple of different errors: - `DataCloneError`s when attempting to send the data to the main-thread, e.g. when `args` is `Dict`/`Stream`. - Errors in `getShadingPatternFromIR` on the main-thread, unless `args` just happens to have the expected format. - Errors when actually attempting to render the pattern on the main-thread, since the `args` will most likely not have the expected format. Hence it probably makes sense to error in `PartialEvaluator.handleColorN`, and having invalid patterns fail gracefully via the existing `ignoreErrors` code-paths instead.	2019-05-04 12:53:18 +02:00
Tim van der Meij	155304a0c1	Merge pull request #10756 from Snuffleupagus/issue-10542 Attempt to handle corrupt PDF documents that contains path operators inside of text object (issue 10542)	2019-05-02 22:29:24 +02:00
Jonas Jenwald	96942d4f7f	Ensure that the `OperatorList` constructor actually initializes a `NullOptimizer` when intended (PR 9089 follow-up) It appears that this has been broken ever since PR 9089, which also introduced this code, since the `QueueOptimizer`/`NullOptimizer` choice was made based on the still undefined `this.intent` property. Furthermore, fixing this also uncovered the fact that the `NullOptimizer.reset` method was missing.	2019-05-02 17:37:05 +02:00
Jonas Jenwald	5335285cda	Attempt to handle corrupt PDF documents that contains path operators inside of text object (issue 10542) First of all, while this simple approach appears to work OK in practice I'm not sure if it's the best way of addressing the problem (assuming that you even want to). Second of all, while the solution implemented here only requires tracking/checking one new boolean in order for this to work, I'm nonetheless not entirely happy about this since it will add additional overhead (albeit very small) to the parsing of path operators in PDF documents just for a handful of corrupt ones.	2019-04-30 23:35:33 +02:00
Tim van der Meij	762c58e0fc	Merge pull request #10738 from Snuffleupagus/ViewerPreferences-api [api-minor] Add support for ViewerPreferences in the API (issue 10736)	2019-04-20 18:39:32 +02:00
Jonas Jenwald	34952b732e	Add a `getDocId` method to the `idFactory`, in `Page` instances, to avoid passing around `PDFManager` instances unnecessarily (PR 7941 follow-up) This way we can avoid manually building a "document id" in multiple places in `evaluator.js`, and it also let's us avoid passing in an otherwise unnecessary `PDFManager` instance when creating a `PartialEvaluator`.	2019-04-20 13:11:17 +02:00
Tim van der Meij	55d9b35d37	Merge pull request #10727 from Snuffleupagus/type3-image-resources Support (rare) Type3 fonts which contains image resources (issue 10717)	2019-04-18 23:07:26 +02:00
Jonas Jenwald	5e9b606e7b	[Firefox] Avoid displaying the indeterminate loadingBar when `disableStream=true` is set (PR 10714 follow-up) While PR 10714 did address the `disableRange=true` case, it also managed to "break" the `disableStream=true` case instead since the indeterminate loadingBar is now displayed when it shouldn't; sorry about that! The solution is simple enough though, don't attempt to fallback to `_fullRequestReader.onProgress` when handling "incomplete" loading information.	2019-04-16 15:35:42 +02:00
Jonas Jenwald	311bac3ebb	[api-minor] Add support for ViewerPreferences in the API (issue 10736) Please see the specification, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#M11.9.12864.1Heading.71.Viewer.Preferences Furthermore, note that this patch only adds API support and unit-tests but does not attempt to integrate e.g. the `ViewerPreferences -> Direction` property into the viewer (which would be necessary to address issue 10736). The reason for this is that it's not entirely clear to me exactly if/how that could be implemented; e.g. would it be as simple as setting the `dir` attribute on the `viewerContainer` DOM element, or will it be more complicated? There's also the question of how the `ViewerPreferences -> Direction` value interacts with the `PageMode`, and this will generally require a fair bit of manual testing. Since the direction of the entire viewer depends on the browser locale, there's also a somewhat open question regarding what default value to use for different locales. Finally, if the viewer supports `ViewerPreferences -> Direction` then I'm assuming that it will be necessary to allow users to override the default value, which will require (most likely) new `SecondaryToolbar` buttons and icons for those etc. Hence this patch only lays the necessary foundation for eventually addressing issue 10736, but defers the actual implementation until later. (Time permitting, I'll try to look into the viewer part later.)	2019-04-14 14:20:52 +02:00
Tim van der Meij	ae2a4dc3dd	Implement free text annotations	2019-04-13 18:45:22 +02:00
Jonas Jenwald	be604bd195	Support (rare) Type3 fonts which contains image resources (issue 10717) The Type3 font type is not commonly used in PDF documents, as can be seen from telemetry data such as: https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2019-04-09&include_spill=0&keys=__none__!__none__!__none__&max_channel_version=nightly%252F68&measure=PDF_VIEWER_FONT_TYPES&min_channel_version=nightly%252F57&processType=&product=Firefox&sanitize=1&sort_by_value=0&sort_keys=submissions&start_date=2019-03-18&table=0&trim=1&use_submission_date=0 (see also https://github.com/mozilla/pdf.js/wiki/Enumeration-Assignments-for-the-Telemetry-Histograms#pdf_viewer_font_types). Type3 fonts containing image resources are very* rare in practice, usually they only contain path rendering operators, but as the issue shows they unfortunately do exist. Currently these Type3-related image resources are not handled in any special way, and given that fonts are document rather than page specific rendering breaks since the image resources are thus not available to the entire document. Fortunately fixing this isn't too difficult, but it does require adding a couple of Type3-specific code-paths to the `PartialEvaluator`. In order to keep the implementation simple, particularily on the main-thread, these Type3 image resources are completely decoded on the worker-thread to avoid adding too many special cases. This should not cause any issues, only marginally less efficient code, but given how rare this kind of Type3 font is adding premature optimizations didn't seem at all warranted at this point.	2019-04-13 18:27:50 +02:00
Tim van der Meij	17de90b88a	Merge pull request #10694 from Snuffleupagus/main-thread-progressiveDataLength Avoid dispatching range requests to fetch PDF data that's already loaded with streaming (PR 10675 follow-up)	2019-04-13 17:15:01 +02:00
Tim van der Meij	2d0c38d626	Merge pull request #10696 from Snuffleupagus/makeSubStream-ensureByte Update `ChunkedStream.makeSubStream` to actually check if (some) data exists when the `length` parameter is undefined	2019-04-13 17:12:20 +02:00
Jonas Jenwald	a7273c8efe	Avoid dispatching range requests to fetch PDF data that's already loaded with streaming (PR 10675 follow-up) Please note: This patch purposely ignores `src/display/network.js`, since its support for progressive reading depends on the non-standard `moz-chunked-arraybuffer` responseType which is currently in the process of being removed.	2019-04-13 00:26:13 +02:00
Vlastimil Máca	d96267c30c	Annotations - _preparePopup method replaced with MarkupAnnotation base class. This is just refactoring, so it shouldn't break anything. It should move annotation API closer to PDF spec and enable future expansion.	2019-04-12 11:24:21 +02:00
Tim van der Meij	4055d0a302	Implement caret annotations The file `test/pdfs/annotation-caret-ink.pdf` is already available in the repository as a reference test for this since I supplied it for another patch that implemented ink annotations.	2019-04-09 23:39:56 +02:00
Tim van der Meij	ce62373db3	Merge pull request #10674 from timvandermeij/svg-backend-es6 Convert `src/display/svg.js` to ES6 syntax and implement `setRenderingIntent` and `setFlatness` for the SVG backend	2019-04-06 17:15:14 +02:00
Tim van der Meij	5a03b1c0d7	Optimize `convertOpList` in `svg.js` by computing the operator ID mapping only once There is no need to recompute this for every operator list we encounter.	2019-04-06 16:57:31 +02:00
Tim van der Meij	2b18e5a355	Implement `setRenderingIntent` and `setFlatness` for the SVG backend This mirrors the canvas implementation where we ignore these operators. This avoids console spam regarding unimplemented operators we're not interested in. For the Tracemonkey paper, we're now down to one warning about tiling patterns which is in fact a valid one.	2019-04-06 16:57:30 +02:00
Tim van der Meij	47d3620d5a	Convert `src/display/svg.js` to ES6 syntax In particular, this should reduce intermediate string creation by using template strings and reduce variable lookup times by removing unneeded variables and caching `this.current` in more places.	2019-04-06 16:57:30 +02:00
Jonas Jenwald	f0a28b3c0d	[Firefox] Ensure that loading progress is reported, and the loadingBar updated, when `disableRange=true` is set With PR 10675 having fixed the completely broken `disableRange=true` setting in the Firefox version of PDF.js, I couldn't help but noticing that loading progress is never reported properly in that case. Currently loading progress is only reported for the `rangeProgress` chrome-event, which obviously isn't dispatched with `disableRange=true` set. However, the `progressiveRead` chrome-event includes loading progress as well, but this information isn't being used in any way. Furthermore, the `PDFDataRangeTransport.onDataProgress` method wasn't able to handle "complete" loading information, and neither was `PDFDataTransportStream._onProgress` since that method would only ever attempt to report it through a RangeReader (which won't exist when `disableRange=true` is set).	2019-04-06 12:53:33 +02:00
Tim van der Meij	b161050df4	Merge pull request #10709 from Snuffleupagus/pageLayout [api-minor] Add basic support for PageLayout in the API and the viewer	2019-04-05 23:07:32 +02:00
Tim van der Meij	8c8738ea47	Merge pull request #10678 from Snuffleupagus/rm-moz-chunked-arraybuffer Remove `moz-chunked-arraybuffer` support, and related code, from `src/display/network.js`	2019-04-05 22:52:28 +02:00
Jonas Jenwald	7a999d1d67	[api-minor] Add basic support for PageLayout in the API and the viewer Please see the specification, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.2393749, and refer to the inline comments for additional details.	2019-04-05 11:32:01 +02:00
Tim van der Meij	57abddc9ca	Merge pull request #10713 from Snuffleupagus/rm-JSDoc-annotation Remove `src/core/annotation.js` from the `gulp jsdoc` build target	2019-04-04 23:15:02 +02:00
Tim van der Meij	072c5864fb	Merge pull request #10675 from Snuffleupagus/PDFDataTransportStream-disableRange [Firefox regression] Fix `disableRange=true` bug in `PDFDataTransportStream`	2019-04-04 23:07:45 +02:00
Jonas Jenwald	f666395c24	Remove `src/core/annotation.js` from the `gulp jsdoc` build target Note how at https://mozilla.github.io/pdf.js/api/ it's being described as API docs, however `src/core/annotation.js` is not part of the public API. Furthermore, given that the code residing in the `src/core/` folder is run in a worker-thread, it's not even accessible on the main-thread (since `postMessage` is being used to transfer the data). Hence the different API methods simply returns a "proxy" to the underlying data, but not actually the same objects and data structures as in the worker-thread itself; thus it doesn't make a whole lot of sense to expose this in API docs as far as I'm concerned. Finally, the patch fixes a small JSDoc related typo in `src/display/api.js` when referring to the `TextStyle` typedef.	2019-04-04 18:03:08 +02:00
Jonas Jenwald	b40e6723be	Remove `moz-chunked-arraybuffer` support, and related code, from `src/display/network.js` The `moz-chunked-arraybuffer` responseType is a non-standard property, which has been subsumed by the Fetch API, and it's in the process of being removed from Firefox; please see https://bugzilla.mozilla.org/show_bug.cgi?id=1120171 and https://bugzilla.mozilla.org/show_bug.cgi?id=1411865 Please note: Rather than waiting for both `Fetch` and `ReadableStream` to be available in e.g. a Firefox ESR version (which is probably going to be 68 at the earliest), let's just decide that PDF.js release `2.1.266` will be the last one with `moz-chunked-arraybuffer` support and land this patch (since nothing should outright break without it anyway).	2019-04-01 20:48:51 +02:00
Jonas Jenwald	c6ddbd55e2	Add a `progressiveDataLength` fast-path to `ChunkedStream.ensureByte` This is similar to the existing check using in `ChunkedStream.ensureRange`.	2019-03-29 20:00:28 +01:00
Jonas Jenwald	49e8a270c4	Update `ChunkedStream.makeSubStream` to actually check if (some) data exists when the `length` parameter is undefined Note how `XRef.fetchUncompressed`, which is used a lot for most PDF documents, is calling the `makeSubStream` method without providing a `length` argument. In practice this results in the `makeSubStream` method, on the `ChunkedStream` instance, calling the `ensureRange` method with `NaN` as the end position, thus resulting in no data being requested despite it possibly being necessary. This may be quite bad, since in this particular case it will lead to a new `ChunkedStream` being created and also a new `Parser`/`Lexer` instance. Given that it's quite possible that even the very first `Parser.getObj` call could throw `MissingDataException`, this could thus lead to wasted time/resources (since re-parsing is necessary once the data finally arrives). You obviously need to be very careful to not have `ChunkedStream.makeSubStream` accidentally requesting the entire file, hence its `this.end` property is of no use here, but it should be possible to at least check that the `start` of the data is present before any potentially expensive parsing occurs.	2019-03-29 17:20:31 +01:00
Tim van der Meij	b4c3b94592	Merge pull request #6606 from Rob--W/pattern-scaling Improve performance and correctness of Tiling Patterns	2019-03-29 00:01:38 +01:00
Tim van der Meij	f9c58115fc	Merge pull request #10683 from janpe2/type0-noncid-cmap Use CMap in Type0 fonts when CFF is not a CID font	2019-03-28 00:07:08 +01:00
Rob Wu	5985d4069a	TilingPattern: Add comment to explain the implementation	2019-03-27 17:50:46 +01:00
Rob Wu	d3dc8f16b5	TilingPattern: Reverse transform after painting This transform resulted in an incorrectly positioned object when the bounding box's upper-left corner did not start at (0,0), because the translation was not reverted. This patch adds the missing transform. The test file (tiling-pattern-box.pdf) is based on the PDF from #2825. All but the first cube (including the PDF data) have been removed. To trigger the bug that is fixed by this commit, I changed the BBox of the first pattern from "[ 0 0 596 842]" to "[90 0 596 842]". Without this patch, the dashed vertical line that intersects the corners at A and E would disappear.	2019-03-27 17:50:35 +01:00
Rob Wu	a72a8e921f	Avoid extreme sizing / scaling in tiling pattern The new test file (tiling-pattern-large-steps.pdf) was manually created, to have the following characteristics: - Large xstep and ystep (90000) - Page width is 4000 (which is larger than MAX_PATTERN_SIZE) - Visually, the page consists of a red rectangle with a black border, surrounded by a 50 unit white padding. - Before patch: blurry; After patch: sharp Fixes #6496 Fixes #5698 Fixes #1434 Fixes #2825	2019-03-27 17:44:04 +01:00
Jonas Jenwald	9077abc263	Take the `FirstChar`/`LastChar` properties into account when computing the hash in `PartialEvaluator.preEvaluateFont` (issue 10665) Without this some fonts may incorrectly end up with matching `hash`es, thus breaking rendering since we'll not actually try to load/parse some of the fonts.	2019-03-27 16:27:10 +01:00
Jonas Jenwald	a2a824ed01	Don't accidentally use an empty `hash` value when comparing `preEvaluatedFonts` in `PartialEvaluator.loadFont` Note that `PartialEvaluator.preEvaluateFont` will return an empty string when no hash was computed. This will complete short-circuit the `fontAlias` comparison in `PartialEvaluator.loadFont`, since fonts which are totally different will then match if their `hash`es are empty.	2019-03-27 00:54:39 +01:00
Jani Pehkonen	49c6233fbc	Use CMap in Type0 fonts when CFF is not a CID font	2019-03-26 19:38:44 +02:00
Rob Wu	60d4685c10	Refactor TilingPattern - Deduplicate size/scale calculation, by introducing `getSizeAndScale`. - Eliminate unnecessary calculations / variables.	2019-03-26 17:35:23 +01:00
Jonas Jenwald	bb384dd5ed	[Firefox regression] Fix `disableRange=true` bug in `PDFDataTransportStream` Currently if trying to set `disableRange=true` in the built-in PDF Viewer in Firefox, either through `about:config` or via the URL hash, the PDF document will never load. It appears that this has been broken for a couple of years, without anyone noticing. Obviously it's not a good idea to set `disableRange=true`, however it seems that this bug affects the PDF Viewer in Firefox even with default settings: - In the case where `initialData` already contains the entire file, we're forced to dispatch a range request to re-fetch already available data just so that file loading may complete. - (In the case where the data arrives, via streaming, before being specifically requested through `requestDataRange`, we're also forced to re-fetch data unnecessarily.) This part was removed, to reduce the scope/risk of the patch somewhat. In the cases outlined above, we're having to re-fetch already available data thus potentially delaying loading/rendering of PDF files in Firefox (and wasting resources in the process).	2019-03-26 16:34:13 +01:00
wuhao.daraw	1472c10bab	fix: electron enviroment detection	2019-03-26 20:52:49 +08:00
Tim van der Meij	33bfbef6ba	Merge pull request #10635 from timvandermeij/lexer-parser Convert `src/core/parser.js` to ES6 syntax and write more unit tests for the lexer and the parser	2019-03-19 23:17:34 +01:00
Tim van der Meij	7d3cb19571	Convert the `Linearization` class in `src/core/parser.js` to ES6 syntax Moreover, disable `var` usage for this file.	2019-03-17 13:27:45 +01:00
Tim van der Meij	ee3cfb7986	Merge pull request #10646 from terurou/svg-fill Implement linear-gradient, radial-gradient and dummy-pattern in SVGGraphics.	2019-03-17 13:13:45 +01:00
terurou	9c70a3831c	Fix to use radicalGradient.	2019-03-17 10:57:16 +09:00
Tim van der Meij	7c9f1cc518	Merge pull request #10644 from Snuffleupagus/revokeObjectURL Ensure that `blob:` URLs will be revoked when pages are cleaned-up/destroyed (JPEG memory usage)	2019-03-16 19:29:23 +01:00
terurou	c970a4b6ae	Fix copy-paste mistake.	2019-03-16 23:21:56 +09:00
Jonas Jenwald	56eeeea1dc	Re-factor the `getTransfers` helper function into a "private" getter method on the `OperatorList` This function is currently called with the `OperatorList` instance as its argument, hence I cannot think of any good reason for not just moving it into the `OperatorList` properly. (This will also help with other planned changes regarding the `ImageCache` functionality.)	2019-03-16 13:06:51 +01:00
Jonas Jenwald	7273795eb6	Actually transfer eligible ImageMask data, rather than always copying it By transfering `ArrayBuffer`s you can avoid having two copies of the same data, i.e. one copy on each of the worker/main-thread, for data that's used only once on the worker-thread. Note how the code in [`PDFImage.createMask`](`80135378ca/src/core/image.js (L284-L285)`) goes to great lengths to actually enable tranfering of the image data. However in [`PartialEvaluator.buildPaintImageXObject`](`80135378ca/src/core/evaluator.js (L336)`) the `cached` property is always set to `true`, which disqualifies the image data from being transfered; see [`getTransfers`](`80135378ca/src/core/operator_list.js (L552-L554)`). For most ImageMask data this patch won't matter, since images found in the `/Resources -> /XObject` dictionary will always be indexed by name. However for inline images which contains ImageMask data, where only "small" images are cached (in both `parser.js` and `evaluator.js`), the current code will result in some unnecessary memory usage.	2019-03-16 13:06:32 +01:00
terurou	fc0f844539	Implement linear-gradient, radial-gradient and dummy-pattern in SVGGraphics.	2019-03-16 13:56:29 +09:00
Jonas Jenwald	88d5750030	Remove the `src` attribute from `Image` objects used with natively supported JPEG images, when pages are cleaned-up/destroyed This will further help reduce the amount of image data that's currently being held alive, by explicitly removing the `src` attribute. Please note that this is mostly relevant for browsers which do not support `URL.createObjectURL`, or where `disableCreateObjectURL` was manually set by the user, since `blob:` URLs will be revoked (see the previous patch). However, using `about:memory` (in Firefox) it does seem that this may also be generally helpful, given that calling `URL.revokeObjectURL` won't invalidate the image data itself (as far as I can tell).	2019-03-15 15:25:48 +01:00
Jonas Jenwald	983b25f863	Ensure that `blob:` URLs will be revoked when pages are cleaned-up/destroyed Natively supported JPEG images are sent as-is, using a `blob:` or possibly a `data` URL, to the main-thread for loading/decoding. However there's currently no attempt at releasing these resources, which are held alive by `blob:` URLs, which seems unfortunately given that images can be arbitrarily large. As mentioned in https://developer.mozilla.org/en-US/docs/Web/API/URL/createObjectURL the lifetime of these URLs are tied to the document, hence they are not being removed when a page is cleaned-up/destroyed (e.g. when being removed from the `PDFPageViewBuffer` in the viewer). This is easy to test with the help of `about:memory` (in Firefox), which clearly shows the number of `blob:` URLs becomming arbitrarily large without this patch. With this patch however the `blob:` URLs are immediately release upon clean-up as expected, and the memory consumption should thus be considerably reduced for long documents with (simple) JPEG images.	2019-03-15 10:40:58 +01:00
Tim van der Meij	80135378ca	Merge pull request #10636 from Snuffleupagus/PDFDocumentProxy-destroy Small clean-up of the `PDFDocumentProxy.destroy` method and related code	2019-03-13 23:46:41 +01:00
Jonas Jenwald	24fc4f83ca	Small clean-up of the `PDFDocumentProxy.destroy` method and related code Note how `PDFDocumentProxy.destroy` is a nothing more than an alias for `PDFDocumentLoadingTask.destroy`. While removing the latter method would be a breaking API change, there's still room for at least some clean-up here. The main changes in this patch are: - Stop providing a `PDFDocumentLoadingTask` instance separately when creating a `PDFDocumentProxy`, since the loadingTask is already available through the `WorkerTransport` instance. - Stop tracking the `PDFDocumentProxy` instance on the `WorkerTransport`, since that property is completely unused. - Simplify the 'Multiple `getDocument` instances' unit-tests by only destroying once, rather than twice, for each document.	2019-03-12 13:25:29 +01:00
Jonas Jenwald	88f9e633dd	Try to improve text-selection for Type3 fonts that utilize a non-default /FontMatrix (bug 1513120) For Type3 fonts text-selection is often not that great, and there's a couple of heuristics used to try and improve things. This patch simple extends those heuristics a bit, and fixes a pre-existing "naive" array comparison, but this all feels a bit brittle to say the least. The existing Type3 test-coverage isn't that great in general, and in particular Type3 `text` tests are few and far between, hence why this patch adds two different new `text` tests.	2019-03-12 10:32:08 +01:00
Tim van der Meij	8d4d7dbf58	Convert the `Lexer` class in `src/core/parser.js` to ES6 syntax	2019-03-10 19:04:36 +01:00
Tim van der Meij	7d0ecee771	Convert the `Parser` class in `src/core/parser.js` to ES6 syntax	2019-03-10 19:04:35 +01:00
Tim van der Meij	d587abbceb	Merge pull request #10633 from Snuffleupagus/murmurhash-class Convert `MurmurHash3_64` to an ES6 class	2019-03-09 21:07:12 +01:00
Jonas Jenwald	6b1ac44aea	Convert `MurmurHash3_64` to an ES6 class Notable changes: - Remove the `return this;` from the `MurmurHash3_64.update` method, since it's completely unused and doesn't make a lot of sense. - Remove the loop(s) from the `MurmurHash3_64.hexdigest` method, since creating a temporary array and then looping over it is wasteful given how simple this can be written with modern JavaScript.	2019-03-09 17:03:06 +01:00
Jonas Jenwald	2665502055	Move `NativeImageDecoder` into a separate file, and convert it to a `class` Given the size of the `src/core/evaluator.js` file, it cannot hurt to move some of its (image related) helper functionality into a separate file.	2019-03-09 15:59:04 +01:00
Tim van der Meij	e41c4aece4	Merge pull request #10621 from janpe2/svg-Tm-stroke Don't scale SVG stroke width by text matrix	2019-03-08 23:16:10 +01:00
Tim van der Meij	8b149b818e	Merge pull request #10615 from Snuffleupagus/corrupt-inline-ASCII85Decode Handle corrupt ASCII85Decode inline images with whitespace "inside" of the EOD marker (issue 10614)	2019-03-08 23:06:01 +01:00
Tim van der Meij	e1b01a601c	Merge pull request #10605 from timvandermeij/display-utils Convert `let` to `const` if possible in, and improve unit test coverage for, `src/display/display_utils.js`	2019-03-06 23:46:53 +01:00
Tim van der Meij	87a70f3359	Convert `let` to `const` if possible in `src/display/display_utils.js` Finally, `var` usage is removed.	2019-03-06 23:41:54 +01:00
Jani Pehkonen	d9e30b3452	Don't scale SVG stroke width by text matrix	2019-03-05 22:54:25 +02:00
Jonas Jenwald	3ce8fe7927	Handle corrupt ASCII85Decode inline images with whitespace "inside" of the EOD marker (issue 10614) There's a number of things wrong with the PDF document, since its inline images are first all a lot larger than the 4 KB limit (as mandated by the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.1852045). Furthermore the actual ASCII85Decode data is interspersed with a lot of needless whitespace, in particular also "inside" of the EOD (end-of-data) marker which thus completely breaks the detection. Note that according to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.1940130, this patch should be safe since it explicitly mentions that all whitespace should be ignored.	2019-03-04 23:41:36 +01:00
Jonas Jenwald	7caf769a66	Move the `deprecated` helper function to the `src/display/display_utils.js` file Given that the function is (purposely) independent of the verbosity level and that its message is worded to only apply on the main-thread, there's no reason to duplicate this across the built `pdf.js`/`pdf.worker.js` files.	2019-03-02 20:23:56 +01:00
Jonas Jenwald	4170c414fa	Reduce usage of `Date.now()` in `src/core/worker.js` Currently for every single parsed/rendered page there's no less than four `Date.now()` calls being made on the worker-side. This seems totally unnecessary, since the result of these calls are, by default, not used for anything unless the verbosity level is set to `INFO`.	2019-03-02 20:23:52 +01:00
Tim van der Meij	c43396c2b7	Merge pull request #10590 from janpe2/svg-missing-moveto Fix missing moveTos in SVG paths	2019-03-02 14:43:53 +01:00
Tim van der Meij	4f13eb00d0	Merge pull request #10604 from brendandahl/fix-type1-charset Put the string name of the glyph in the charset array.	2019-03-02 13:03:16 +01:00
Brendan Dahl	7d6ab081eb	Put the string name of the glyph in the charset array. Also, only warn once per font when missing a glyph name.	2019-03-01 18:03:51 -08:00
Jonas Jenwald	d7d1f23826	Zero the width/height of the temporary canvas used during `TextLayer` rendering The default size of these canvases seem to be `300 x 150` (two orders of magnitude larger than the ones in PR 10597), which probably is sufficient enough to matter since there's one such canvas for each textLayer that's rendered in the viewer. Also fixes the incorrect rejection reason, i.e. one using a string rather than an `Error`, in the `TextLayerRenderTask.cancel` method.	2019-03-01 04:05:37 +01:00
Brendan Dahl	34022d2fd1	Merge pull request #10591 from brendandahl/fix-charset Add unique glyph names for CFF fonts.	2019-02-28 17:22:29 -08:00
Tim van der Meij	9559d57636	Merge pull request #10595 from Snuffleupagus/JpegDecode-zero-tmpCanvas Zero the width/height of the temporary canvas used during `JpegDecode` (issue 10594)	2019-02-28 23:41:22 +01:00
Tim van der Meij	39fa26ea33	Merge pull request #10597 from Snuffleupagus/isFontSubpixelAAEnabled-canvas-cleanup Ensure that the temporary canvas created in `CanvasGraphics.isFontSubpixelAAEnabled` will be cleared	2019-02-28 23:37:24 +01:00
Tim van der Meij	af5597b7e5	Merge pull request #10573 from Snuffleupagus/type3-avoid-truncation Avoid truncating/breaking some Type3 glyphs in `compileType3Glyph` (bug 1245391, issue 10568)	2019-02-28 23:25:45 +01:00
Jonas Jenwald	b61b4d3229	Ensure that the temporary canvas created in `CanvasGraphics.isFontSubpixelAAEnabled` will be cleared While this particular canvas may be small, there can still be an arbitrarily large number of them (one per page rendered), which can/will eventually add up memory wise. This can be easily avoided by using the `cachedCanvases` abstraction instead, which will ensure that the `isFontSubpixelAAEnabled` canvas is removed together with other temporary canvases in `CanvasGraphics.endDrawing`.	2019-02-28 14:18:38 +01:00
Jonas Jenwald	4687cc85ac	Zero the width/height of the temporary canvas used during `JpegDecode` (issue 10594)	2019-02-28 12:23:34 +01:00
Brendan Dahl	8a596ef5d5	Add unique glyph names for CFF fonts. Printing on MacOS was broken with the previous approach of just mapping all the glyphs to notdef.	2019-02-27 15:00:29 -08:00
Jonas Jenwald	f664e074c9	Avoid using the Fetch API, in `GENERIC` builds, for unsupported protocols (issue 10587)	2019-02-27 13:04:20 +01:00
Jonas Jenwald	cbc07f985b	Load built-in CMap files using the Fetch API when possible	2019-02-27 13:04:19 +01:00
Jani Pehkonen	52e8e9b059	Fix missing moveTos in SVG paths	2019-02-26 20:00:35 +02:00
Jonas Jenwald	3a09a2f7a5	Update the year in the `license_header` files	2019-02-24 00:35:42 +01:00
Jonas Jenwald	db5dc14158	Move worker-thread only functions from `src/shared/util.js` and into a new `src/core/core_utils.js` file The `src/shared/util.js` file is being bundled into both the `pdf.js` and `pdf.worker.js` files, meaning that its code is by definition duplicated. Some main-thread only utility functions have already been moved to a separate `src/display/display_utils.js` file, and this patch simply extends that concept to utility functions which are used only on the worker-thread. Note in particular the `getInheritableProperty` function, which expects a `Dict` as input and thus cannot possibly ever be used on the main-thread.	2019-02-24 00:35:39 +01:00
Jonas Jenwald	a1f7517996	Rename the `src/display/dom_utils.js` file to `src/display/display_utils.js` This file (currently) contains not only DOM-specific helper functions/classes, but is used generally for various helper code relevant for main-thread functionality.	2019-02-23 16:30:16 +01:00
Jonas Jenwald	fb774a65b0	Avoid truncating/breaking some Type3 glyphs in `compileType3Glyph` (bug 1245391, issue 10568) Hopefully this patch makes sense, since I cannot claim to fully understand this function. With the changes made in PR 3354 some Type3 glyph outlines are no longer rendering correctly, since the final paths were being accidentally ignored. The fact that Type3 fonts are not very common in PDF documents, and that most Type3 glyphs are unaffected by this regression, probably explains why this has gone unnoticed since 2013.	2019-02-21 23:29:43 +01:00
Jonas Jenwald	60f6d49ff7	[api-minor] Expose the existence of a `Collection` dictionary via the `getMetadata` API method (issue 10555) Given the complexity of this functionality, and the fact that it doesn't seem widely used, I highly doubt that it'd ever make sense to support Collections; see also https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#M11.9.39646.2Heading.824.Collections	2019-02-15 15:40:31 +01:00
Jonas Jenwald	b6d090cc14	Fallback to the built-in font renderer when font loading fails After PR 9340 all glyphs are now re-mapped to a Private Use Area (PUA) which means that if a font fails to load, for whatever reason[1], all glyphs in the font will now render as Unicode glyph outlines. This obviously doesn't look good, to say the least, and might be seen as a "regression" since previously many glyphs were left in their original positions which provided a slightly better fallback[2]. Hence this patch, which implements a general fallback to the PDF.js built-in font renderer for fonts that fail to load (i.e. are rejected by the sanitizer). One caveat here is that this only works for the Font Loading API, since it's easy to handle errors in that case[3]. The solution implemented in this patch does not in any way delay the loading of valid fonts, which was the problem with my previous attempt at a solution, and will only require a bit of extra work/waiting for those fonts that actually fail to load. Please note: This patch doesn't fix any of the underlying PDF.js font conversion bugs that's responsible for creating corrupt font files, however it does improve rendering in a number of cases; refer to this possibly incomplete list: [Bug 1524888](https://bugzilla.mozilla.org/show_bug.cgi?id=1524888) Issue 10175 Issue 10232 --- [1] Usually because the PDF.js font conversion code wasn't able to parse the font file correctly. [2] Glyphs fell back to some default font, which while not accurate was more useful than the current state. [3] Furthermore I'm not sure how to implement this generally, assuming that's even possible, and don't really have time/interest to look into it either.	2019-02-11 10:27:08 +01:00

... 2 3 4 5 6 ...

3813 Commits