pdf.js

Author	SHA1	Message	Date
Tim van der Meij	d9cd890228	Update packages	2019-08-24 20:08:09 +02:00
Tim van der Meij	ce1acff5f0	Update translations	2019-08-24 20:05:47 +02:00
Tim van der Meij	56ae7a6690	Merge pull request #11069 from Snuffleupagus/getoplist-stream Use streams for OperatorList chunking (issue 10023)	2019-08-24 19:31:00 +02:00
Jonas Jenwald	711040ecc5	Stop re-throwing errors in the 'GetOperatorList' and 'GetTextContent' handlers, in `src/core/worker.js` These functions aren't returning anything, now that they're using `ReadableStream`s, and it thus doesn't seem necessary to re-throw errors (also given the console message that's caused by it).	2019-08-24 15:56:41 +02:00
Yury Delendik	66e0dd1b06	Use streams for OperatorList chunking (issue 10023) Please note: The majority of this patch was written by Yury, and it's simply been rebased and slightly extended to prevent issues when dealing with `RenderingCancelledException`. By leveraging streams this (finally) provides a simple way in which parsing can be aborted on the worker-thread, which will ultimately help save resources. With this patch worker-thread parsing will only be aborted when the document is destroyed, and not when rendering is cancelled. There's a couple of reasons for this: - The API currently expects the entire OperatorList to be extracted, or an Error to occur, once it's been started. Hence additional re-factoring/re-writing of the API code will be necessary to properly support cancelling and re-starting of OperatorList parsing in cases where the `lastChunk` hasn't yet been seen. - Even with the above addressed, immediately cancelling when encountering a `RenderingCancelledException` will lead to worse performance in e.g. the default viewer. When zooming and/or rotation of the document occurs it's very likely that `cancel` will be (almost) immediately followed by a new `render` call. In that case you'd obviously not want to abort parsing on the worker-thread, since then you'd risk throwing away a partially parsed Page and thus be forced to re-parse it again which will regress perceived performance. - This patch is already somewhat risky, given that it touches fundamentally important/critical code, and trying to keep it somewhat small should hopefully reduce the risk of regressions (and simplify reviewing as well). Time permitting, once this has landed and been in Nightly for awhile, I'll try to work on the remaining points outlined above. Co-Authored-By: Yury Delendik <ydelendik@mozilla.com> Co-Authored-By: Jonas Jenwald <jonas.jenwald@gmail.com>	2019-08-24 15:56:40 +02:00
Tim van der Meij	ee75fc1298	Merge pull request #11092 from Snuffleupagus/textLayer-expandTextDivs-padding [TextLayer] Use an Array to build the total `padding`, rather than concatenating Strings, in `expandTextDivs`	2019-08-24 14:45:21 +02:00
Tim van der Meij	42213f6a2c	Merge pull request #11093 from Priestch/shorthand_after_print Shorthand afterPrint signature in app.js	2019-08-24 14:36:53 +02:00
Priestch	000780d27e	Use shorthand method signature for `afterPrint` in `web/app.js`	2019-08-24 18:26:25 +08:00
Jonas Jenwald	29a2516e4c	[TextLayer] Use an Array to build the total `padding`, rather than concatenating Strings, in `expandTextDivs` Furthermore, it's possible to re-use the same Array for all `textDiv`s on the page and the resulting padding string also becomes a lot more compact. Please note that the `paddingLeft` branch was moved, since the padding values need to be ordered as `top, right, bottom, left`. Finally, with this re-factoring it's no longer necessary to cache the original `style` string for every `textDiv` when `enhanceTextSelection` is enabled.	2019-08-24 01:13:59 +02:00
Tim van der Meij	edbebb8bf7	Merge pull request #11090 from Snuffleupagus/textLayer-expandTextDivs-transform [TextLayer] Use an Array to build the total `transform`, rather than concatenating Strings, in `expandTextDivs`	2019-08-23 23:12:42 +02:00
Tim van der Meij	d1ef08e147	Merge pull request #11091 from Snuffleupagus/textLayer-expandTextDivs-valid-padding [TextLayer] Only handle positive padding values in `expandTextDivs`	2019-08-23 23:08:39 +02:00
Jonas Jenwald	932fcacff8	[TextLayer] Only handle positive padding values in `expandTextDivs` Given that browsers will reject padding values smaller than zero (which may be caused by limited numerical precision during calculations in the `expand` code), it makes no sense to include those when expanding the `textDiv`s.	2019-08-23 13:16:20 +02:00
Jonas Jenwald	37e8a8189b	[TextLayer] Use an Array to build the total `transform`, rather than concatenating Strings, in `expandTextDivs` Furthermore, it's possible to re-use the same Array for all `textDiv`s on the page.	2019-08-23 12:17:12 +02:00
Tim van der Meij	490deb1b65	Merge pull request #11086 from Snuffleupagus/textLayer-originalTransform [TextLayer] Only cache the `originalTransform` when `enhanceTextSelection` is enabled	2019-08-22 23:09:07 +02:00
Brendan Dahl	31f319301d	Merge pull request #11087 from brendandahl/disable-links Add a way to disable external links.	2019-08-22 11:13:11 -07:00
Jonas Jenwald	a519ceffee	[TextLayer] Use template strings when updating the font property in the `_layoutText` method	2019-08-22 14:47:44 +02:00
Jonas Jenwald	6afe3221b7	[TextLayer] Only cache the `originalTransform` when `enhanceTextSelection` is enabled Given that this is completely unused in "regular" text-selection mode, there's no reason to unconditionally store one string for every `textDiv`.	2019-08-22 14:47:18 +02:00
Brendan Dahl	98e989116c	Add a way to disable external links.	2019-08-21 11:20:41 -07:00
Tim van der Meij	52c6b3c138	Merge pull request #11079 from Snuffleupagus/textLayer-memory [TextLayer] Only cache the current `textDiv` style when `enhanceTextSelection` is enabled and use template strings in `expandTextDivs``	2019-08-20 22:48:10 +02:00
Tim van der Meij	78f9ab53fc	Merge pull request #11081 from dhuang612/document-pdfjs-dist/webpack added in information about pdfjs/webpack	2019-08-20 22:38:58 +02:00
dhuang612	d52d1e2d09	added in information about pdfjs/webpack updated readme with corrections	2019-08-20 10:20:32 -04:00
Jonas Jenwald	431a264126	[TextLayer] Reduce the amount of intermediary strings in `expandTextDivs` By using template strings, we can avoid some unnecessary string allocations (which is also helped by shortening a variable name).	2019-08-19 12:09:18 +02:00
Jonas Jenwald	45dfad8640	[TextLayer] Only cache the current `textDiv` style when `enhanceTextSelection` is enabled This will help save a little bit of memory, by not storing one unused string for each `textDiv` in regular text-selection mode.	2019-08-19 11:02:56 +02:00
Tim van der Meij	852fc955bd	Merge pull request #11076 from Snuffleupagus/XRef-fetch-isRef/cache Replace the `XRef.cache` Array with a Map instead	2019-08-18 14:44:31 +02:00
Jonas Jenwald	1cd9a28c81	Replace the `XRef.cache` Array with a Map instead Given that the different types of `Stream`s will never be cached, this thus implies that the `XRef.cache` Array will always be more-or-less sparse. Generally speaking, the longer the document the more sparse the `XRef.cache` will thus become. For example, looking at the `pdf.pdf` file from the test-suite: The length of the `XRef.cache` Array will be a few hundred thousand elements, with approximately 95% of them being empty. Hence it seems pretty clear that an Array isn't really the best data-structure for this kind of cache, and this patch thus changes it to a Map instead. This patch-series was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file: ``` [ { "id": "issue2618", "file": "../web/pdfs/issue2618.pdf", "md5": "", "rounds": 200, "type": "eq" } ] ``` which gave the following results when comparing this patch-series against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 200 \| 2736 \| 2736 \| 1 \| 0.02 \| Firefox \| Page Request \| 200 \| 2 \| 2 \| 0 \| -8.26 \| faster Firefox \| Rendering \| 200 \| 2733 \| 2734 \| 1 \| 0.03 \| ```	2019-08-18 12:07:18 +02:00
Jonas Jenwald	34a53b9f5d	Inline the `isRef` checks in the various `XRef.fetch` related methods The relevant methods are usually not hot enough for these changes to have an easily measurable effect, however there's been a lot of other cases where similiar inlining has helped performance. (And these changes may help offset the changes made in the next patch.)	2019-08-18 11:57:48 +02:00
Tim van der Meij	1565d1849d	Merge pull request #11073 from brendandahl/code-point Move polyfill for codePointAt to String prototype.	2019-08-17 13:26:35 +02:00
Brendan Dahl	c8129b8787	Move polyfill for codePointAt to String prototype. This method belongs on the prototype not the String object.	2019-08-16 14:32:43 -07:00
Tim van der Meij	20181b65d4	Merge pull request #11070 from Snuffleupagus/Parser-getObj-rm-isString Inline the `isString` check in the `Parser.getObj` method	2019-08-16 22:54:55 +02:00
Jonas Jenwald	40d3916f31	Reduce the number of temporary variables in the `Parser.getObj` method This avoids allocating approximately 1.7 million short-lived variables when loading the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, in the default viewer.	2019-08-16 13:51:41 +02:00
Jonas Jenwald	7728a6630c	Inline the `isString` check in the `Parser.getObj` method For very large and complex PDF files this will help performance slightly, since `Parser.getObj` is called a lot during parsing in the worker. This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file: ``` [ { "id": "issue2618", "file": "../web/pdfs/issue2618.pdf", "md5": "", "rounds": 200, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 200 \| 2847 \| 2830 \| -17 \| -0.60 \| faster Firefox \| Page Request \| 200 \| 2 \| 2 \| 0 \| -7.14 \| Firefox \| Rendering \| 200 \| 2844 \| 2827 \| -17 \| -0.60 \| faster ```	2019-08-16 10:34:24 +02:00
Tim van der Meij	02dcd20263	Merge pull request #11064 from Snuffleupagus/Util-class Convert the `src/shared/util.js` file to ES6 syntax	2019-08-11 20:57:11 +02:00
Jonas Jenwald	7f456b3e2e	Replace of all usages of `var` with `let`/`const` in the `src/shared/util.js` file Also removes a couple of unnecessary (temporary) variable assigments in `arraysToBytes` and uses template strings in a few spots.	2019-08-11 14:35:35 +02:00
Jonas Jenwald	f6c4a1f080	Convert `Util` to a class with static methods Also replaces `var` with `const` in all the relevant code.	2019-08-11 14:35:35 +02:00
Jonas Jenwald	7ee370a394	Remove the `skipEmpty` parameter from `Util.intersect` (PR 11059 follow-up) Looking at this again, it struck me that added functionality in `Util.intersect` is probably more confusing than helpful in general; sorry about the churn in this code! Based on the parameter name you'd probably expect it to only match when the intersection is `[0, 0, 0, 0]` and not when only one component is zero, hence the `skipEmpty` parameter thus feels too tightly coupled to the `Page.view` getter.	2019-08-11 14:33:52 +02:00
Tim van der Meij	85acc9acac	Merge pull request #11062 from Snuffleupagus/misc-viewer-cleanup Miscellaneous small clean-up of code in the `web/` folder	2019-08-11 13:48:35 +02:00
Jonas Jenwald	446ce88f81	Remove the non-`PRODUCTION` only 'disablebcmaps' hash parameter This was added in PR 4470, but doesn't appear to have been used since. While it's certainly easy to understand how this was helpful during development of that PR, actually providing this hash parameter isn't going to work anymore given that the original CMap files were also removed from the repository. I suppose that the hash parameter could be useful if you'd attempt to update the BCMap files, however that hasn't been attempted even once in over five years time. Furthermore, at this point using the `AppOptions` directly in that situation should also work fine. All in all, this seems like a piece of old and unused code which we can simply remove now.	2019-08-10 15:40:33 +02:00
Jonas Jenwald	04a3dc65e4	Move the sidebar `toggleButton` event listener into `PDFSidebar` This is consistent with other functionality, such as e.g. `SecondaryToolbar` and `PDFFindBar`.	2019-08-10 15:38:33 +02:00
Tim van der Meij	fbe8c6127c	Merge pull request #11059 from Snuffleupagus/boundingBox-more-validation Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries	2019-08-09 22:39:01 +02:00
Tim van der Meij	85b570adfb	Merge pull request #11057 from Snuffleupagus/issue-11052 Handle some corrupt/truncated JPEG images that are missing the EOI (End of Image) marker (issue 11052)	2019-08-09 22:02:34 +02:00
Jonas Jenwald	d637b25e36	Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it. Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly". The patch makes the following notable changes: - Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.) - Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer. - Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`. - Add an optional parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty. - Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange. --- [1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.	2019-08-09 10:18:13 +02:00
Jonas Jenwald	0f78fdb229	Handle some corrupt/truncated JPEG images that are missing the EOI (End of Image) marker (issue 11052) Note that even Adobe Reader cannot render the PDF file completely, which is always a good indication that it's corrupt.	2019-08-08 10:37:41 +02:00
Tim van der Meij	aaef00ce5d	Merge pull request #11051 from Snuffleupagus/view-array-compare Actually compare the `cropBox` and `mediaBox` correctly in the `Page.view` getter	2019-08-07 23:16:02 +02:00
Jonas Jenwald	e9b7996f2f	Actually compare the `cropBox` and `mediaBox` correctly in the `Page.view` getter The current code will only consider the `cropBox` and `mediaBox` as equal when they both point to the same underlying Array. In the case where a PDF file actually specifies both boxes independently, with the exact same values in each, the comparison will currently fail and lead to an unneeded intersection computation.	2019-08-07 17:15:57 +02:00
Tim van der Meij	bb5e98195d	Merge pull request #11047 from Snuffleupagus/issue-11045 Support corrupt PDF files with invalid/non-existent Group /CS entries (issue 11045)	2019-08-06 22:50:36 +02:00
Brendan Dahl	e1aed05c44	Merge pull request #11049 from brendandahl/telem-add-time Add page rendered timestamp to telemetry.	2019-08-06 11:53:08 -07:00
Brendan Dahl	47077f8de9	Add page rendered timestamp to telemetry.	2019-08-06 09:46:33 -07:00
Jonas Jenwald	5ac9c7c384	Support corrupt PDF files with invalid/non-existent Group /CS entries (issue 11045) The PDF file in question tries to reference a non-existent ColorSpace, which should be quite rare in practice.	2019-08-06 14:33:05 +02:00
Tim van der Meij	a666f1ef00	Merge pull request #11048 from Snuffleupagus/linkService-compact-pagesRefCache Use more compact keys in `PDFLinkService._pagesRefCache`	2019-08-05 22:44:00 +02:00
Jonas Jenwald	9acaaf5126	Use more compact keys in `PDFLinkService._pagesRefCache` By using the same internal formatting here as in the `Ref.toString` method, in `src/core/primitives.js`, all cache-keys will become at least two bytes shorter (and most three bytes shorter). Obviously this won't have a huge effect on memory since there's only one cache entry per page, but it nonetheless seems wasteful to use longer keys than strictly required.	2019-08-05 18:06:32 +02:00

1 2 3 4 5 ...

11778 Commits