pdf.js

Author	SHA1	Message	Date
Tim van der Meij	4a3a24b002	Merge pull request #11912 from Snuffleupagus/GlobalImageCache Attempt to cache repeated images at the document, rather than the page, level (issue 11878)	2020-05-21 23:54:28 +02:00
Jonas Jenwald	dda6626f40	Attempt to cache repeated images at the document, rather than the page, level (issue 11878) Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the same images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1] Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2] However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages. In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be twenty copies of the image data). While this obviously benefit both CPU and memory usage in this case, for very large image data this patch may possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will only cache a certain number of image resources at the document level and simply fallback to the default behaviour. Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3] Please note: The patch will lead to small movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator. --- [1] There's e.g. PDF documents that use the same image as background on all pages. [2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer. [3] If the latter case were true, we could simply check for repeat images before parsing started and thus avoid handling any duplicate image resources.	2020-05-21 18:13:45 +02:00
Tim van der Meij	604a6f96aa	Merge pull request #11919 from Snuffleupagus/less-SystemJS Reduce usage of SystemJS, in the development viewer, even further	2020-05-20 14:50:26 +02:00
Jonas Jenwald	8d56a69e74	Reduce usage of SystemJS, in the development viewer, even further With these changes SystemJS is now only used, during development, on the worker-thread and in the unit/font-tests, since Firefox is currently missing support for worker modules; please see https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 Hence all the JavaScript files in the `web/` and `src/display/` folders are now loaded natively by the browser (during development) using standard `import` statements/calls, thanks to a nice `import-maps` polyfill. Please note: As soon as https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 is fixed in Firefox, we should be able to remove all traces of SystemJS and thus finally be able to use every possible modern JavaScript feature.	2020-05-20 13:36:52 +02:00
Tim van der Meij	a5c60cdd31	Merge pull request #11914 from Snuffleupagus/less-require Convert the `src/pdf.js` and `src/pdf.worker.js` files to use standard `import`/`export` statements	2020-05-20 13:28:44 +02:00
Jonas Jenwald	e2c3312416	Convert the `src/pdf.js` and `src/pdf.worker.js` files to use standard `import`/`export` statements As part of reducing our reliance on SystemJS in the development viewer, this patch replaces usage of `require` statements with modern standards `import`/`export` statements instead. If we want to try and move forward with reducing usage of SystemJS, we don't have much choice but to make these kind changes (despite what prior test-results showed, however I'm no longer able to reproduce the issues locally).	2020-05-20 13:18:23 +02:00
Jonas Jenwald	d4d933538b	Re-factor `setPDFNetworkStreamFactory`, in src/display/api.js, to also accept an asynchronous function As part of trying to reduce the usage of SystemJS in the development viewer, this patch is a necessary step that will allow removal of some `require` statements. Currently this uses `SystemJS.import` in non-PRODUCTION mode, but it should be possible to replace those with standard dynamic `import` calls in the future.	2020-05-20 13:18:18 +02:00
Tim van der Meij	0960e6c0b5	Merge pull request #11917 from Snuffleupagus/bug-1632644 [Firefox] Allow PDF attachments to, once again, be opened directly in the browser (bug 1632644)	2020-05-20 12:55:20 +02:00
Jonas Jenwald	93e7f630c1	Remove unnecessary empty string fallback from the `getPDFFileNameFromURL` call in `web/pdf_document_properties.js` (PR 10114 follow-up) Given that the `getPDFFileNameFromURL` helper function has a specific code-path for handling non-string inputs, this empty string fallback really isn't necessary at the call-site in `web/pdf_document_properties.js`.	2020-05-20 12:09:04 +02:00
Jonas Jenwald	108258a8f8	[Firefox] Allow PDF attachments to, once again, be opened directly in the browser (bug 1632644) Apparently the old link format used in MOZCENTRAL-builds, with the blob URL separated from the filename with a `?` character violates the specification; see https://bugzilla.mozilla.org/show_bug.cgi?id=1632644#c5 Obviously just removing the `?`-part of the URL would have worked, but that would also have meant that we'd no longer be able to provide the correct filename when the user attempts to download the opened PDF attachment. To fix this we'll instead append the filename in the hash-part of the URL, which however required using a custom hash-parameter to avoid triggering the fallback "named destination" code-paths in the viewer. Note that only changing the `web/pdf_attachment_viewer.js` file wasn't sufficient to fix the bug, and we also need to tweak the `webViewerInitialized` function in `web/app.js` since MOZCENTRAL-builds used to ignore everything in the URL hash. This particular code is very old, but changing it should be completely safe given that the `PDFViewerApplication.setTitleUsingUrl` method since some time now stores both the original URL (in `this.url`) as well as one without the hash (in `this.baseUrl`). The latter one is already used everywhere where it matters, so this change seem fine to me. This patch thus restores the original behaviour for PDF attachments in the MOZCENTRAL-build, by once again allowing them to be opened directly in the browser without downloading. (The fallback added in PR 11845 is obviously kept, since it seems generally useful to have.)	2020-05-20 12:08:59 +02:00
Tim van der Meij	6ffcedc24b	Merge pull request #11911 from Snuffleupagus/getDefaultPreferences-rm-SystemJS Remove the SystemJS dependency from the `web/preferences.js` file	2020-05-16 23:48:28 +02:00
Jonas Jenwald	8f24415a46	Remove the SystemJS dependency from the `web/preferences.js` file Originally the `default_preferences.json` file was checked into the repository, and we thus needed to load it in non-PRODUCTION mode (which was originally done asynchronously using `XMLHttpRequest`). Over the years a lot has changed and the `default_preferences.json` file is now built, by the `gulp default_preferences` task, from the `web/app_options.js` file. Hence it's no longer necessary, in non-PRODUCTION mode, to use SystemJS here since we can simply use a standard `import` statement instead. Note how e.g. `web/app.js` already imports from `web/app_options.js` in the same exact way that `web/preferences.js` now does, hence this patch will not result in any significant changes in the built/bundled viewer file. This is another (small) part in trying to reduce usage of SystemJS, with the goal of hopefully getting rid of it completely. (I've started working on this, and doing so has identified a number of problem areas; this patch addresses one of them.)	2020-05-16 16:22:15 +02:00
Tim van der Meij	34218ed192	Merge pull request #11910 from Snuffleupagus/update-packages Update packages and translations	2020-05-16 14:31:20 +02:00
Jonas Jenwald	c12c92e598	Update l10n files	2020-05-16 11:47:08 +02:00
Jonas Jenwald	4f6664f3f5	Update `npm` packages	2020-05-16 11:44:41 +02:00
Jonas Jenwald	887d2f2948	Update the `eslint-plugin-unicorn` package	2020-05-16 11:43:21 +02:00
Tim van der Meij	15087c35d1	Merge pull request #11905 from Snuffleupagus/less-require Reduce the usage of `require` statements in code-paths not protected by pre-processor and/or run-time checks	2020-05-15 11:28:10 +02:00
Jonas Jenwald	ec0ab91a2b	Reduce the usage of `require` statements in code-paths not protected by pre-processor and/or run-time checks This replaces some additional `require`/`exports` usage with standard `import`/`export` statements instead. Hence another, small, part in the effort to reduce the reliance on SystemJS-specific functionality in the development viewer.	2020-05-14 15:57:49 +02:00
Tim van der Meij	8b9492a5c4	Merge pull request #11892 from Snuffleupagus/minified-es5 Add a `minified-es5` gulp task (issue 11858)	2020-05-11 23:17:52 +02:00
Tim van der Meij	fd80bc8178	Merge pull request #11890 from Snuffleupagus/eslint-7 Update ESLint to version 7	2020-05-11 23:11:08 +02:00
Jonas Jenwald	9b71ccb13b	Add a `minified-es5` gulp task (issue 11858) By re-factoring the existing gulp tasks, most of the code can be re-used for both the existing `gulp minified` as well as the new `gulp minified-es5` task.	2020-05-10 13:41:42 +02:00
Jonas Jenwald	8440958bcf	Ensure that the `DEFINES` build target constants, in `gulpfile.js`, cannot be changed	2020-05-10 13:38:58 +02:00
Jonas Jenwald	9118cea9f7	Enable the ESLint `default-case-last` rule, and tweak the existing `use-isnan` rule These changes were made possible by ESLint version 7, and neither of these rules required any code changes. Please find additional details about the ESLint rules at https://eslint.org/docs/rules/default-case-last and https://eslint.org/docs/rules/use-isnan	2020-05-10 11:33:44 +02:00
Jonas Jenwald	f8bff283f3	Update ESLint to version 7 Please see https://eslint.org/blog/2020/05/eslint-v7.0.0-released for a list of notable changes.	2020-05-10 11:32:46 +02:00
Tim van der Meij	1ee63dc465	Merge pull request #11889 from Snuffleupagus/_parsedAnnotations-move-catch Handle errors individually for each annotation in the `_parsedAnnotations` getter	2020-05-10 00:22:13 +02:00
Jonas Jenwald	73636e052a	Handle errors individually for each annotation in the `_parsedAnnotations` getter While working on PR 11872, it occurred to me that it probably wouldn't be a bad idea to change the `_parsedAnnotations` getter to handle errors individually for each annotation. This way, one broken/corrupt annotation won't prevent the rest of them from being e.g. fetched through the API.	2020-05-09 12:33:39 +02:00
Tim van der Meij	7823d593f9	Merge pull request #11880 from Snuffleupagus/issue-11875 Attempt to respect the "zoom" hash parameter, even when the "nameddest" parameter is present (issue 11875)	2020-05-08 23:42:12 +02:00
Tim van der Meij	bf2ce760f0	Merge pull request #11873 from Snuffleupagus/eslint-assert Use the ESLint `no-restricted-syntax` rule to ensure that `assert` is always called with two arguments	2020-05-08 00:01:30 +02:00
Tim van der Meij	9c341cfec6	Merge pull request #11879 from Snuffleupagus/eslint-grouped-accessor-pairs Enable the ESLint `grouped-accessor-pairs` rule	2020-05-07 23:53:26 +02:00
Jonas Jenwald	af1bb04662	Attempt to respect the "zoom" hash parameter, even when the "nameddest" parameter is present (issue 11875) Given that the `PDFLinkService.setHash` method itself if completely synchronous, moving the handling of "nameddest" to occur last shouldn't cause any problems (famous last words). This way the destination will still override any previous parameter, such as e.g. the "page", as expected. Furthermore, given that the `PDFLinkService.navigateTo` method is asynchronous that should provide additional guarantees that the "nameddest" parameter is always respected. As sort-of expected, this fairly innocent looking change also required some tweaks in the `PDFHistory` to prevent dummy history entires upon document load (only an issue when both "page" and "nameddest" parameters are provided in the hash).	2020-05-07 13:53:07 +02:00
Jonas Jenwald	744af9eeb8	Enable the ESLint `grouped-accessor-pairs` rule This rule complements the existing `accessor-pairs` nicely, and ensures that a getter/setter pair is always consistently ordered. Please find additional details about this rule at https://eslint.org/docs/rules/grouped-accessor-pairs	2020-05-07 11:43:19 +02:00
Jonas Jenwald	e1f340a0c2	Use the ESLint `no-restricted-syntax` rule to ensure that `assert` is always called with two arguments Having `assert` calls without a message string isn't very helpful when debugging, and it turns out that it's easy enough to make use of ESLint to enforce better `assert` call-sites. In a couple of cases the `assert` calls were changed to "regular" throwing of errors instead, since that seemed more appropriate. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-restricted-syntax	2020-05-05 13:40:05 +02:00
Tim van der Meij	491904d30a	Merge pull request #11872 from Snuffleupagus/issue-11871 Gracefully handle annotation parsing errors in `Page.getOperatorList` (issue 11871)	2020-05-04 22:19:27 +02:00
Tim van der Meij	c32f145c94	Merge pull request #11863 from brendandahl/unsupported-features Add more categories of unsupported features.	2020-05-04 22:09:13 +02:00
Brendan Dahl	b1be33c96f	Add more categories of unsupported features. Fixes #11815	2020-05-04 11:02:16 -07:00
Jonas Jenwald	4aabd063fc	Gracefully handle annotation parsing errors in `Page.getOperatorList` (issue 11871) This should ensure that a page will always render successfully, even if there's errors during the Annotation fetching/parsing. Additionally the `OperatorList.addOpList` method is also adjusted to ignore invalid data, to make it slightly more robust.	2020-05-04 17:09:48 +02:00
Tim van der Meij	2711f4bc8c	Merge pull request #11869 from Snuffleupagus/gulpfile-cleanup Various smaller clean-up in `gulpfile.js`	2020-05-03 16:14:16 +02:00
Jonas Jenwald	a9e7798ac6	Split the `createBundle` helper function, in gulpfile.js, into separate ones for the main/worker-thread files All of the other similar helper functions only target one file per function, and there's no particular reason for this one to be different. This patch will simplify future changes, e.g. experimenting with using `gulp watch` instead of SystemJS for the development viewer.	2020-05-03 11:34:08 +02:00
Jonas Jenwald	21495c1dd1	Remove the `gulp bundle` task since it's unused and doesn't really make sense Not only is there no code depending on it now, the actual task itself doesn't even make sense as-is. Note that it uses the default `DEFINES` configuration unaltered, which is neither useful nor correct since the resulting build thus won't make sense without an actual built target set.	2020-05-03 11:34:02 +02:00
Tim van der Meij	d822578450	Merge pull request #11868 from Snuffleupagus/update-packages Update packages and translations	2020-05-02 14:52:12 +02:00
Jonas Jenwald	30bd3b24c2	Update l10n files	2020-05-02 13:25:28 +02:00
Jonas Jenwald	8fac59de96	Update `npm` packages	2020-05-02 13:23:41 +02:00
Tim van der Meij	822939cace	Merge pull request #11864 from Snuffleupagus/rm-react-example Remove the `create-react-app` example (issue 11729)	2020-05-01 23:52:41 +02:00
Jonas Jenwald	3dc0567a37	Remove the `create-react-app` example (issue 11729) Given that none of the PDF.js contributors know React, maintaining and/or providing supporting for the example isn't really feasible unfortunately. Even something as simple as running/testing the example becomes difficult for anyone completely unfamiliar with React, and furthermore: - It's very difficult to tell if the example demonstrates React best-practices, since the PDF.js contributors don't know React. - We also have no reasonable way of keeping the example up-to-date with changes in React. - The React example, in its current form, is even hard-coding the PDF.js version to a now unsupported version. - The example is currently triggering "fake worker" usage, see issue 11729, which is really really bad. Note that the "fake worker" functionality is only intended as a fallback, and it should absolutely not under any circumstances be advertised and certainly shouldn't be triggered in official PDF.js examples.	2020-05-01 12:42:35 +02:00
Tim van der Meij	b6f69d47b6	Merge pull request #11834 from xelan/feature/preserve-error-types Preserve error types during translation	2020-04-28 23:47:24 +02:00
Andreas Erhard	f5fd24a61f	Preserve error types during translation By preserving the exception type, more fine-grained error handling can be performed via client-side logic (e.g. redirect to a search page if a PDF is not found, or to a ticket system in case of invalid PDF files). The original exception is now re-thrown. Fixes #11658	2020-04-28 09:36:30 +02:00
Tim van der Meij	8fb82e939c	Merge pull request #11853 from timvandermeij/acroform-names Include the name for interactive form elements	2020-04-27 17:05:06 +02:00
roccobeno	371e699905	Include the name for interactive form elements We already rendered the name for radio buttons, but it was missing for all other interactive form elements. This commit adds that so that values entered in form elements can be read based on the element name.	2020-04-27 16:55:35 +02:00
Tim van der Meij	d469b420a7	Merge pull request #11807 from timvandermeij/puppeteer Introduce Puppeteer for handling browsers during tests	2020-04-27 13:39:30 +02:00
Tim van der Meij	9ebb18f505	Implement a command line flag to skip Chrome when running tests To save time or resources during development it can be useful to run tests only in Firefox. Previously this could be done by editing the browser manifest file, but since that file is no longer used for Puppeteer, this command line flag replaces it. For example, executing `gulp unittest --noChrome` will only run the unit tests in Firefox.	2020-04-27 13:03:12 +02:00

1 2 3 4 5 ...

12548 Commits