Commit Graph

12856 Commits

Author SHA1 Message Date
Tim van der Meij
cd6d089489
Merge pull request #11926 from Snuffleupagus/GlobalImageCache-clear-onlyData
Allow `GlobalImageCache.clear` to, optionally, only remove the actual data (PR 11912 follow-up)
2020-05-23 12:21:38 +02:00
Jonas Jenwald
8af70d75aa Allow GlobalImageCache.clear to, optionally, only remove the actual data (PR 11912 follow-up)
When "Cleanup" is triggered, you obviously need to remove all globally cached data on *both* the main- and worker-threads.
However, the current the implementation of the `GlobalImageCache.clear` method also means that we lose *all* information about which images were cached and not just their data. This thus has the somewhat unfortunate side-effect of requiring images, which were previously known to be "global", to *again* having to reach `NUM_PAGES_THRESHOLD` before being cached again.

To avoid doing unnecessary parsing after "Cleanup", we can thus let `GlobalImageCache.clear` keep track of which images were cached while still removing their actual data. This should not have any significant impact on memory usage, since the only extra thing being kept is a `RefSetCache` (essentially an Object) with a couple of `Set`s containing only integers.
2020-05-23 11:30:24 +02:00
Tim van der Meij
973f39b558
Merge pull request #11924 from Snuffleupagus/issue-11922
Avoid hanging the worker-thread for CMap data with ridiculously large ranges (issue 11922)
2020-05-23 00:32:12 +02:00
Jonas Jenwald
56ebf01ae0 Avoid hanging the worker-thread for CMap data with ridiculously large ranges (issue 11922)
This patch was inspired by ad2b64f124/xpdf/CharCodeToUnicode.cc (L480-L484)
2020-05-22 15:23:17 +02:00
Jonas Jenwald
ebef67b354 Stop building any src/ files during the gulp default_preferences task
With the changes made in the previous patch, the `web/app_options.js` file no longer depends on anything *except* files residing in the `web/` folder. Hence the `gulp default_preferences` task can now be further simplified and thus becomes even faster than before; see also PR 11724.
2020-05-22 00:22:48 +02:00
Jonas Jenwald
18e0b10d3c [api-minor] Remove the disableCreateObjectURL option from the getDocument parameters, since it's now unused in the API
With the changes in previous patches, the `disableCreateObjectURL` option/functionality is no longer used for anything in the API and/or in the Worker code.

Note however that there's some functionality, mainly related to file loading/downloading, in the GENERIC version of the default viewer which still depends on this option.
Hence the `disableCreateObjectURL` option (and related compatibility code) is moved into the viewer, see e.g. `web/app_options.js`, such that it's still available in the default viewer.
2020-05-22 00:22:48 +02:00
Jonas Jenwald
cc4cc8b11b Remove the, now unused, releaseImageResources helper function
With the changes in the previous patch, this is now dead code which should thus be removed.
2020-05-22 00:22:48 +02:00
Jonas Jenwald
0351852d74 [api-minor] Decode all JPEG images with the built-in PDF.js decoder in src/core/jpg.js
Currently some JPEG images are decoded by the built-in PDF.js decoder in `src/core/jpg.js`, while others attempt to use the browser JPEG decoder. This inconsistency seem unfortunate for a number of reasons:

 - It adds, compared to the other image formats supported in the PDF specification, a fair amount of code/complexity to the image handling in the PDF.js library.

 - The PDF specification support JPEG images with features, e.g. certain ColorSpaces, that browsers are unable to decode natively. Hence, determining if a JPEG image is possible to decode natively in the browser require a non-trivial amount of parsing. In particular, we're parsing (part of) the raw JPEG data to extract certain marker data and we also need to parse the ColorSpace for the JPEG image.

 - While some JPEG images may, for all intents and purposes, appear to be natively supported there's still cases where the browser may fail to decode some JPEG images. In order to support those cases, we've had to implement a fallback to the PDF.js JPEG decoder if there's any issues during the native decoding. This also means that it's no longer possible to simply send the JPEG image to the main-thread and continue parsing, but you now need to actually wait for the main-thread to indicate success/failure first.
   In practice this means that there's a code-path where the worker-thread is forced to wait for the main-thread, while the reverse should *always* be the case.

 - The native decoding, for anything except the *simplest* of JPEG images, result in increased peak memory usage because there's a handful of short-lived copies of the JPEG data (see PR 11707).
Furthermore this also leads to data being *parsed* on the main-thread, rather than the worker-thread, which you usually want to avoid for e.g. performance and UI-reponsiveness reasons.

 - Not all environments, e.g. Node.js, fully support native JPEG decoding. This has, historically, lead to some issues and support requests.

 - Different browsers may use different JPEG decoders, possibly leading to images being rendered slightly differently depending on the platform/browser where the PDF.js library is used.

Originally the implementation in `src/core/jpg.js` were unable to handle all of the JPEG images in the test-suite, but over the last couple of years I've fixed (hopefully) all of those issues.
At this point in time, there's two kinds of failure with this patch:

 - Changes which are basically imperceivable to the naked eye, where some pixels in the images are essentially off-by-one (in all components), which could probably be attributed to things such as different rounding behaviour in the browser/PDF.js JPEG decoder.
   This type of "failure" accounts for the *vast* majority of the total number of changes in the reference tests.

 - Changes where the JPEG images now looks *ever so slightly* blurrier than with the native browser decoder. For quite some time I've just assumed that this pointed to a general deficiency in the `src/core/jpg.js` implementation, however I've discovered when comparing two viewers side-by-side that the differences vanish at higher zoom levels (usually around 200% is enough).
   Basically if you disable [this downscaling in canvas.js](8fb82e939c/src/display/canvas.js (L2356-L2395)), which is what happens when zooming in, the differences simply vanish!
   Hence I'm pretty satisfied that there's no significant problems with the `src/core/jpg.js` implementation, and the problems are rather tied to the general quality of the downscaling algorithm used. It could even be seen as a positive that *all* images now share the same downscaling behaviour, since this actually fixes one old bug; see issue 7041.
2020-05-22 00:22:48 +02:00
Tim van der Meij
4a3a24b002
Merge pull request #11912 from Snuffleupagus/GlobalImageCache
Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
2020-05-21 23:54:28 +02:00
Jonas Jenwald
dda6626f40 Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]

Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.

In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.

Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]

*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.

---
[1] There's e.g. PDF documents that use the same image as background on all pages.

[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.

[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-21 18:13:45 +02:00
Tim van der Meij
604a6f96aa
Merge pull request #11919 from Snuffleupagus/less-SystemJS
Reduce usage of SystemJS, in the development viewer, even further
2020-05-20 14:50:26 +02:00
Jonas Jenwald
8d56a69e74 Reduce usage of SystemJS, in the development viewer, even further
With these changes SystemJS is now only used, during development, on the worker-thread and in the unit/font-tests, since Firefox is currently missing support for worker modules; please see https://bugzilla.mozilla.org/show_bug.cgi?id=1247687

Hence all the JavaScript files in the `web/` and `src/display/` folders are now loaded *natively* by the browser (during development) using standard `import` statements/calls, thanks to a nice `import-maps` polyfill.

*Please note:* As soon as https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 is fixed in Firefox, we should be able to remove all traces of SystemJS and thus finally be able to use every possible modern JavaScript feature.
2020-05-20 13:36:52 +02:00
Tim van der Meij
a5c60cdd31
Merge pull request #11914 from Snuffleupagus/less-require
Convert the `src/pdf.js` and `src/pdf.worker.js` files to use standard `import`/`export` statements
2020-05-20 13:28:44 +02:00
Jonas Jenwald
e2c3312416 Convert the src/pdf.js and src/pdf.worker.js files to use standard import/export statements
As part of reducing our reliance on SystemJS in the development viewer, this patch replaces usage of `require` statements with modern standards `import`/`export` statements instead.

If we want to try and move forward with reducing usage of SystemJS, we don't have much choice but to make these kind changes (despite what prior test-results showed, however I'm no longer able to reproduce the issues locally).
2020-05-20 13:18:23 +02:00
Jonas Jenwald
d4d933538b Re-factor setPDFNetworkStreamFactory, in src/display/api.js, to also accept an asynchronous function
As part of trying to reduce the usage of SystemJS in the development viewer, this patch is a necessary step that will allow removal of some `require` statements.

Currently this uses `SystemJS.import` in non-PRODUCTION mode, but it should be possible to replace those with standard *dynamic* `import` calls in the future.
2020-05-20 13:18:18 +02:00
Tim van der Meij
0960e6c0b5
Merge pull request #11917 from Snuffleupagus/bug-1632644
[Firefox] Allow PDF attachments to, once again, be opened directly in the browser (bug 1632644)
2020-05-20 12:55:20 +02:00
Jonas Jenwald
93e7f630c1 Remove unnecessary empty string fallback from the getPDFFileNameFromURL call in web/pdf_document_properties.js (PR 10114 follow-up)
Given that the `getPDFFileNameFromURL` helper function has a specific code-path for handling non-string inputs, this empty string fallback really isn't necessary at the call-site in `web/pdf_document_properties.js`.
2020-05-20 12:09:04 +02:00
Jonas Jenwald
108258a8f8 [Firefox] Allow PDF attachments to, once again, be opened directly in the browser (bug 1632644)
Apparently the old link format used in MOZCENTRAL-builds, with the blob URL separated from the filename with a `?` character violates the specification; see https://bugzilla.mozilla.org/show_bug.cgi?id=1632644#c5

Obviously just removing the `?`-part of the URL would have worked, but that would also have meant that we'd no longer be able to provide the correct filename when the user attempts to download the opened PDF attachment.
To fix this we'll instead append the filename in the hash-part of the URL, which however required using a *custom* hash-parameter to avoid triggering the fallback "named destination" code-paths in the viewer.

Note that only changing the `web/pdf_attachment_viewer.js` file wasn't sufficient to fix the bug, and we also need to tweak the `webViewerInitialized` function in `web/app.js` since MOZCENTRAL-builds used to ignore *everything* in the URL hash.
This particular code is very old, but changing it *should* be completely safe given that the `PDFViewerApplication.setTitleUsingUrl` method since some time now stores both the original URL (in `this.url`) as well as one without the hash (in `this.baseUrl`). The latter one is already used everywhere where it matters, so this change seem fine to me.

This patch thus restores the original behaviour for PDF attachments in the MOZCENTRAL-build, by once again allowing them to be opened *directly* in the browser without downloading. (The fallback added in PR 11845 is obviously kept, since it seems generally useful to have.)
2020-05-20 12:08:59 +02:00
Tim van der Meij
6ffcedc24b
Merge pull request #11911 from Snuffleupagus/getDefaultPreferences-rm-SystemJS
Remove the SystemJS dependency from the `web/preferences.js` file
2020-05-16 23:48:28 +02:00
Jonas Jenwald
8f24415a46 Remove the SystemJS dependency from the web/preferences.js file
Originally the `default_preferences.json` file was checked into the repository, and we thus needed to load it in non-PRODUCTION mode (which was originally done asynchronously using `XMLHttpRequest`). Over the years a lot has changed and the `default_preferences.json` file is now built, by the `gulp default_preferences` task, from the `web/app_options.js` file. Hence it's no longer necessary, in non-PRODUCTION mode, to use SystemJS here since we can simply use a standard `import` statement instead.

Note how e.g. `web/app.js` already imports from `web/app_options.js` in the same exact way that `web/preferences.js` now does, hence this patch will *not* result in any significant changes in the built/bundled viewer file.

This is another (small) part in trying to reduce usage of SystemJS, with the goal of hopefully getting rid of it completely. (I've started working on this, and doing so has identified a number of problem areas; this patch addresses one of them.)
2020-05-16 16:22:15 +02:00
Tim van der Meij
34218ed192
Merge pull request #11910 from Snuffleupagus/update-packages
Update packages and translations
2020-05-16 14:31:20 +02:00
Jonas Jenwald
c12c92e598 Update l10n files 2020-05-16 11:47:08 +02:00
Jonas Jenwald
4f6664f3f5 Update npm packages 2020-05-16 11:44:41 +02:00
Jonas Jenwald
887d2f2948 Update the eslint-plugin-unicorn package 2020-05-16 11:43:21 +02:00
Tim van der Meij
15087c35d1
Merge pull request #11905 from Snuffleupagus/less-require
Reduce the usage of `require` statements in code-paths not protected by pre-processor and/or run-time checks
2020-05-15 11:28:10 +02:00
Jonas Jenwald
ec0ab91a2b Reduce the usage of require statements in code-paths not protected by pre-processor and/or run-time checks
This replaces some additional `require`/`exports` usage with standard `import`/`export` statements instead.
Hence another, small, part in the effort to reduce the reliance on SystemJS-specific functionality in the development viewer.
2020-05-14 15:57:49 +02:00
Tim van der Meij
8b9492a5c4
Merge pull request #11892 from Snuffleupagus/minified-es5
Add a `minified-es5` gulp task (issue 11858)
2020-05-11 23:17:52 +02:00
Tim van der Meij
fd80bc8178
Merge pull request #11890 from Snuffleupagus/eslint-7
Update ESLint to version 7
2020-05-11 23:11:08 +02:00
Jonas Jenwald
9b71ccb13b Add a minified-es5 gulp task (issue 11858)
By re-factoring the existing gulp tasks, most of the code can be re-used for both the existing `gulp minified` as well as the new `gulp minified-es5` task.
2020-05-10 13:41:42 +02:00
Jonas Jenwald
8440958bcf Ensure that the DEFINES build target constants, in gulpfile.js, cannot be changed 2020-05-10 13:38:58 +02:00
Jonas Jenwald
9118cea9f7 Enable the ESLint default-case-last rule, and tweak the existing use-isnan rule
These changes were made possible by ESLint version 7, and neither of these rules required any code changes.

Please find additional details about the ESLint rules at https://eslint.org/docs/rules/default-case-last and https://eslint.org/docs/rules/use-isnan
2020-05-10 11:33:44 +02:00
Jonas Jenwald
f8bff283f3 Update ESLint to version 7
Please see https://eslint.org/blog/2020/05/eslint-v7.0.0-released for a list of notable changes.
2020-05-10 11:32:46 +02:00
Tim van der Meij
1ee63dc465
Merge pull request #11889 from Snuffleupagus/_parsedAnnotations-move-catch
Handle errors individually for each annotation in the `_parsedAnnotations` getter
2020-05-10 00:22:13 +02:00
Jonas Jenwald
73636e052a Handle errors individually for each annotation in the _parsedAnnotations getter
While working on PR 11872, it occurred to me that it probably wouldn't be a bad idea to change the `_parsedAnnotations` getter to handle errors individually for each annotation. This way, one broken/corrupt annotation won't prevent the rest of them from being e.g. fetched through the API.
2020-05-09 12:33:39 +02:00
Tim van der Meij
7823d593f9
Merge pull request #11880 from Snuffleupagus/issue-11875
Attempt to respect the "zoom" hash parameter, even when the "nameddest" parameter is present (issue 11875)
2020-05-08 23:42:12 +02:00
Tim van der Meij
bf2ce760f0
Merge pull request #11873 from Snuffleupagus/eslint-assert
Use the ESLint `no-restricted-syntax` rule to ensure that `assert` is always called with two arguments
2020-05-08 00:01:30 +02:00
Tim van der Meij
9c341cfec6
Merge pull request #11879 from Snuffleupagus/eslint-grouped-accessor-pairs
Enable the ESLint `grouped-accessor-pairs` rule
2020-05-07 23:53:26 +02:00
Jonas Jenwald
af1bb04662 Attempt to respect the "zoom" hash parameter, even when the "nameddest" parameter is present (issue 11875)
Given that the `PDFLinkService.setHash` method itself if completely synchronous, moving the handling of "nameddest" to occur last *shouldn't* cause any problems (famous last words).
This way the destination will still override any previous parameter, such as e.g. the "page", as expected. Furthermore, given that the `PDFLinkService.navigateTo` method is asynchronous that should provide additional guarantees that the "nameddest" parameter is always respected.

As sort-of expected, this fairly innocent looking change also required some tweaks in the `PDFHistory` to prevent dummy history entires upon document load (only an issue when both "page" *and* "nameddest" parameters are provided in the hash).
2020-05-07 13:53:07 +02:00
Jonas Jenwald
744af9eeb8 Enable the ESLint grouped-accessor-pairs rule
This rule complements the existing `accessor-pairs` nicely, and ensures that a getter/setter pair is always consistently ordered.

Please find additional details about this rule at https://eslint.org/docs/rules/grouped-accessor-pairs
2020-05-07 11:43:19 +02:00
Jonas Jenwald
e1f340a0c2 Use the ESLint no-restricted-syntax rule to ensure that assert is always called with two arguments
Having `assert` calls without a message string isn't very helpful when debugging, and it turns out that it's easy enough to make use of ESLint to enforce better `assert` call-sites.
In a couple of cases the `assert` calls were changed to "regular" throwing of errors instead, since that seemed more appropriate.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-restricted-syntax
2020-05-05 13:40:05 +02:00
Tim van der Meij
491904d30a
Merge pull request #11872 from Snuffleupagus/issue-11871
Gracefully handle annotation parsing errors in `Page.getOperatorList` (issue 11871)
2020-05-04 22:19:27 +02:00
Tim van der Meij
c32f145c94
Merge pull request #11863 from brendandahl/unsupported-features
Add more categories of unsupported features.
2020-05-04 22:09:13 +02:00
Brendan Dahl
b1be33c96f Add more categories of unsupported features.
Fixes #11815
2020-05-04 11:02:16 -07:00
Jonas Jenwald
4aabd063fc Gracefully handle annotation parsing errors in Page.getOperatorList (issue 11871)
This should ensure that a page will always render successfully, even if there's errors during the Annotation fetching/parsing.
Additionally the `OperatorList.addOpList` method is also adjusted to ignore invalid data, to make it slightly more robust.
2020-05-04 17:09:48 +02:00
Tim van der Meij
2711f4bc8c
Merge pull request #11869 from Snuffleupagus/gulpfile-cleanup
Various smaller clean-up in `gulpfile.js`
2020-05-03 16:14:16 +02:00
Jonas Jenwald
a9e7798ac6 Split the createBundle helper function, in gulpfile.js, into separate ones for the main/worker-thread files
All of the other *similar* helper functions only target one file per function, and there's no particular reason for this one to be different.
This patch will simplify future changes, e.g. experimenting with using `gulp watch` instead of SystemJS for the development viewer.
2020-05-03 11:34:08 +02:00
Jonas Jenwald
21495c1dd1 Remove the gulp bundle task since it's unused and doesn't really make sense
Not only is there no code depending on it now, the actual task itself doesn't even make sense as-is. Note that it uses the default `DEFINES` configuration *unaltered*, which is neither useful nor correct since the resulting build thus won't make sense without an actual built target set.
2020-05-03 11:34:02 +02:00
Tim van der Meij
d822578450
Merge pull request #11868 from Snuffleupagus/update-packages
Update packages and translations
2020-05-02 14:52:12 +02:00
Jonas Jenwald
30bd3b24c2 Update l10n files 2020-05-02 13:25:28 +02:00
Jonas Jenwald
8fac59de96 Update npm packages 2020-05-02 13:23:41 +02:00