Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	81525fd446	Use ESLint to ensure that `export`s are sorted alphabetically There's built-in ESLint rule, see `sort-imports`, to ensure that all `import`-statements are sorted alphabetically, since that often helps with readability. Unfortunately there's no corresponding rule to sort `export`-statements alphabetically, however there's an ESLint plugin which does this; please see https://www.npmjs.com/package/eslint-plugin-sort-exports The only downside here is that it's not automatically fixable, but the re-ordering is a one-time "cost" and the plugin will help maintain a consistent ordering of `export`-statements in the future. Note: To reduce the possibility of introducing any errors here, the re-ordering was done by simply selecting the relevant lines and then using the built-in sort-functionality of my editor.	2021-01-09 20:37:51 +01:00
Calixte Denizet	56424967f2	Fix encoding issues when printing/saving a form with non-ascii characters	2021-01-05 17:23:18 +01:00
Calixte Denizet	1e2173f038	JS - Collect and execute actions at doc and pages level * the goal is to execute actions like Open or OpenAction * can be tested with issue6106.pdf (auto-print) * once #12701 is merged, we can add page actions	2020-12-18 20:03:59 +01:00
Jonas Jenwald	9602844368	Enable the ESLint `no-useless-escape` rule (PR 12551 follow-up) Note that a number of these cases are covered by existing unit-tests, and a few others only matter for the development/build scripts. Furthermore, I've also tried to the best of my ability to test each case manually to hopefully further reduce the likelihood of this patch introducing any bugs. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-useless-escape	2020-11-07 13:06:24 +01:00
Tim van der Meij	3e52098e29	Merge pull request #12555 from calixteman/color Replace css color rgb(...) by #...	2020-11-02 23:55:39 +01:00
Calixte Denizet	9d11b51a3e	Replace css color rgb(...) by #... * it's faster to generate the color code in using a table for components * it's very likely a way faster to parse (when setting the color in the canvas)	2020-11-02 10:25:04 +01:00
Jonas Jenwald	a1e5581a0b	Let `Annotation._collectActions` return `null` when no actions are present Rather than returning an empty Object[1] we should be returning `null` instead, since that's consistent with existing API-functionality. To avoid having to manually track if the Object is empty, this patch also introduces a small helper function to check its size.	2020-10-30 13:23:05 +01:00
Jonas Jenwald	9fc7cdcc9d	Use a `Map`, rather than an `Object`, internally in the `Catalog.openAction` getter (PR 11644 follow-up) This provides a work-around to avoid having to conditionally try to initialize the `openAction`-object in multiple places. Given that `Object.fromEntries` doesn't seem to guarantee that a `null` prototype is used, we thus hack around that by using `Object.assign` with `Object.create(null)`.	2020-10-28 14:43:28 +01:00
Calixte Denizet	71ecc3129b	Add the possibility to collect Javascript actions	2020-10-14 10:44:16 +02:00
Jonas Jenwald	2a7d1557f9	Enable the ESLint `no-var` rule in the `src/shared/` folder Previously this rule has been enabled in the `web/` folder, and in select files in the `src/` sub-folders. In this case, enabling of this rule didn't actually require any further code changes. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-var	2020-10-03 08:27:45 +02:00
Calixte Denizet	0c8de5aaf9	Replace \n and \r by \n and \r when saving a string	2020-09-14 17:34:39 +02:00
Tim van der Meij	f9d56320f5	Merge pull request #12349 from calixteman/followup_12344 Follow-up of pr #12344	2020-09-09 23:40:53 +02:00
Calixte Denizet	908e7ae5e4	Set the modification date to the current day when saving	2020-09-09 19:06:39 +02:00
Calixte Denizet	64a6efd95e	Follow-up of pr #12344	2020-09-09 11:46:02 +02:00
calixteman	68b99c59ee	Save form data in XFA datasets when pdf is a mix of acroforms and xfa (#12344 ) * Move display/xml_parser.js in shared to use it in worker * Save form data in XFA datasets when pdf is a mix of acroforms and xfa Co-authored-by: Brendan Dahl <brendan.dahl@gmail.com>	2020-09-08 15:13:52 -07:00
Calixte Denizet	1a6816ba98	Add support for saving forms	2020-08-12 10:32:59 +02:00
Calixte Denizet	1747d259f9	Support textfield and choice widgets for printing	2020-08-06 14:45:23 +02:00
Brendan Dahl	ac494a2278	Add support for optional marked content. Add a new method to the API to get the optional content configuration. Add a new render task param that accepts the above configuration. For now, the optional content is not controllable by the user in the viewer, but renders with the default configuration in the PDF. All of the test files added exhibit different uses of optional content. Fixes #269. Fix test to work with optional content. - Change the stopAtErrors test to ensure the operator list has something, instead of asserting the exact number of operators.	2020-08-04 09:26:55 -07:00
Takashi Tamura	bc4648c0a6	Add types to functions exported as API in src/pdf.js.	2020-08-03 19:19:48 +09:00
Linus Gasser	f1bbfdc16d	Add typescript definitions This PR adds typescript definitions from the JSDoc already present. It adds a new gulp-target 'types' that calls 'tsc', the typescript compiler, to create the definitions. To use the definitions, users can simply do the following: ``` import {getDocument, GlobalWorkerOptions} from "pdfjs-dist"; import pdfjsWorker from "pdfjs-dist/build/pdf.worker.entry"; GlobalWorkerOptions.workerSrc = pdfjsWorker; const pdf = await getDocument("file:///some.pdf").promise; ``` Co-authored-by: @oBusk Co-authored-by: @tamuratak	2020-07-30 11:10:37 +02:00
Wojciech Maj	78970bbbe1	Replace non-inclusive "whitelist" term with "allowlist"	2020-06-29 17:15:14 +02:00
Jonas Jenwald	88fdb482b0	Move the `isEmptyObj` helper function from `src/shared/util.js` to `test/unit/test_utils.js` Since this helper function is no longer used anywhere in the main code-base, but only in a couple of unit-tests, it's thus being moved to a more appropriate spot. Finally, the implementation of `isEmptyObj` is also tweaked slightly by removing the manual loop.	2020-06-09 17:50:16 +02:00
Jonas Jenwald	0351852d74	[api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js` Currently some JPEG images are decoded by the built-in PDF.js decoder in `src/core/jpg.js`, while others attempt to use the browser JPEG decoder. This inconsistency seem unfortunate for a number of reasons: - It adds, compared to the other image formats supported in the PDF specification, a fair amount of code/complexity to the image handling in the PDF.js library. - The PDF specification support JPEG images with features, e.g. certain ColorSpaces, that browsers are unable to decode natively. Hence, determining if a JPEG image is possible to decode natively in the browser require a non-trivial amount of parsing. In particular, we're parsing (part of) the raw JPEG data to extract certain marker data and we also need to parse the ColorSpace for the JPEG image. - While some JPEG images may, for all intents and purposes, appear to be natively supported there's still cases where the browser may fail to decode some JPEG images. In order to support those cases, we've had to implement a fallback to the PDF.js JPEG decoder if there's any issues during the native decoding. This also means that it's no longer possible to simply send the JPEG image to the main-thread and continue parsing, but you now need to actually wait for the main-thread to indicate success/failure first. In practice this means that there's a code-path where the worker-thread is forced to wait for the main-thread, while the reverse should always be the case. - The native decoding, for anything except the simplest of JPEG images, result in increased peak memory usage because there's a handful of short-lived copies of the JPEG data (see PR 11707). Furthermore this also leads to data being parsed on the main-thread, rather than the worker-thread, which you usually want to avoid for e.g. performance and UI-reponsiveness reasons. - Not all environments, e.g. Node.js, fully support native JPEG decoding. This has, historically, lead to some issues and support requests. - Different browsers may use different JPEG decoders, possibly leading to images being rendered slightly differently depending on the platform/browser where the PDF.js library is used. Originally the implementation in `src/core/jpg.js` were unable to handle all of the JPEG images in the test-suite, but over the last couple of years I've fixed (hopefully) all of those issues. At this point in time, there's two kinds of failure with this patch: - Changes which are basically imperceivable to the naked eye, where some pixels in the images are essentially off-by-one (in all components), which could probably be attributed to things such as different rounding behaviour in the browser/PDF.js JPEG decoder. This type of "failure" accounts for the vast majority of the total number of changes in the reference tests. - Changes where the JPEG images now looks ever so slightly blurrier than with the native browser decoder. For quite some time I've just assumed that this pointed to a general deficiency in the `src/core/jpg.js` implementation, however I've discovered when comparing two viewers side-by-side that the differences vanish at higher zoom levels (usually around 200% is enough). Basically if you disable [this downscaling in canvas.js](`8fb82e939c/src/display/canvas.js (L2356-L2395)`), which is what happens when zooming in, the differences simply vanish! Hence I'm pretty satisfied that there's no significant problems with the `src/core/jpg.js` implementation, and the problems are rather tied to the general quality of the downscaling algorithm used. It could even be seen as a positive that all images now share the same downscaling behaviour, since this actually fixes one old bug; see issue 7041.	2020-05-22 00:22:48 +02:00
Jonas Jenwald	e1f340a0c2	Use the ESLint `no-restricted-syntax` rule to ensure that `assert` is always called with two arguments Having `assert` calls without a message string isn't very helpful when debugging, and it turns out that it's easy enough to make use of ESLint to enforce better `assert` call-sites. In a couple of cases the `assert` calls were changed to "regular" throwing of errors instead, since that seemed more appropriate. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-restricted-syntax	2020-05-05 13:40:05 +02:00
Brendan Dahl	b1be33c96f	Add more categories of unsupported features. Fixes #11815	2020-05-04 11:02:16 -07:00
Jonas Jenwald	426945b480	Update Prettier to version 2.0 Please note that these changes were done automatically, using `gulp lint --fix`. Given that the major version number was increased, there's a fair number of (primarily whitespace) changes; please see https://prettier.io/blog/2020/03/21/2.0.0.html In order to reduce the size of these changes somewhat, this patch maintains the old "arrowParens" style for now (once mozilla-central updates Prettier we can simply choose the same formatting, assuming it will differ here).	2020-04-14 12:28:14 +02:00
Jonas Jenwald	dcb16af968	Whitelist closure related cases to address the remaining `no-shadow` linting errors Given the way that "classes" were previously implemented in PDF.js, using regular functions and closures, there's a fair number of false positives when the `no-shadow` ESLint rule was enabled. Note that while some of these `eslint-disable` statements can be removed if/when the relevant code is converted to proper `class`es, we'll probably never be able to get rid of all of them given our naming/coding conventions (however I don't really see this being a problem).	2020-03-25 11:57:12 +01:00
Jonas Jenwald	e4758beaaa	Move `IsLittleEndianCached` and `IsEvalSupportedCached` to `src/shared/util.js` Rather than duplicating the lookup and caching in multiple files, it seems easier to simply move all of this functionality into `src/shared/util.js` instead. This will also help avoid a bunch of ESLint errors once the `no-shadow` rule is eventually enabled.	2020-03-12 11:36:26 +01:00
Jonas Jenwald	3f031f69c2	Move additional worker-thread only functions from `src/shared/util.js` and into a `src/core/core_utils.js` instead This moves the `log2`, `readInt8`, `readUint16`, `readUint32`, and `isSpace` functions since they are only used in the worker-thread.	2020-01-25 00:33:52 +01:00
Jonas Jenwald	9e262ae7fa	Enable the ESLint `prefer-const` rule globally (PR 11450 follow-up) Please find additional details about the ESLint rule at https://eslint.org/docs/rules/prefer-const With the recent introduction of Prettier this sort of mass enabling of ESLint rules becomes a lot easier, since the code will be automatically reformatted as necessary to account for e.g. changed line lengths. Note that this patch is generated automatically, by using the ESLint `--fix` argument, and will thus require some additional clean-up (which is done separately).	2020-01-25 00:20:22 +01:00
Takashi Tamura	00ce7898a2	Enable import/extensions of ESlint plugin to enforce all `import` have a `.js` file extension. Related to #11465. - https://github.com/benmosher/eslint-plugin-import/blob/master/docs/rules/extensions.md	2020-01-18 10:53:01 +09:00
Jonas Jenwald	de36b2aaba	Enable auto-formatting of the entire code-base using Prettier (issue 11444) Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes). Prettier is being used for a couple of reasons: - To be consistent with `mozilla-central`, where Prettier is already in use across the tree. - To ensure a consistent coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters. Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some). Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that comments won't become too long. Please note: This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a separate commit. (On a more personal note, I'll readily admit that some of the changes Prettier makes are extremely ugly. However, in the name of consistency we'll probably have to live with that.)	2019-12-26 12:34:24 +01:00
Jonas Jenwald	8ec1dfde49	Add `// prettier-ignore` comments to prevent re-formatting of certain data structures There's a fair number of (primarily) `Array`s/`TypedArray`s whose formatting we don't want disturb, since in many cases that would lead to the code becoming much more difficult to read and/or break existing inline comments. Please note: It may be a good idea to look through these cases individually, and possibly re-write some of the them (especially the `String` ones) to reduce the need for all of these ignore commands.	2019-12-26 00:14:03 +01:00
Jonas Jenwald	e24050fa13	[api-minor] Move the `ReadableStream` polyfill to the global scope Note that most (reasonably) modern browsers have supported this for a while now, see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#Browser_compatibility By moving the polyfill into `src/shared/compatibility.js` we can thus get rid of the need to manually export/import `ReadableStream` and simply use it directly instead. The only change here which could possibly lead to a difference in behavior is in the `isFetchSupported` function. Previously we attempted to check for the existence of a global `ReadableStream` implementation, which could now pass (assuming obviously that the preceding checks also succeeded). However I'm not sure if that's a problem, since the previous check only confirmed the existence of a native `ReadableStream` implementation and not that it actually worked correctly. Finally it could just as well have been a globally registered polyfill from an application embedding the PDF.js library.	2019-12-11 19:02:37 +01:00
Jonas Jenwald	80342e2fdc	Support UTF-16 little-endian strings in the `stringToPDFString` helper function (bug 1593902) The bug report seem to suggest that we don't support UTF-16 strings with a BOM (byte order mark), which we actually do as evident by both the code and a unit-test. The issue at play here is rather that we previously only supported big-endian UTF-16 BOM, and the `Title` string in the PDF document is using a little-endian UTF-16 BOM instead. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1593902	2019-11-05 12:43:17 +01:00
Tim van der Meij	ca3a58f93a	Consistently use `@returns` for returned data types in JSDoc comments Sometimes we also used `@return`, but `@returns` is what the JSDoc documentation recommends. Even though `@return` works as an alias, it's good to use the recommended syntax and to be consistent within the project.	2019-10-13 13:58:17 +02:00
Jonas Jenwald	5d93fda4f2	Convert the various `...Exception`s to proper classes, to reduce code duplication By utilizing a base "class", things become significantly simpler. Unfortunately the new `BaseException` cannot be a proper ES6 class and just extend `Error`, since the SystemJS dependency doesn't seem to play well with that. Note also that we (generally) need to keep the `name` property on the actual `...Exception` object, rather than on its prototype, since the property will otherwise be dropped during the structured cloning used with `postMessage`.	2019-09-29 10:16:20 +02:00
Tim van der Meij	1f5ebfbf0c	Replace our `URL` polyfill with the one from `core-js` `core-js` polyfills have proven to be of good quality and using them prevents us from having to maintain them ourselves.	2019-09-19 14:09:51 +02:00
Jonas Jenwald	7f456b3e2e	Replace of all usages of `var` with `let`/`const` in the `src/shared/util.js` file Also removes a couple of unnecessary (temporary) variable assigments in `arraysToBytes` and uses template strings in a few spots.	2019-08-11 14:35:35 +02:00
Jonas Jenwald	f6c4a1f080	Convert `Util` to a class with static methods Also replaces `var` with `const` in all the relevant code.	2019-08-11 14:35:35 +02:00
Jonas Jenwald	7ee370a394	Remove the `skipEmpty` parameter from `Util.intersect` (PR 11059 follow-up) Looking at this again, it struck me that added functionality in `Util.intersect` is probably more confusing than helpful in general; sorry about the churn in this code! Based on the parameter name you'd probably expect it to only match when the intersection is `[0, 0, 0, 0]` and not when only one component is zero, hence the `skipEmpty` parameter thus feels too tightly coupled to the `Page.view` getter.	2019-08-11 14:33:52 +02:00
Jonas Jenwald	d637b25e36	Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it. Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly". The patch makes the following notable changes: - Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.) - Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer. - Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`. - Add an optional parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty. - Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange. --- [1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.	2019-08-09 10:18:13 +02:00
Brendan Dahl	31d71808e7	[api-minor] Update telemetry to use 'categorical' histograms. Firefox telemetry supports using string labels now. Convert our integers that we used for categories to just use strings. The upstream work will happen in: https://bugzilla.mozilla.org/show_bug.cgi?id=1566882	2019-08-01 09:51:02 -07:00
vlastimilmaca	fe49f0f766	Annotations - Implement parsing of IRT, RT, State and StateModel	2019-07-16 23:33:07 +02:00
Jonas Jenwald	88f9e633dd	Try to improve text-selection for Type3 fonts that utilize a non-default /FontMatrix (bug 1513120) For Type3 fonts text-selection is often not that great, and there's a couple of heuristics used to try and improve things. This patch simple extends those heuristics a bit, and fixes a pre-existing "naive" array comparison, but this all feels a bit brittle to say the least. The existing Type3 test-coverage isn't that great in general, and in particular Type3 `text` tests are few and far between, hence why this patch adds two different new `text` tests.	2019-03-12 10:32:08 +01:00
Jonas Jenwald	7caf769a66	Move the `deprecated` helper function to the `src/display/display_utils.js` file Given that the function is (purposely) independent of the verbosity level and that its message is worded to only apply on the main-thread, there's no reason to duplicate this across the built `pdf.js`/`pdf.worker.js` files.	2019-03-02 20:23:56 +01:00
Jonas Jenwald	db5dc14158	Move worker-thread only functions from `src/shared/util.js` and into a new `src/core/core_utils.js` file The `src/shared/util.js` file is being bundled into both the `pdf.js` and `pdf.worker.js` files, meaning that its code is by definition duplicated. Some main-thread only utility functions have already been moved to a separate `src/display/display_utils.js` file, and this patch simply extends that concept to utility functions which are used only on the worker-thread. Note in particular the `getInheritableProperty` function, which expects a `Dict` as input and thus cannot possibly ever be used on the main-thread.	2019-02-24 00:35:39 +01:00
Jonas Jenwald	22468817e1	Add a `settled` property, tracking the fulfilled/rejected stated of the Promise, to `createPromiseCapability` This allows cleaning-up code which is currently manually tracking the state of the Promise of a `createPromiseCapability` instance.	2019-02-02 15:18:56 +01:00
Jonas Jenwald	60bcce184e	Check that the first page can be successfully loaded, to try and ascertain the validity of the XRef table (issue 7496, issue 10326) For PDF documents with sufficiently broken XRef tables, it's usually quite obvious when you need to fallback to indexing the entire file. However, for certain kinds of corrupted PDF documents the XRef table will, for all intents and purposes, appear to be valid. It's not until you actually try to fetch various objects that things will start to break, which is the case in the referenced issues[1]. Since there's generally a real effort being in made PDF.js to load even corrupt PDF documents, this patch contains a suggested approach to attempt to do a bit more validation of the XRef table during the initial document loading phase. Here the choice is made to attempt to load the first page, as a basic sanity check of the validity of the XRef table. Please note that attempting to load a more-or-less arbitrarily chosen object without any context of what it's supposed to be isn't a very useful, which is why this particular choice was made. Obviously, just because the first page can be loaded successfully that doesn't guarantee that the entire XRef table is valid, however if even the first page fails to load you can be reasonably sure that the document is not valid[2]. Even though this patch won't cause any significant increase in the amount of parsing required during initial loading of the document[3], it will require loading of more data upfront which thus delays the initial `getDocument` call. Whether or not this is a problem depends very much on what you actually measure, please consider the following examples: ```javascript console.time('first'); getDocument(...).promise.then((pdfDocument) => { console.timeEnd('first'); }); console.time('second'); getDocument(...).promise.then((pdfDocument) => { pdfDocument.getPage(1).then((pdfPage) => { // Note: the API uses `pageNumber >= 1`, the Worker uses `pageIndex >= 0`. console.timeEnd('second'); }); }); ``` The first case is pretty much guaranteed to show a small regression, however the second case won't be affected at all since the Worker caches the result of `getPage` calls. Again, please remember that the second case is what matters for the standard PDF.js use-case which is why I'm hoping that this patch is deemed acceptable. --- [1] In issue 7496, the problem is that the document is edited without the XRef table being correctly updated. In issue 10326, the generator was sorting the XRef table according to the offsets rather than the objects. [2] The idea of checking the first page in particular came from the "standard" use-case for the PDF.js library, i.e. the default viewer, where a failure to load the first page basically means that nothing will work; note how `{BaseViewer, PDFThumbnailViewer}.setDocument` depends completely on being able to fetch the first page. [3] The only extra parsing is caused by, potentially, having to traverse part of the `Pages` tree to find the first page.	2018-12-29 12:47:25 +01:00
Tim van der Meij	bf13c8a50b	Use the `const` keyword for constants in `src/shared/util.js` Moreover, move general constants to the top of the file, i.e., those that are not closely tied to a function in the file.	2018-09-11 16:17:45 +02:00

1 2 3 4 5