Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	298ee5cfbb	Replace some ternary operators with optional chaining, and nullish coalescing, in the `src/display/`-folder This way, we can further reduce unnecessary code-repetition in some cases.	2021-01-19 17:20:02 +01:00
Jonas Jenwald	13742eb82d	Inlude the JS `actions` for the page when dispatching the "pageopen"-event in the `BaseViewer` Note first of all how the `PDFDocumentProxy.getJSActions` method in the API caches the result, which makes repeated lookups cheap enough to not really be an issue. Secondly, with the previous patch, we're now only dispatching "pageopen"/"pageclose"-events when there's actually a sandbox that listens for them. All-in-all, with these changes we can thus simplify the default-viewer "pageopen"-event handler a fair bit.	2021-01-12 20:28:50 +01:00
Jonas Jenwald	81525fd446	Use ESLint to ensure that `export`s are sorted alphabetically There's built-in ESLint rule, see `sort-imports`, to ensure that all `import`-statements are sorted alphabetically, since that often helps with readability. Unfortunately there's no corresponding rule to sort `export`-statements alphabetically, however there's an ESLint plugin which does this; please see https://www.npmjs.com/package/eslint-plugin-sort-exports The only downside here is that it's not automatically fixable, but the re-ordering is a one-time "cost" and the plugin will help maintain a consistent ordering of `export`-statements in the future. Note: To reduce the possibility of introducing any errors here, the re-ordering was done by simply selecting the relevant lines and then using the built-in sort-functionality of my editor.	2021-01-09 20:37:51 +01:00
Jonas Jenwald	941b65f683	Remove unncessary `CanvasFactory`/`CMapReaderFactory`/`FileReaderFactory` duplication in unit-tests Given that the API will now, after PR 12039, automatically pick the correct factories to use depending on the environment (browser vs. Node.js), we can utilize that in the unit-tests as well. This way we don't have to manually repeat the same initialization code in multiple unit-tests. Note: The official PDF.js API is defined in `src/pdf.js`, hence the new exports in `src/display/api.js` will not affect that. Also, updates the unit-test `FileReaderFactory` helpers similarily. Drive-by change: Fix the `CMapReaderFactory` usage in the annotation unit-tests, since the cache should only contain raw data and not a Promise. While this obviously works as-is, having unit-tests that "abuse" the intended data format can easily lead to unnecessary failures if changes are made to the relevant `src/core/` code.	2021-01-08 17:33:59 +01:00
Calixte Denizet	6523f8880b	JS -- Plug PageOpen and PageClose actions	2021-01-06 13:31:15 +01:00
Jonas Jenwald	f9530e56da	Run `AnnotationStorage.resetModified` when destroying the `PDFDocumentLoadingTask`/`PDFDocumentProxy` This will, in a very simple way using the existing events, thus allow the viewer to remove the "beforeunload" `window` event listener when the document is closed. Generally speaking we want to avoid having global event listeners for the PDF document instance, which is why the `EventBus` exists, and instead reserve global events for the viewer itself. However, the `AnnotationStorage` "beforeunload" event unfortunately needs to be document-specific and we should thus ensure that it's correctly removed when the document is destroyed.	2020-12-19 14:05:31 +01:00
Calixte Denizet	1e2173f038	JS - Collect and execute actions at doc and pages level * the goal is to execute actions like Open or OpenAction * can be tested with issue6106.pdf (auto-print) * once #12701 is merged, we can add page actions	2020-12-18 20:03:59 +01:00
Jonas Jenwald	01d12b465c	[api-minor] Add "contentLength" to the information returned by the `getMetadata` method Given that we already include the "Content-Disposition"-header filename, when it exists, it shouldn't hurt to also include the information from the "Content-Length"-header. For PDF documents opened via a URL, which should be a very common way for the PDF.js library to be used, this will[1] thus provide a way of getting the PDF filesize without having to wait for the `getDownloadInfo`-promise to resolve[2]. With these API improvements, we can also simplify the filesize handling in the `PDFDocumentProperties` class. --- [1] Assuming that the server is correctly configured, of course. [2] Since that's not guaranteed to happen in general, with e.g. `disableAutoFetch = true` set.	2020-11-20 15:30:36 +01:00
Jonas Jenwald	de628cec59	Some `hasJSActions`, and general annotation-code, related cleanup in the viewer and API - Add support for logical assignment operators, i.e. `&&=`, `\|\|=`, and `??=`, with a Babel-plugin. Given that these required incrementing the ECMAScript version in the ESLint and Acorn configurations, and that platform/browser support is still fairly limited, always transpiling them seems appropriate for now. - Cache the `hasJSActions` promise in the API, similar to the existing `getAnnotations` caching. With this implemented, the lookup should now be cheap enough that it can be called unconditionally in the viewer. - Slightly improve cleanup of resources when destroying the `WorkerTransport`. - Remove the `annotationStorage`-property from the `PDFPageView` constructor, since it's not necessary and also brings it more inline with the `BaseViewer`. - Update the `BaseViewer.createAnnotationLayerBuilder` method to actaually agree with the `IPDFAnnotationLayerFactory` interface.[1] - Slightly tweak a couple of JSDoc comments. --- [1] We probably ought to re-factor both the `IPDFTextLayerFactory` and `IPDFAnnotationLayerFactory` interfaces to take parameter objects instead, since especially the `IPDFAnnotationLayerFactory` one is becoming quite unwieldy. Given that that would likely be a breaking change for any custom viewer-components implementation, this probably requires careful deprecation.	2020-11-14 13:58:35 +01:00
Calixte Denizet	a5279897a7	JS -- Add listener for sandbox events only if there are some actions * When no actions then set it to null instead of empty object * Even if a field has no actions, it needs to listen to events from the sandbox in order to be updated if an action changes something in it.	2020-11-09 18:37:59 +01:00
Jonas Jenwald	1dad255784	Convert files in the `src/display/`-folder to use optional chaining where possible By using optional chaining, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Optional_chaining, it's possible to reduce unnecessary code-repetition in many cases. Note that these changes also reduce the size of the built `pdf.js` file, when `SKIP_BABEL == true` is set, and for the `MOZCENTRAL` build-target that result in a `0.1%` filesize reduction from a simple and mostly mechanical code change.	2020-11-07 13:22:06 +01:00
Tim van der Meij	e341e6e542	Merge pull request #12525 from brendandahl/mark-info [api-minor] Implement API to get MarkInfo from the catalog.	2020-10-31 00:05:19 +01:00
Brendan Dahl	f5c821e9c3	[api-minor] Implement API to get MarkInfo from the catalog.	2020-10-30 10:59:45 -07:00
Jonas Jenwald	c293fc2b8f	Add (some) optional chaining usage in `src/display/api.js` Since we no longer use SystemJS to load the unit-tests, there's now nothing that prevents us from using optional chaining and nullish coalescing in the `src/display/` directory.	2020-10-26 11:11:48 +01:00
Jonas Jenwald	d9084c0be2	Load the fake worker, in non-`PRODUCTION` mode, with native async `import` This removes the last SystemJS usage from both the API and the default viewer.	2020-10-26 11:11:48 +01:00
Calixte Denizet	c30a3a94f0	JS - Add a function in api to get the fields ids in AcroForm::CO	2020-10-17 12:56:40 +02:00
Jonas Jenwald	3351d3476d	Don't store complex data in `PDFDocument.formInfo`, and replace the `fields` object with a `hasFields` boolean instead This patch is based on a couple of smaller things that I noticed when working on PR 12479. - Don't store the /Fields on the `formInfo` getter, since that feels like overloading it with unintended (and too complex) data, and utilize a `hasFields` boolean instead. This functionality was originally added in PR 12271, to help determine what kind of form data a PDF document contains, and I think that we should ensure that the return value of `formInfo` only consists of "simple" data. With these changes the `fieldObjects` getter instead has to look-up the /Fields manually, however that shouldn't be a problem since the access is guarded by a `formInfo.hasFields` check which ensures that the data both exists and is valid. Furthermore, most documents doesn't even have any /AcroForm data anyway. - Determine the `hasFields` property first, to ensure that it's always correct even if there's errors when checking e.g. the /XFA or /SigFlags entires, since the `fieldObjects` getter depends on it. - Simplify a loop in `fieldObjects`, since the object being accessed is a `Map` and those have built-in iteration support. - Use a higher logging level for errors in the `formInfo` getter, and include the actual error message, since that'd have helped with fixing PR 12479 a lot quicker. - Update the JSDoc comment in `src/display/api.js` to list the return values correctly, and also slightly extend/improve the description.	2020-10-16 12:47:27 +02:00
Calixte Denizet	71ecc3129b	Add the possibility to collect Javascript actions	2020-10-14 10:44:16 +02:00
Jonas Jenwald	2a8983d76b	Enable the ESLint `no-var` rule in the `src/display/` folder Previously this rule has been enabled in the `web/` folder, and in select files in the `src/` sub-folders. Note that a number of the files in the `src/display/` folder were already enforcing the `no-var` rule, and thanks to Prettier the necessary re-writing will be (mostly) handled automatically. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-var	2020-10-02 16:16:23 +02:00
Jonas Jenwald	2393443e73	Include the `/Order` array, if available, when parsing the Optional Content configuration The `/Order` array is used to improve the display of Optional Content groups in PDF viewers, and it allows a PDF document to e.g. specify that Optional Content groups should be displayed as a (collapsable) tree-structure rather than as just a list. Note that not all available Optional Content groups must be present in the `/Order` array, and PDF viewers will often (by default) hide those toggles in the UI. To allow us to improve the UX around toggling of Optional Content groups, in the default viewer, these hidden-by-default groups are thus appended to the parsed `/Order` array under a custom nesting level (with `name == null`). Finally, the patch also slightly tweaks an `OptionalContentConfig` related JSDoc-comment in the API.	2020-08-30 16:28:40 +02:00
Jonas Jenwald	1f5021d76a	Prevent errors if `PDFDocumentProxy.saveDocument` is called without the `annotationStorage` parameter (PR 12241 follow-up) Obviously it doesn't make sense to call that method without providing an `AnnotationStorage`-instance, however we should ensure that doing so won't cause errors. Hence we need to check that `annotationStorage` is actually defined, before attempting to call its `resetModified` method.	2020-08-22 18:09:17 +02:00
Brendan Dahl	8023175103	Support file save triggered from the Firefox integrated version. Related to https://bugzilla.mozilla.org/show_bug.cgi?id=1659753 This allows Firefox trigger a "save" event from ctrl/cmd+s or the "Save Page As" context menu, which in turn lets pdf.js generate a new PDF if there is form data to save. I also now use `sourceEventType` on downloads so Firefox can determine if it should launch the "open with" dialog or "save as" dialog.	2020-08-20 18:05:08 -07:00
Aki Sasaki	83365a3756	confirm if leaving a modified form without saving	2020-08-20 17:23:06 -07:00
Jonas Jenwald	b26d736809	Ensure that the "DocException" message handler, in the API, will always either error or warn (depending on the build) if a valid `Error` isn't found Having this present would have made debugging issues 11941 and 12209 so much quicker and easier.	2020-08-13 13:17:30 +02:00
Calixte Denizet	1a6816ba98	Add support for saving forms	2020-08-12 10:32:59 +02:00
Jonas Jenwald	4d351eab93	A couple of (small) tweaks of the `AnnotationStorage` (PR 12173 follow-up) - Initialize the `AnnotationStorage`-instance, on `PDFDocumentProxy`, lazily. - Change the `AnnotationStorage` to use a `Map` internally, rather than a regular Object (simplifies the following points). - Let `AnnotationStorage.getAll` return `null` when there's no data stored, to avoid unnecessary parsing on the worker-thread. This ought to "just work", since the worker-thread code should already handle the `!annotationStorage` case everywhere. - Add a new `AnnotationStorage.size` getter, to be able to easily tell if there's any data stored.	2020-08-10 17:07:24 +02:00
Jonathan Grimes	ac723a1760	Allow loading pdf fonts into another document.	2020-08-08 02:52:32 +00:00
Takashi Tamura	4ac62d8787	Fix the type of PDFDocumentLoadingTask.destroy.	2020-08-07 16:10:19 +09:00
Jonas Jenwald	5e44b241b2	[api-minor] Fix the `annotationStorage` parameter in `PDFPageProxy.render` While the parameter name (clearly) suggests that an `AnnotationStorage`-instance is expected, looking at the only call-sites that include the parameter (i.e. the `PDFPrintServiceFactory` instances) it actually contains just a normal Object. Hence it seems much more reasonable to actually pass a valid `AnnotationStorage`-instance, as the name suggests, and simply have `PDFPageProxy.render` do the `annotationStorage.getAll()` call. (Since we cannot send an `AnnotationStorage`-instance as-is to the worker-thread, given the "structured clone algorithm".)	2020-08-05 23:02:30 +02:00
Takashi Tamura	a0f0ab78f3	Fix the type definition of TypedArray.	2020-08-05 17:01:08 +09:00
Tim van der Meij	56ca027c08	Improve consistency for the API documentation comments Over time we used multiple different formats for JSDoc comments. This commit standardizes those formats to the one we used most often. Moreover, this removes the example in the outline endpoint documentation since it now has a proper type definition and it didn't render correctly in JSDoc.	2020-08-04 23:27:22 +02:00
Tim van der Meij	ba4a07ce07	Fix incorrect types in the API documentation	2020-08-04 23:19:59 +02:00
Tim van der Meij	3116216e1d	Improve the API documentation for `PDFDocumentLoadingTask` This commit: - formats the documentation block according to the standards; - replaces the callback definitions with the `function` type (we have that for other definitions already and the callback type was not rendered correctly by JSDoc); - synchronizes the type documentation and the class documentation; - fixes the documentation by making it easier to read and making sure that all optional properties are indicated as such; - uses the `@link` tag to indicate links to other code. The `typestest` still passes and JSDoc now renders this class correctly.	2020-08-04 23:17:24 +02:00
Brendan Dahl	ac494a2278	Add support for optional marked content. Add a new method to the API to get the optional content configuration. Add a new render task param that accepts the above configuration. For now, the optional content is not controllable by the user in the viewer, but renders with the default configuration in the PDF. All of the test files added exhibit different uses of optional content. Fixes #269. Fix test to work with optional content. - Change the stopAtErrors test to ensure the operator list has something, instead of asserting the exact number of operators.	2020-08-04 09:26:55 -07:00
Tim van der Meij	00a8b42e67	Merge pull request #12102 from ineiti/add_types_annotations Add types annotations	2020-08-02 16:45:37 +02:00
Jonas Jenwald	05baa4c89f	Revert "[api-minor] Allow loading pdf fonts into another document."	2020-08-01 12:52:39 +02:00
Tim van der Meij	173b92a873	Merge pull request #12131 from jsg2021/issue-8271 [api-minor] Allow loading pdf fonts into another document.	2020-08-01 01:13:41 +02:00
Jonas Jenwald	6d192f987e	Prevent `Uncaught (in promise) AbortException` when running the unit-tests These errors can/will occur if data is still loading when the document is destroyed, which is the case in the API unit-tests that load the `tracemonkey.pdf` file. While this patch prevents these kind of problems, and thus allows us to update Jasmine again, I cannot help but thinking that it's slightly "hacky". Basically, we'll simply catch and ignore (some) rejected promises once the document is destroyed and/or its data loading is aborted. However, I don't think that these changes should cause issues in general, since we don't really care about errors once document destruction has started (note e.g. the fair number of `catch` handlers ignoring `AbortException`s already).	2020-07-31 23:29:05 +02:00
Jonathan Grimes	9b16b8ef71	Allow loading pdf fonts into another document.	2020-07-31 11:41:48 -05:00
Linus Gasser	f1bbfdc16d	Add typescript definitions This PR adds typescript definitions from the JSDoc already present. It adds a new gulp-target 'types' that calls 'tsc', the typescript compiler, to create the definitions. To use the definitions, users can simply do the following: ``` import {getDocument, GlobalWorkerOptions} from "pdfjs-dist"; import pdfjsWorker from "pdfjs-dist/build/pdf.worker.entry"; GlobalWorkerOptions.workerSrc = pdfjsWorker; const pdf = await getDocument("file:///some.pdf").promise; ``` Co-authored-by: @oBusk Co-authored-by: @tamuratak	2020-07-30 11:10:37 +02:00
Jonas Jenwald	f3ff526019	Send/receive Type3 images the same way as other globally-cached images There's quite frankly no particular reason to special-case Type3-fonts with image resources, which are very rare anyway, now that we have a general mechanism for sending/receiving images globally.	2020-07-27 13:20:15 +02:00
Calixte Denizet	584902dbf8	Add an annotation storage in order to save annotation data in acroforms	2020-07-24 10:50:11 +02:00
Jonas Jenwald	4a7e29865d	[api-minor] Use the `NodeCanvasFactory`/`NodeCMapReaderFactory` classes as defaults in Node.js environments (issue 11900) This moves, and slightly simplifies, code that's currently residing in the unit-test utils into the actual library, such that it's bundled with `GENERIC`-builds and used in e.g. the API-code. As an added bonus, this also brings out-of-the-box support for CMaps in e.g. the Node.js examples.	2020-07-02 04:44:23 +02:00
Jonas Jenwald	4cb0c032f3	Convert the `PDFPageProxy.intentStates` property from an `Object` to a `Map` As can be seen in the code there's a handful of places where this structure needs to be iterated, something that becomes cumbersome when dealing with `Object`s. Hence, by changing this to a `Map` instead we can both simplify the code and avoid creating unnecessary closures. Particularily the `PDFPageProxy._tryCleanup` method becomes a lot more readable, at least in my opinion. Finally, since this property is intended to be "private" the name is adjusted to reflect that.	2020-06-21 17:02:42 +02:00
Jonas Jenwald	cabc2cc4fc	Add a `InternalRenderTask.completed` getter and use it to simplify `PDFPageProxy._destroy` This patch aims to simplify the `PDFPageProxy._destroy` method, by: - Replacing the unnecessary `forEach` with a "regular" `for`-loop instead. - Use a more appropriate variable name, since `intentState.renderTasks` contain instances of `InternalRenderTask`. - Move the "is rendering completed"-handling to a new `InternalRenderTask.completed` getter, to abstract away some (mostly) internal `InternalRenderTask` state.	2020-06-21 15:56:14 +02:00
Jonas Jenwald	64378fc366	[api-minor] Remove the deprecated `PDFDocumentProxy.getOpenActionDestination` method (PR 11644 follow-up) This method has been printing a `deprecated` warning in two releases, hence it should hopefully be safe to remove now.	2020-06-02 12:28:00 +02:00
Jonas Jenwald	18e0b10d3c	[api-minor] Remove the `disableCreateObjectURL` option from the `getDocument` parameters, since it's now unused in the API With the changes in previous patches, the `disableCreateObjectURL` option/functionality is no longer used for anything in the API and/or in the Worker code. Note however that there's some functionality, mainly related to file loading/downloading, in the GENERIC version of the default viewer which still depends on this option. Hence the `disableCreateObjectURL` option (and related compatibility code) is moved into the viewer, see e.g. `web/app_options.js`, such that it's still available in the default viewer.	2020-05-22 00:22:48 +02:00
Jonas Jenwald	cc4cc8b11b	Remove the, now unused, `releaseImageResources` helper function With the changes in the previous patch, this is now dead code which should thus be removed.	2020-05-22 00:22:48 +02:00
Jonas Jenwald	0351852d74	[api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js` Currently some JPEG images are decoded by the built-in PDF.js decoder in `src/core/jpg.js`, while others attempt to use the browser JPEG decoder. This inconsistency seem unfortunate for a number of reasons: - It adds, compared to the other image formats supported in the PDF specification, a fair amount of code/complexity to the image handling in the PDF.js library. - The PDF specification support JPEG images with features, e.g. certain ColorSpaces, that browsers are unable to decode natively. Hence, determining if a JPEG image is possible to decode natively in the browser require a non-trivial amount of parsing. In particular, we're parsing (part of) the raw JPEG data to extract certain marker data and we also need to parse the ColorSpace for the JPEG image. - While some JPEG images may, for all intents and purposes, appear to be natively supported there's still cases where the browser may fail to decode some JPEG images. In order to support those cases, we've had to implement a fallback to the PDF.js JPEG decoder if there's any issues during the native decoding. This also means that it's no longer possible to simply send the JPEG image to the main-thread and continue parsing, but you now need to actually wait for the main-thread to indicate success/failure first. In practice this means that there's a code-path where the worker-thread is forced to wait for the main-thread, while the reverse should always be the case. - The native decoding, for anything except the simplest of JPEG images, result in increased peak memory usage because there's a handful of short-lived copies of the JPEG data (see PR 11707). Furthermore this also leads to data being parsed on the main-thread, rather than the worker-thread, which you usually want to avoid for e.g. performance and UI-reponsiveness reasons. - Not all environments, e.g. Node.js, fully support native JPEG decoding. This has, historically, lead to some issues and support requests. - Different browsers may use different JPEG decoders, possibly leading to images being rendered slightly differently depending on the platform/browser where the PDF.js library is used. Originally the implementation in `src/core/jpg.js` were unable to handle all of the JPEG images in the test-suite, but over the last couple of years I've fixed (hopefully) all of those issues. At this point in time, there's two kinds of failure with this patch: - Changes which are basically imperceivable to the naked eye, where some pixels in the images are essentially off-by-one (in all components), which could probably be attributed to things such as different rounding behaviour in the browser/PDF.js JPEG decoder. This type of "failure" accounts for the vast majority of the total number of changes in the reference tests. - Changes where the JPEG images now looks ever so slightly blurrier than with the native browser decoder. For quite some time I've just assumed that this pointed to a general deficiency in the `src/core/jpg.js` implementation, however I've discovered when comparing two viewers side-by-side that the differences vanish at higher zoom levels (usually around 200% is enough). Basically if you disable [this downscaling in canvas.js](`8fb82e939c/src/display/canvas.js (L2356-L2395)`), which is what happens when zooming in, the differences simply vanish! Hence I'm pretty satisfied that there's no significant problems with the `src/core/jpg.js` implementation, and the problems are rather tied to the general quality of the downscaling algorithm used. It could even be seen as a positive that all images now share the same downscaling behaviour, since this actually fixes one old bug; see issue 7041.	2020-05-22 00:22:48 +02:00
Jonas Jenwald	dda6626f40	Attempt to cache repeated images at the document, rather than the page, level (issue 11878) Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the same images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1] Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2] However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages. In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be twenty copies of the image data). While this obviously benefit both CPU and memory usage in this case, for very large image data this patch may possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will only cache a certain number of image resources at the document level and simply fallback to the default behaviour. Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3] Please note: The patch will lead to small movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator. --- [1] There's e.g. PDF documents that use the same image as background on all pages. [2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer. [3] If the latter case were true, we could simply check for repeat images before parsing started and thus avoid handling any duplicate image resources.	2020-05-21 18:13:45 +02:00

1 2 3 4 5 ...