pdf.js

Author	SHA1	Message	Date
Jonathan Grimes	ac723a1760	Allow loading pdf fonts into another document.	2020-08-08 02:52:32 +00:00
Takashi Tamura	4ac62d8787	Fix the type of PDFDocumentLoadingTask.destroy.	2020-08-07 16:10:19 +09:00
Tim van der Meij	8c162f57f7	Merge pull request #12175 from calixteman/textfield Support textfield and choice widgets for printing	2020-08-07 00:20:29 +02:00
Calixte Denizet	1747d259f9	Support textfield and choice widgets for printing	2020-08-06 14:45:23 +02:00
Jonas Jenwald	16fa9dc4ea	Add support for `Object.fromEntries` This provides a simpler way of creating an `Object` from e.g. a `Map`, without having to manually iterate over it. Please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/fromEntries	2020-08-06 14:39:51 +02:00
Jonas Jenwald	5e44b241b2	[api-minor] Fix the `annotationStorage` parameter in `PDFPageProxy.render` While the parameter name (clearly) suggests that an `AnnotationStorage`-instance is expected, looking at the only call-sites that include the parameter (i.e. the `PDFPrintServiceFactory` instances) it actually contains just a normal Object. Hence it seems much more reasonable to actually pass a valid `AnnotationStorage`-instance, as the name suggests, and simply have `PDFPageProxy.render` do the `annotationStorage.getAll()` call. (Since we cannot send an `AnnotationStorage`-instance as-is to the worker-thread, given the "structured clone algorithm".)	2020-08-05 23:02:30 +02:00
Takashi Tamura	a0f0ab78f3	Fix the type definition of TypedArray.	2020-08-05 17:01:08 +09:00
Tim van der Meij	56ca027c08	Improve consistency for the API documentation comments Over time we used multiple different formats for JSDoc comments. This commit standardizes those formats to the one we used most often. Moreover, this removes the example in the outline endpoint documentation since it now has a proper type definition and it didn't render correctly in JSDoc.	2020-08-04 23:27:22 +02:00
Tim van der Meij	ba4a07ce07	Fix incorrect types in the API documentation	2020-08-04 23:19:59 +02:00
Tim van der Meij	3116216e1d	Improve the API documentation for `PDFDocumentLoadingTask` This commit: - formats the documentation block according to the standards; - replaces the callback definitions with the `function` type (we have that for other definitions already and the callback type was not rendered correctly by JSDoc); - synchronizes the type documentation and the class documentation; - fixes the documentation by making it easier to read and making sure that all optional properties are indicated as such; - uses the `@link` tag to indicate links to other code. The `typestest` still passes and JSDoc now renders this class correctly.	2020-08-04 23:17:24 +02:00
Brendan Dahl	ac494a2278	Add support for optional marked content. Add a new method to the API to get the optional content configuration. Add a new render task param that accepts the above configuration. For now, the optional content is not controllable by the user in the viewer, but renders with the default configuration in the PDF. All of the test files added exhibit different uses of optional content. Fixes #269. Fix test to work with optional content. - Change the stopAtErrors test to ensure the operator list has something, instead of asserting the exact number of operators.	2020-08-04 09:26:55 -07:00
Tim van der Meij	e68ac05f18	Merge pull request #12160 from tamuratak/worker_options Use typedef to define the type of GlobalWorkerOptions (PR 12102 follow-up)	2020-08-03 22:55:49 +02:00
Tim van der Meij	0b75701012	Merge pull request #12157 from tamuratak/fix_svggraphics Fix the type of SVGGraphics (PR 12102 follow-up)	2020-08-03 22:52:10 +02:00
Tim van der Meij	adc7645a44	Merge pull request #12161 from tamuratak/exported_func Add types to functions exported as API in src/pdf.js (PR 12102 follow-up)	2020-08-03 22:43:50 +02:00
Takashi Tamura	923ba27f1f	Tweak for the type of PageViewportParameters.viewBox	2020-08-03 20:42:42 +09:00
Takashi Tamura	bc4648c0a6	Add types to functions exported as API in src/pdf.js.	2020-08-03 19:19:48 +09:00
Takashi Tamura	f6fd8e9e7f	Use typedef to define the type of GlobalWorkerOptions.	2020-08-03 19:06:28 +09:00
Takashi Tamura	d72bbecee2	Fix the type of SVGGraphics.	2020-08-03 09:58:19 +09:00
Tim van der Meij	00a8b42e67	Merge pull request #12102 from ineiti/add_types_annotations Add types annotations	2020-08-02 16:45:37 +02:00
Tim van der Meij	5a66c56eca	Merge pull request #12108 from calixteman/radio Add support for radios printing	2020-08-02 14:47:46 +02:00
Jonas Jenwald	05baa4c89f	Revert "[api-minor] Allow loading pdf fonts into another document."	2020-08-01 12:52:39 +02:00
Tim van der Meij	173b92a873	Merge pull request #12131 from jsg2021/issue-8271 [api-minor] Allow loading pdf fonts into another document.	2020-08-01 01:13:41 +02:00
Tim van der Meij	0d20a2b7b4	Merge pull request #12144 from Snuffleupagus/uncaught-promise-AbortException Prevent `Uncaught (in promise) AbortException` when running the unit-tests	2020-08-01 00:22:10 +02:00
Jonas Jenwald	6d192f987e	Prevent `Uncaught (in promise) AbortException` when running the unit-tests These errors can/will occur if data is still loading when the document is destroyed, which is the case in the API unit-tests that load the `tracemonkey.pdf` file. While this patch prevents these kind of problems, and thus allows us to update Jasmine again, I cannot help but thinking that it's slightly "hacky". Basically, we'll simply catch and ignore (some) rejected promises once the document is destroyed and/or its data loading is aborted. However, I don't think that these changes should cause issues in general, since we don't really care about errors once document destruction has started (note e.g. the fair number of `catch` handlers ignoring `AbortException`s already).	2020-07-31 23:29:05 +02:00
Jonathan Grimes	9b16b8ef71	Allow loading pdf fonts into another document.	2020-07-31 11:41:48 -05:00
Jonas Jenwald	346afd1e1c	[api-minor] Fix the `AnnotationStorage` usage properly in the viewer/tests (PR 12107 and 12143 follow-up) The [api-minor] label probably ought to have been added to the original PR, given the changes to the `createAnnotationLayerBuilder` signature (if nothing else). This patch fixes the following things: - Let the `AnnotationLayer.render` method create an `AnnotationStorage`-instance if none was provided, thus making the parameter properly optional. This not only fixes the reference tests, it also prevents issues when the viewer components are used. - Stop exporting `AnnotationStorage` in the official API, i.e. the `src/pdf.js` file, since it's no longer necessary given the change above. Generally speaking, unless absolutely necessary we probably shouldn't export unused things in the API. - Fix a number of JSDocs `typedef`s, in `src/display/` and `web/` code, to actually account for the new `annotationStorage` parameter. - Update `web/interfaces.js` to account for the changes in `createAnnotationLayerBuilder`. - Initialize the storage, in `AnnotationStorage`, using `Object.create(null)` rather than `{}` (which is the PDF.js default).	2020-07-31 16:32:46 +02:00
Calixte Denizet	538017f7a7	Add support for radios printing	2020-07-31 14:31:49 +02:00
Linus Gasser	f1bbfdc16d	Add typescript definitions This PR adds typescript definitions from the JSDoc already present. It adds a new gulp-target 'types' that calls 'tsc', the typescript compiler, to create the definitions. To use the definitions, users can simply do the following: ``` import {getDocument, GlobalWorkerOptions} from "pdfjs-dist"; import pdfjsWorker from "pdfjs-dist/build/pdf.worker.entry"; GlobalWorkerOptions.workerSrc = pdfjsWorker; const pdf = await getDocument("file:///some.pdf").promise; ``` Co-authored-by: @oBusk Co-authored-by: @tamuratak	2020-07-30 11:10:37 +02:00
Tim van der Meij	eb4d6a0652	Merge pull request #12107 from calixteman/checkbox Add support for checkboxes printing	2020-07-30 00:11:41 +02:00
Calixte Denizet	cb60523a15	Add support for checkboxes printing	2020-07-29 16:42:57 +02:00
Jonas Jenwald	1b720a4b23	Ignore `fetch()` errors, in `PDFFetchStreamRangeReader`, once the request has been aborted Besides making general sense, as far as I can tell, this patch should also prevent one source of `Uncaught (in promise) ...` exceptions. Unfortunately `reason instanceof AbortError` doesn't work here, since `AbortError` isn't actually defined in browsers; note how even the DOM specification contains an example using the `name` property: https://dom.spec.whatwg.org/#aborting-ongoing-activities This patch prevents the following errors from being logged in the console, when the unit-tests are running: - Firefox: `Uncaught (in promise) DOMException: The operation was aborted.` - Chrome: `Uncaught (in promise) DOMException: The user aborted a request.`	2020-07-28 17:18:49 +02:00
Jonas Jenwald	f3ff526019	Send/receive Type3 images the same way as other globally-cached images There's quite frankly no particular reason to special-case Type3-fonts with image resources, which are very rare anyway, now that we have a general mechanism for sending/receiving images globally.	2020-07-27 13:20:15 +02:00
Calixte Denizet	584902dbf8	Add an annotation storage in order to save annotation data in acroforms	2020-07-24 10:50:11 +02:00
Jonas Jenwald	d4d7ac1b88	Stop special-casing the (very unlikely) "no `/XObject` found"-scenario, when parsing `OPS.paintXObject` operators, in `PartialEvaluator.{getOperatorList, getTextContent}` Originally there weren't any (generally) good ways to handle errors gracefully, on the worker-side, however that's no longer the case and we can simply fallback to the existing `ignoreErrors` functionality instead. Also, please note that the "no `/XObject` found"-scenario should be extremely unlikely in practice and would only occur in corrupt/broken documents. Note that the `PartialEvaluator.getOperatorList` case is especially bad currently, since we'll simply (attempt to) send the data as-is to the main-thread. This is quite bad, since in a corrupt/broken document the data could contain anything and e.g. be unclonable (which would cause breaking errors). Also, we're (obviously) not attempting to do anything with this "raw" `OPS.paintXObject` data on the main-thread and simply ensuring that we never send it definately seems like the correct approach.	2020-07-12 21:59:59 +02:00
Jonas Jenwald	4a7e29865d	[api-minor] Use the `NodeCanvasFactory`/`NodeCMapReaderFactory` classes as defaults in Node.js environments (issue 11900) This moves, and slightly simplifies, code that's currently residing in the unit-test utils into the actual library, such that it's bundled with `GENERIC`-builds and used in e.g. the API-code. As an added bonus, this also brings out-of-the-box support for CMaps in e.g. the Node.js examples.	2020-07-02 04:44:23 +02:00
Jonas Jenwald	fef24658e7	Adjust the heuristics used when dealing with rectangles, i.e. `re` operators, with zero width/height (issue 12010)	2020-07-02 00:02:49 +02:00
Tim van der Meij	c1cb9ee9fc	Merge pull request #12016 from Snuffleupagus/issue-8078 Tweak the `QueueOptimizer` to recognize `OPS.paintImageMaskXObject` operators as repeated when the "skew" transformation matrix elements are non-zero (issue 8078)	2020-06-21 19:38:27 +02:00
Jonas Jenwald	4cb0c032f3	Convert the `PDFPageProxy.intentStates` property from an `Object` to a `Map` As can be seen in the code there's a handful of places where this structure needs to be iterated, something that becomes cumbersome when dealing with `Object`s. Hence, by changing this to a `Map` instead we can both simplify the code and avoid creating unnecessary closures. Particularily the `PDFPageProxy._tryCleanup` method becomes a lot more readable, at least in my opinion. Finally, since this property is intended to be "private" the name is adjusted to reflect that.	2020-06-21 17:02:42 +02:00
Jonas Jenwald	cabc2cc4fc	Add a `InternalRenderTask.completed` getter and use it to simplify `PDFPageProxy._destroy` This patch aims to simplify the `PDFPageProxy._destroy` method, by: - Replacing the unnecessary `forEach` with a "regular" `for`-loop instead. - Use a more appropriate variable name, since `intentState.renderTasks` contain instances of `InternalRenderTask`. - Move the "is rendering completed"-handling to a new `InternalRenderTask.completed` getter, to abstract away some (mostly) internal `InternalRenderTask` state.	2020-06-21 15:56:14 +02:00
Jonas Jenwald	e18fa3fc45	Tweak the `QueueOptimizer` to recognize `OPS.paintImageMaskXObject` operators as repeated when the "skew" transformation matrix elements are non-zero (issue 8078) First of all, I should mention that my understanding of the finer details of the `QueueOptimizer` (and its related `CanvasGraphics` methods) is somewhat limited. Hence I'm not sure if there's actually a very good reason for only considering ImageMasks where the "skew" transformation matrix elements are zero as repeated, however simply looking at the code I just don't see why these elements cannot be non-zero as long as they are all identical for the ImageMasks. Furthermore, looking at the group case (which is what we're currently falling back to), there's no particular limitation placed upon the transformation matrix elements. While this patch obviously isn't enough to completely fix the issue, since there should be a visible Pattern rendered as well[1], it seem (at least to me) like enough of an improvement that submitting this is justified. With these changes the referenced PDF document will no longer hang the entire browser, and rendering also finishes in a reasonable time (< 10 seconds for me) which seem fine given the huge number of identical inline images present.[2] --- [1] Temporarily changing the Pattern to a solid color does render the correct/expected area, which suggests that the remaining problem is a pre-existing issue related to the Pattern-handling itself rather than the `QueueOptimizer` functionality. [2] The document isn't exactly rendered immediately in e.g. Adobe Reader either.	2020-06-20 12:18:48 +02:00
Jonas Jenwald	00d45fce33	Update `SVGGraphics` to account for globally cached images (PR 11912 follow-up) Since there's (essentially) no tests for the SVG-backend, these changes didn't make in into PR 11912 when the code in the `src/display/canvas.js` file was modified.	2020-06-10 15:31:26 +02:00
Jonas Jenwald	466d10f6fc	Remove unused methods from `NetworkManager`, in `src/display/network.js` Both of the removed methods were added in PR 2719, however they are no longer used: - It appears that `hasPendingRequests` was never used at all, even from the beginning. - The only general PDF.js library usage of `abortAllRequests` was removed in PR 6879, which is now four years ago. (Originally the Firefox-specific network implementation, see https://searchfox.org/mozilla-central/source/browser/extensions/pdfjs/content/PdfJsNetwork.jsm, was shared with the `src/display/network.js` file and there this method is used. However, since all of the Firefox-specific code now lives directly in mozilla-central, that's not relevant for the removal in this patch.)	2020-06-07 16:03:32 +02:00
Jonas Jenwald	64378fc366	[api-minor] Remove the deprecated `PDFDocumentProxy.getOpenActionDestination` method (PR 11644 follow-up) This method has been printing a `deprecated` warning in two releases, hence it should hopefully be safe to remove now.	2020-06-02 12:28:00 +02:00
Tim van der Meij	f14215da37	Implement fill opacity for shading patterns in the SVG back-end In the PDF file from the issue below, the fill alpha (`ca`) is set before drawing the circles using the `setGState` operator. Doing so causes the global alpha to be set on the canvas' context for the canvas back-end, but this was not handled in the SVG back-end. This patch fixes that by taking the fill opacity into account when drawing shading patterns in the same way as done elsewhere so it is only included if the value is non-default. Fixes #11812.	2020-05-24 14:25:40 +02:00
Jonas Jenwald	18e0b10d3c	[api-minor] Remove the `disableCreateObjectURL` option from the `getDocument` parameters, since it's now unused in the API With the changes in previous patches, the `disableCreateObjectURL` option/functionality is no longer used for anything in the API and/or in the Worker code. Note however that there's some functionality, mainly related to file loading/downloading, in the GENERIC version of the default viewer which still depends on this option. Hence the `disableCreateObjectURL` option (and related compatibility code) is moved into the viewer, see e.g. `web/app_options.js`, such that it's still available in the default viewer.	2020-05-22 00:22:48 +02:00
Jonas Jenwald	cc4cc8b11b	Remove the, now unused, `releaseImageResources` helper function With the changes in the previous patch, this is now dead code which should thus be removed.	2020-05-22 00:22:48 +02:00
Jonas Jenwald	0351852d74	[api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js` Currently some JPEG images are decoded by the built-in PDF.js decoder in `src/core/jpg.js`, while others attempt to use the browser JPEG decoder. This inconsistency seem unfortunate for a number of reasons: - It adds, compared to the other image formats supported in the PDF specification, a fair amount of code/complexity to the image handling in the PDF.js library. - The PDF specification support JPEG images with features, e.g. certain ColorSpaces, that browsers are unable to decode natively. Hence, determining if a JPEG image is possible to decode natively in the browser require a non-trivial amount of parsing. In particular, we're parsing (part of) the raw JPEG data to extract certain marker data and we also need to parse the ColorSpace for the JPEG image. - While some JPEG images may, for all intents and purposes, appear to be natively supported there's still cases where the browser may fail to decode some JPEG images. In order to support those cases, we've had to implement a fallback to the PDF.js JPEG decoder if there's any issues during the native decoding. This also means that it's no longer possible to simply send the JPEG image to the main-thread and continue parsing, but you now need to actually wait for the main-thread to indicate success/failure first. In practice this means that there's a code-path where the worker-thread is forced to wait for the main-thread, while the reverse should always be the case. - The native decoding, for anything except the simplest of JPEG images, result in increased peak memory usage because there's a handful of short-lived copies of the JPEG data (see PR 11707). Furthermore this also leads to data being parsed on the main-thread, rather than the worker-thread, which you usually want to avoid for e.g. performance and UI-reponsiveness reasons. - Not all environments, e.g. Node.js, fully support native JPEG decoding. This has, historically, lead to some issues and support requests. - Different browsers may use different JPEG decoders, possibly leading to images being rendered slightly differently depending on the platform/browser where the PDF.js library is used. Originally the implementation in `src/core/jpg.js` were unable to handle all of the JPEG images in the test-suite, but over the last couple of years I've fixed (hopefully) all of those issues. At this point in time, there's two kinds of failure with this patch: - Changes which are basically imperceivable to the naked eye, where some pixels in the images are essentially off-by-one (in all components), which could probably be attributed to things such as different rounding behaviour in the browser/PDF.js JPEG decoder. This type of "failure" accounts for the vast majority of the total number of changes in the reference tests. - Changes where the JPEG images now looks ever so slightly blurrier than with the native browser decoder. For quite some time I've just assumed that this pointed to a general deficiency in the `src/core/jpg.js` implementation, however I've discovered when comparing two viewers side-by-side that the differences vanish at higher zoom levels (usually around 200% is enough). Basically if you disable [this downscaling in canvas.js](`8fb82e939c/src/display/canvas.js (L2356-L2395)`), which is what happens when zooming in, the differences simply vanish! Hence I'm pretty satisfied that there's no significant problems with the `src/core/jpg.js` implementation, and the problems are rather tied to the general quality of the downscaling algorithm used. It could even be seen as a positive that all images now share the same downscaling behaviour, since this actually fixes one old bug; see issue 7041.	2020-05-22 00:22:48 +02:00
Jonas Jenwald	dda6626f40	Attempt to cache repeated images at the document, rather than the page, level (issue 11878) Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the same images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1] Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2] However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages. In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be twenty copies of the image data). While this obviously benefit both CPU and memory usage in this case, for very large image data this patch may possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will only cache a certain number of image resources at the document level and simply fallback to the default behaviour. Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3] Please note: The patch will lead to small movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator. --- [1] There's e.g. PDF documents that use the same image as background on all pages. [2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer. [3] If the latter case were true, we could simply check for repeat images before parsing started and thus avoid handling any duplicate image resources.	2020-05-21 18:13:45 +02:00
Jonas Jenwald	8d56a69e74	Reduce usage of SystemJS, in the development viewer, even further With these changes SystemJS is now only used, during development, on the worker-thread and in the unit/font-tests, since Firefox is currently missing support for worker modules; please see https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 Hence all the JavaScript files in the `web/` and `src/display/` folders are now loaded natively by the browser (during development) using standard `import` statements/calls, thanks to a nice `import-maps` polyfill. Please note: As soon as https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 is fixed in Firefox, we should be able to remove all traces of SystemJS and thus finally be able to use every possible modern JavaScript feature.	2020-05-20 13:36:52 +02:00
Jonas Jenwald	d4d933538b	Re-factor `setPDFNetworkStreamFactory`, in src/display/api.js, to also accept an asynchronous function As part of trying to reduce the usage of SystemJS in the development viewer, this patch is a necessary step that will allow removal of some `require` statements. Currently this uses `SystemJS.import` in non-PRODUCTION mode, but it should be possible to replace those with standard dynamic `import` calls in the future.	2020-05-20 13:18:18 +02:00

1 2 3 4 5 ...

1018 Commits