Commit Graph

1018 Commits

Author SHA1 Message Date
Jonathan Grimes
ac723a1760 Allow loading pdf fonts into another document. 2020-08-08 02:52:32 +00:00
Takashi Tamura
4ac62d8787 Fix the type of PDFDocumentLoadingTask.destroy. 2020-08-07 16:10:19 +09:00
Tim van der Meij
8c162f57f7
Merge pull request #12175 from calixteman/textfield
Support textfield and choice widgets for printing
2020-08-07 00:20:29 +02:00
Calixte Denizet
1747d259f9 Support textfield and choice widgets for printing 2020-08-06 14:45:23 +02:00
Jonas Jenwald
16fa9dc4ea Add support for Object.fromEntries
This provides a simpler way of creating an `Object` from e.g. a `Map`, without having to manually iterate over it.
Please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/fromEntries
2020-08-06 14:39:51 +02:00
Jonas Jenwald
5e44b241b2 [api-minor] Fix the annotationStorage parameter in PDFPageProxy.render
While the parameter name (clearly) suggests that an `AnnotationStorage`-instance is expected, looking at the only call-sites that include the parameter (i.e. the `PDFPrintServiceFactory` instances) it actually contains just a normal Object.

Hence it seems much more reasonable to actually pass a valid `AnnotationStorage`-instance, as the name suggests, and simply have `PDFPageProxy.render` do the `annotationStorage.getAll()` call. (Since we cannot send an `AnnotationStorage`-instance as-is to the worker-thread, given the "structured clone algorithm".)
2020-08-05 23:02:30 +02:00
Takashi Tamura
a0f0ab78f3 Fix the type definition of TypedArray. 2020-08-05 17:01:08 +09:00
Tim van der Meij
56ca027c08
Improve consistency for the API documentation comments
Over time we used multiple different formats for JSDoc comments. This
commit standardizes those formats to the one we used most often.

Moreover, this removes the example in the outline endpoint documentation
since it now has a proper type definition and it didn't render correctly
in JSDoc.
2020-08-04 23:27:22 +02:00
Tim van der Meij
ba4a07ce07
Fix incorrect types in the API documentation 2020-08-04 23:19:59 +02:00
Tim van der Meij
3116216e1d
Improve the API documentation for PDFDocumentLoadingTask
This commit:
- formats the documentation block according to the standards;
- replaces the callback definitions with the `function` type (we have
  that for other definitions already and the callback type was not
  rendered correctly by JSDoc);
- synchronizes the type documentation and the class documentation;
- fixes the documentation by making it easier to read and making sure
  that all optional properties are indicated as such;
- uses the `@link` tag to indicate links to other code.

The `typestest` still passes and JSDoc now renders this class correctly.
2020-08-04 23:17:24 +02:00
Brendan Dahl
ac494a2278 Add support for optional marked content.
Add a new method to the API to get the optional content configuration. Add
a new render task param that accepts the above configuration.
For now, the optional content is not controllable by the user in
the viewer, but renders with the default configuration in the PDF.

All of the test files added exhibit different uses of optional content.

Fixes #269.

Fix test to work with optional content.

- Change the stopAtErrors test to ensure the operator list has something,
  instead of asserting the exact number of operators.
2020-08-04 09:26:55 -07:00
Tim van der Meij
e68ac05f18
Merge pull request #12160 from tamuratak/worker_options
Use typedef to define the type of GlobalWorkerOptions (PR 12102 follow-up)
2020-08-03 22:55:49 +02:00
Tim van der Meij
0b75701012
Merge pull request #12157 from tamuratak/fix_svggraphics
Fix the type of SVGGraphics (PR 12102 follow-up)
2020-08-03 22:52:10 +02:00
Tim van der Meij
adc7645a44
Merge pull request #12161 from tamuratak/exported_func
Add types to functions exported as API in src/pdf.js (PR 12102 follow-up)
2020-08-03 22:43:50 +02:00
Takashi Tamura
923ba27f1f Tweak for the type of PageViewportParameters.viewBox 2020-08-03 20:42:42 +09:00
Takashi Tamura
bc4648c0a6 Add types to functions exported as API in src/pdf.js. 2020-08-03 19:19:48 +09:00
Takashi Tamura
f6fd8e9e7f Use typedef to define the type of GlobalWorkerOptions. 2020-08-03 19:06:28 +09:00
Takashi Tamura
d72bbecee2 Fix the type of SVGGraphics. 2020-08-03 09:58:19 +09:00
Tim van der Meij
00a8b42e67
Merge pull request #12102 from ineiti/add_types_annotations
Add types annotations
2020-08-02 16:45:37 +02:00
Tim van der Meij
5a66c56eca
Merge pull request #12108 from calixteman/radio
Add support for radios printing
2020-08-02 14:47:46 +02:00
Jonas Jenwald
05baa4c89f
Revert "[api-minor] Allow loading pdf fonts into another document." 2020-08-01 12:52:39 +02:00
Tim van der Meij
173b92a873
Merge pull request #12131 from jsg2021/issue-8271
[api-minor] Allow loading pdf fonts into another document.
2020-08-01 01:13:41 +02:00
Tim van der Meij
0d20a2b7b4
Merge pull request #12144 from Snuffleupagus/uncaught-promise-AbortException
Prevent `Uncaught (in promise) AbortException` when running the unit-tests
2020-08-01 00:22:10 +02:00
Jonas Jenwald
6d192f987e Prevent Uncaught (in promise) AbortException when running the unit-tests
These errors can/will occur if data is still loading when the document is destroyed, which is the case in the API unit-tests that load the `tracemonkey.pdf` file.
While this patch prevents these kind of problems, and thus allows us to update Jasmine again, I cannot help but thinking that it's slightly "hacky". Basically, we'll simply catch and ignore (some) rejected promises once the document is destroyed and/or its data loading is aborted. However, I don't *think* that these changes should cause issues in general, since we don't really care about errors once document destruction has started (note e.g. the fair number of `catch` handlers ignoring `AbortException`s already).
2020-07-31 23:29:05 +02:00
Jonathan Grimes
9b16b8ef71
Allow loading pdf fonts into another document. 2020-07-31 11:41:48 -05:00
Jonas Jenwald
346afd1e1c [api-minor] Fix the AnnotationStorage usage properly in the viewer/tests (PR 12107 and 12143 follow-up)
*The [api-minor] label probably ought to have been added to the original PR, given the changes to the `createAnnotationLayerBuilder` signature (if nothing else).*

This patch fixes the following things:
 - Let the `AnnotationLayer.render` method create an `AnnotationStorage`-instance if none was provided, thus making the parameter *properly* optional. This not only fixes the reference tests, it also prevents issues when the viewer components are used.
 - Stop exporting `AnnotationStorage` in the official API, i.e. the `src/pdf.js` file, since it's no longer necessary given the change above. Generally speaking, unless absolutely necessary we probably shouldn't export unused things in the API.
 - Fix a number of JSDocs `typedef`s, in `src/display/` and `web/` code, to actually account for the new `annotationStorage` parameter.
 - Update `web/interfaces.js` to account for the changes in `createAnnotationLayerBuilder`.
 - Initialize the storage, in `AnnotationStorage`, using `Object.create(null)` rather than `{}` (which is the PDF.js default).
2020-07-31 16:32:46 +02:00
Calixte Denizet
538017f7a7 Add support for radios printing 2020-07-31 14:31:49 +02:00
Linus Gasser
f1bbfdc16d Add typescript definitions
This PR adds typescript definitions from the JSDoc already present.
It adds a new gulp-target 'types' that calls 'tsc', the typescript
compiler, to create the definitions.

To use the definitions, users can simply do the following:

```
import {getDocument, GlobalWorkerOptions} from "pdfjs-dist";
import pdfjsWorker from "pdfjs-dist/build/pdf.worker.entry";
GlobalWorkerOptions.workerSrc = pdfjsWorker;

const pdf = await getDocument("file:///some.pdf").promise;
```

Co-authored-by: @oBusk
Co-authored-by: @tamuratak
2020-07-30 11:10:37 +02:00
Tim van der Meij
eb4d6a0652
Merge pull request #12107 from calixteman/checkbox
Add support for checkboxes printing
2020-07-30 00:11:41 +02:00
Calixte Denizet
cb60523a15 Add support for checkboxes printing 2020-07-29 16:42:57 +02:00
Jonas Jenwald
1b720a4b23 Ignore fetch() errors, in PDFFetchStreamRangeReader, once the request has been aborted
Besides making general sense, as far as I can tell, this patch should also prevent *one* source of `Uncaught (in promise) ...` exceptions.
Unfortunately `reason instanceof AbortError` doesn't work here, since `AbortError` isn't actually defined in browsers; note how even the DOM specification contains an example using the `name` property: https://dom.spec.whatwg.org/#aborting-ongoing-activities

This patch prevents the following errors from being logged in the console, when the unit-tests are running:
 - Firefox: `Uncaught (in promise) DOMException: The operation was aborted.`
 - Chrome: `Uncaught (in promise) DOMException: The user aborted a request.`
2020-07-28 17:18:49 +02:00
Jonas Jenwald
f3ff526019 Send/receive Type3 images the same way as other globally-cached images
There's quite frankly no particular reason to special-case Type3-fonts with image resources, which are very rare anyway, now that we have a general mechanism for sending/receiving images globally.
2020-07-27 13:20:15 +02:00
Calixte Denizet
584902dbf8 Add an annotation storage in order to save annotation data in acroforms 2020-07-24 10:50:11 +02:00
Jonas Jenwald
d4d7ac1b88 Stop special-casing the (very unlikely) "no /XObject found"-scenario, when parsing OPS.paintXObject operators, in PartialEvaluator.{getOperatorList, getTextContent}
Originally there weren't any (generally) good ways to handle errors gracefully, on the worker-side, however that's no longer the case and we can simply fallback to the existing `ignoreErrors` functionality instead.
Also, please note that the "no `/XObject` found"-scenario should be *extremely* unlikely in practice and would only occur in corrupt/broken documents.

Note that the `PartialEvaluator.getOperatorList` case is especially bad currently, since we'll simply (attempt to) send the data as-is to the main-thread. This is quite bad, since in a corrupt/broken document the data *could* contain anything and e.g. be unclonable (which would cause breaking errors).
Also, we're (obviously) not attempting to do anything with this "raw" `OPS.paintXObject` data on the main-thread and simply ensuring that we never send it definately seems like the correct approach.
2020-07-12 21:59:59 +02:00
Jonas Jenwald
4a7e29865d [api-minor] Use the NodeCanvasFactory/NodeCMapReaderFactory classes as defaults in Node.js environments (issue 11900)
This moves, and slightly simplifies, code that's currently residing in the unit-test utils into the actual library, such that it's bundled with `GENERIC`-builds and used in e.g. the API-code.

As an added bonus, this also brings out-of-the-box support for CMaps in e.g. the Node.js examples.
2020-07-02 04:44:23 +02:00
Jonas Jenwald
fef24658e7 Adjust the heuristics used when dealing with rectangles, i.e. re operators, with zero width/height (issue 12010) 2020-07-02 00:02:49 +02:00
Tim van der Meij
c1cb9ee9fc
Merge pull request #12016 from Snuffleupagus/issue-8078
Tweak the `QueueOptimizer` to recognize `OPS.paintImageMaskXObject` operators as *repeated* when the "skew" transformation matrix elements are non-zero (issue 8078)
2020-06-21 19:38:27 +02:00
Jonas Jenwald
4cb0c032f3 Convert the PDFPageProxy.intentStates property from an Object to a Map
As can be seen in the code there's a handful of places where this structure needs to be iterated, something that becomes cumbersome when dealing with `Object`s. Hence, by changing this to a `Map` instead we can both simplify the code and avoid creating unnecessary closures.

Particularily the `PDFPageProxy._tryCleanup` method becomes a lot more readable, at least in my opinion.

Finally, since this property is intended to be "private" the name is adjusted to reflect that.
2020-06-21 17:02:42 +02:00
Jonas Jenwald
cabc2cc4fc Add a InternalRenderTask.completed getter and use it to simplify PDFPageProxy._destroy
This patch aims to simplify the `PDFPageProxy._destroy` method, by:
 - Replacing the unnecessary `forEach` with a "regular" `for`-loop instead.
 - Use a more appropriate variable name, since `intentState.renderTasks` contain instances of `InternalRenderTask`.
 - Move the "is rendering completed"-handling to a new `InternalRenderTask.completed` getter, to abstract away some (mostly) internal `InternalRenderTask` state.
2020-06-21 15:56:14 +02:00
Jonas Jenwald
e18fa3fc45 Tweak the QueueOptimizer to recognize OPS.paintImageMaskXObject operators as *repeated* when the "skew" transformation matrix elements are non-zero (issue 8078)
*First of all, I should mention that my understanding of the finer details of the `QueueOptimizer` (and its related `CanvasGraphics` methods) is somewhat limited.*
Hence I'm not sure if there's actually a very good reason for *only* considering ImageMasks where the "skew" transformation matrix elements are zero as *repeated*, however simply looking at the code I just don't see why these elements cannot be non-zero as long as they are *all identical* for the ImageMasks.
Furthermore, looking at the *group* case (which is what we're currently falling back to), there's no particular limitation placed upon the transformation matrix elements.

While this patch obviously isn't enough to *completely* fix the issue, since there should be a visible Pattern rendered as well[1], it seem (at least to me) like enough of an improvement that submitting this is justified.
With these changes the referenced PDF document will no longer hang the *entire* browser, and rendering also finishes in a *reasonable* time (< 10 seconds for me) which seem fine given the *huge* number of identical inline images present.[2]

---
[1] Temporarily changing the Pattern to a solid color *does* render the correct/expected area, which suggests that the remaining problem is a pre-existing issue related to the Pattern-handling itself rather than the `QueueOptimizer` functionality.

[2] The document isn't exactly rendered immediately in e.g. Adobe Reader either.
2020-06-20 12:18:48 +02:00
Jonas Jenwald
00d45fce33 Update SVGGraphics to account for globally cached images (PR 11912 follow-up)
Since there's (essentially) no tests for the SVG-backend, these changes didn't make in into PR 11912 when the code in the `src/display/canvas.js` file was modified.
2020-06-10 15:31:26 +02:00
Jonas Jenwald
466d10f6fc Remove unused methods from NetworkManager, in src/display/network.js
Both of the removed methods were added in PR 2719, however they are no longer used:
 - It appears that `hasPendingRequests` was never used at all, even from the beginning.
 - The only general PDF.js library usage of `abortAllRequests` was removed in PR 6879, which is now four years ago. (Originally the Firefox-specific network implementation, see https://searchfox.org/mozilla-central/source/browser/extensions/pdfjs/content/PdfJsNetwork.jsm, was shared with the `src/display/network.js` file and *there* this method is used. However, since all of the Firefox-specific code now lives directly in mozilla-central, that's not relevant for the removal in this patch.)
2020-06-07 16:03:32 +02:00
Jonas Jenwald
64378fc366 [api-minor] Remove the deprecated PDFDocumentProxy.getOpenActionDestination method (PR 11644 follow-up)
This method has been printing a `deprecated` warning in two releases, hence it should hopefully be safe to remove now.
2020-06-02 12:28:00 +02:00
Tim van der Meij
f14215da37
Implement fill opacity for shading patterns in the SVG back-end
In the PDF file from the issue below, the fill alpha (`ca`) is set
before drawing the circles using the `setGState` operator. Doing so
causes the global alpha to be set on the canvas' context for the canvas
back-end, but this was not handled in the SVG back-end. This patch fixes
that by taking the fill opacity into account when drawing shading
patterns in the same way as done elsewhere so it is only included if the
value is non-default.

Fixes #11812.
2020-05-24 14:25:40 +02:00
Jonas Jenwald
18e0b10d3c [api-minor] Remove the disableCreateObjectURL option from the getDocument parameters, since it's now unused in the API
With the changes in previous patches, the `disableCreateObjectURL` option/functionality is no longer used for anything in the API and/or in the Worker code.

Note however that there's some functionality, mainly related to file loading/downloading, in the GENERIC version of the default viewer which still depends on this option.
Hence the `disableCreateObjectURL` option (and related compatibility code) is moved into the viewer, see e.g. `web/app_options.js`, such that it's still available in the default viewer.
2020-05-22 00:22:48 +02:00
Jonas Jenwald
cc4cc8b11b Remove the, now unused, releaseImageResources helper function
With the changes in the previous patch, this is now dead code which should thus be removed.
2020-05-22 00:22:48 +02:00
Jonas Jenwald
0351852d74 [api-minor] Decode all JPEG images with the built-in PDF.js decoder in src/core/jpg.js
Currently some JPEG images are decoded by the built-in PDF.js decoder in `src/core/jpg.js`, while others attempt to use the browser JPEG decoder. This inconsistency seem unfortunate for a number of reasons:

 - It adds, compared to the other image formats supported in the PDF specification, a fair amount of code/complexity to the image handling in the PDF.js library.

 - The PDF specification support JPEG images with features, e.g. certain ColorSpaces, that browsers are unable to decode natively. Hence, determining if a JPEG image is possible to decode natively in the browser require a non-trivial amount of parsing. In particular, we're parsing (part of) the raw JPEG data to extract certain marker data and we also need to parse the ColorSpace for the JPEG image.

 - While some JPEG images may, for all intents and purposes, appear to be natively supported there's still cases where the browser may fail to decode some JPEG images. In order to support those cases, we've had to implement a fallback to the PDF.js JPEG decoder if there's any issues during the native decoding. This also means that it's no longer possible to simply send the JPEG image to the main-thread and continue parsing, but you now need to actually wait for the main-thread to indicate success/failure first.
   In practice this means that there's a code-path where the worker-thread is forced to wait for the main-thread, while the reverse should *always* be the case.

 - The native decoding, for anything except the *simplest* of JPEG images, result in increased peak memory usage because there's a handful of short-lived copies of the JPEG data (see PR 11707).
Furthermore this also leads to data being *parsed* on the main-thread, rather than the worker-thread, which you usually want to avoid for e.g. performance and UI-reponsiveness reasons.

 - Not all environments, e.g. Node.js, fully support native JPEG decoding. This has, historically, lead to some issues and support requests.

 - Different browsers may use different JPEG decoders, possibly leading to images being rendered slightly differently depending on the platform/browser where the PDF.js library is used.

Originally the implementation in `src/core/jpg.js` were unable to handle all of the JPEG images in the test-suite, but over the last couple of years I've fixed (hopefully) all of those issues.
At this point in time, there's two kinds of failure with this patch:

 - Changes which are basically imperceivable to the naked eye, where some pixels in the images are essentially off-by-one (in all components), which could probably be attributed to things such as different rounding behaviour in the browser/PDF.js JPEG decoder.
   This type of "failure" accounts for the *vast* majority of the total number of changes in the reference tests.

 - Changes where the JPEG images now looks *ever so slightly* blurrier than with the native browser decoder. For quite some time I've just assumed that this pointed to a general deficiency in the `src/core/jpg.js` implementation, however I've discovered when comparing two viewers side-by-side that the differences vanish at higher zoom levels (usually around 200% is enough).
   Basically if you disable [this downscaling in canvas.js](8fb82e939c/src/display/canvas.js (L2356-L2395)), which is what happens when zooming in, the differences simply vanish!
   Hence I'm pretty satisfied that there's no significant problems with the `src/core/jpg.js` implementation, and the problems are rather tied to the general quality of the downscaling algorithm used. It could even be seen as a positive that *all* images now share the same downscaling behaviour, since this actually fixes one old bug; see issue 7041.
2020-05-22 00:22:48 +02:00
Jonas Jenwald
dda6626f40 Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]

Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.

In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.

Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]

*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.

---
[1] There's e.g. PDF documents that use the same image as background on all pages.

[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.

[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-21 18:13:45 +02:00
Jonas Jenwald
8d56a69e74 Reduce usage of SystemJS, in the development viewer, even further
With these changes SystemJS is now only used, during development, on the worker-thread and in the unit/font-tests, since Firefox is currently missing support for worker modules; please see https://bugzilla.mozilla.org/show_bug.cgi?id=1247687

Hence all the JavaScript files in the `web/` and `src/display/` folders are now loaded *natively* by the browser (during development) using standard `import` statements/calls, thanks to a nice `import-maps` polyfill.

*Please note:* As soon as https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 is fixed in Firefox, we should be able to remove all traces of SystemJS and thus finally be able to use every possible modern JavaScript feature.
2020-05-20 13:36:52 +02:00
Jonas Jenwald
d4d933538b Re-factor setPDFNetworkStreamFactory, in src/display/api.js, to also accept an asynchronous function
As part of trying to reduce the usage of SystemJS in the development viewer, this patch is a necessary step that will allow removal of some `require` statements.

Currently this uses `SystemJS.import` in non-PRODUCTION mode, but it should be possible to replace those with standard *dynamic* `import` calls in the future.
2020-05-20 13:18:18 +02:00