pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	c42029489e	Run `gulp lint --fix`, to account for changes in Prettier version `2.2.1` Please refer to https://github.com/prettier/prettier/blob/master/CHANGELOG.md#221 for additional details.	2020-11-29 10:01:46 +01:00
Tim van der Meij	256068556d	Merge pull request #12662 from Snuffleupagus/issue-12402 Check the top-level /Pages dictionary when finding the trailer in `XRef.indexObjects` (issue 12402)	2020-11-25 21:54:41 +01:00
Jonas Jenwald	8a132f584d	Check the top-level /Pages dictionary when finding the trailer in `XRef.indexObjects` (issue 12402) In addition to the existing /Root and /Pages validation, also check that the /Pages-entry actually is a dictionary and that it has a valid /Count-entry. This way we can avoid picking a trailer candidate which e.g. the `Catalog.numPages` getter will just end up rejecting, thus breaking PDF document loading completely.	2020-11-25 15:14:53 +01:00
Calixte Denizet	18b525de2e	Parenthesis in names are not escaped when saving	2020-11-25 12:28:12 +01:00
Calixte Denizet	b11592a756	JS -- hidden annotations must be built in case a script show them * in some pdf, there are actions with "event.source.hidden = ..." * in order to handle visibility when printing, annotationStorage is extended to store multiple properties (value, hidden, editable, ...)	2020-11-10 12:48:34 +01:00
Calixte Denizet	a5279897a7	JS -- Add listener for sandbox events only if there are some actions * When no actions then set it to null instead of empty object * Even if a field has no actions, it needs to listen to events from the sandbox in order to be updated if an action changes something in it.	2020-11-09 18:37:59 +01:00
Jonas Jenwald	a03b383edb	Fail early, in modern `GENERIC` builds, if `globalThis` isn't available (PR 11799 follow-up, issue 12596) It probably doesn't hurt to explicitly check for `globalThis` as well, in addition to the existing checks.	2020-11-07 19:00:33 +01:00
Tim van der Meij	99ac2d1036	Merge pull request #12583 from Snuffleupagus/nonBlendModesSet Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in `PartialEvaluator.hasBlendModes`	2020-11-05 23:53:39 +01:00
Tim van der Meij	646f895d35	Merge pull request #12568 from calixteman/defaultvalue [api-minor] JS -- Add default value in annotation data	2020-11-05 22:53:21 +01:00
Jonas Jenwald	082cd8fc6c	Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in `PartialEvaluator.hasBlendModes` The `PartialEvaluator.hasBlendModes` method is necessary to determine if there's any blend modes on a page, which unfortunately requires synchronous parsing of the /Resources of each page before its rendering can start (see the "StartRenderPage"-message). In practice it's not uncommon for certain /Resources-entries to be found on more than one page (referenced via the XRef-table), which thus leads to unnecessary re-fetching/re-parsing of data in `PartialEvaluator.hasBlendModes`. To improve performance, especially in pathological cases, we can cache /Resources-entries when it's absolutely clear that they do not contain any blend modes at all[1]. This way, subsequent `PartialEvaluator.hasBlendModes` calls can be made significantly more efficient. This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf: ``` [ { "id": "issue6961", "file": "../web/pdfs/issue6961.pdf", "md5": "a80e4357a8fda758d96c2c76f2980b03", "rounds": 100, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, page, stat -- browser \| page \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ---- \| ------------ \| ----- \| ------------ \| ----------- \| ---- \| ------ \| ------------- firefox \| 0 \| Overall \| 100 \| 1034 \| 555 \| -480 \| -46.39 \| faster firefox \| 0 \| Page Request \| 100 \| 489 \| 7 \| -482 \| -98.67 \| faster firefox \| 0 \| Rendering \| 100 \| 545 \| 548 \| 2 \| 0.45 \| firefox \| 1 \| Overall \| 100 \| 912 \| 428 \| -484 \| -53.06 \| faster firefox \| 1 \| Page Request \| 100 \| 487 \| 1 \| -486 \| -99.77 \| faster firefox \| 1 \| Rendering \| 100 \| 425 \| 427 \| 2 \| 0.51 \| ``` --- [1] In the case where blend modes are found, it becomes a lot more difficult to know if it's generally safe to skip /Resources-entries. Hence we don't cache anything in that case, however note that most document/pages do not utilize blend modes anyway.	2020-11-05 16:59:08 +01:00
Calixte Denizet	39f5954729	JS -- Add default value in annotation data * these values are used when a form is resetted	2020-11-05 13:44:23 +01:00
Brendan Dahl	1de2bc4816	Merge pull request #12505 from calixteman/12504 Split highlight annotation div into multiple divs	2020-11-04 10:41:28 -08:00
Tim van der Meij	3e52098e29	Merge pull request #12555 from calixteman/color Replace css color rgb(...) by #...	2020-11-02 23:55:39 +01:00
Calixte Denizet	9d11b51a3e	Replace css color rgb(...) by #... * it's faster to generate the color code in using a table for components * it's very likely a way faster to parse (when setting the color in the canvas)	2020-11-02 10:25:04 +01:00
Tim van der Meij	46e60a266c	Merge pull request #12552 from Snuffleupagus/annotation-fixes Miscellaneous (small) improvements in `src/core/annotation.js`	2020-10-31 00:41:39 +01:00
Tim van der Meij	e341e6e542	Merge pull request #12525 from brendandahl/mark-info [api-minor] Implement API to get MarkInfo from the catalog.	2020-10-31 00:05:19 +01:00
Brendan Dahl	f5c821e9c3	[api-minor] Implement API to get MarkInfo from the catalog.	2020-10-30 10:59:45 -07:00
Jonas Jenwald	fdb6520012	Change the `Catalog.openAction` getter back to using an Object internally (PR 12543 follow-up) Given that the `Map`-pattern apparently has undesirable performance characteristics, change this getter back to using an Object instead and check its size before returning it.	2020-10-30 13:27:05 +01:00
Jonas Jenwald	a1e5581a0b	Let `Annotation._collectActions` return `null` when no actions are present Rather than returning an empty Object[1] we should be returning `null` instead, since that's consistent with existing API-functionality. To avoid having to manually track if the Object is empty, this patch also introduces a small helper function to check its size.	2020-10-30 13:23:05 +01:00
Jonas Jenwald	8540b4cc76	Stop calling `Font.charsToGlyphs`, in `src/core/annotation.js`, with unused arguments As can be seen in `src/core/fonts.js`, this method only accepts one parameter, hence it's somewhat difficult to understand what the Annotation-code is actually attempting to do here. The only possible explanation that I can imagine, is that the intention was initially to call `Font.charToGlyph` directly instead. However, note that that'd would not actually have been correct, since that'd ignore one level of font-caching (see `this.charsCache`). Hence the unused arguments are removed, in `src/core/annotation.js`, and the `Font.charToGlyph` method is now marked as "private" as intended.	2020-10-30 13:17:52 +01:00
Jonas Jenwald	46e94cad17	Fix some errors reported by the ESLint `no-useless-escape` rule This patch removes unnecessary escape-sequence in (mostly) strings, as a first step, since the ones in regular expressions probably requires more careful testing (just in case). The only exception is a regular expression in `src/core/annotation.js`, since we should have both unit- and reference-tests for this code and given [this information on MDN](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Character_Classes#Types): > Inside a character set, the dot loses its special meaning and matches a literal dot. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-useless-escape	2020-10-29 15:40:40 +01:00
Jonas Jenwald	9fc7cdcc9d	Use a `Map`, rather than an `Object`, internally in the `Catalog.openAction` getter (PR 11644 follow-up) This provides a work-around to avoid having to conditionally try to initialize the `openAction`-object in multiple places. Given that `Object.fromEntries` doesn't seem to guarantee that a `null` prototype is used, we thus hack around that by using `Object.assign` with `Object.create(null)`.	2020-10-28 14:43:28 +01:00
Tim van der Meij	ea4d88a330	Merge pull request #12395 from calixteman/checks Render not displayed annotations in using normal appearance when printing	2020-10-28 00:11:10 +01:00
Calixte Denizet	6be2f84b4e	Render not displayed annotations in using normal appearance when printing	2020-10-27 19:00:31 +01:00
Tim van der Meij	71a14be8e7	Merge pull request #12534 from Snuffleupagus/murmurhash-slice Ensure that `MurmurHash3_64.update` handles `ArrayBuffer` input correctly, to avoid hash-collisions (issue 12533)	2020-10-26 23:34:03 +01:00
Jonas Jenwald	f2fa053c51	Ensure that `MurmurHash3_64.update` handles `ArrayBuffer` input correctly, to avoid hash-collisions (issue 12533) Different fonts incorrectly end up with identical hashes, despite having different /ToUnicode data. The issue, and it's very interesting that we've apparently not seen it before, appears to be caused by the fact that different /ToUnicode entries share the same underlying `ArrayBuffer`, which thus becomes problematic at the `const dataUint32 = new Uint32Array(data.buffer, 0, blockCounts);` line. The simplest solution thus seem to be to just copy the input, when it's an `ArrayBuffer`, rather than using it as-is. (Note that if we'd stringified the input, when calling `MurmurHash3_64.update`, the issue would also have been fixed. In this case, we're already creating an unique TypedArray.)	2020-10-26 16:27:33 +01:00
Jonas Jenwald	56fa6d414c	Add a `getArrayLookupTableFactory` helper function and use it to re-format `src/core/{glyphlist, unicode}.js` Please note: Once https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 is implemented, and we've removed SystemJS completely, this entire patch can (and even should) be reverted. This is similar to the existing `getLookupTableFactory` helper function, but is implemented as outlined in issue 6774. The re-formatting of the tables were done automatically, by using find-and-replace with regular expressions. For reasons that I don't even pretend to understand, using this particular structure for these very long lookup tables allow SystemJS to process the files correctly/quickly and the development viewer thus works as intended.	2020-10-26 11:08:00 +01:00
Jonas Jenwald	441d9c8cc0	Change `src/core/{glyphlist, unicode}.js` to use standard `import`/`export` statements While the built `pdf.worker.js` file still works correctly with these changes, despite these two files being excluded by Babel[1], the development viewer does not because of issues with SystemJS[2] and/or its Babel-plugin (both of which are old). Furthermore, note also that excluding these two files from Babel-processing isn't generally necessary since e.g. the `gulp mozcentral` command works anyway. The explanation is rather that it's actually the source-map generation which fails for these huge sequences when building the `pdf.worker.js` file. However, not using standard `import`/`export` statements in all files means we also need to use SystemJS when e.e. running the unit-tests. This is very unfortunate, since SystemJS (or its old Babel-version) doesn't support modern ECMAScript features such as e.g. optional chaining and nullish coalescing. Unfortunately it also seems that https://bugzilla.mozilla.org/show_bug.cgi?id=1247687, which tracks the implementation of worker-modules in Firefox, has stalled since there hasn't been any updates for six months now. To hopefully address all of the above, this patch is the first in a series that attempts to further reduce our reliance on SystemJS. --- [1] The only difference being how the dependencies are handled, in the Webpack-bundled file. [2] Parsing takes way too long and consumes too much memory, thus rendering the development viewer essentially unusable.	2020-10-26 11:08:00 +01:00
Tim van der Meij	b4ca3d55b8	Merge pull request #12508 from calixteman/button_fallback_font Fallback font for buttons must be ZapfDingbats.	2020-10-24 18:56:12 +02:00
Tim van der Meij	180f35ee91	Merge pull request #12526 from Snuffleupagus/TilingPattern-args Improve argument/name handling when parsing TilingPatterns (PR 12458 follow-up)	2020-10-24 15:47:57 +02:00
Tim van der Meij	c493dc96fa	Merge pull request #12516 from Snuffleupagus/fieldObjects-annotation-undefined Prevent issues, in `PDFDocument.fieldObjects`, for invalid Annotations	2020-10-24 15:42:33 +02:00
Jonas Jenwald	b478d3e7b9	Improve argument/name handling when parsing TilingPatterns (PR 12458 follow-up) - Handle the arguments correctly in `PartialEvaluator.handleColorN`. For TilingPatterns with a base-ColorSpace, we're currently using the `args` when computing the color. However, as can be seen we're passing the Array as-is to the `ColorSpace.getRgb` method, which means that the `Name` is included as well.[1] Thankfully this hasn't, as far as I know, caused any actual bugs, but that may be more luck than anything else given how the `ColorSpace` code is implemented. This can be easily fixed though, simply by popping the `Name`-object off of the `args` Array. - Cache TilingPatterns using the `Name`-string, rather than the object directly. This is not only consistent with other caches in `PartialEvaluator`, but importantly it also ensures that the cache lookup always works correctly. Note that since `Name`-objects, similar to other primitives, uses a cache themselves a manually triggered `cleanup`-call could thus (theoretically) cause the `LocalTilingPatternCache` to not find an existing entry. While the likelihood of this happening is extremely small, it's still something that we should fix. --- [1] The `args` Array can e.g. look like this: `[0.043, 0.09, 0.188, 0.004, /P1]`, which means that we're passing in the `Name`-object to the `ColorSpace` method.	2020-10-24 13:49:46 +02:00
Calixte Denizet	37c86b2daa	Fallback font for buttons must be ZapfDingbats. Fix bug https://bugzilla.mozilla.org/show_bug.cgi?id=1669099.	2020-10-24 12:00:03 +02:00
Calixte Denizet	85e6c67cf3	Split highlight annotation div into multiple divs Fix for issue #12504. Highlight annotation may have several rectangles so we must have several divs to add mouse events handlers.	2020-10-23 15:26:16 +02:00
Jonas Jenwald	b44a975d7c	Prevent issues, in `PDFDocument.fieldObjects`, for invalid Annotations For an invalid Annotation, there's one code-path where `undefined` is returned from `AnnotationFactory._create`. That'd currently, incorrectly, trigger an error during the `PDFDocument._collectFieldObjects` parsing which thus seem good to avoid. Along these lines, the filtering in `PDFDocument.fieldObjects` is also updated to handle both `null` and `undefined` the same way.	2020-10-22 13:24:43 +02:00
Calixte Denizet	d2ef878702	Invalidate an annotation with no quadPoints (when it's required) Some pdf softwares don't remove highlight annotations but make the QuadPoints array empty. And the Rect for the annotation can be [-32768, -32768, 32768, 32768] so it leads to have a giant div which catches all the mouse events and make the pdf unusable when there are some forms elements.	2020-10-21 13:53:19 +02:00
Calixte Denizet	c30a3a94f0	JS - Add a function in api to get the fields ids in AcroForm::CO	2020-10-17 12:56:40 +02:00
Tim van der Meij	ff2631493e	Merge pull request #12481 from calixteman/issue_12475 Get urls if any in AA::D dictionary for pushbuttons	2020-10-16 22:55:43 +02:00
Tim van der Meij	32bceae732	Merge pull request #12483 from Snuffleupagus/formInfo-hasFields Don't store complex data in `PDFDocument.formInfo`, and replace the `fields` object with a `hasFields` boolean instead	2020-10-16 22:40:40 +02:00
Jonas Jenwald	f956d0a96a	Stop caching the parsed Font data on its `Dict` object (PR 7347 follow-up) Given that all fonts are, ever since PR 7347, now cached in the "normal" `fontCache` there's actually no reason for the special `font.translated` construction. (Given how Objects in JavaScript are references, rather than raw values, the old code shouldn't have caused any significant memory overhead.) Instead we can simply store the `cacheKey`, which is a simple string, on only the Font `Dict`s where it's needed and thus look-up all fonts using the `fontCache` instead.	2020-10-16 17:45:01 +02:00
Jonas Jenwald	29af15f37e	Add more validation in the `PDFDocument._hasOnlyDocumentSignatures` method If this method is ever passed invalid/unexpected data, or if during the course of parsing (since it's used recursively) such data is found, it will fail in a non-graceful way. Hence this patch, which ensures that we don't attempt to access non-existent properties and also that errors such as the one fixed in PR 12479 wouldn't have occured.	2020-10-16 13:03:47 +02:00
Jonas Jenwald	3351d3476d	Don't store complex data in `PDFDocument.formInfo`, and replace the `fields` object with a `hasFields` boolean instead This patch is based on a couple of smaller things that I noticed when working on PR 12479. - Don't store the /Fields on the `formInfo` getter, since that feels like overloading it with unintended (and too complex) data, and utilize a `hasFields` boolean instead. This functionality was originally added in PR 12271, to help determine what kind of form data a PDF document contains, and I think that we should ensure that the return value of `formInfo` only consists of "simple" data. With these changes the `fieldObjects` getter instead has to look-up the /Fields manually, however that shouldn't be a problem since the access is guarded by a `formInfo.hasFields` check which ensures that the data both exists and is valid. Furthermore, most documents doesn't even have any /AcroForm data anyway. - Determine the `hasFields` property first, to ensure that it's always correct even if there's errors when checking e.g. the /XFA or /SigFlags entires, since the `fieldObjects` getter depends on it. - Simplify a loop in `fieldObjects`, since the object being accessed is a `Map` and those have built-in iteration support. - Use a higher logging level for errors in the `formInfo` getter, and include the actual error message, since that'd have helped with fixing PR 12479 a lot quicker. - Update the JSDoc comment in `src/display/api.js` to list the return values correctly, and also slightly extend/improve the description.	2020-10-16 12:47:27 +02:00
Calixte Denizet	ce3d3a6ff8	Get urls if any in AA::D dictionary for pushbuttons	2020-10-15 19:42:36 +02:00
Jonas Jenwald	bc6b47a50e	Convert `PartialEvaluator.translateFont` to an `async` method This allows us to make a slight simplification in `PartialEvaluator.loadFont`, which thus removes an old TODO-comment from the method. Furthermore, in `PartialEvaluator.translateFont`, the CMap-handling is now limited to only composite fonts to avoid having to wait for a "dummy"-Promise for most fonts.	2020-10-15 09:42:58 +02:00
Tim van der Meij	a373137304	Merge pull request #12429 from calixteman/collect_js [api-minor] Add the possibility to collect Javascript actions	2020-10-14 23:27:47 +02:00
Calixte Denizet	71ecc3129b	Add the possibility to collect Javascript actions	2020-10-14 10:44:16 +02:00
Tim van der Meij	1034769ca1	Merge pull request #12477 from Snuffleupagus/SaveDocument-WorkerTask Handle `WorkerTask`s, and various PDF document properties, correctly in the "SaveDocument" handler in `src/core/worker.js`	2020-10-13 21:11:54 +02:00
Jonas Jenwald	65132ba5d8	Handle `WorkerTask`s, and various PDF document properties, correctly in the "SaveDocument" handler in `src/core/worker.js` - Actually register/unregister the `WorkerTask`s, used when saving each page, correctly. To prevent issues when terminating the Worker, we purposely wait for all running `WorkerTask`s to complete first. Hence we need to actually handle `WorkerTask`s the same way in "SaveDocument" as in the rest of this file, see e.g. "GetOperatorList" and "GetTextContent". - Access `PDFDocument` properties in a generally safe/consistent way. While the current code works fine, given how the PDF document is being loaded, it still seems like a very good idea to be consistent in how we access these kind of properties (since in general you need to avoid `MissingDataException` everywhere in this file). - Change a variable name, since there's essentially no precedent in the code-base for local variable names to start with an underscore.	2020-10-13 19:30:43 +02:00
Jonas Jenwald	38629c345d	Remove the `scope` parameter from the "GetOperatorList" handler in `src/core/worker.js` (PR 11110 follow-up) Support for the `scope` parameter, in `MessageHandler.on`, was removed in PR 11110 however this particular case was unused/unnecessary for years prior to that change. (From a quick look through the history, I'm not even sure if it was actually needed in the first place.)	2020-10-13 15:58:38 +02:00
Jonas Jenwald	30e8d5dea1	Add local caching of TilingPatterns in `PartialEvaluator.getOperatorList` (issue 2765 and 8473) In practice it's not uncommon for PDF documents to re-use the same TilingPatterns more than once, and parsing them is essentially equal to parsing of a (small) page since a `getOperatorList` call is required. By caching the internal TilingPattern representation we can thus avoid having to re-parse the same data over and over, and there's also less asynchronous parsing required for repeated TilingPatterns. Initially I had intended to include (standard) benchmark results with this patch, however it's not entirely clear that this is actually necessary here given the preliminary results. When testing this manually in the development viewer, using `pdfBug=Stats`, the following (approximate) reduction in rendering times were observed when comparing `master` against this patch: - http://pubs.usgs.gov/sim/3067/pdf/sim3067sheet-2.pdf (from issue 2765): `6800 ms` -> `4100 ms`. - https://github.com/mozilla/pdf.js/files/1046131/stepped.pdf (from issue 8473): `54000 ms` -> `13000 ms` - https://github.com/mozilla/pdf.js/files/1046130/proof.pdf (from issue 8473): `5900 ms` -> `2500 ms` As always, whenever you're dealing with documents which are "slow", there's usually a certain level of subjectivity involved with regards to what's deemed acceptable performance. Hence it's not clear to me that we want to regard any of the referenced issues as fixed, however the improvements are significant enough to warrant caching of TilingPatterns in my opinion.	2020-10-08 18:43:21 +02:00

1 2 3 4 5 ...

1840 Commits