pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	cc3a6563ee	Move the Metadata parsing to the worker-thread The only reason, as far as I can tell, for parsing the Metadata on the main-thread is how it was originally implemented. When Metadata support was first implemented, it utilized the [`DOMParser`](https://developer.mozilla.org/en-US/docs/Web/API/DOMParser) which isn't available in workers. Today, with the custom XML-parser being used, that's no longer an issue and it seems reasonable to move the Metadata parsing to the worker-thread[1], since that's where all parsing should happen (for performance reasons). Based on these changes, we'll be able to reduce the now unnecessary duplication of the XML-parser (and related code) in both of the built `pdf.js`/`pdf.worker.js` files. Finally, this patch changes the `_repair` method to use "Array + join" rather than string concatenation. --- [1] This needed the previous patch, to enable sending of `Map`s between threads with workers disabled.	2021-02-17 13:12:01 +01:00
Calixte Denizet	ccef734ebb	Remove Promise.all and async+done from unit/scripting_spec	2021-02-17 11:19:39 +01:00
Calixte Denizet	82f75a8ac2	JS -- Fix doc.getField and add missing field methods - getField("foo") was wrongly returning a field named "foobar"; - field object had few missing unimplemented methods	2021-02-17 10:42:52 +01:00
Tim van der Meij	bab059d8fd	Merge pull request #12964 from calixteman/12963 Avoid infinite loop when getting annotation field name	2021-02-16 22:36:24 +01:00
Calixte Denizet	0fc8267576	Avoid infinite loop when getting annotation field name - aims to fix issue #12963; - use a Set to track already visited objects; - remove the loop limit in getInheritableProperty and use a RefSet too.	2021-02-14 19:58:19 +01:00
Jonas Jenwald	b26c7974fe	[api-minor] Change the `dc:subject` Metadata field to an Array This patch simply extends the existing handling of the `dc:creator` field, which should hopefully suffice here; please refer to https://wwwimages2.adobe.com/content/dam/acom/en/devnet/xmp/pdfs/XMP%20SDK%20Release%20cc-2016-08/XMPSpecificationPart1.pdf#page=34	2021-02-14 17:16:40 +01:00
Calixte Denizet	ea06bb0e36	[api-minor] Annotation -- Don't compute appearance when nothing has changed * don't set a value in annotationStorage by default: - having an undefined when the annotation is rendered for saving/printing means nothing has changed so use normal appearance - aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1681687 * change the way to compute font size when this one is null in DA: - make fontSize proportional to line height - in multiline case, take into account the number of lines for text entered to adapt the font size	2021-02-12 19:27:21 +01:00
calixteman	a8021208ea	Restore window.alert after use in scripting test (#12987 )	2021-02-12 14:19:58 +01:00
dhufnagel	fc925827b2	fix initial state of checkboxes in display layer (#12904 ) consider the export value when multiple checkboxes have the same name	2021-02-12 11:22:54 +01:00
Jonas Jenwald	4733f163e8	Replace a few `new Date().getTime()` instances with `Date.now()` The former format is not only more verbose, but it's also slightly less efficient since it creates a new `Date` object.	2021-02-11 23:00:42 +01:00
calixteman	0479deef4e	XFA -- Add other objects (#12949 ) - connectionSet: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=969 - datasets: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=1038 - signature: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=1040 - stylesheet: the same - xhtml: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=1187	2021-02-11 12:30:37 +01:00
Jonas Jenwald	0068dba009	[api-minor] Rename `-es5` to `-legacy`, to reduce confusion over what's actually supported (issue 12976) Please note that this will also require some edits of the Wiki.	2021-02-10 16:01:59 +01:00
Jonas Jenwald	d3e65f24e3	Request all data, rather than throwing, when encountering general errors in `ObjectLoader._walk` (issue 9462, PR 3289 follow-up) As far as I can tell, this has been broken ever since PR 3289 (back in 2013) without anyone noticing. For any non-`MissingDataException` errors encountered in `ObjectLoader._walk`, we're simply throwing immediately which thus has the potential to completely break rendering of an entire page. In practice this is obviously only an issue for PDF documents which are in one way or another corrupt, since that's the only way that `XRef.fetch` will throw non-`MissingDataException` errors. To make matters worse these errors are intermittent, since they can only occur if the document is still loading when the `ObjectLoader`-code runs (note the early return in `ObjectLoader.load`). Please note that we cannot simply catch the error and let "normal" parsing continue in `ObjectLoader._walk`, since that could lead to errors elsewhere given that resources "below" the current one (in the graph) might not be checked as intended then. All-in-all, the only way to make absolutely sure that we won't cause unexpected `MissingDataException`s somewhere else in the code-base is to fallback to fetching the entire document in this edge-case.	2021-02-06 14:33:50 +01:00
Calixte Denizet	652ff57897	XFA -- Add template object - Specifications: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=596	2021-02-03 21:05:10 +01:00
Jonas Jenwald	cacb1cc7ba	Re-enable the `issue6961` test-case (issue 7112)	2021-02-02 10:31:16 +01:00
Calixte Denizet	0ff5cd7eb5	XFA - Add a parser for XFA files - the parser is base on a class extending XMLParserBase - it handle xml namespaces: * each namespace is assocated with a builder * builder builds nodes belonging to the namespace * when a node is inserted in the parent namespace compatibility is checked (if required) - to avoid name collision between xml names and object properties, use Symbol.	2021-02-01 13:45:31 +01:00
Tim van der Meij	286271152f	Merge pull request #12910 from calixteman/bidi Add back dir property in spans in text layer	2021-01-27 22:09:00 +01:00
Tim van der Meij	639437d287	Merge pull request #12911 from calixteman/reg_test Fix text layer regression tests in using the correct line-height property	2021-01-26 23:45:45 +01:00
Calixte Denizet	539256c351	Add back dir property in spans in text layer - aims to fix #12909	2021-01-26 12:00:05 +01:00
calixteman	a3f6882b06	JS -- add support for choice widget (#12826 )	2021-01-25 23:40:57 +01:00
Calixte Denizet	52641e8643	Fix text layer regression tests in using the correct line-height property	2021-01-25 23:01:07 +01:00
Tim van der Meij	f2c7338b02	Merge pull request #12897 from calixteman/12895 JS - Fix mouse event names	2021-01-24 12:28:24 +01:00
Calixte Denizet	34d2e72df2	JS - Fix mouse event names - fix issue #12895	2021-01-23 20:26:22 +01:00
Tim van der Meij	d4c4f5d4e5	Merge pull request #12870 from Snuffleupagus/page-advance Add previous/next-page functionality that takes scroll/spread-modes into account (issue 11946)	2021-01-23 19:35:08 +01:00
Tim van der Meij	25b84ce84c	Merge pull request #12828 from dhufnagel/feature/annotation_layer_display_fontsize [api-minor] Set font size and color for text widget annotations	2021-01-23 16:08:07 +01:00
Jonas Jenwald	ef1d33a29e	Use slightly less verbose font-names in the "Default appearance" unit-tests The new names are not only less verbose, but also uses a very common PDF font-naming convention.	2021-01-23 15:34:22 +01:00
Jonas Jenwald	6bcb4e3ad9	Ensure that `parseDefaultAppearance` won't attempt to access a not yet defined variable (PR 12831 follow-up) Note how, in the `if (this.stateManager.stateStack.length !== 0) {` branch, we're attempting to access the not yet defined variable[1] `args`. If this code-path is ever hit, an Error will be thrown and parsing will thus be aborted immediately (likely leading to e.g. rendering bugs). Note that I found this purely by accident, since I happened to glance at the LGTM report. However, I've since found that the error is also present during the unit-test[2] and with this patch we're actually testing the intended thing here. As part of fixing this, and to avoid re-introducing a similar bug in the future, we'll now instead always reset `args.length` before attempting to read the next operator. Also, we can use the existing `EvaluatorPreprocessor.savedStatesDepth` getter to simplify the save/restore detection a tiny bit. --- [1] The ESLint rule `no-use-before-define` would have helped catch this problem, but unfortunately we cannot enable that without quite a bit of refactoring all over the code-base. [2] The unit-test was updated such that it would fail in the `master`-branch.	2021-01-23 15:33:28 +01:00
Dominik Hufnagel	c5083cda02	set font size and color on annotation layer use the default appearance to set the font size and color of a text annotation widget	2021-01-22 23:12:14 +01:00
Jonas Jenwald	a2b592f4a2	Add previous/next-page functionality that takes scroll/spread-modes into account (issue 11946) - For wrapped scrolling, we unfortunately need to do a fair bit of parsing of the current page layout. Compared to e.g. the spread-modes, where we can easily tell how the pages are laid out, with wrapped scrolling we cannot tell without actually checking. In particular documents with varying page sizes require some care, since we need to check all pages on the "row" of the current page are visible and that there aren't any "holes" present. Otherwise, in the general case, there's a risk that we'd skip over pages if we'd simply always advance to the previous/next "row" in wrapped scrolling. - For horizontal scrolling, this patch simply maintains the current behaviour of advancing one page at a time. The reason for this is to prevent inconsistent behaviour for the next and previous cases, since those cannot be handled identically. For the next-case, it'd obviously be simple to advance to the first not completely visible page. However for the previous-case, we'd only be able to go back one page since it's not possible to (easily) determine the page layout of non-visible pages (documents with varying page sizes being a particular issue). - For vertical scrolling, this patch maintains the current behaviour by default. When spread-modes are being used, we'll now attempt to advance to the next spread, rather than just the next page, whenever possible. To prevent skipping over a page, this two-page advance will only apply when both pages of the current spread are visible (to avoid breaking documents with varying page sizes) and when the second page in the current spread is fully visible horizontally (to handle larger zoom values). In order to reduce the performance impact of these changes, note that the previous/next-functionality will only call `getVisibleElements` for the scroll/spread-modes where that's necessary and that "normal" vertical scrolling is thus unaffected by these changes. To support these changes, the `getVisibleElements` helper function will now also include the `widthPercent` in addition to the existing `percent` property. The `PDFViewer._updateHelper` method is changed slightly w.r.t. updating the `currentPageNumber` for the non-vertical/spread modes, i.e. won't affect "normal" vertical scrolling, since that helped simplify the overall calculation of the page advance. Finally, these new `BaseViewer` methods also allow (some) simplification of previous/next-page functionality in various viewer components. Please note: There's one thing that this patch does not attempt to change, namely disabling of the previous/next toolbarButtons respectively the firstPage/lastPage secondaryToolbarButtons. The reason for this is that doing so would add quite a bit of complexity in general, and if for some reason `BaseViewer._getPageAdvance` would get things wrong we could end up incorrectly disabling the buttons. Hence it seemed overall safer to not touch this, and accept that the buttons won't be `disabled` despite in some edge-cases no further scrolling being possible.	2021-01-22 21:38:15 +01:00
Jonas Jenwald	4db7330677	Enable ESLint rules that no longer need to be disabled on a directory/file-basis Given that browsers/environments without native support for both arrow functions and object shorthand properties are no longer supported in PDF.js, please refer to the compatibility information below, we can now enable a fair number of ESLint rules and also simplify/remove some `.eslintrc` files. With the exception of the `no-alert` cases, all code changes were made automatically by using `gulp lint --fix`. - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/Arrow_functions#browser_compatibility - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Object_initializer#browser_compatibility	2021-01-22 17:47:03 +01:00
Brendan Dahl	2cba290361	Merge pull request #12836 from calixteman/update_buttons JS -- update radio/checkbox values even if there are no actions	2021-01-21 14:00:26 -08:00
calixteman	1039698697	Add a parser to get font data from the default appearance (#12831 ) * Add a parser to get font data from the default appearance - pdfium & poppler use a special parser too to get these info. * Update src/core/default_appearance.js Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com> Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>	2021-01-21 20:15:31 +01:00
Brendan Dahl	f45ba02fd3	Merge pull request #12850 from calixteman/missing_cstes JS -- Add few missing constants in global scope	2021-01-20 11:33:02 -08:00
Calixte Denizet	0d1b19632d	Enforce linewidth to 1px when at least one of scale factor is lower than 1	2021-01-15 13:18:24 +01:00
Jonas Jenwald	cf7eb87934	Remove a duplicated reference test (PR 12812 follow-up) - Remove a duplicated reference test, see "issue12810", from the manifest. - Improve the spelling in a couple of comments in `src/core/canvas.js`, most notable of the word "parallelogram". - Update a comment, also in `src/core/canvas.js`, to actually agree with the value used to reduce confusion when reading the code.	2021-01-15 10:57:15 +01:00
Brendan Dahl	6619f1f3f2	Merge pull request #12812 from calixteman/too_thin Enforce line width to be at least 1px after applied transform	2021-01-14 15:21:44 -08:00
Jonas Jenwald	2600e59acb	Always re-measure non-embedded ArialNarrow fonts (bug 1671312, PR 12725 follow-up) While PR 12725 fixed bug 1671312 as reported, i.e. the "In the upper right corner "Purposes' has bad kerning."-part, it however broke other parts of the text rendering. Note in particular the tables, e.g. on page 2 and beyond, where the glyphs are now rendered too close together. The reason for this is that the fonts in question are non-embedded ArialNarrow, which we just replace with Helvetica which obviously is not narrow. Given that the font replacement isn't a perfect fit for non-embedded ArialNarrow, we still need to re-measure the glyph widths in this case.	2021-01-14 15:51:48 +01:00
Ross Johnson	6dae2677d5	[api-minor] Highlight search results correctly for normalized text (PR 9448) This patch is a rebased and refactored version of PR 9448, such that it applies cleanly given that `PDFFindController` has changed since that PR was opened; obviously keeping the original author information intact. This patch will thus ensure that e.g. fractions, and other things that we normalize before searching, will still be highlighted correctly in the textLayer. Furthermore, this patch also adds basic unit-tests for this functionality. Note: The `[api-minor]` tag is added, since third-party implementations of the `PDFFindController` must now always use the `pageMatchesLength` property to get accurate length information (see the `web/text_layer_builder.js` changes). Co-authored-by: Ross Johnson <ross@mazira.com> Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>	2021-01-12 18:08:08 +01:00
calixteman	1de1ae0be6	Merge pull request #12838 from calixteman/authors [api-minor] Change the "dc:creator" Metadata field to an Array	2021-01-12 02:44:58 -08:00
Calixte Denizet	43d5512f5c	[api-minor] Change the "dc:creator" Metadata field to an Array - add scripting support for doc.info.authors - doc.info.metadata is the raw string with xml code	2021-01-11 21:34:07 +01:00
Calixte Denizet	8e6bec6e2e	JS -- Add few missing constants in global scope - these constants are available in pdfium implementation too - fix error code in aform.js	2021-01-11 17:19:28 +01:00
Calixte Denizet	b3dccd66ab	Enforce line width to be at least 1px after applied transform * add a comment to explain how minimal linewidth is computed. * when context.linewidth < 1 after transform, firefox and chrome don't render in the same way (issue #12810). * set lineWidth to 1 after transform and before stroking - aims fix issue #12295 - a pixel can be transformed into a rectangle with both heights < 1. A single rescale leads to a rectangle with dim equals to 1 and the other to something greater than 1. * change the way to render rectangle with null dimensions: - right now we rely on the lineWidth set before "re" but it can be set after "re" and before "S" and in this case the rendering will be wrong. - render such rectangles as a single line.	2021-01-10 18:02:12 +01:00
Jonas Jenwald	246a6f9d13	Enable the Stylelint `length-zero-no-unit` rule Note that these changes were done automatically, using `gulp lint --fix`. With this rule, we'll thus enforce a consistent formatting of zero-lengths in our CSS files. Please find additional details about the Stylelint rule at https://stylelint.io/user-guide/rules/length-zero-no-unit	2021-01-10 14:09:36 +01:00
Tim van der Meij	f85b8721d1	Merge pull request #12842 from Snuffleupagus/issue-12841 Improve handling of JPEG images without an EOI marker (issue 12841)	2021-01-10 13:21:28 +01:00
Tim van der Meij	699d65eb1c	Merge pull request #12840 from Snuffleupagus/sort-exports Use ESLint to ensure that `export`s are sorted alphabetically	2021-01-10 13:17:28 +01:00
Jonas Jenwald	66b2c19368	Fix broken "issue12394" test-case This test-case is currently broken, with the reference image being completely empty, since it uses the old "annotationStorage" format in the manifest.	2021-01-09 21:42:56 +01:00
Jonas Jenwald	81525fd446	Use ESLint to ensure that `export`s are sorted alphabetically There's built-in ESLint rule, see `sort-imports`, to ensure that all `import`-statements are sorted alphabetically, since that often helps with readability. Unfortunately there's no corresponding rule to sort `export`-statements alphabetically, however there's an ESLint plugin which does this; please see https://www.npmjs.com/package/eslint-plugin-sort-exports The only downside here is that it's not automatically fixable, but the re-ordering is a one-time "cost" and the plugin will help maintain a consistent ordering of `export`-statements in the future. Note: To reduce the possibility of introducing any errors here, the re-ordering was done by simply selecting the relevant lines and then using the built-in sort-functionality of my editor.	2021-01-09 20:37:51 +01:00
Jonas Jenwald	cd9422a075	Improve handling of JPEG images without an EOI marker (issue 12841) Given that the PDF document in the issue contains the same very large JPEG image three times, this patch includes a test-case where only the first page has been extracted from it.	2021-01-09 20:19:39 +01:00
Tim van der Meij	c0a6d6cd21	Merge pull request #12394 from calixteman/appearance In a text widget, Font resources can be in the appearance	2021-01-08 21:03:41 +01:00
Jonas Jenwald	941b65f683	Remove unncessary `CanvasFactory`/`CMapReaderFactory`/`FileReaderFactory` duplication in unit-tests Given that the API will now, after PR 12039, automatically pick the correct factories to use depending on the environment (browser vs. Node.js), we can utilize that in the unit-tests as well. This way we don't have to manually repeat the same initialization code in multiple unit-tests. Note: The official PDF.js API is defined in `src/pdf.js`, hence the new exports in `src/display/api.js` will not affect that. Also, updates the unit-test `FileReaderFactory` helpers similarily. Drive-by change: Fix the `CMapReaderFactory` usage in the annotation unit-tests, since the cache should only contain raw data and not a Promise. While this obviously works as-is, having unit-tests that "abuse" the intended data format can easily lead to unnecessary failures if changes are made to the relevant `src/core/` code.	2021-01-08 17:33:59 +01:00

1 2 3 4 5 ...

2293 Commits