pdf.js

Author	SHA1	Message	Date
Calixte Denizet	46369e4aa5	Fix some issues with lineWidth < 1 after transform (bug 1753075, bug 1743245, bug 1710019) - it aims to fix: - https://bugzilla.mozilla.org/show_bug.cgi?id=1753075; - https://bugzilla.mozilla.org/show_bug.cgi?id=1743245; - https://bugzilla.mozilla.org/show_bug.cgi?id=1710019; - issue #13211; - issue #14521. - previously we were trying to adjust lineWidth to have something correct after the current transform is applied but this approach was not correct because finally the pixel is rescaled with the same factors in both directions. And sometimes those factors must be different (see bug 1753075). - So the idea of this patch is to apply a scale matrix to the current transform just before setting lineWidth and stroking. This scale matrix is computed in order to ensure that after transform, a pixel will have its two thickness greater than 1.	2022-02-25 18:37:34 +01:00
Jonas Jenwald	530af48b8e	Merge pull request #14569 from brendandahl/smask-state Fix canvas state getting out of sync from smasks. (bug 1755507)	2022-02-18 19:35:58 +01:00
Brendan Dahl	7def6d12c8	Fix canvas state getting out of sync from smasks. (bug 1755507) Soft masks can be enabled/disabled at anytime and at different points in the save/restore stack. This can lead to the amount of save/restores becoming unbalanced across the two canvases. Instead of save/restoring on the temporary canvas change it so we only track state on the main (suspended canvas). I was also getting an out balance stack from patterns, so I've also fixed that and added a warning that will at least show up on chrome. It would be nice to add this so Firefox at some point too. Fixes #11328, #14297 and bug 1755507	2022-02-17 17:38:32 -08:00
Jonas Jenwald	fd319e94b3	Add a missing string-check in the `_collectJS` helper function Unfortunately I don't have a test-case that breaks without this change, however the `stringToPDFString` helper function will fail if anything other than a string is passed to it. The changes in this patch thus make this code more-or-less identical to that found in the `Catalog.{_collectJavaScript, parseDestDictionary}` methods.	2022-02-16 13:43:42 +01:00
Calixte Denizet	18e3a98c2b	[api-minor] Don't add in the text content the chars which are out-of-page (bug 1755201) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1755201; - if the glyph position is not within the view then skip it.	2022-02-13 21:07:11 +01:00
Tim van der Meij	c37d785b2a	Merge pull request #14560 from Snuffleupagus/Node-ReadableStream-polyfill [api-minor] Remove the, in `legacy` builds, bundled `ReadableStream` polyfill	2022-02-13 14:08:22 +01:00
Jonas Jenwald	b89595fd20	[api-minor] Remove the, in `legacy` builds, bundled `ReadableStream` polyfill According to the MDN compatibility data, see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#browser_compatibility, all browsers that we support have native `ReadableStream` implementations (since quite some time too). Hence only Node.js is now lagging behind w.r.t. `ReadableStream` support, and its experimental implementation doesn't really help us given the life-span of the LTS releases (see https://en.wikipedia.org/wiki/Node.js#Releases). It seems quite unfortunate to bundle a `ReadableStream` polyfill in the `legacy` builds when it's unnecessary in browsers, given its overall size, but fortunately we can avoid that by simply listing `web-streams-polyfill` as a dependency for the `pdfjs-dist` library.	2022-02-13 10:15:58 +01:00
Jonas Jenwald	d642d34500	Remove the UTF-8 fallback, when `TextDecoder` is missing, from the Content-Disposition parser Given that `TextDecoder` is now supported by all modern browsers/environments, please see https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder#browser_compatibility, there's no longer any good reason to keep a UTF-8 fallback in the Content-Disposition parser.	2022-02-12 10:30:25 +01:00
Jonas Jenwald	b87a243222	[api-minor] Stop exposing the `createObjectURL` helper function in the API With recent changes, specifically PR 14515 and the previous patch, the `createObjectURL` helper function is now only used with the SVG back-end. All other call-sites, throughout the code-base, are now using `URL.createObjectURL(...)` directly and it no longer seems necessary to keep exposing the helper function in the API. Finally, the `createObjectURL` helper function is moved into the `src/display/svg.js` file to avoid unnecessarily duplicating this code on both the main- and worker-threads.	2022-02-10 12:01:35 +01:00
Brendan Dahl	f8b2a99ddc	Merge pull request #14543 from Snuffleupagus/bug-1753983 Let `Lexer.getNumber` treat a single minus sign as zero (bug 1753983)	2022-02-09 14:06:35 -08:00
Jonas Jenwald	1f0fb270b1	[api-minor] Ensure that the `PDFDocumentLoadingTask`-promise is rejected when cancelling the PasswordPrompt (bug 1754421) This is essentially a continuation of PR 7926, where we added support for rejecting the current `PDFDocumentLoadingTask`-promise by throwing inside of the `onPassword`-callback. Hence the naive way to address [bug 1754421](https://bugzilla.mozilla.org/show_bug.cgi?id=1754421) would be to simply throw in the `onPassword`-callback used in the default viewer. However it unfortunately turns out to not work, since the password input/validation is asynchronous, and we thus need another approach. The simplest solution that I can come up with here, is thus to extend the `onPassword`-callback to also reject the current `PDFDocumentLoadingTask`-instance if an `Error` is explicitly passed as the input to the callback function. (This doesn't feel great, but I cannot see a better solution that isn't really complicated.)	2022-02-09 15:09:20 +01:00
Jonas Jenwald	64f3dbeb48	Let `Lexer.getNumber` treat a single minus sign as zero (bug 1753983) This appears to be consistent with the behaviour in both Adobe Reader and PDFium (in Google Chrome); this is essentially the same approach as used for a single decimal point in PR 9827.	2022-02-07 17:09:47 +01:00
Jonas Jenwald	03f5f6a421	[api-minor] Update the minimum supported browser versions Please note that while we "support" some (by now) fairly old browsers, that essentially means that the library (and viewer) will load and that the basic functionality will work as intended.[1] However, in older browsers, some functionality may not be available and generally we'll ask users to update to a modern browser when bugs (specific to old browsers) are reported.[2] There's always a question of just how old browsers the PDF.js contributors can realistically support, and here I'm suggesting that we place the cut-off point at approximately three years. With that in mind, this patch updates the minimum supported browsers (and environments) as follows: - Chrome 73, which was released on 2019-03-12; see https://en.wikipedia.org/wiki/Google_Chrome_version_history - Firefox ESR (as before); see https://wiki.mozilla.org/Release_Management/Calendar - Safari 12.1, which was released on 2019-03-25; see https://en.wikipedia.org/wiki/Safari_version_history#Safari_12 - Node.js 12, which was release on 2019-04-23 (and will soon reach EOL); see https://en.wikipedia.org/wiki/Node.js#Releases --- [1] Assuming a `legacy`-build is being used, of course. [2] In general it's never a good idea to use an old/outdated browser, since those may contain known security vulnerabilities.	2022-02-06 13:06:43 +01:00
Jonas Jenwald	403baa7bba	[api-minor] Remove the `normalizeWhitespace` option in the `PDFPageProxy.{getTextContent, streamTextContent}` methods (issue 14519, PR 14428 follow-up) With these changes, we'll now always replace all whitespaces with standard spaces (0x20). This behaviour is already, since many years, the default in both the viewer and the browser-tests.	2022-02-03 09:17:22 +01:00
calixteman	7a034706ba	Merge pull request #14510 from calixteman/14502 [api-minor] Annotations - Adjust the font size in text field in considering the total width (bug 1721335)	2022-01-30 15:58:51 +01:00
Calixte Denizet	ae842e1c3a	[api-minor] Annotations - Adjust the font size in text field in considering the total width (bug 1721335) - it aims to fix #14502 and bug 1721335; - Acrobat and Pdfium do the same; - it'll avoid to have truncated data when printed; - change the factor to compute font size in using field height: lineHeight = 1.35*fontSize - this is the value used by Acrobat. - in order to not have truncated strings on the bottom, add few basic metrics for standard fonts.	2022-01-30 15:53:31 +01:00
Jonas Jenwald	7cc761a8c0	Polyfill `structuredClone` with core-js (PR 13948 follow-up) This allows us to remove the manually implemented `structuredClone` polyfill, thus reducing the maintenance burden for the `LoopbackPort` class; refer to https://github.com/zloirock/core-js#structuredclone Please note: While `structuredClone` support landed already in Firefox 94, Google Chrome only added it in version 98 (currently in Beta). However, given that the `LoopbackPort` will only be used together with fake workers in browsers this shouldn't be too much of a problem.[1] For Node.js environments, where fake workers are unfortunately necessary, using a `legacy/`-build is already required which thus guarantees that the `structuredClone` polyfill is available. Also, the patch updates core-js to the latest version since that one includes `structuredClone` improvements; please see https://github.com/zloirock/core-js/releases/tag/v3.20.3 --- [1] Given that we only support browsers with proper worker support, if fake workers are being used that essentially indicates a configuration problem/error.	2022-01-27 21:11:42 +01:00
Jonas Jenwald	8f6965b197	Merge pull request #14506 from Snuffleupagus/license_header_2022 Update the year in the `license_header` files	2022-01-27 19:34:56 +01:00
Jonas Jenwald	00bd549e82	Update the year in the `license_header` files This also includes a couple of files that are included as-is in the `pdfjs-dist` library.	2022-01-27 19:24:31 +01:00
calixteman	838909f8c1	Merge pull request #14491 from quaoaris/lines-rendered-too-thick fix for lines (stroke) are rendered too thick (Bug 1743245)	2022-01-27 18:46:26 +01:00
Calixte Denizet	3a7004ca25	Take into account all rotations before comparing glyph positions - it aims to fix #14497; - previously, only rotations with an angle 0, 90, 180 or 270 were taken into account; - so generalize to any angle but keep the fast path for 0, 90, ... because they're likely more common than anything else.	2022-01-26 17:19:00 +01:00
quaoaris	3f77d80f31	fix for lines (stroke) are rendered too thick (Bug 1743245) This commit fixes Bug 1743245 (Grided PDF file lines rendered too thick) which was created by a fix for #12868 . The lineWidth was set to round(1 * this._combinedScaleFactor) when the pixel is drawn as a parallelorgam with a height <1. This fix changes this to floor(1*this._combinedScaleFactor) . This change shows a visual result comparable to Chrome and Acrobat. Regarding the last PR 3 statements in canvas.js are affected and will change with this commit (stroke and paintChar). renaming the reference files to naming comvention	2022-01-25 10:27:30 +01:00
Jonas Jenwald	8836593b9e	Add a (global) cache to the `getCharUnicodeCategory` function Given that the regular expression has already become more complex (after the initial patch adding it), it seems to me that it probably cannot hurt to add a global cache to reduce unnecessary re-parsing. Obviously the `Glyph`-instances are being cached per font, however in most documents multiple fonts are being used and in practice there's very often a fair amount of overlap between the /ToUnicode-data in different fonts[1]. Consider for example loading and rendering the entire `tracemonkey.pdf` document (from the test-suite), which isn't a particularily large document. In that case the `getCharUnicodeCategory` function is being called a total of `601` times, however there's only `106` unique unicode-chars being checked. Please note: In practice I suppose that this won't have a huge effect on overall performance, however given the relative simplicity of this patch I figured that it'd not hurt to submit it for review. --- [1] Consider e.g. how there's usually different fonts used for regular, bold, respectively italic text.	2022-01-25 09:59:34 +01:00
Calixte Denizet	e1d3a3b414	Remove the invisible format marks from the text chunks - it aims to fix issue #9186.	2022-01-24 13:47:24 +01:00
calixteman	88236e1163	Merge pull request #14430 from calixteman/beforeinput [JS] Use beforeinput event to trigger a keystroke event in the sandbox	2022-01-23 20:42:33 +01:00
Calixte Denizet	6ac296e48e	[JS] Use beforeinput event to trigger a keystroke event in the sandbox - it aims to fix issue #14307; - this event has been added recently in Firefox and we can now use it; - fix few bugs in aform.js or in annotation_layer.js; - add some integration tests to test keystroke events (see `AFSpecial_Keystroke`); - make dispatchEvent in the quickjs sandbox async.	2022-01-23 19:53:01 +01:00
Tim van der Meij	23b6fde9fc	Merge pull request #14464 from Snuffleupagus/issue-14462 Support Type1 font files with incomplete /CharStrings definitions (issue 14462)	2022-01-19 20:38:46 +01:00
calixteman	b0231cc887	Merge pull request #14456 from calixteman/1749563 Font renderer - get int8 instead of uint8 in composite glyphes (bug 1749563)	2022-01-19 01:20:49 -08:00
Calixte Denizet	74f25d2755	Font renderer - get int8 instead of uint8 in composite glyphes (bug 1749563) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1749563; - use some helper functions to get (u\|i)int** values in buffer: it helps to have a clearer code; - in composite glyphes the translations values with a transformations are signed so consequently get some int8 instead of uint8; - add few TODOs.	2022-01-18 22:06:23 +01:00
Jonas Jenwald	a13ae5d97d	Support Type1 font files with incomplete /CharStrings definitions (issue 14462) Please refer to https://www.pdfa.org/norm-refs/Type1Fonts.pdf#page=15 for the expected format for the /CharStrings entries. In the referenced PDF document the /CharStrings are missing the expected end-token, which causes us to swallow the start of the next glyph name.	2022-01-17 18:55:22 +01:00
Jonas Jenwald	ba37d600d7	Make the `normalizeWhitespace` handling, in the `PartialEvaluator`, more efficient (PR 14428 follow-up) After the changes in PR 14428 we can directly, and more efficiently, handle whitespace conversion in `PartialEvaluator.getTextContent` when the `normalizeWhitespace` option is being used. This way we no longer need a separate helper function for this, and can avoid having to (again) iterate through the text and checking each character. Finally, this also removes the need for using a regular expression on e.g. all non-ASCII text.	2022-01-16 08:29:21 +01:00
calixteman	da953f4b64	Merge pull request #14428 from calixteman/typo Use the correct dimension to know if we have to add an EOL in vertical mode	2022-01-15 12:47:10 -08:00
Calixte Denizet	9dae421a0d	Handle all the whitespaces the same way when creating text chunks	2022-01-15 21:44:00 +01:00
Tim van der Meij	922dac035c	Merge pull request #14448 from Snuffleupagus/Type3-circular-refs Prevent circular references in Type3 fonts	2022-01-15 14:11:47 +01:00
Tim van der Meij	a72d188599	Merge pull request #14439 from Snuffleupagus/issue-14438 Ignore Annotations with empty /Rect-entries in the display-layer (issue 14438)	2022-01-15 14:11:25 +01:00
Tim van der Meij	c0d2932faf	Merge pull request #14454 from Snuffleupagus/util-more-unreachable Replace some `assert` usage with `unreachable` in the `src/shared/util.js` file	2022-01-15 13:52:10 +01:00
Tim van der Meij	625f829842	Merge pull request #14446 from Snuffleupagus/issue-14435 Expose even more API-functionality in the TypeScript definitions (issue 14435, PR 14013 follow-up)	2022-01-15 13:46:11 +01:00
Jonas Jenwald	0e1b93bf20	Replace some `assert` usage with `unreachable` in the `src/shared/util.js` file Inlining the checks should be a tiny bit more efficient, since it avoids have to make unconditional function calls in these fairly commonly used helper functions.	2022-01-15 13:01:25 +01:00
Jonas Jenwald	12d8f0b64d	Re-factor the `stringToPDFString` helper function for UTF-16 strings This patch changes the function to instead utilize the `TextDecoder` for both kinds of UTF-16 BOM strings.	2022-01-14 20:38:40 +01:00
Jonas Jenwald	76444888fb	Add (basic) UTF-8 support in the `stringToPDFString` helper function (issue 14449) This patch implements this by looking for the UTF-8 BOM, i.e. `\xEF\xBB\xBF`, in order to determine the encoding.[1] The actual conversion is done using the `TextDecoder` interface, which should be available in all environments/browsers that we support; please see https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder#browser_compatibility --- [1] Assuming that everything lacking a UTF-16 BOM would have to be UTF-8 encoded really doesn't seem correct.	2022-01-14 18:57:07 +01:00
Jonas Jenwald	53d4ee7990	Prevent circular references in Type3 fonts In corrupt PDF documents Type3 fonts may introduce circular dependencies, thus resulting in the affected font(s) never loading and parsing/rendering never completing. Note that I've not seen any real-world examples of this kind of font corruption, but the attached PDF document was rather found in https://github.com/pdf-association/safedocs/tree/main/Miscellaneous%20Targeted%20Test%20PDFs Please note: That repository contains a number of reduced test-cases that are specifically intended to test interoperability (between PDF viewer) and parsing/rendering for various kinds of strange/corrupt PDF documents. Some of the test-cases found there may thus not make sense to try and "fix" upfront, in my opinion, unless the problems are also found in real-world PDF documents.	2022-01-13 17:58:37 +01:00
Jonas Jenwald	b9849e38b8	Expose even more API-functionality in the TypeScript definitions (issue 14435, PR 14013 follow-up) While `PageViewport` apparently makes sense in TypeScript environments, given that it's being returned by the `PDFPageProxy.getViewport`-method in the API, we really don't want to extend the public API by simply exporting the class directly in `src/pdf.js` since it should never be called/initialized manually. Hence we follow the same pattern as in PR 14013, and also extend the API unit-tests to ensure that `PDFPageProxy.getViewport` always returns a `PageViewport`-instance as expected.	2022-01-13 12:05:40 +01:00
Jonas Jenwald	08d88a0235	Ignore Annotations with empty /Rect-entries in the display-layer (issue 14438) This prevents the `BaseSVGFactory.create`-method from throwing, and thus preventing any remaining Annotations (on the page) from rendering in corrupt documents.	2022-01-11 13:54:35 +01:00
Tim van der Meij	8ac0ccc227	Merge pull request #14424 from Snuffleupagus/mv-addLinkAttributes [api-minor] Move `addLinkAttributes`, `LinkTarget`, and `removeNullCharacters` into the viewer (PR 14092 follow-up)	2022-01-08 13:19:11 +01:00
Calixte Denizet	6369617e6f	[JS] Fix few errors around AFSpecial_Keystroke - @cincodenada found some errors which are fixed in this patch; - it partially fixes issue #14306; - add some tests.	2022-01-08 12:34:56 +01:00
Calixte Denizet	9bb636402a	Use the correct dimension to know if we have to add an EOL in vertical mode	2022-01-07 15:19:03 +01:00
Jonas Jenwald	7b8794b37e	[api-minor] Move `removeNullCharacters` into the viewer This helper function has never been used in e.g. the worker-thread, hence its placement in `src/shared/util.js` led to a small amount of unnecessary duplication. After the previous patches this helper function is now only used in the viewer, hence it no longer seems necessary to expose it through the official API. Please note: It seems somewhat unlikely that third-party users were relying directly on this helper function, which is why it's not being exported as part of the viewer components. (If necessary, we can always change this later on.)	2022-01-06 12:25:33 +01:00
Jonas Jenwald	2d2b6463b8	[api-minor] Move `addLinkAttributes` and `LinkTarget` into the viewer As part of the changes/improvement in PR 14092, we're no longer using the `addLinkAttributes` directly in e.g. the AnnotationLayer-code. Given that the helper function is now only used in the viewer, hence it no longer seems necessary to expose it through the official API. Please note: It seems somewhat unlikely that third-party users were relying directly on the helper function, which is why it's not being exported as part of the viewer components. (If necessary, we can always change this later on.)	2022-01-06 12:25:33 +01:00
Calixte Denizet	6cdae5ac4d	Use positive dimensions for text chunks in the text layer (issue #14415 ).	2022-01-05 10:49:56 +01:00
Jonas Jenwald	b0e774d9c5	Convert `Catalog.getAllPageDicts` to an `async` method The patch in PR 14335 essentially re-introduced the old code from before PR 3848, however looking at this code a bit closer it should be possible to simplify it by making the method asynchronous. While this method is currently only used as a fallback in corrupt documents, the way that `MissingDataException`s are handled is less than ideal. Note that if a `MissingDataException` is thrown, we're forced to re-parse the entire /Pages tree[1]. With this method now being asynchronous, we're able to handle fetching of References in a much easier/nicer way than before without having to throw `MissingDataException`s and re-parse anything. These changes also let us simplify the call-site slightly, by calling the method directly instead of using the `PDFManager`-instance (since again it will no longer throw `MissingDataException`s). Furthermore, this patch contains the following other changes: - Reduce unnecessary duplication in the various `catch` handlers throughout the method, by simply moving the `XRefEntryException` handling into the `addPageError` helper function instead. - Move the "circular references"-check to occur slightly earlier, since there's obviously no point in asynchronously fetching data just to then throw an Error immediately afterwards. --- [1] Imagine e.g. a thousand page document, where there's a `MissingDataException` thrown when fetching/parsing page 900.	2021-12-31 22:03:10 +01:00

1 2 3 4 5 ...

5147 Commits