pdf.js

Author	SHA1	Message	Date
Tim van der Meij	a57a4bc6c2	Merge pull request #15018 from Snuffleupagus/issue-15016 Expose `TextLayerRenderTask` in the TypeScript definitions (issue 15016, PR 14013 follow-up)	2022-06-10 22:18:35 +02:00
Jonas Jenwald	e046b811b7	Expose `TextLayerRenderTask` in the TypeScript definitions (issue 15016, PR 14013 follow-up) While `TextLayerRenderTask` apparently makes sense in TypeScript environments, given that it's being returned by the `renderTextLayer`-function in the API, we really don't want to extend the public API by simply exporting the class directly in `src/pdf.js` since it should never be called/initialized manually. Hence we follow the same pattern as in PR 14013, and add some very basic unit-tests to ensure that `renderTextLayer` always returns a `TextLayerRenderTask`-instance as expected.	2022-06-10 22:12:32 +02:00
Jonas Jenwald	9ac4536693	Enable the `unicorn/prefer-at` ESLint plugin rule (PR 15008 follow-up) Please find additional information here: - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/at - https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-at.md	2022-06-09 21:21:19 +02:00
Calixte Denizet	36aae436bf	[editor] Add support for saving newly added Ink	2022-06-08 22:16:01 +02:00
Calixte Denizet	7773b3f5be	[edition] Add support for saving a newly added FreeText	2022-06-08 14:34:09 +02:00
Calixte Denizet	c7afce4210	Support Hangul syllables when searching some text (bug 1771477) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1771477; - hangul contains some syllables which are decomposed when using NFD, hence the text must be correctly shifted in case it contains some of them.	2022-05-28 16:50:03 +02:00
Calixte Denizet	60498c67e4	Display background when printing or saving a text widget (issue #14928 )	2022-05-19 16:41:54 +02:00
Jonas Jenwald	6bcc5b615d	[api-minor] Include line endings in Line/Polyline Annotation-data (issue 14896) Please refer to: - https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2109792 - https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2096489 - https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2096447 Note that we still won't attempt to use the /LE-data when creating fallback appearance streams, as mentioned in PR 13448, since custom line endings aren't common enough to warrant the added complexity. Finally, note that according to the PDF specification we should potentially also take the line endings into account for FreeText Annotations. However, in that case their use is conditional on other parameters that we currently don't support.	2022-05-12 11:08:30 +02:00
Jonas Jenwald	8267fd8a52	Replace the `AnnotationStorage.lastModified`-getter with a proper hash-method The current `lastModified`-getter, which only contains a time-stamp, is a fairly crude way of detecting if the stored data has actually been changed. In particular, when the `getRawValue`-method is used, the `lastModified`-getter doesn't cope with data being modified from the "outside". To fix these issues[1], and to prevent any future bugs in this code, this patch introduces a new `AnnotationStorage.hash`-getter which computes a hash of the currently stored data. To simplify things this re-uses the existing `MurmurHash3_64`-implementation, which required moving that file into the `src/shared/`-folder, since its performance should be good enough here. --- [1] Given how the `AnnotationStorage.lastModified`-getter was used, this would have been limited to printing of forms.	2022-05-04 15:21:30 +02:00
Jonas Jenwald	8135d7ccf6	Merge pull request #14869 from calixteman/14862 [JS] Fix few bugs present in the pdf for issue #14862	2022-05-03 18:31:31 +02:00
Calixte Denizet	094ff38da0	[JS] Fix few bugs present in the pdf for issue #14862 - since resetForm function reset a field value a calculateNow is consequently triggered. But the calculate callback can itself call resetForm, hence an infinite recursive loop. So basically, prevent calculeNow to be triggered by itself. - in Firefox, the letters entered in some fields were duplicated: "AaBb" instead of "AB". It was mainly because beforeInput was triggering a Keystroke which was itself triggering an input value update and then the input event was triggered. So in order to avoid that, beforeInput calls preventDefault and then it's up to the JS to handle the event. - fields have a property valueAsString which returns the value as a string. In the implementation it was wrongly used to store the formatted value of a field (2€ when the user entered 2). So this patch implements correctly valueAsString. - non-rendered fields can be updated in using JS but when they're, they must take some properties in the annotationStorage. It was implemented for field values, but it wasn't for display, colors, ... - it fixes #14862 and #14705.	2022-05-03 15:48:44 +02:00
Jonas Jenwald	df5a4fd0a7	Support encoded dest-strings in /GoTo destination dictionaries (issue 14864) Interestingly enough this appears to be the very first case of encoded dest-strings, in /GoTo destination dictionaries, that we've actually come across. What's really fascinating is that it's less than a week after issue 14847, given that these issues are somewhat similar.	2022-05-02 10:14:32 +02:00
Jonas Jenwald	fbf6dee8ee	[api-minor] Remove the `forceClamped`-functionality in the Streams (issue 14849) As it turns out, most of the code-paths in the `PDFImage`-class won't actually pass the TypedArray (containing the image-data) to the `ColorSpace`-code. Hence we generally don't need to force the image-data to be a `Uint8ClampedArray`, and can just as well directly use a `Uint8Array` instead. In the following cases we're returning the data without any `ColorSpace`-parsing, and the exact TypedArray used shouldn't matter: - `b72a448327/src/core/image.js (L714)` - `b72a448327/src/core/image.js (L751)` In the following cases the image-data is only used internally, and again the exact TypedArray used shouldn't matter: - `b72a448327/src/core/image.js (L762)` with the actual image-data being defined (as `Uint8ClampedArray`) further below - `b72a448327/src/core/image.js (L837)` Please note: This is tagged `api-minor` because it's API-observable, given that some image/mask-data will now be returned as `Uint8Array` rather than using `Uint8ClampedArray` unconditionally. However, that seems like a small price to pay to (slightly) reduce memory usage during image-conversion.	2022-04-29 14:46:30 +02:00
Jonas Jenwald	71370d012b	Support destinations in NameTrees with encoded keys (issue 14847) Initially I considered updating the `NameOrNumberTree`-implementation to handle encoded keys, however that quickly became somewhat messy (especially in the `NameOrNumberTree.get`-method) since only NameTrees using string-keys. Hence the easiest solution, as far as I'm concerned, was thus to just update the `Catalog.destinations`-getter instead. Please note that in the referenced PDF document the `Catalog.destination`-method will thus fallback to fetch all destinations, which should be fine since this is the very first case of encoded keys that we've seen. Also changes the `NameOrNumberTree.getAll`-method to prevent a possible run-time error, although we've so far not seen such a case, for any non-Array Kids-entries found in a NameTree/NumberTree. Finally, to improve overall consistency and to hopefully prevent future bugs, the patch also updates a couple of other `NameTree` call-sites to correctly handle encoded keys. (Note that the `Catalog.attachments`-getter was already doing this.)	2022-04-27 11:19:55 +02:00
Calixte Denizet	040fcae5ab	Improve performance with image masks (bug 857031) - it aims to partially fix performance issue reported: https://bugzilla.mozilla.org/show_bug.cgi?id=857031; - the idea is too avoid to use byte arrays but use ImageBitmap which are a way faster to draw: * an ImageBitmap is Transferable which means that it can be built in the worker instead of in the main thread: - this is achieved in using an OffscreenCanvas when it's available, there is a bug to enable them for pdf.js: https://bugzilla.mozilla.org/show_bug.cgi?id=1763330; - or in using createImageBitmap: in Firefox a task is sent to the main thread to build the bitmap so it's slightly slower than using an OffscreenCanvas. * it's transfered from the worker to the main thread by "reference"; * the byte buffers used to create the image data have a very short lifetime and ergo the memory used is globally less than before. - Use the localImageCache for the mask; - Fix the pdf issue4436r.pdf: it was expected to have a binary stream for the image; - Move the singlePixel trick from operator_list to image: this way we can use this trick even if it isn't in a set as defined in operator_list.	2022-04-09 18:26:26 +02:00
Jonas Jenwald	addb4cb12b	Use `String.prototype.repeat()` in a couple of spots Rather than using a temporary Array to manually create repeated strings, we can use `String.prototype.repeat()` instead. The reason that we didn't use this from the start is most likely because some browsers, notably IE, didn't support this; note https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/repeat#browser_compatibility	2022-03-30 15:42:40 +02:00
Calixte Denizet	ad3fb71a02	[Annotations] Add support for printing/saving choice list with multiple selections - it aims to fix issue #12189.	2022-03-29 18:59:44 +02:00
Calixte Denizet	18e79e3c0b	[text selection] Add the whitespaces present in the pdf in the text chunk - it aims to fix issue #14627; - the basic idea of the recent text refactoring was to only consider the rendered visible whitespaces. But sometimes, the heuristics aren't correct and although some whitespaces are in the text stream they weren't in the text chunks because they were too small. Hence we added some exceptions, for example, we always add a whitespace when it is between two non-whitespace chars but only when in the same Tj. So basically, this patch removes the constraint to have the chars in the same Tj (in using a circular buffer to save the two last chars) but don't add a space when the visible space is really too small (hence `NOT_A_SPACE_FACTOR`).	2022-03-27 14:34:56 +02:00
Jonas Jenwald	849de5a508	Slightly improve validation of (some) parameters in `getDocument` There's a couple of `getDocument` parameters that should be numbers, but which are currently not fully validated to prevent issues elsewhere in the code-base. Also, improves validation of the `ownerDocument` parameter since we currently accept more-or-less anything here.	2022-03-21 13:32:17 +01:00
Calixte Denizet	f0b549c2a2	[JS] - Parse a date in using the given format first and then try the default date parser - it aims to fix #14672.	2022-03-19 16:07:43 +01:00
Jonas Jenwald	c0736647f9	Add general iteration support in the `RefSet` and `RefSetCache` classes This patch removes the existing `forEach` methods, in favor of making the classes properly iterable instead. Given that the classes are using a `Set` respectively a `Map` internally, implementing this is very easy/efficient and allows us to simplify some existing code.	2022-03-18 14:27:34 +01:00
Jonas Jenwald	fb345ee184	Enable the "gets fieldObjects" unit-test in Node.js (PR 14409 follow-up) Apparently this unit-test works in Node.js now, hence it's possible that the reason it didn't work previously is that there were bugs in our old `structuredClone` polyfill.	2022-03-13 10:40:57 +01:00
Tim van der Meij	bcf453cf14	Merge pull request #14656 from Snuffleupagus/mv-isSameOrigin Move the `isSameOrigin` helper function	2022-03-11 21:08:49 +01:00
Jonas Jenwald	537ed37835	Move the `isSameOrigin` helper function This function is currently placed in the `src/shared/util.js` file, which means that the code is duplicated in both of the built `pdf.js` and `pdf.worker.js` files. Furthermore, it only has a single call-site which is also specific to the `GENERIC`-build of the PDF.js library. Hence this helper function is instead moved into the `src/display/api.js` file, in such a way that it's conditionally defined but still can be unit-tested.	2022-03-10 13:51:09 +01:00
Jonas Jenwald	e08e3f4d37	Replace XMLHttpRequest usage with the Fetch API in `send` (in `test/unit/testreporter.js`) Besides converting the `send` function to use the Fetch API, this patch also changes the method to return a `Promise` to get rid of the callback function. (Although, currently there's no call-site passing in a callback function.)	2022-03-10 12:55:08 +01:00
Jonas Jenwald	99cd24ce3e	Remove the `isString` helper function The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls. Note that in the `src/`-folder we already had more `typeof`-cases than `isString`-calls.	2022-02-26 16:33:41 +01:00
Tim van der Meij	cf7ce0aa7e	Merge pull request #14600 from Snuffleupagus/getPageIndex-more-validation [api-minor] Add validation for the `PDFDocumentProxy.getPageIndex` method	2022-02-26 15:30:00 +01:00
Jonas Jenwald	172d007598	[api-minor] Add validation for the `PDFDocumentProxy.getPageIndex` method Currently we'll happily attempt to send any argument passed to this method over to the worker-thread, without doing any sort of validation. That could obviously be quite bad, since there's first of all no protection against sending unclonable data. Secondly, it's also possible to pass data that will cause the `Ref.get` call in the worker-thread to fail immediately. In order to address all of these issues, we'll now properly validate the argument passed to `PDFDocumentProxy.getPageIndex` and when necessary reject already on the main-thread instead.	2022-02-24 12:01:51 +01:00
Jonas Jenwald	2be8036eb7	[api-minor] Reduce duplication in the "gets non-existent page" unit-test	2022-02-24 11:25:21 +01:00
Jonas Jenwald	ec87995050	Ensure that `Cmd`/`Name` is only initialized with string arguments Trying to use a non-string argument in either a `Cmd` or a `Name` is not intended, and would basically be an implementation error. Hence we can add a non-PRODUCTION check to enforce this, similar to the existing one used e.g. in the `Dict.set` method.	2022-02-23 22:39:12 +01:00
Tim van der Meij	2bb96a708c	Merge pull request #14598 from Snuffleupagus/rm-isBool Re-factor the `Catalog.viewerPreferences` method and remove the `isBool` helper function	2022-02-23 20:36:56 +01:00
Jonas Jenwald	3704283f5b	Remove the `isBool` helper function The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls.	2022-02-23 13:31:03 +01:00
Jonas Jenwald	a2f9031e9a	Ensure that `Dict.set` only accepts string `key`s Trying to use a non-string `key` in a `Dict` is not intended, and would basically be an implementation error. Hence we can add a non-PRODUCTION check to enforce this, complementing the existing `value` check added in PR 11672.	2022-02-22 16:35:20 +01:00
Jonas Jenwald	05edd91bdb	Remove the `isNum` helper function The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls. Note that in the `src/`-folder we already had more `typeof`-cases than `isNum`-calls. These changes were mostly done using regular expression search-and-replace, with two exceptions: - In `Font._charToGlyph` we no longer unconditionally update the `width`, since that seems completely unnecessary. - In `PDFDocument.documentInfo`, when parsing custom entries, we now do the `typeof`-check once.	2022-02-22 11:55:34 +01:00
Jonas Jenwald	b282814e38	Prefer `instanceof Name` rather than calling `isName()` with one argument Unless you actually need to check that something is both a `Name` and also of the correct type, using `instanceof Name` directly should be a tiny bit more efficient since it avoids one function call and an unnecessary `undefined` check. This patch uses ESLint to enforce this, since we obviously still want to keep the `isName` helper function for where it makes sense.	2022-02-21 12:45:00 +01:00
Jonas Jenwald	4df82ad31e	Prefer `instanceof Dict` rather than calling `isDict()` with one argument Unless you actually need to check that something is both a `Dict` and also of the correct type, using `instanceof Dict` directly should be a tiny bit more efficient since it avoids one function call and an unnecessary `undefined` check. This patch uses ESLint to enforce this, since we obviously still want to keep the `isDict` helper function for where it makes sense.	2022-02-21 12:44:56 +01:00
Jonas Jenwald	67b658e8d5	Prefer `instanceof Cmd` rather than calling `isCmd()` with one argument Unless you actually need to check that something is both a `Cmd` and also of the correct type, using `instanceof Cmd` directly should be a tiny bit more efficient since it avoids one function call and an unnecessary `undefined` check. This patch uses ESLint to enforce this, since we obviously still want to keep the `isCmd` helper function for where it makes sense.	2022-02-21 12:44:51 +01:00
Jonas Jenwald	2cb2f633ac	Remove the `isRef` helper function This helper function is not really needed, since it's just a wrapper around a simple `instanceof` check, and it only adds unnecessary indirection in the code.	2022-02-19 15:33:42 +01:00
Jonas Jenwald	1a31855977	Remove the `isStream` helper function At this point all the various Stream-classes extends an abstract base-class, hence this helper function is no longer necessary and only adds unnecessary indirection in the code.	2022-02-17 13:51:36 +01:00
Calixte Denizet	18f4e560ae	[Search] Some matches were incorrectly shifted because of some '-\n' - it aims to fix #14562; - 'X-\n' were not correctly positioned; - when X is a diacritic (e.g. in "sä-\n", which is decomposed into "sa¨-\n") we must handle both things: - diacritics on the one hand; - "-\n" on the other hand.	2022-02-14 10:12:33 +01:00
Calixte Denizet	18e3a98c2b	[api-minor] Don't add in the text content the chars which are out-of-page (bug 1755201) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1755201; - if the glyph position is not within the view then skip it.	2022-02-13 21:07:11 +01:00
Jonas Jenwald	0daab88a48	Update two `display_utils` unit-tests to use native functionality rather than the `createObjectURL` helper function Given that most of the code-base is already using native functionality, we can update these unit-tests similarily as well. - For the `blob:`-URL test, we simply use `URL.createObjectURL(...)` and `Blob` directly instead. - For the `data:`-URL test, we simply use `btoa` to do the Base64 encoding and then build the final URL-string.	2022-02-10 12:01:29 +01:00
Brendan Dahl	f8b2a99ddc	Merge pull request #14543 from Snuffleupagus/bug-1753983 Let `Lexer.getNumber` treat a single minus sign as zero (bug 1753983)	2022-02-09 14:06:35 -08:00
Jonas Jenwald	1f0fb270b1	[api-minor] Ensure that the `PDFDocumentLoadingTask`-promise is rejected when cancelling the PasswordPrompt (bug 1754421) This is essentially a continuation of PR 7926, where we added support for rejecting the current `PDFDocumentLoadingTask`-promise by throwing inside of the `onPassword`-callback. Hence the naive way to address [bug 1754421](https://bugzilla.mozilla.org/show_bug.cgi?id=1754421) would be to simply throw in the `onPassword`-callback used in the default viewer. However it unfortunately turns out to not work, since the password input/validation is asynchronous, and we thus need another approach. The simplest solution that I can come up with here, is thus to extend the `onPassword`-callback to also reject the current `PDFDocumentLoadingTask`-instance if an `Error` is explicitly passed as the input to the callback function. (This doesn't feel great, but I cannot see a better solution that isn't really complicated.)	2022-02-09 15:09:20 +01:00
Jonas Jenwald	64f3dbeb48	Let `Lexer.getNumber` treat a single minus sign as zero (bug 1753983) This appears to be consistent with the behaviour in both Adobe Reader and PDFium (in Google Chrome); this is essentially the same approach as used for a single decimal point in PR 9827.	2022-02-07 17:09:47 +01:00
Calixte Denizet	1f41028fcb	Support search with or without diacritics (bug 1508345, bug 916883, bug 1651113) - get original index in using a dichotomic seach instead of a linear one; - normalize the text in using NFD; - convert the query string into a RegExp; - replace whitespaces in the query with \s+; - handle hyphens at eol use to break a word; - add some \s* around punctuation signs	2022-02-03 15:42:55 +01:00
Jonas Jenwald	403baa7bba	[api-minor] Remove the `normalizeWhitespace` option in the `PDFPageProxy.{getTextContent, streamTextContent}` methods (issue 14519, PR 14428 follow-up) With these changes, we'll now always replace all whitespaces with standard spaces (0x20). This behaviour is already, since many years, the default in both the viewer and the browser-tests.	2022-02-03 09:17:22 +01:00
Calixte Denizet	ae842e1c3a	[api-minor] Annotations - Adjust the font size in text field in considering the total width (bug 1721335) - it aims to fix #14502 and bug 1721335; - Acrobat and Pdfium do the same; - it'll avoid to have truncated data when printed; - change the factor to compute font size in using field height: lineHeight = 1.35*fontSize - this is the value used by Acrobat. - in order to not have truncated strings on the bottom, add few basic metrics for standard fonts.	2022-01-30 15:53:31 +01:00
calixteman	9367d54009	Merge pull request #14483 from calixteman/200B Remove the invisible format marks from the text chunks	2022-01-24 17:52:06 +01:00
Calixte Denizet	880ac6037c	Fix scripting test related to keystroke event	2022-01-24 17:04:50 +01:00

1 2 3 4 5 ...

968 Commits