pdf.js

Author	SHA1	Message	Date
Calixte Denizet	19ca41896e	Correctly clip the text in the text layer (fixes #16316 )	2023-04-18 17:00:42 +02:00
Calixte Denizet	117bbf7cd9	[api-minor] Don't normalize the text used in the text layer. Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized when creating the search query. So to avoid to duplicate the normalization code, everything is moved in the find controller. The previous code to normalize text was using NFKC but with a hardcoded map, hence it has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size by 30kb). In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into account some RTL unicode ranges, the generated font wasn't embedding the mapping this char and the unicode ranges in the OS/2 table weren't up-to-date. When normalized some chars can be replaced by several ones and it induced to have some extra chars in the text layer. To avoid any regression, when copying some text from the text layer, a copied string is normalized (NFKC) before being put in the clipboard (it works like this in either Acrobat or Chrome).	2023-04-17 14:31:23 +02:00
Calixte Denizet	8e5f4c0622	[Editor] Take into account the initial rotation (issue #16278 )	2023-04-16 21:36:26 +02:00
Jonas Jenwald	3a36a9d337	Merge pull request #16268 from Snuffleupagus/RegionalImageCache Attempt to also cache images at the "page"-level (issue 16263)	2023-04-11 12:06:29 +02:00
Jonas Jenwald	9881dbf927	Attempt to also cache images at the "page"-level (issue 16263) Currently we have two separate image-caches on the worker-thread: - A local one, which is unique to each `PartialEvaluator.getOperatorList` invocation. This one caches both names and references, since image-resources may be accessed in either way. - A global one, which applies to the entire PDF documents and all its pages. This one only caches references, since nothing else would work. This patch introduces a third image-cache, which essentially sits "between" the two existing ones. The new `RegionalImageCache`[1] will be usable throughout a `PartialEvaluator` instance, and consequently it only caches references, which thus allows us to keep track of repeated image-resources found in e.g. different /Form and /SMask objects. --- [1] For lack of a better word, since naming things is hard...	2023-04-10 11:34:41 +02:00
Calixte Denizet	4b7eb1436d	Thin whitespaces must have their own span	2023-03-29 11:23:58 +02:00
Calixte Denizet	a96f10e55d	Create a new chunk when the char is too rised compared to the previouse one	2023-03-28 13:56:46 +02:00
Jonas Jenwald	9321758d91	Merge pull request #16186 from Snuffleupagus/issue-16176 Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176)	2023-03-21 22:17:18 +01:00
Jonas Jenwald	d4bcfe8c16	Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176) Hopefully this makes sense, since we already "create" multi-byte ToUnicode entries in other cases (see e.g. the `getNormalizedUnicodes` table).	2023-03-21 21:35:57 +01:00
Calixte Denizet	2d0f30a67c	Use the position of the previous xref stream if any when saving a pdf (bug 1823296)	2023-03-21 19:27:24 +01:00
calixteman	b2a86350fc	Merge pull request #16096 from bungeman/fix_trig_functions Correct PostScript trigonometric operators	2023-03-11 14:32:23 +01:00
Calixte Denizet	07b094729e	Fix search in pdf a containing some UTF-32 characters (bug 1820909) Some chars were supposed to have a length equals to 1 but UTF-32 chars can be longuer.	2023-03-09 15:03:01 +01:00
calixteman	a0ef5a4ae1	Merge pull request #16115 from calixteman/issue16114 Apply transfer filters to any graphic commands	2023-03-08 14:53:41 +01:00
Jonas Jenwald	471aef5fc6	Support (rare) Type3 fonts with Pattern resources (issue 16127) This simply extends the approach in PR 10727 to also cover Patterns, which shouldn't be a common occurrence in Type3 fonts (since this is the first issue we've seen).	2023-03-08 09:20:52 +01:00
Calixte Denizet	8304df2520	Apply transfer filters to any graphic commands	2023-03-07 22:17:19 +01:00
Calixte Denizet	b8dda089e2	Slightly modify the max width of a tracking space	2023-03-07 19:38:49 +01:00
Calixte Denizet	8db77cc361	Use appearance stream to render locked annotations (bug 1723568)	2023-03-07 15:01:31 +01:00
Calixte Denizet	05b0c9d7e6	Render large images even if they're larger than the canvas limits (bug 1720282) The idea is to encode large image in BMP format (which is very simple and doesn't require to compute any checksums) and then use createImageBitmap with a BMP blob (which doesn't suffer of the Canvas/ImageData limits). From a performance point of view, it isn't crazy (generating a large blob + decoding it on the main thread is really not ideal) but at least we've something to display which is a way better than a blank page (and one can notice that most of the time is spent in decoding the image from the pdf stream).	2023-03-05 14:07:07 +01:00
Ben Wagner	158c836e26	Correct PostScript trigonometric operators PDF 32000-1:2008 7.10.5.1 "Type 4 (PostScript Calculator) Functions" defers to the PostScript Language Reference for the description of these functions. The PostScript Language Reference, third edition chapter 8 "Operators" defines the `angle` type as a "number of degrees". Section 8.1 defines "angle `sin` real", "angle `cos` real", and "num den `atan` angle". The documentation for `atan` further states that it will return an angle in degrees between 0 and 360. Handle these operators correctly in `PostScriptEvaluator.execute`. Convert the inputs to `sin` and `cos` from degrees to radians for use with `Math.sin` and `Math.cos`. Correctly pop two values from the stack for `atan`, use `Math.atan2`, and convert from radians to (positive) degrees.	2023-03-03 17:25:11 -05:00
Calixte Denizet	3a21423386	[Acroform] Use the full path to find the node in the XFA datasets where to store the value I noticed several 'Path not found' errors because of a field called #subform[2]. From the XFA specs, the hash is used for a class of elements in the template tree. When we're looking for a node in the datasets tree, it doesn't make sense to search for a class. Hence the path element starting with a hash are just skipped.	2023-02-23 12:09:39 +01:00
Calixte Denizet	dca54c8f8a	[JS] Send a Validate action on change on Choice widget	2023-02-19 16:33:05 +01:00
Calixte Denizet	fc7d74385f	Don't replace an eol by a whitespace when the last char is a Katakana-Hiragana diacritic	2023-02-16 11:31:58 +01:00
Calixte Denizet	58e4d92884	[Annotation] For choice widget, use the I entry instead of the V one (bug 1770750) It isn't really conform to the specifications but Acrobat is working like that...	2023-02-09 17:26:13 +01:00
Calixte Denizet	a25895bf72	[Annotation] Take into account the stroke alpha for a FreeText without appearance	2023-02-07 22:15:27 +01:00
Calixte Denizet	ea7b4b4d6c	[Annotation] Avoid to encrypt the appearance stream two times (bug 1815476)	2023-02-07 19:26:46 +01:00
Jonas Jenwald	808ca828f1	Extend `getGlyphMapForStandardFonts` with additional entries (issue 15977)	2023-01-30 12:13:21 +01:00
Calixte Denizet	6f4d037a8e	[JS] Correctly format field with numbers (bug 1811694, bug 1811510) In PR #15757, a value is automatically converted into a number when it's possible but the case of numbers like "000123" has been overlooked and their format must be preserved. When a script is doing something like "foo.value + bar.value" and the values are numbers then "foo.value" must return a number but the displayed value must be what the user entered or what a script set, so this patch is just adding a a field _orginalValue in order to track the value has it has defined. Some people are used to use a comma as decimal separator, hence it must be considered when a value is parsed into a number. This patch is fixing a regression introduced by #15757.	2023-01-26 14:57:02 +01:00
Jonas Jenwald	40a46e4397	Tweak `adjustType1ToUnicode` for fonts with a predefined named encoding (bug 1811668, PR 14050 follow-up) Please note: I cannot reproduce the problem reported in bug 1811668, regarding the context menu, and in any case it's not clear that that part is even a PDF Viewer bug. Looking at bug 1811668 I couldn't help but noticing that the textLayer isn't correct, and it's unfortunately once again a problem with the `adjustType1ToUnicode` function. That's intended to help improve text-selection for fonts without a /ToUnicode-entry, and in many cases it does help (the original PR fixed lots of issues) however it's also caused some problems. In order to improve text-selection in bug 1811668, we'll now properly ignore fonts that have a predefined named encoding specified since that's really the intention with PR 14050.	2023-01-21 12:21:21 +01:00
Jonas Jenwald	f2fce93826	[JBIG2] Ensure that the `decodeInteger` function returns valid integers (issue 15942) The JBIG2 images in this PDF document are corrupt enough that even Adobe Reader warns about it when opening the file. Please note: I don't really know the JBIG2 image format at all, however from a very brief look at the specification it seems that integers should be 32-bit.	2023-01-19 17:14:17 +01:00
Jonas Jenwald	d6be5141e9	Fallback to using the `name` table to infer the encoding for TrueType fonts missing such data (issue 15910) The relevant TrueType font is missing both /ToUnicode and /Encoding entires, either of which would have prevented the (current) broken textLayer rendering. My first idea was that we could use the `post` table in the TrueType font, see https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6post.html, to get the actual glyphNames and amend the fallback ToUnicode-map that way. Unfortunately that didn't work, since the `post` table only contained ".notdef" and "" (i.e. empty string) entries. Instead we try to use the `name` table in the TrueType font, see https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6name.html, to determine if the platform is Windows and thus fallback to generate a ToUnicode-map from the `WinAnsiEncoding`.	2023-01-17 16:04:51 +01:00
Jonas Jenwald	cefaecc2e8	Ensure that Annotation `appearance`-entries are actually Streams Note how all over the `src/core/annotation.js`-code we're assuming that if an `appearance`-entry exists it's also a Stream. However, we're not actually checking that thoroughly enough which causes issues in some badly generated PDF documents.	2023-01-16 13:02:53 +01:00
Jonas Jenwald	7d94fdeb48	Support parsing encrypted documents in `XRef.indexObjects` (issue 15893) Please note: The reduced test-case is not a perfect reproduction of the original PDF document, since this one fails to open in e.g. Adobe Reader, but I do believe that it captures the most important points here. For corrupt and encrypted PDF documents, it's possible that only some trailer dictionaries actually contain an /Encrypt-entry. Previously we'd could easily miss that, since we generally pick the first not obviously corrupt trailer dictionary, and the solution implemented here is to simply pre-parse all trailer dictionaries to see if there's any /Encrypt-entries.	2023-01-06 13:09:37 +01:00
Calixte Denizet	dea2471e96	[JS] UserActivation must be enabled before running document actions else auto-print is broken (it's a regression from patch #15822).	2023-01-04 21:26:36 +01:00
Jonas Jenwald	2fcf8bb5be	Re-factor searching for incomplete objects in `XRef.indexObjects` (issue 15803) When trying to find incomplete objects, i.e. those missing the "endobj"-string at the end, there's unfortunately a number of possible operators that we need to check for. Otherwise we could miss e.g. the "trailer" at the end of a corrupt PDF document, which is why the referenced document didn't work. Currently we do all searching on the "raw" bytes of the PDF document, for efficiency, however this doesn't really work when we need to check for multiple potential command-strings. To keep the complexity manageable we'll instead use regular expressions here, but we can at least avoid creating lots of substrings thanks to the `RegExp.lastIndex` property; which is well supported across browsers according to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastIndex#browser_compatibility Note that this repeated regular expression usage could perhaps be slightly less efficient than the old code, however this method is only invoked for corrupt PDF documents.	2022-12-19 23:01:09 +01:00
Calixte Denizet	f80880ccaa	Strip out a reserved operator (9) from CFF char strings (fixes issue #15784 )	2022-12-16 15:17:46 +01:00
Jonas Jenwald	26135b0313	Always parse the entire `startXRefQueue` in `XRef.readXRef` (issue 15833) Previously we'd abort all parsing if an Error was encountered, despite the fact that multiple `startXRefQueue`-entries may be available and that continued parsing could thus eventually be able to find usable data. Note that in the referenced PDF document the `startxref`-operator, at the end of the file, points to a position in the middle of an arbitrary `stream` which is why things break.	2022-12-15 13:46:28 +01:00
Calixte Denizet	2ebf8745a2	[JS] Run the named actions before running the format when the file is open (issue #15818 ) It's a follow-up of #14950: some format actions are ran when the document is open but we must be sure we've everything ready for that, hence we have to run some named actions before runnig the global format. In playing with the form, I discovered that the blur event wasn't triggered when JS called `setFocus` (because in such a case the mouse was never down). So I removed the mouseState thing to just use the correct commitKey when blur is triggered by a TAB key.	2022-12-13 21:12:32 +01:00
Calixte Denizet	0c1ec946aa	[JS] Handle correctly choice widgets where the display and the export values are different (issue #15815 )	2022-12-13 19:08:26 +01:00
Calixte Denizet	1a397681fe	The annotation layer dimensions must be set before adding some elements (follow-up of #15770 ) In order to move the annotations in the DOM to have something which corresponds to the visual order, we need to have their dimensions/positions which means that the parent must have some dimensions.	2022-12-13 14:54:45 +01:00
Calixte Denizet	4f0bfabe7a	Take all the viewBox into account when computing the coordinates of an annotation in the page (fixes #15789 )	2022-12-08 15:02:20 +01:00
Calixte Denizet	20fd9099f8	[Annotation] Send correctly the updated values to the JS sandbox	2022-11-29 17:34:06 +01:00
Calixte Denizet	ea1995991b	Don't add an extra space after a Katakana or a Hiragana at the eol when searching	2022-11-29 10:46:48 +01:00
Calixte Denizet	ae7da6ae48	[JS] By default, a text field value must be treated as a number (bug 1802888)	2022-11-28 16:24:01 +01:00
Jonas Jenwald	aa5b678f94	Add default icons for FileAttachment annotations (bug 1230933) Please note: This "borrows" the icons from Thunderbird. According to the PDF specification, see https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2096626, we should be providing default icons for FileAttachment annotations without appearances.	2022-11-26 11:24:59 +01:00
Jonas Jenwald	8fda3f04fe	Merge pull request #15732 from Snuffleupagus/issue-15719 Add a fallback for non-embedded composite Tahoma fonts (issue 15719)	2022-11-24 19:09:12 +01:00
Jonas Jenwald	d1c01b3164	Add a fallback for non-embedded composite Tahoma fonts (issue 15719)	2022-11-23 15:51:18 +01:00
Jonas Jenwald	47682985d3	Add support for Optional Content in TilingPatterns (issue 15716) This can't be a particularly common feature, since we've supported Optional Content for over two years and this is the very first TilingPattern-case we've seen.	2022-11-23 12:58:00 +01:00
Calixte Denizet	2be64d63e1	Normalize fullwidth, halfwidth and circled chars when searching	2022-11-14 19:27:51 +01:00
Jonas Jenwald	a1d48e3651	Add a linked test-case for issue 2618 Given that this PDF document is an interesting test-case for performance reasons, w.r.t. inline image caching, it probably can't hurt to add it to the test-suite to make it more readily available. Considering the contents of that PDF document I'm not sure if we can include it directly in the repository, hence why a linked test-case was choosen here.	2022-11-12 16:31:01 +01:00
Jonas Jenwald	595711bd7c	Merge pull request #15679 from Snuffleupagus/bug-1799927-2 Use the full inline image as the cacheKey in `Parser.makeInlineImage` (bug 1799927)	2022-11-10 22:54:48 +01:00

1 2 3 4 5 ...

1310 Commits