pdf.js

Author	SHA1	Message	Date
Calixte Denizet	3091e70aad	Flush the current chunk when the font changed because of a restore op (issue #14755 )	2023-05-18 19:37:16 +02:00
calixteman	839be801a0	Merge pull request #16433 from calixteman/bug1825002 For text widgets, get the text from the AP stream instead of from the format callback (bug 1825002)	2023-05-17 16:48:59 +02:00
Calixte Denizet	177036e6ae	For text widgets, get the text from the AP stream instead of from the format callback (bug 1825002) When fixing bug 1766987, I thought the field formatted value came from the result of the format callback: I was wrong. The format callback is ran but the value is unused (maybe it's useful to set some global vars... or it's just a bug in Acrobat). Anyway the value to display is the one rendered in the AP stream. The field value setter has been simplified and that fixes issue #16409.	2023-05-17 14:07:28 +02:00
Jonas Jenwald	bfb374dbf6	Attempt to fallback to a default font, for non-available ones, in more cases (issue 16432) This essentially extends PR 11218 to also apply when looking up the final font-reference, via the XRef-table, fails because the font isn't available. This patch also changes `PartialEvaluator.fallbackFontDict` to simply use "Helvetica" as the default font-name, since that seems generally reasonable given the now existing font-substitution code.	2023-05-17 11:41:08 +02:00
Calixte Denizet	2486536843	Compress the data when saving annotions CompressionStream API has been added in Firefox 113 (see https://bugzilla.mozilla.org/show_bug.cgi?id=1823619) hence we can use it to compress the streams with added/modified annotations.	2023-05-09 14:46:50 +02:00
Calixte Denizet	6c0fdc6ec2	Make something similar to Acrobat when Underline annotation has no appearance	2023-05-06 21:19:25 +02:00
Jonas Jenwald	722e5910e1	Improve handling of JPEG images with non-standard /Decode-entries (issue 16395) The /Decode-implementation in the our JPEG decoder, i.e. `src/core/jpg.js`, seems to only handle inverting of images properly. To support arbitrary /Decode-entries correctly we'll always use the `PDFImage.decodeBuffer` method, even for "simple" JPEG images, which should be fine since non-default /Decode-entries aren't a very common occurrence. Please note: This patch will lead to a little bit of movement in some existing test-cases, however it should be virtually imperceivable to the naked eye.	2023-05-06 13:55:39 +02:00
calixteman	f151a39d14	Merge pull request #16387 from calixteman/issue16384 [Annotations] Draw readonly annotations on their own canvas and show the HTML elements when there is a JS interaction (issue #16384)	2023-05-04 21:49:08 +02:00
Calixte Denizet	72da14f005	[Annotations] Draw readonly annotations on their own canvas and show the HTML elements when there is a JS interaction (issue #16384 )	2023-05-04 20:08:32 +02:00
calixteman	a24e11a91c	Merge pull request #16106 from bungeman/improve_color_stop_detection Better approximate gradient color stops	2023-05-04 19:48:57 +02:00
Calixte Denizet	19ca41896e	Correctly clip the text in the text layer (fixes #16316 )	2023-04-18 17:00:42 +02:00
Calixte Denizet	117bbf7cd9	[api-minor] Don't normalize the text used in the text layer. Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized when creating the search query. So to avoid to duplicate the normalization code, everything is moved in the find controller. The previous code to normalize text was using NFKC but with a hardcoded map, hence it has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size by 30kb). In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into account some RTL unicode ranges, the generated font wasn't embedding the mapping this char and the unicode ranges in the OS/2 table weren't up-to-date. When normalized some chars can be replaced by several ones and it induced to have some extra chars in the text layer. To avoid any regression, when copying some text from the text layer, a copied string is normalized (NFKC) before being put in the clipboard (it works like this in either Acrobat or Chrome).	2023-04-17 14:31:23 +02:00
Calixte Denizet	8e5f4c0622	[Editor] Take into account the initial rotation (issue #16278 )	2023-04-16 21:36:26 +02:00
Jonas Jenwald	3a36a9d337	Merge pull request #16268 from Snuffleupagus/RegionalImageCache Attempt to also cache images at the "page"-level (issue 16263)	2023-04-11 12:06:29 +02:00
Jonas Jenwald	9881dbf927	Attempt to also cache images at the "page"-level (issue 16263) Currently we have two separate image-caches on the worker-thread: - A local one, which is unique to each `PartialEvaluator.getOperatorList` invocation. This one caches both names and references, since image-resources may be accessed in either way. - A global one, which applies to the entire PDF documents and all its pages. This one only caches references, since nothing else would work. This patch introduces a third image-cache, which essentially sits "between" the two existing ones. The new `RegionalImageCache`[1] will be usable throughout a `PartialEvaluator` instance, and consequently it only caches references, which thus allows us to keep track of repeated image-resources found in e.g. different /Form and /SMask objects. --- [1] For lack of a better word, since naming things is hard...	2023-04-10 11:34:41 +02:00
Calixte Denizet	4b7eb1436d	Thin whitespaces must have their own span	2023-03-29 11:23:58 +02:00
Calixte Denizet	a96f10e55d	Create a new chunk when the char is too rised compared to the previouse one	2023-03-28 13:56:46 +02:00
Jonas Jenwald	9321758d91	Merge pull request #16186 from Snuffleupagus/issue-16176 Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176)	2023-03-21 22:17:18 +01:00
Jonas Jenwald	d4bcfe8c16	Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176) Hopefully this makes sense, since we already "create" multi-byte ToUnicode entries in other cases (see e.g. the `getNormalizedUnicodes` table).	2023-03-21 21:35:57 +01:00
Calixte Denizet	2d0f30a67c	Use the position of the previous xref stream if any when saving a pdf (bug 1823296)	2023-03-21 19:27:24 +01:00
calixteman	b2a86350fc	Merge pull request #16096 from bungeman/fix_trig_functions Correct PostScript trigonometric operators	2023-03-11 14:32:23 +01:00
Calixte Denizet	07b094729e	Fix search in pdf a containing some UTF-32 characters (bug 1820909) Some chars were supposed to have a length equals to 1 but UTF-32 chars can be longuer.	2023-03-09 15:03:01 +01:00
Ben Wagner	5fad91a680	Better approximate gradient color stops PDF gradients do not have color stops but an arbitrary PDF function of the type f(t) -> color. CSS gradients are only based on color stops. Most PDF gradient functions are produced from color stop oriented gradients. Take advantage of this by sampling the PDF function at a higher frequency but not converting any samples which could be interpolated to color stops. The sampling frequency is chosen to be the least common multiple of as many values as practical to exactly re-create the common case of the PDF function implementing equally spaced linearly interpolated stops in RGB color space. This also allows for better approximation of other smooth PDF functions (non-linear, or non-equally spaced, or in different color space). Fixes: #10572, #14165	2023-03-09 08:49:50 -05:00
calixteman	a0ef5a4ae1	Merge pull request #16115 from calixteman/issue16114 Apply transfer filters to any graphic commands	2023-03-08 14:53:41 +01:00
Jonas Jenwald	471aef5fc6	Support (rare) Type3 fonts with Pattern resources (issue 16127) This simply extends the approach in PR 10727 to also cover Patterns, which shouldn't be a common occurrence in Type3 fonts (since this is the first issue we've seen).	2023-03-08 09:20:52 +01:00
Calixte Denizet	8304df2520	Apply transfer filters to any graphic commands	2023-03-07 22:17:19 +01:00
Calixte Denizet	b8dda089e2	Slightly modify the max width of a tracking space	2023-03-07 19:38:49 +01:00
Calixte Denizet	8db77cc361	Use appearance stream to render locked annotations (bug 1723568)	2023-03-07 15:01:31 +01:00
Calixte Denizet	05b0c9d7e6	Render large images even if they're larger than the canvas limits (bug 1720282) The idea is to encode large image in BMP format (which is very simple and doesn't require to compute any checksums) and then use createImageBitmap with a BMP blob (which doesn't suffer of the Canvas/ImageData limits). From a performance point of view, it isn't crazy (generating a large blob + decoding it on the main thread is really not ideal) but at least we've something to display which is a way better than a blank page (and one can notice that most of the time is spent in decoding the image from the pdf stream).	2023-03-05 14:07:07 +01:00
Ben Wagner	158c836e26	Correct PostScript trigonometric operators PDF 32000-1:2008 7.10.5.1 "Type 4 (PostScript Calculator) Functions" defers to the PostScript Language Reference for the description of these functions. The PostScript Language Reference, third edition chapter 8 "Operators" defines the `angle` type as a "number of degrees". Section 8.1 defines "angle `sin` real", "angle `cos` real", and "num den `atan` angle". The documentation for `atan` further states that it will return an angle in degrees between 0 and 360. Handle these operators correctly in `PostScriptEvaluator.execute`. Convert the inputs to `sin` and `cos` from degrees to radians for use with `Math.sin` and `Math.cos`. Correctly pop two values from the stack for `atan`, use `Math.atan2`, and convert from radians to (positive) degrees.	2023-03-03 17:25:11 -05:00
Calixte Denizet	3a21423386	[Acroform] Use the full path to find the node in the XFA datasets where to store the value I noticed several 'Path not found' errors because of a field called #subform[2]. From the XFA specs, the hash is used for a class of elements in the template tree. When we're looking for a node in the datasets tree, it doesn't make sense to search for a class. Hence the path element starting with a hash are just skipped.	2023-02-23 12:09:39 +01:00
Calixte Denizet	dca54c8f8a	[JS] Send a Validate action on change on Choice widget	2023-02-19 16:33:05 +01:00
Calixte Denizet	fc7d74385f	Don't replace an eol by a whitespace when the last char is a Katakana-Hiragana diacritic	2023-02-16 11:31:58 +01:00
Calixte Denizet	58e4d92884	[Annotation] For choice widget, use the I entry instead of the V one (bug 1770750) It isn't really conform to the specifications but Acrobat is working like that...	2023-02-09 17:26:13 +01:00
Calixte Denizet	a25895bf72	[Annotation] Take into account the stroke alpha for a FreeText without appearance	2023-02-07 22:15:27 +01:00
Calixte Denizet	ea7b4b4d6c	[Annotation] Avoid to encrypt the appearance stream two times (bug 1815476)	2023-02-07 19:26:46 +01:00
Jonas Jenwald	808ca828f1	Extend `getGlyphMapForStandardFonts` with additional entries (issue 15977)	2023-01-30 12:13:21 +01:00
Calixte Denizet	6f4d037a8e	[JS] Correctly format field with numbers (bug 1811694, bug 1811510) In PR #15757, a value is automatically converted into a number when it's possible but the case of numbers like "000123" has been overlooked and their format must be preserved. When a script is doing something like "foo.value + bar.value" and the values are numbers then "foo.value" must return a number but the displayed value must be what the user entered or what a script set, so this patch is just adding a a field _orginalValue in order to track the value has it has defined. Some people are used to use a comma as decimal separator, hence it must be considered when a value is parsed into a number. This patch is fixing a regression introduced by #15757.	2023-01-26 14:57:02 +01:00
Jonas Jenwald	40a46e4397	Tweak `adjustType1ToUnicode` for fonts with a predefined named encoding (bug 1811668, PR 14050 follow-up) Please note: I cannot reproduce the problem reported in bug 1811668, regarding the context menu, and in any case it's not clear that that part is even a PDF Viewer bug. Looking at bug 1811668 I couldn't help but noticing that the textLayer isn't correct, and it's unfortunately once again a problem with the `adjustType1ToUnicode` function. That's intended to help improve text-selection for fonts without a /ToUnicode-entry, and in many cases it does help (the original PR fixed lots of issues) however it's also caused some problems. In order to improve text-selection in bug 1811668, we'll now properly ignore fonts that have a predefined named encoding specified since that's really the intention with PR 14050.	2023-01-21 12:21:21 +01:00
Jonas Jenwald	f2fce93826	[JBIG2] Ensure that the `decodeInteger` function returns valid integers (issue 15942) The JBIG2 images in this PDF document are corrupt enough that even Adobe Reader warns about it when opening the file. Please note: I don't really know the JBIG2 image format at all, however from a very brief look at the specification it seems that integers should be 32-bit.	2023-01-19 17:14:17 +01:00
Jonas Jenwald	d6be5141e9	Fallback to using the `name` table to infer the encoding for TrueType fonts missing such data (issue 15910) The relevant TrueType font is missing both /ToUnicode and /Encoding entires, either of which would have prevented the (current) broken textLayer rendering. My first idea was that we could use the `post` table in the TrueType font, see https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6post.html, to get the actual glyphNames and amend the fallback ToUnicode-map that way. Unfortunately that didn't work, since the `post` table only contained ".notdef" and "" (i.e. empty string) entries. Instead we try to use the `name` table in the TrueType font, see https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6name.html, to determine if the platform is Windows and thus fallback to generate a ToUnicode-map from the `WinAnsiEncoding`.	2023-01-17 16:04:51 +01:00
Jonas Jenwald	cefaecc2e8	Ensure that Annotation `appearance`-entries are actually Streams Note how all over the `src/core/annotation.js`-code we're assuming that if an `appearance`-entry exists it's also a Stream. However, we're not actually checking that thoroughly enough which causes issues in some badly generated PDF documents.	2023-01-16 13:02:53 +01:00
Jonas Jenwald	7d94fdeb48	Support parsing encrypted documents in `XRef.indexObjects` (issue 15893) Please note: The reduced test-case is not a perfect reproduction of the original PDF document, since this one fails to open in e.g. Adobe Reader, but I do believe that it captures the most important points here. For corrupt and encrypted PDF documents, it's possible that only some trailer dictionaries actually contain an /Encrypt-entry. Previously we'd could easily miss that, since we generally pick the first not obviously corrupt trailer dictionary, and the solution implemented here is to simply pre-parse all trailer dictionaries to see if there's any /Encrypt-entries.	2023-01-06 13:09:37 +01:00
Calixte Denizet	dea2471e96	[JS] UserActivation must be enabled before running document actions else auto-print is broken (it's a regression from patch #15822).	2023-01-04 21:26:36 +01:00
Jonas Jenwald	2fcf8bb5be	Re-factor searching for incomplete objects in `XRef.indexObjects` (issue 15803) When trying to find incomplete objects, i.e. those missing the "endobj"-string at the end, there's unfortunately a number of possible operators that we need to check for. Otherwise we could miss e.g. the "trailer" at the end of a corrupt PDF document, which is why the referenced document didn't work. Currently we do all searching on the "raw" bytes of the PDF document, for efficiency, however this doesn't really work when we need to check for multiple potential command-strings. To keep the complexity manageable we'll instead use regular expressions here, but we can at least avoid creating lots of substrings thanks to the `RegExp.lastIndex` property; which is well supported across browsers according to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastIndex#browser_compatibility Note that this repeated regular expression usage could perhaps be slightly less efficient than the old code, however this method is only invoked for corrupt PDF documents.	2022-12-19 23:01:09 +01:00
Calixte Denizet	f80880ccaa	Strip out a reserved operator (9) from CFF char strings (fixes issue #15784 )	2022-12-16 15:17:46 +01:00
Jonas Jenwald	26135b0313	Always parse the entire `startXRefQueue` in `XRef.readXRef` (issue 15833) Previously we'd abort all parsing if an Error was encountered, despite the fact that multiple `startXRefQueue`-entries may be available and that continued parsing could thus eventually be able to find usable data. Note that in the referenced PDF document the `startxref`-operator, at the end of the file, points to a position in the middle of an arbitrary `stream` which is why things break.	2022-12-15 13:46:28 +01:00
Calixte Denizet	2ebf8745a2	[JS] Run the named actions before running the format when the file is open (issue #15818 ) It's a follow-up of #14950: some format actions are ran when the document is open but we must be sure we've everything ready for that, hence we have to run some named actions before runnig the global format. In playing with the form, I discovered that the blur event wasn't triggered when JS called `setFocus` (because in such a case the mouse was never down). So I removed the mouseState thing to just use the correct commitKey when blur is triggered by a TAB key.	2022-12-13 21:12:32 +01:00
Calixte Denizet	0c1ec946aa	[JS] Handle correctly choice widgets where the display and the export values are different (issue #15815 )	2022-12-13 19:08:26 +01:00
Calixte Denizet	1a397681fe	The annotation layer dimensions must be set before adding some elements (follow-up of #15770 ) In order to move the annotations in the DOM to have something which corresponds to the visual order, we need to have their dimensions/positions which means that the parent must have some dimensions.	2022-12-13 14:54:45 +01:00

1 2 3 4 5 ...

1321 Commits