pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	cefaecc2e8	Ensure that Annotation `appearance`-entries are actually Streams Note how all over the `src/core/annotation.js`-code we're assuming that if an `appearance`-entry exists it's also a Stream. However, we're not actually checking that thoroughly enough which causes issues in some badly generated PDF documents.	2023-01-16 13:02:53 +01:00
Jonas Jenwald	7d94fdeb48	Support parsing encrypted documents in `XRef.indexObjects` (issue 15893) Please note: The reduced test-case is not a perfect reproduction of the original PDF document, since this one fails to open in e.g. Adobe Reader, but I do believe that it captures the most important points here. For corrupt and encrypted PDF documents, it's possible that only some trailer dictionaries actually contain an /Encrypt-entry. Previously we'd could easily miss that, since we generally pick the first not obviously corrupt trailer dictionary, and the solution implemented here is to simply pre-parse all trailer dictionaries to see if there's any /Encrypt-entries.	2023-01-06 13:09:37 +01:00
Calixte Denizet	dea2471e96	[JS] UserActivation must be enabled before running document actions else auto-print is broken (it's a regression from patch #15822).	2023-01-04 21:26:36 +01:00
Jonas Jenwald	2fcf8bb5be	Re-factor searching for incomplete objects in `XRef.indexObjects` (issue 15803) When trying to find incomplete objects, i.e. those missing the "endobj"-string at the end, there's unfortunately a number of possible operators that we need to check for. Otherwise we could miss e.g. the "trailer" at the end of a corrupt PDF document, which is why the referenced document didn't work. Currently we do all searching on the "raw" bytes of the PDF document, for efficiency, however this doesn't really work when we need to check for multiple potential command-strings. To keep the complexity manageable we'll instead use regular expressions here, but we can at least avoid creating lots of substrings thanks to the `RegExp.lastIndex` property; which is well supported across browsers according to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastIndex#browser_compatibility Note that this repeated regular expression usage could perhaps be slightly less efficient than the old code, however this method is only invoked for corrupt PDF documents.	2022-12-19 23:01:09 +01:00
Calixte Denizet	f80880ccaa	Strip out a reserved operator (9) from CFF char strings (fixes issue #15784 )	2022-12-16 15:17:46 +01:00
Jonas Jenwald	26135b0313	Always parse the entire `startXRefQueue` in `XRef.readXRef` (issue 15833) Previously we'd abort all parsing if an Error was encountered, despite the fact that multiple `startXRefQueue`-entries may be available and that continued parsing could thus eventually be able to find usable data. Note that in the referenced PDF document the `startxref`-operator, at the end of the file, points to a position in the middle of an arbitrary `stream` which is why things break.	2022-12-15 13:46:28 +01:00
Calixte Denizet	2ebf8745a2	[JS] Run the named actions before running the format when the file is open (issue #15818 ) It's a follow-up of #14950: some format actions are ran when the document is open but we must be sure we've everything ready for that, hence we have to run some named actions before runnig the global format. In playing with the form, I discovered that the blur event wasn't triggered when JS called `setFocus` (because in such a case the mouse was never down). So I removed the mouseState thing to just use the correct commitKey when blur is triggered by a TAB key.	2022-12-13 21:12:32 +01:00
Calixte Denizet	0c1ec946aa	[JS] Handle correctly choice widgets where the display and the export values are different (issue #15815 )	2022-12-13 19:08:26 +01:00
Calixte Denizet	1a397681fe	The annotation layer dimensions must be set before adding some elements (follow-up of #15770 ) In order to move the annotations in the DOM to have something which corresponds to the visual order, we need to have their dimensions/positions which means that the parent must have some dimensions.	2022-12-13 14:54:45 +01:00
Calixte Denizet	4f0bfabe7a	Take all the viewBox into account when computing the coordinates of an annotation in the page (fixes #15789 )	2022-12-08 15:02:20 +01:00
Calixte Denizet	20fd9099f8	[Annotation] Send correctly the updated values to the JS sandbox	2022-11-29 17:34:06 +01:00
Calixte Denizet	ea1995991b	Don't add an extra space after a Katakana or a Hiragana at the eol when searching	2022-11-29 10:46:48 +01:00
Calixte Denizet	ae7da6ae48	[JS] By default, a text field value must be treated as a number (bug 1802888)	2022-11-28 16:24:01 +01:00
Jonas Jenwald	aa5b678f94	Add default icons for FileAttachment annotations (bug 1230933) Please note: This "borrows" the icons from Thunderbird. According to the PDF specification, see https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2096626, we should be providing default icons for FileAttachment annotations without appearances.	2022-11-26 11:24:59 +01:00
Jonas Jenwald	8fda3f04fe	Merge pull request #15732 from Snuffleupagus/issue-15719 Add a fallback for non-embedded composite Tahoma fonts (issue 15719)	2022-11-24 19:09:12 +01:00
Jonas Jenwald	d1c01b3164	Add a fallback for non-embedded composite Tahoma fonts (issue 15719)	2022-11-23 15:51:18 +01:00
Jonas Jenwald	47682985d3	Add support for Optional Content in TilingPatterns (issue 15716) This can't be a particularly common feature, since we've supported Optional Content for over two years and this is the very first TilingPattern-case we've seen.	2022-11-23 12:58:00 +01:00
Calixte Denizet	2be64d63e1	Normalize fullwidth, halfwidth and circled chars when searching	2022-11-14 19:27:51 +01:00
Jonas Jenwald	a1d48e3651	Add a linked test-case for issue 2618 Given that this PDF document is an interesting test-case for performance reasons, w.r.t. inline image caching, it probably can't hurt to add it to the test-suite to make it more readily available. Considering the contents of that PDF document I'm not sure if we can include it directly in the repository, hence why a linked test-case was choosen here.	2022-11-12 16:31:01 +01:00
Jonas Jenwald	595711bd7c	Merge pull request #15679 from Snuffleupagus/bug-1799927-2 Use the full inline image as the cacheKey in `Parser.makeInlineImage` (bug 1799927)	2022-11-10 22:54:48 +01:00
Calixte Denizet	3ca03603c2	[Annotation] Fix printing/saving for annotations containing some non-ascii chars and with no fonts to handle them (bug 1666824) - For text fields * when printing, we generate a fake font which contains some widths computed thanks to an OffscreenCanvas and its method measureText. In order to avoid to have to layout the glyphs ourselves, we just render all of them in one call in the showText method in using the system sans-serif/monospace fonts. * when saving, we continue to create the appearance streams if the fonts contain the char but when a char is missing, we just set, in the AcroForm dict, the flag /NeedAppearances to true and remove the appearance stream. This way, we let the different readers handle the rendering of the strings. - For FreeText annotations * when printing, we use the same trick as for text fields. * there is no need to save an appearance since Acrobat is able to infer one from the Content entry.	2022-11-10 19:05:39 +01:00
Jonas Jenwald	b46e0d61cf	Use the full inline image as the cacheKey in `Parser.makeInlineImage` (bug 1799927) Please note: This only fixes the "wrong letter" part of bug 1799927. It appears that the simple `computeAdler32` function, used when caching inline images, generates hash collisions for some (very short) TypedArrays. In this case that leads to some of the "letters", which are actually inline images, being rendered incorrectly. Rather than switching to another hashing algorithm, e.g. the `MurmurHash3_64` class, we simply cache using a stringified version of the inline image data as the cacheKey to prevent any future collisions. While this will (naturally) lead to slightly higher peak memory usage, it'll however be limited to the current `Parser`-instance which means that it's not persistent. One small benefit of these changes is that we can avoid creating lots of `Stream`-instances for already cached inline images.	2022-11-10 18:27:26 +01:00
Jonas Jenwald	2516ffa78e	Fallback to finding the first "obj" occurrence, when the trailer-dictionary is incomplete (issue 15590) Note that the "trailer"-case is already a fallback, since normally we're able to use the "xref"-operator even in corrupt documents. However, when a "trailer"-operator is found we still expect "startxref" to exist and be usable in order to advance the stream position. When that's not the case, as happens in the referenced issue, we use a simple fallback to find the first "obj" occurrence instead. This partially fixes issue 15590, since without this patch we fail to find any objects at all during `XRef.indexObjects`. However, note that the PDF document is still corrupt and won't render since there's no actual /Pages-dictionary and the /Root-entry simply points to the /OpenAction-dictionary instead.	2022-11-03 12:46:30 +01:00
calixteman	e42e1cde61	Merge pull request #15615 from calixteman/bug1796741 [Form] Don't use field appearances when /NeedAppearances is set to true (bug 1796741)	2022-10-31 09:58:27 +01:00
Jonas Jenwald	980acddbfa	Prevent textLayer errors in documents with unbalanced beginMarkedContent/endMarkedContent operators (issue 15629)	2022-10-26 18:35:48 +02:00
Calixte Denizet	9f95a14e91	[Form] Don't use field appearances when /NeedAppearances is set to true (bug 1796741) When a form isn't changed, we used the appearances we had in the file, but when /NeedAppearances is true, all the appearances have to be regenerated whatever they're.	2022-10-26 12:10:51 +02:00
Jonas Jenwald	71bd8b4de9	Let `Lexer.getNumber` treat more invalid "numbers" as zero (issue 15604) In the referenced PDF document there are "numbers" which consist only of `-.`, and while that's obviously not valid Adobe Reader seems to handle it just fine. Letting this method ignore more invalid "numbers" was suggested during the review of PR 14543, so let's simply relax our the validation here.	2022-10-20 22:36:15 +02:00
Calixte Denizet	6db9cefaaf	[Annotation] Replace use of id by data-element-id to have the correct id	2022-10-19 23:36:28 +02:00
Jonas Jenwald	3c046c0a21	Extend `getSupplementalGlyphMapForCalibri` with some umlauts (issue 15594)	2022-10-19 17:49:40 +02:00
Jonas Jenwald	bc13a277ce	Relax the /Pages dictionary /Count check for corrupt documents (issue 9105) After PR 14311, and follow-up patches, we no longer require that the /Count entry (in the /Pages dictionary) is either present or even valid in order to parse/render a PDF document. Hence it seems strange to keep this requirement for corrupt PDF documents, when trying to find a usable `trailer` in the `XRef.indexObjects` method.	2022-10-19 12:28:25 +02:00
Jonas Jenwald	de99f99a01	Fallback and try a previous generation if all else fails in `XRef.indexObjects` (issue 15577) When we fail to find a usable PDF document `trailer` and there were errors during parsing, try and fallback to a previous generation as a last resort during fetching of uncompressed references. Please note: This will not affect "normal" PDF documents, with valid /XRef data, and even most corrupt documents should be completely unaffected by these changes.	2022-10-18 20:24:01 +02:00
Calixte Denizet	556513a6e7	Use all the current transform as key when caching some image for masks used with pattern fill (bug 1795263, #15573 )	2022-10-14 14:37:58 +02:00
Jonas Jenwald	858d941ff8	Take the /CIDToGIDMap into account when getting the glyph mapping for CFF fonts (issue 15559) Please note: I don't really know what I'm doing here, however the patch appears to fix the referenced issue when comparing the rendering with Adobe Reader (with the caveat that I don't speak the language in question).	2022-10-13 10:02:25 +02:00
Jonas Jenwald	081e897588	Ensure that `Page.getOperatorList` handles Annotation parsing errors correctly (issue 15557) Fixes a regression from PR 15246, sorry about that! The return value of all `Annotation.getOperatorList` methods was changed in PR 15246, however I missed updating the error code-path in `Page.getOperatorList` which thus breaks all operatorList-parsing for pages with corrupt Annotations.	2022-10-10 09:48:01 +02:00
Jonas Jenwald	ce66fefbff	[api-minor] Add partial support for the "GoToE" action (issue 8844) Please note: The referenced issue is the only mention that I can find, in either GitHub or Bugzilla, of "GoToE" actions. Hence why I've purposely settled for a very simple, and partial, "GoToE" implementation to avoid complicating things initially.[1] In particular, this patch only supports "GoToE" actions that references the /EmbeddedFiles-dict in the PDF document. See https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2048909 --- [1] Usually I always prefer having real-world test-cases to work with, whenever I'm implementing new features.	2022-10-06 10:33:07 +02:00
Jonas Jenwald	c87f90102c	Add more non-standard ligatures in the `glyphlist.js` file (issue 15516) Note that this PR only adds the "underscore"-variant of actually existing ligatures, however the referenced PDF document also uses a couple of non-standard ones (e.g. `ft`, `Th`, and `fh`) that we cannot easily support without larger changes (since they don't have official Unicode-entries). Given that it's clearly the PDF document, and its fonts, that's the culprit here it's not entirely clear to me that we actually want to attempt a larger refactoring/rewriting of the `glyphlist.js` code, assuming it's even generally possible. Especially when this patch alone already improves our copy-paste behaviour when compared to both Adobe Reader and PDFium, and that this is only the second time this sort of bug has been reported.	2022-09-27 16:31:51 +02:00
Jonas Jenwald	f1b0dc6f04	Tweak the heuristic that handles JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 15492)	2022-09-22 14:09:04 +02:00
Calixte Denizet	198e9a3db1	Initialize values in the path bounding box before flushing the operator list (bug 1791583) OperatorList.addOp can trigger a flush if it's required, hence the values passed to it must be correctly initialized in order to avoid some wrong values in the renderer. Because of that a clip path was considered as empty, nothing was clipped, hence the wrong rendering in bug 1791583.	2022-09-20 20:01:54 +02:00
Jonas Jenwald	7a19def34c	Extend `getSupplementalGlyphMapForCalibri` with more entries (issue 15443)	2022-09-15 22:19:16 +02:00
Jonas Jenwald	2f2ecad8fd	Extend `getGlyphMapForStandardFonts` with some quote-entries (issue 15441)	2022-09-15 11:37:20 +02:00
Jonas Jenwald	947d390421	Fallback to a standard font when a Type1 font program is empty (issue 15292) Please note: This is only a, hopefully generally helpful, work-around rather than a proper solution to issue 15292. There's something that's "special" about the Type1 fonts in the referenced PDF document, since we don't manage to find any actual font programs and thus cannot render anything. Given that it shouldn't make sense for a Type1 font program to ever be empty, since that means that there's no glyph-data to render, we simply fallback to a standard font to at least try and render something in these rare cases.	2022-09-05 12:07:19 +02:00
Jonas Jenwald	9578152ae4	Merge pull request #15392 from Snuffleupagus/issue-15352 Don't allow `adjustToUnicode` to extend a built-in /ToUnicode map (issue 15352)	2022-09-04 15:12:10 +02:00
Calixte Denizet	6c6f6fb2b8	Don't replace cr by a white space when the last char on the line is an ideographic char	2022-09-04 14:21:05 +02:00
Jonas Jenwald	12d60e0acf	Don't allow `adjustToUnicode` to extend a built-in /ToUnicode map (issue 15352) Given that the change in PR 13393 was slightly speculative, given the lack of test-cases, let's just revert part of that to fix the referenced issue. Based on a quick look at old issues and existing test-cases, it seems that most (if not all) PDF documents that benefit from using the font-data in this way lack any /ToUnicode maps which should mean that they're unaffected by these changes.	2022-09-03 23:11:42 +02:00
Jonas Jenwald	cc4baa2fe9	[api-minor] Add basic support for the `SetOCGState` action (issue 15372) Note that this patch implements the `SetOCGState`-handling in `PDFLinkService`, rather than as a new method in `OptionalContentConfig`[1], since this action is nothing but a series of `setVisibility`-calls and that it seems quite uncommon in real-world PDF documents. The new functionality also required some tweaks in the `PDFLayerViewer`, to ensure that the `layersView` in the sidebar is updated correctly when the optional-content visibility changes from "outside" of `PDFLayerViewer`. --- [1] We can obviously move this code into `OptionalContentConfig` instead, if deemed necessary, but for an initial implementation I figured that doing it this way might be acceptable.	2022-09-01 17:34:24 +02:00
Jonas Jenwald	216b86a082	[api-minor] Support Named-actions in the outline (issue 15367) Apparently this is implemented in e.g. Adobe Reader, and the specification does support it, however it cannot be commonly used in real-world PDF documents since it took over ten years for this feature to be requested.	2022-08-30 18:47:45 +02:00
Calixte Denizet	c06c5f7cbd	[Annotations] charLimit === 0 means unlimited (bug 1782564) Changing the charLimit in JS had no impact, so this patch aims to fix that and add an integration test for it.	2022-08-19 11:28:28 +02:00
Calixte Denizet	f316300113	[Annotations] Add some aria-owns in the text layer to link to annotations (bug 1780375) This patch doesn't structurally change the text layer: it just adds some aria-owns attributes to some spans. The aria-owns attribute expect to have an element id, hence it's why it adds back an id on the element rendering an annotation, but this id is built in using crypto.randomUUID to avoid any potential issues with the hash in the url. The elements in the annotation layer are moved into the DOM in order to have them in the same "order" as they visually are. The overall goal is to help screen readers to present to the user the annotations as they visually are and as they come in the text flow. It is clearly not perfect, but it should improve readability for some people with visual disabilities.	2022-08-12 14:35:26 +02:00
Calixte Denizet	04f78c935c	Fix OTS issue with empty index (#15289 )	2022-08-08 22:56:26 +02:00
Jonas Jenwald	899fc29eef	Always set a border-radius for RadioButton annotations (issue 15262)	2022-08-02 13:58:20 +02:00

1 2 3 4 5 ...

1330 Commits