pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	595711bd7c	Merge pull request #15679 from Snuffleupagus/bug-1799927-2 Use the full inline image as the cacheKey in `Parser.makeInlineImage` (bug 1799927)	2022-11-10 22:54:48 +01:00
Calixte Denizet	3ca03603c2	[Annotation] Fix printing/saving for annotations containing some non-ascii chars and with no fonts to handle them (bug 1666824) - For text fields * when printing, we generate a fake font which contains some widths computed thanks to an OffscreenCanvas and its method measureText. In order to avoid to have to layout the glyphs ourselves, we just render all of them in one call in the showText method in using the system sans-serif/monospace fonts. * when saving, we continue to create the appearance streams if the fonts contain the char but when a char is missing, we just set, in the AcroForm dict, the flag /NeedAppearances to true and remove the appearance stream. This way, we let the different readers handle the rendering of the strings. - For FreeText annotations * when printing, we use the same trick as for text fields. * there is no need to save an appearance since Acrobat is able to infer one from the Content entry.	2022-11-10 19:05:39 +01:00
Jonas Jenwald	b46e0d61cf	Use the full inline image as the cacheKey in `Parser.makeInlineImage` (bug 1799927) Please note: This only fixes the "wrong letter" part of bug 1799927. It appears that the simple `computeAdler32` function, used when caching inline images, generates hash collisions for some (very short) TypedArrays. In this case that leads to some of the "letters", which are actually inline images, being rendered incorrectly. Rather than switching to another hashing algorithm, e.g. the `MurmurHash3_64` class, we simply cache using a stringified version of the inline image data as the cacheKey to prevent any future collisions. While this will (naturally) lead to slightly higher peak memory usage, it'll however be limited to the current `Parser`-instance which means that it's not persistent. One small benefit of these changes is that we can avoid creating lots of `Stream`-instances for already cached inline images.	2022-11-10 18:27:26 +01:00
Jonas Jenwald	2516ffa78e	Fallback to finding the first "obj" occurrence, when the trailer-dictionary is incomplete (issue 15590) Note that the "trailer"-case is already a fallback, since normally we're able to use the "xref"-operator even in corrupt documents. However, when a "trailer"-operator is found we still expect "startxref" to exist and be usable in order to advance the stream position. When that's not the case, as happens in the referenced issue, we use a simple fallback to find the first "obj" occurrence instead. This partially fixes issue 15590, since without this patch we fail to find any objects at all during `XRef.indexObjects`. However, note that the PDF document is still corrupt and won't render since there's no actual /Pages-dictionary and the /Root-entry simply points to the /OpenAction-dictionary instead.	2022-11-03 12:46:30 +01:00
calixteman	e42e1cde61	Merge pull request #15615 from calixteman/bug1796741 [Form] Don't use field appearances when /NeedAppearances is set to true (bug 1796741)	2022-10-31 09:58:27 +01:00
Jonas Jenwald	980acddbfa	Prevent textLayer errors in documents with unbalanced beginMarkedContent/endMarkedContent operators (issue 15629)	2022-10-26 18:35:48 +02:00
Calixte Denizet	9f95a14e91	[Form] Don't use field appearances when /NeedAppearances is set to true (bug 1796741) When a form isn't changed, we used the appearances we had in the file, but when /NeedAppearances is true, all the appearances have to be regenerated whatever they're.	2022-10-26 12:10:51 +02:00
Calixte Denizet	6db9cefaaf	[Annotation] Replace use of id by data-element-id to have the correct id	2022-10-19 23:36:28 +02:00
Jonas Jenwald	3c046c0a21	Extend `getSupplementalGlyphMapForCalibri` with some umlauts (issue 15594)	2022-10-19 17:49:40 +02:00
Jonas Jenwald	bc13a277ce	Relax the /Pages dictionary /Count check for corrupt documents (issue 9105) After PR 14311, and follow-up patches, we no longer require that the /Count entry (in the /Pages dictionary) is either present or even valid in order to parse/render a PDF document. Hence it seems strange to keep this requirement for corrupt PDF documents, when trying to find a usable `trailer` in the `XRef.indexObjects` method.	2022-10-19 12:28:25 +02:00
Calixte Denizet	556513a6e7	Use all the current transform as key when caching some image for masks used with pattern fill (bug 1795263, #15573 )	2022-10-14 14:37:58 +02:00
Jonas Jenwald	081e897588	Ensure that `Page.getOperatorList` handles Annotation parsing errors correctly (issue 15557) Fixes a regression from PR 15246, sorry about that! The return value of all `Annotation.getOperatorList` methods was changed in PR 15246, however I missed updating the error code-path in `Page.getOperatorList` which thus breaks all operatorList-parsing for pages with corrupt Annotations.	2022-10-10 09:48:01 +02:00
Jonas Jenwald	ce66fefbff	[api-minor] Add partial support for the "GoToE" action (issue 8844) Please note: The referenced issue is the only mention that I can find, in either GitHub or Bugzilla, of "GoToE" actions. Hence why I've purposely settled for a very simple, and partial, "GoToE" implementation to avoid complicating things initially.[1] In particular, this patch only supports "GoToE" actions that references the /EmbeddedFiles-dict in the PDF document. See https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2048909 --- [1] Usually I always prefer having real-world test-cases to work with, whenever I'm implementing new features.	2022-10-06 10:33:07 +02:00
Jonas Jenwald	c87f90102c	Add more non-standard ligatures in the `glyphlist.js` file (issue 15516) Note that this PR only adds the "underscore"-variant of actually existing ligatures, however the referenced PDF document also uses a couple of non-standard ones (e.g. `ft`, `Th`, and `fh`) that we cannot easily support without larger changes (since they don't have official Unicode-entries). Given that it's clearly the PDF document, and its fonts, that's the culprit here it's not entirely clear to me that we actually want to attempt a larger refactoring/rewriting of the `glyphlist.js` code, assuming it's even generally possible. Especially when this patch alone already improves our copy-paste behaviour when compared to both Adobe Reader and PDFium, and that this is only the second time this sort of bug has been reported.	2022-09-27 16:31:51 +02:00
Jonas Jenwald	7a19def34c	Extend `getSupplementalGlyphMapForCalibri` with more entries (issue 15443)	2022-09-15 22:19:16 +02:00
Jonas Jenwald	2f2ecad8fd	Extend `getGlyphMapForStandardFonts` with some quote-entries (issue 15441)	2022-09-15 11:37:20 +02:00
Calixte Denizet	6c6f6fb2b8	Don't replace cr by a white space when the last char on the line is an ideographic char	2022-09-04 14:21:05 +02:00
Jonas Jenwald	cc4baa2fe9	[api-minor] Add basic support for the `SetOCGState` action (issue 15372) Note that this patch implements the `SetOCGState`-handling in `PDFLinkService`, rather than as a new method in `OptionalContentConfig`[1], since this action is nothing but a series of `setVisibility`-calls and that it seems quite uncommon in real-world PDF documents. The new functionality also required some tweaks in the `PDFLayerViewer`, to ensure that the `layersView` in the sidebar is updated correctly when the optional-content visibility changes from "outside" of `PDFLayerViewer`. --- [1] We can obviously move this code into `OptionalContentConfig` instead, if deemed necessary, but for an initial implementation I figured that doing it this way might be acceptable.	2022-09-01 17:34:24 +02:00
Jonas Jenwald	216b86a082	[api-minor] Support Named-actions in the outline (issue 15367) Apparently this is implemented in e.g. Adobe Reader, and the specification does support it, however it cannot be commonly used in real-world PDF documents since it took over ten years for this feature to be requested.	2022-08-30 18:47:45 +02:00
Calixte Denizet	c06c5f7cbd	[Annotations] charLimit === 0 means unlimited (bug 1782564) Changing the charLimit in JS had no impact, so this patch aims to fix that and add an integration test for it.	2022-08-19 11:28:28 +02:00
Calixte Denizet	f316300113	[Annotations] Add some aria-owns in the text layer to link to annotations (bug 1780375) This patch doesn't structurally change the text layer: it just adds some aria-owns attributes to some spans. The aria-owns attribute expect to have an element id, hence it's why it adds back an id on the element rendering an annotation, but this id is built in using crypto.randomUUID to avoid any potential issues with the hash in the url. The elements in the annotation layer are moved into the DOM in order to have them in the same "order" as they visually are. The overall goal is to help screen readers to present to the user the annotations as they visually are and as they come in the text flow. It is clearly not perfect, but it should improve readability for some people with visual disabilities.	2022-08-12 14:35:26 +02:00
Jonas Jenwald	899fc29eef	Always set a border-radius for RadioButton annotations (issue 15262)	2022-08-02 13:58:20 +02:00
Calixte Denizet	d092a85b6c	Fix wrong order of arguments when calling the CipherTransform ctor (bug 1782186)	2022-07-29 12:46:45 +02:00
Jonas Jenwald	c2f7942aea	Ensure that the /Resources-entry is actually a dictionary (issue 15150) Prevent issues in corrupt PDF documents, if the /Resources-entry is not of the correct and expected type.	2022-07-08 12:43:43 +02:00
Jonas Jenwald	79cfc548fc	Improve text-selection for Type3 fonts with bogus /FontBBox-entries (issue 14999) This extends PR 13461, by also building a fallback bounding box for Type3 fonts that contain a much too small /FontBBox-entry. Please note: While this patch improves things overall, copy-and-pasting still doesn't work perfectly for this document. In particular the lowercase letter "c" cannot be selected/copied, however this can be reproduced in both Adobe Reader and PDFium (in Google Chrome) too, which is caused by a lack of proper /ToUnicode-data in the PDF document.	2022-07-05 14:27:14 +02:00
Calixte Denizet	a334a21a1d	[JS] Update siblings when a field is updated after a calculation (#15092 )	2022-06-24 14:23:06 +02:00
Calixte Denizet	cdc58b7a52	Rotate annotations based on the MK::R value (bug 1675139) - it aims to fix: https://bugzilla.mozilla.org/show_bug.cgi?id=1675139; - An annotation can be rotated (counterclockwise); - the rotation can be set in using JS.	2022-06-21 17:57:26 +02:00
Calixte Denizet	7e3941da9d	[JS] Hide field borders and buttons (#15053 ) - Since the border belongs to the section containing the HTML counterpart of an annotation, this section must be hidden when a JS action requires it; - it wasn't possible to hide a button in using JS.	2022-06-17 17:36:38 +02:00
Jonas Jenwald	3d244cb6a8	Render PopupAnnotations even if they have missing or empty /Rect-entries (issue 15012, PR 14439 follow-up) This only applies to corrupt PDF documents, where Annotations are missing the required /Rect-entry. Rendering PopupAnnotations unconditionally shouldn't be a problem, since we're not using a `BaseSVGFactory`-instance in that case.	2022-06-09 15:10:54 +02:00
Calixte Denizet	2dd0c861bf	Outline fields which are required (bug 1724918) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1724918; - it applies for both Acroform and XFA.	2022-06-07 17:02:11 +02:00
Calixte Denizet	c7afce4210	Support Hangul syllables when searching some text (bug 1771477) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1771477; - hangul contains some syllables which are decomposed when using NFD, hence the text must be correctly shifted in case it contains some of them.	2022-05-28 16:50:03 +02:00
Jonas Jenwald	5a2899c57e	Skip bogus `d1` operators in Type3-glyphs (issue 14953) In the `src/display/canvas.js` code the `d1` operator will be used to set the clipping region, and it obviously cannot be empty since that prevents the Type3-glyph from rendering. Also, the patch removes an outdated comment; refer to PR 12718.	2022-05-24 12:20:31 +02:00
Jonas Jenwald	6e7e9d83d8	Add support for TrueType format 12 `cmap`s (issue 14881) This is, as far as I can tell, the first case we've seen of a format 12 `cmap`. Please see https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6cmap.html	2022-05-06 11:11:38 +02:00
Calixte Denizet	094ff38da0	[JS] Fix few bugs present in the pdf for issue #14862 - since resetForm function reset a field value a calculateNow is consequently triggered. But the calculate callback can itself call resetForm, hence an infinite recursive loop. So basically, prevent calculeNow to be triggered by itself. - in Firefox, the letters entered in some fields were duplicated: "AaBb" instead of "AB". It was mainly because beforeInput was triggering a Keystroke which was itself triggering an input value update and then the input event was triggered. So in order to avoid that, beforeInput calls preventDefault and then it's up to the JS to handle the event. - fields have a property valueAsString which returns the value as a string. In the implementation it was wrongly used to store the formatted value of a field (2€ when the user entered 2). So this patch implements correctly valueAsString. - non-rendered fields can be updated in using JS but when they're, they must take some properties in the annotationStorage. It was implemented for field values, but it wasn't for display, colors, ... - it fixes #14862 and #14705.	2022-05-03 15:48:44 +02:00
Jonas Jenwald	71370d012b	Support destinations in NameTrees with encoded keys (issue 14847) Initially I considered updating the `NameOrNumberTree`-implementation to handle encoded keys, however that quickly became somewhat messy (especially in the `NameOrNumberTree.get`-method) since only NameTrees using string-keys. Hence the easiest solution, as far as I'm concerned, was thus to just update the `Catalog.destinations`-getter instead. Please note that in the referenced PDF document the `Catalog.destination`-method will thus fallback to fetch all destinations, which should be fine since this is the very first case of encoded keys that we've seen. Also changes the `NameOrNumberTree.getAll`-method to prevent a possible run-time error, although we've so far not seen such a case, for any non-Array Kids-entries found in a NameTree/NumberTree. Finally, to improve overall consistency and to hopefully prevent future bugs, the patch also updates a couple of other `NameTree` call-sites to correctly handle encoded keys. (Note that the `Catalog.attachments`-getter was already doing this.)	2022-04-27 11:19:55 +02:00
Tim van der Meij	f9e54d9226	Merge pull request #14823 from Snuffleupagus/issue-14821 Ignore invalid /Encoding-entries when parsing fonts (issue 14821)	2022-04-23 13:19:26 +02:00
Jonas Jenwald	e723da7261	Ignore invalid /Encoding-entries when parsing fonts (issue 14821) In the referenced PDF document the fonts have /Encoding-entries that are Streams (containing completely bogus data), which are thus obviously not valid here. Hence, only when `ignoreErrors` is set, we'll now ignore these corrupt /Encoding-entries and fallback to the existing code to try and infer a usable encoding. Given that this is clearly a case of corrupt PDF documents, there's no guarantee that this will "fix" all such cases, however it's the best that we do here and shouldn't really be worse than ignoring an entire font.	2022-04-22 11:49:03 +02:00
Jonas Jenwald	39d1bdde09	Ignore non-Stream /SMask-entries when parsing images (issue 14814) This is similar to the pre-existing check used in the /Mask-case below, to handle corrupt PDF documents that include non-Stream /SMask-entries in images; please refer to the PDF specification: https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#page=216 Please note: Adobe Reader also fails to render the image on the second page, and displays an error message.	2022-04-21 12:14:08 +02:00
Jonas Jenwald	5bc7339c1b	Add support for the /Catalog Base-URI when resolving URLs (issue 14802) As far as I can tell, this is actually the very first time that we've seen a PDF document with a Base-URI specified in the /Catalog; please refer to the specification: https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2097122 To simplify the overall implementation, this new parameter is accessed via the existing `BasePdfManager.docBaseUrl`-getter and will thus override any user-specified `docBaseUrl` API-parameter.	2022-04-19 17:14:52 +02:00
Calixte Denizet	18e79e3c0b	[text selection] Add the whitespaces present in the pdf in the text chunk - it aims to fix issue #14627; - the basic idea of the recent text refactoring was to only consider the rendered visible whitespaces. But sometimes, the heuristics aren't correct and although some whitespaces are in the text stream they weren't in the text chunks because they were too small. Hence we added some exceptions, for example, we always add a whitespace when it is between two non-whitespace chars but only when in the same Tj. So basically, this patch removes the constraint to have the chars in the same Tj (in using a circular buffer to save the two last chars) but don't add a space when the visible space is really too small (hence `NOT_A_SPACE_FACTOR`).	2022-03-27 14:34:56 +02:00
Jonas Jenwald	1a7921dbf0	Compute the loca table `endOffset`, of the "first" glyph, correctly (issue 14618) When there are multiple empty glyphs at the start of the data, ensure that the "first" glyph gets a correct `endOffset` to avoid skipping it during parsing in the `sanitizeGlyph` function.	2022-03-03 14:22:45 +01:00
Brendan Dahl	85ff7b117e	Merge pull request #14536 from calixteman/thin_line Fix some issues with lineWidth < 1 after transform (bug 1753075, bug 1743245, bug 1710019)	2022-03-02 09:46:15 -08:00
Jeff Muizelaar	9b9609a6d8	[JPEG 2000] Add support for resetContextProbabilities (bug 1731483)	2022-02-26 13:05:23 +01:00
Calixte Denizet	46369e4aa5	Fix some issues with lineWidth < 1 after transform (bug 1753075, bug 1743245, bug 1710019) - it aims to fix: - https://bugzilla.mozilla.org/show_bug.cgi?id=1753075; - https://bugzilla.mozilla.org/show_bug.cgi?id=1743245; - https://bugzilla.mozilla.org/show_bug.cgi?id=1710019; - issue #13211; - issue #14521. - previously we were trying to adjust lineWidth to have something correct after the current transform is applied but this approach was not correct because finally the pixel is rescaled with the same factors in both directions. And sometimes those factors must be different (see bug 1753075). - So the idea of this patch is to apply a scale matrix to the current transform just before setting lineWidth and stroking. This scale matrix is computed in order to ensure that after transform, a pixel will have its two thickness greater than 1.	2022-02-25 18:37:34 +01:00
Brendan Dahl	7def6d12c8	Fix canvas state getting out of sync from smasks. (bug 1755507) Soft masks can be enabled/disabled at anytime and at different points in the save/restore stack. This can lead to the amount of save/restores becoming unbalanced across the two canvases. Instead of save/restoring on the temporary canvas change it so we only track state on the main (suspended canvas). I was also getting an out balance stack from patterns, so I've also fixed that and added a warning that will at least show up on chrome. It would be nice to add this so Firefox at some point too. Fixes #11328, #14297 and bug 1755507	2022-02-17 17:38:32 -08:00
Calixte Denizet	1f41028fcb	Support search with or without diacritics (bug 1508345, bug 916883, bug 1651113) - get original index in using a dichotomic seach instead of a linear one; - normalize the text in using NFD; - convert the query string into a RegExp; - replace whitespaces in the query with \s+; - handle hyphens at eol use to break a word; - add some \s* around punctuation signs	2022-02-03 15:42:55 +01:00
Calixte Denizet	ae842e1c3a	[api-minor] Annotations - Adjust the font size in text field in considering the total width (bug 1721335) - it aims to fix #14502 and bug 1721335; - Acrobat and Pdfium do the same; - it'll avoid to have truncated data when printed; - change the factor to compute font size in using field height: lineHeight = 1.35*fontSize - this is the value used by Acrobat. - in order to not have truncated strings on the bottom, add few basic metrics for standard fonts.	2022-01-30 15:53:31 +01:00
calixteman	838909f8c1	Merge pull request #14491 from quaoaris/lines-rendered-too-thick fix for lines (stroke) are rendered too thick (Bug 1743245)	2022-01-27 18:46:26 +01:00
Calixte Denizet	3a7004ca25	Take into account all rotations before comparing glyph positions - it aims to fix #14497; - previously, only rotations with an angle 0, 90, 180 or 270 were taken into account; - so generalize to any angle but keep the fast path for 0, 90, ... because they're likely more common than anything else.	2022-01-26 17:19:00 +01:00
quaoaris	3f77d80f31	fix for lines (stroke) are rendered too thick (Bug 1743245) This commit fixes Bug 1743245 (Grided PDF file lines rendered too thick) which was created by a fix for #12868 . The lineWidth was set to round(1 * this._combinedScaleFactor) when the pixel is drawn as a parallelorgam with a height <1. This fix changes this to floor(1*this._combinedScaleFactor) . This change shows a visual result comparable to Chrome and Acrobat. Regarding the last PR 3 statements in canvas.js are affected and will change with this commit (stroke and paintChar). renaming the reference files to naming comvention	2022-01-25 10:27:30 +01:00
calixteman	88236e1163	Merge pull request #14430 from calixteman/beforeinput [JS] Use beforeinput event to trigger a keystroke event in the sandbox	2022-01-23 20:42:33 +01:00
Calixte Denizet	6ac296e48e	[JS] Use beforeinput event to trigger a keystroke event in the sandbox - it aims to fix issue #14307; - this event has been added recently in Firefox and we can now use it; - fix few bugs in aform.js or in annotation_layer.js; - add some integration tests to test keystroke events (see `AFSpecial_Keystroke`); - make dispatchEvent in the quickjs sandbox async.	2022-01-23 19:53:01 +01:00
Jonas Jenwald	a13ae5d97d	Support Type1 font files with incomplete /CharStrings definitions (issue 14462) Please refer to https://www.pdfa.org/norm-refs/Type1Fonts.pdf#page=15 for the expected format for the /CharStrings entries. In the referenced PDF document the /CharStrings are missing the expected end-token, which causes us to swallow the start of the next glyph name.	2022-01-17 18:55:22 +01:00
Tim van der Meij	922dac035c	Merge pull request #14448 from Snuffleupagus/Type3-circular-refs Prevent circular references in Type3 fonts	2022-01-15 14:11:47 +01:00
Jonas Jenwald	4c55563574	Add an additional test-case for circular references in Type3 fonts The PDF document in this patch already worked without the previous patch, but I wanted to improve our test-coverage for the Type3-parsing. The attached PDF document was also found in https://github.com/pdf-association/safedocs/tree/main/Miscellaneous%20Targeted%20Test%20PDFs	2022-01-13 17:59:57 +01:00
Jonas Jenwald	53d4ee7990	Prevent circular references in Type3 fonts In corrupt PDF documents Type3 fonts may introduce circular dependencies, thus resulting in the affected font(s) never loading and parsing/rendering never completing. Note that I've not seen any real-world examples of this kind of font corruption, but the attached PDF document was rather found in https://github.com/pdf-association/safedocs/tree/main/Miscellaneous%20Targeted%20Test%20PDFs Please note: That repository contains a number of reduced test-cases that are specifically intended to test interoperability (between PDF viewer) and parsing/rendering for various kinds of strange/corrupt PDF documents. Some of the test-cases found there may thus not make sense to try and "fix" upfront, in my opinion, unless the problems are also found in real-world PDF documents.	2022-01-13 17:58:37 +01:00
Jonas Jenwald	08d88a0235	Ignore Annotations with empty /Rect-entries in the display-layer (issue 14438) This prevents the `BaseSVGFactory.create`-method from throwing, and thus preventing any remaining Annotations (on the page) from rendering in corrupt documents.	2022-01-11 13:54:35 +01:00
Calixte Denizet	6cdae5ac4d	Use positive dimensions for text chunks in the text layer (issue #14415 ).	2022-01-05 10:49:56 +01:00
Jonas Jenwald	e8562173b8	Prevent an infinite loop when parsing corrupt /CCITTFaxDecode data (issue 14305) Fixes one of the documents in issue 14305.	2021-12-07 13:57:25 +01:00
Jonas Jenwald	40291d1943	Handle errors when fetching the raw /Metadata (issue 14305) Currently the `Catalog.metadata` getter only handles errors during parsing, however in a corrupt PDF document fetching of the raw /Metadata can obviously fail as well. Without this patch the `PDFDocumentProxy.getMetadata` method, in the API, can thus fail which it never should and this will cause the viewer to not initialize all state as expected. Fixes one of the documents in issue 14305.	2021-12-04 09:41:42 +01:00
Jonas Jenwald	1fac6371d3	[Regression] Eagerly fetch/parse the entire /Pages-tree in corrupt documents (issue 14303, PR 14311 follow-up) Please note: This is similar to the method that existed prior to PR 3848, but the new method will only be used as a fallback when parsing of corrupt PDF documents. The implementation in PR 14311 unfortunately turned out to be way too simplistic, as evident by the recently added test-files in issue 14303, since it may cause infinite loops in `PDFDocument.checkLastPage` for some corrupt PDF documents.[1] To avoid this, the easiest solution that I could come up with was to fallback to eagerly parsing the entire /Pages-tree when the /Count-entry validation fails during document initialization. Fixes at least two of the issues listed in issue 14303, namely the `poppler-395-0.pdf...` and `GHOSTSCRIPT-698804-1.pdf...` documents. --- [1] The whole point of PR 14311 was obviously to get rid of infinte loops during document initialization, not to introduce any more of those.	2021-12-02 14:31:04 +01:00
Jonas Jenwald	63be23f05b	Handle errors correctly when data lookup fails during /Pages-tree parsing (issue 14303) This only applies to severely corrupt documents, where it's possible that the `Parser` throws when we try to access e.g. a /Kids-entry in the /Pages-tree. Fixes two of the issues listed in issue 14303, namely the `poppler-742-0.pdf...` and `poppler-937-0.pdf...` documents.	2021-12-02 10:54:40 +01:00
Jonas Jenwald	a807ffe907	Prevent circular references in XRef tables from hanging the worker-thread (issue 14303) Please note: While this patch on its own is sufficient to prevent the worker-thread from hanging, however in combination with PR 14311 these PDF documents will both load and render correctly. Rather than focusing on the particular structure of these PDF documents, it seemed (at least to me) to make sense to try and prevent all circular references when fetching/looking-up data using the XRef table. To avoid a solution that required tracking the references manually everywhere, the implementation settled on here instead handles that internally in the `XRef.fetch`-method. This should work, since that method and the `Parser`/`Lexer`-implementations are completely synchronous. Note also that the existing `XRef`-caching, used for all data-types except Streams, should hopefully help to lessen the performance impact of these changes. One potential problem with these changes could be certain browser exceptions, since those are generally not catchable in JavaScript code, however those would most likely "stop" worker-thread parsing anyway (at least I hope so). Finally, note that I settled on returning dummy-data rather than throwing an exception. This was done to allow parsing, for the rest of the document, to continue such that one bad reference doesn't prevent an entire document from loading. Fixes two of the issues listed in issue 14303, namely the `poppler-91414-0.zip-2.gz-53.pdf` and `poppler-91414-0.zip-2.gz-54.pdf` documents.	2021-11-27 23:50:26 +01:00
Jonas Jenwald	d0c4bbd828	[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303) This patch basically extends the approach from PR 10392, by also checking the last page. Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an integer /Count entry it must also be correct/valid. As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser). Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the entire /Pages-tree and essentially counting the pages. To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the entire /Pages-tree to determine the number of pages. Unfortunately these changes will have a number of somewhat negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug. - This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the last page of the PDF documents. - For poorly generated PDF documents, where the entire /Pages-tree only has one level, we'll unfortunately need to fetch/parse the entire /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of some long PDF documents, - This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost. As one small additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value). Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.	2021-11-27 21:57:35 +01:00
Calixte Denizet	31e13515f5	XFA - Draw arcs correctly - it aims to fix #14315; - take into account the startAngle to compute the coordinates of the final point.	2021-11-27 19:30:12 +01:00
Jonas Jenwald	ca8d2bdce4	Abort parsing when the XRef /W-array contain bogus entries (issue 14303) For this particular PDF document, we have `/W [1 2 166666666666666666666666666]` which obviously makes no sense. While this patch makes no attempt at actually validating the entries in the /W-array, we'll now simply abort all processing when the end of the PDF document has been reached (thus preventing hanging the browser). Please note that this patch doesn't enable the PDF document to be loaded/rendered, but at least it fails "correctly" now. Fixes one of the issues listed in issue 14303, namely the `REDHAT-1531897-0.pdf`document.	2021-11-25 18:35:08 +01:00
Jonas Jenwald	ae4f1ae3e7	Ensure that `ChunkedStream` won't attempt to request data beyond the document size (issue 14303) This bug was surprisingly difficult to track down, since it didn't just depend on range-requests being used but also on how quickly the document was loaded. To even be able to reproduce this locally, I had to use a very small `rangeChunkSize`-value (note the unit-test). The cause of this bug is a bogus entry in the XRef-table, causing us to attempt to request data from beyond the actual document size and thus getting into an infinite loop. Fixes one of the issues listed in issue 14303, namely the `PDFBOX-4352-0.pdf` document.	2021-11-24 19:19:43 +01:00
Calixte Denizet	7041c62ccf	Remove non-displayable chars from outline title (#14267 ) - it aims to fix #14267; - there is nothing about chars in range [0-1F] in the specs but acrobat doesn't display them in any way.	2021-11-13 16:56:08 +01:00
Jonas Jenwald	ea1c348c67	Always prefer abbreviated keys, over full ones, when doing any dictionary lookups (issue 14256) Note that issue 14256 was specifically about inline images, please refer to: - https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G7.1852045 - https://www.pdfa.org/safedocs-unearths-pdf-inline-image-issue/ - https://pdf-issues.pdfa.org/32000-2-2020/clause08.html#H8.9.7 However, during review of the initial PR in https://github.com/mozilla/pdf.js/pull/14257#issuecomment-964469710, it was suggested that we instead do this unconditionally for all dictionary lookups. In addition to re-ordering the existing call-sites in the `src/core`-code, and adding non-PRODUCTION/TESTING asserts to catch future errors, for consistency a number of existing `if`/`switch`-blocks were re-factored to also check the abbreviated keys first.	2021-11-10 11:56:18 +01:00
calixteman	e136afbabc	Merge pull request #14218 from janekotovich/subform_min_0 XFA subform with occur min=0 and no bound data displaying.	2021-11-05 04:12:34 -07:00
Jonas Jenwald	8222d6530b	Merge pull request #14232 from brendandahl/show-text-pattern Use correct matrix for patterns with showText.	2021-11-05 10:04:56 +01:00
Brendan Dahl	1c7048399b	Use correct matrix for patterns with showText. We were incorrectly using the transform in the pattern before it had been adjusted causing the pattern to be misplaced relative to the page. Fixes: ShowText-ShadingPattern.pdf (already in corpus) Fixes: #8111 Fixes: #9243	2021-11-04 16:57:36 -07:00
Jane-Kotovich	56b502391c	XFA subform with occur min=0 and no bound data displaying Subfrom nomin displays even though it's subform is set to <occur max=-1 min=0> If we look through specs of XFA 3.3 : https://www.pdfa.org/norm-refs/XFA-3_3.pdf - The min attribute is used when processing a form that contains data. Regardless of the data at least this number of instances is included. It is permissible to set this value to zero, in which case the container is entirely excluded if there is no data for it. However, in our case it doesn't happen, because we let our empty dataNode get through. Though by setting a clause: - eliminate unmatched data with occur min=0 we are checking our empty data and sending it to uselessNode array where at the end it gets removed;	2021-11-04 20:22:05 +10:00
Jonas Jenwald	5f77d3719b	Tweak the Bidi-detection heuristics for very short RTL strings (issue 11656) Very short strings can narrowly miss the existing Bidi-detection threshold, leading to incorrect text-selection and copying behaviour. In my testing, neither Adobe Reader or PDFium seem to handle copying "correctly" for this document. Hence it's not entirely clear to me that we actually want to fix this, since tweaking these heuristics can obviously cause regressions elsewhere (and our test coverage for RTL-text isn't exactly great).	2021-11-03 20:31:57 +01:00
Jonas Jenwald	8edec018fe	Add a RTL-text reference test (issue 10301) It seems that issue 10301 was fixed by PR 13424, by combining the spans, however given that we don't have a lot of test coverage for RTL-text I figured that adding a simple reference test wouldn't hurt (rather than just closing the issue as WORKSFORME).	2021-10-31 16:55:11 +01:00
Calixte Denizet	cf8dc750d6	Support rich content in markup annotation - use the xfa parser but in the xhtml namespace.	2021-10-31 13:44:51 +01:00
Tim van der Meij	0e7614df7f	Merge pull request #14180 from Snuffleupagus/bug-1627427 Handle ranges that "overflow" the last byte in `CMap.mapBfRange` (bug 1627427)	2021-10-27 20:06:09 +02:00
Jane-Kotovich	91fc643ff9	[api-minor] Implement securityHandler in the scripting API (bug 1731578)	2021-10-26 23:42:04 +10:00
Jonas Jenwald	aa1b78684f	Handle ranges that "overflow" the last byte in `CMap.mapBfRange` (bug 1627427)	2021-10-24 13:48:38 +02:00
Jonas Jenwald	52372b9378	Merge pull request #14175 from brendandahl/smask-v2 Use a new method for handling soft masks.	2021-10-23 09:27:18 +02:00
Brendan Dahl	2d1f9ff7a3	Use a new method for handling soft masks. The old method of handling soft masks had a number of issues where the temporary drawing canvas and the suspended main canvas could get out of sync (e.g. mismatched save/restores or clip state) or we could end up compositing at the wrong time. A good example of things getting out sync is the reduced test case in #9017. To fix this I've changed two big things: 1) Duplicate all the needed graphics state from the temporary canvas to the suspended main canvas. This ensure the canvases stay in sync so that when we switch back to the main canvas the graphics state stack is the same (e.g. transforms, clip paths). 2) Immediately composite after each drawing operation. This ensures that if there's an active clip region that we'll still be able to composite the correct portions of the canvas. Note: This solution could be avoided by using getImageData and putImageData since those ignore clipping region, but this is very very slow. Note2: I also think the old way of only compositing at the end of the soft mask is incorrect and can lead to wrong colors if drawing over the same region, but in practice this doesn't seem to matter much. Fixes: #5781 Fixes: #5853 Fixes: #7267 Fixes: #7891 Fixes: #8403 Fixes: #8624 Fixes: #12798 Fixes: #13891 Fixes: #9017 (reduced test case) Fixes: https://bugzilla.mozilla.org/show_bug.cgi?id=1703683	2021-10-22 13:41:21 -07:00
calixteman	bbb64369f1	Merge pull request #13424 from calixteman/chunks2 [api-minor] Fix issues in text selection	2021-10-18 06:14:15 -07:00
Calixte Denizet	61d1063276	Fix issues in text selection - PR #13257 fixed a lot of issues but not all and this patch aims to fix almost all remaining issues. - the idea in this new patch is to compare position of new glyph with the last position where a glyph has been drawn; - no space are "drawn": it just moves the cursor but they aren't added in the chunk; - so this way a space followed by a cursor move can be treated as only one space: it helps to merge all spaces into one. - to make difference between real spaces and tracking ones, we used a factor of the space width (from the font) - it was a pretty good idea in general but it fails with some fonts where space was too big: - in Poppler, they're using a factor of the font size: this is an excellent idea (<= 0.1 * fontSize implies tracking space).	2021-10-17 16:27:05 +02:00
Jay Berkenbilt	586295fad6	Implement TrueType character map "format 2" (fixes #14117 ) If a PDF included an embedded TrueType font whose preferred character map (cmap) was in "format 2", the code would select that character map and then refuse to read it because of an unsupported format, thus causing the characters not to be rendered. This commit implements support for format 2 as described at the link below. https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6cmap.html	2021-10-13 07:37:14 -04:00
Jonas Jenwald	284d259054	Merge pull request #14057 from Snuffleupagus/bug-920426 Support CMap-data with only strings, when parsing TrueType composite fonts (bug 920426)	2021-10-01 23:22:25 +02:00
Calixte Denizet	aecbd7cd89	AcroForm: Add support for ResetForm action - it aims to fix #12721. - Thanks to PR #14023, we've now the fieldObjects in the annotation layer so we can easily map fields names on their id if needed. - Reset values in the storage, in the JS sandbox and in the visible html elements.	2021-09-30 22:02:33 +02:00
Jonas Jenwald	d3ca28bc34	Support CMap-data with only strings, when parsing TrueType composite fonts (bug 920426) In the referenced bug, the embedded fonts contain custom CMap-data that only include strings. Note how for embedded composite TrueType fonts we're using the CMap-data when building the glyph mapping, and currently we end up with a completely empty map because the code expects only CID numbers. Furthermore, just fixing the glyph mapping alone isn't sufficient to fully address the bug, since we also need to consider this "special" kind of CMap-data when looking up glyph widths.	2021-09-30 18:10:47 +02:00
Calixte Denizet	748ab4983c	Add the missing pdf file for the test in the PR #14049	2021-09-29 22:07:07 +02:00
Jonas Jenwald	1dcd2f0cd3	[api-minor] Add basic support for RTL text-content in PopupAnnotations (issue 14046) In order to implement this, we utilize the existing `bidi` function to infer the text-direction of /T and /Contents entries. While this may not be perfect in cases where one PopupAnnotation mixes LTR and RTL languages, it should work well enough in most cases. To avoid having to add two new properties in lots of annotations, supplementing the existing `title`/`contents`-properties, this patch instead re-factors the existing code such that the properties are replaced by Objects (containing `str` and `dir`). Please note: In order avoid breaking existing third-party implementations, `GENERIC`-builds of the PDF.js library will still provide the old `title`/`contents`-properties on annotations returned by `PDFPageProxy.getAnnotations`.	2021-09-25 09:18:58 +02:00
Calixte Denizet	386acf5bdd	Integration test for PR #14023	2021-09-23 13:05:18 +02:00
Jonas Jenwald	8ea27ce157	Tweak how fonts with an /Encoding are handled in `adjustToUnicode` (issue 14048, PR 13277 follow-up) Currently we only exclude /Encoding entries that also contains a /Differences array, which is the cause of the text-selection problem in the referenced issue. In order to address this we'll now also exclude /Encoding entries that contain one of the predefined named encodings, and no longer require that it also contains a /Differences array. Please note: This patch cases a small "regression" in the `bug1130815-text` test-case, however this is actually an improvement when compared with Adobe Reader and PDFium (in Google Chrome).	2021-09-18 22:44:25 +02:00
Jonas Jenwald	a11343e9af	Improve glyph mapping for non-embedded composite standard fonts with a /CIDToGIDMap (issue 11915) Please note: All of this feels very handwavy, but at least it passes all tests locally. Hopefully we have enough tests for this part of the font code. For non-embedded composite standard fonts with an "incomplete" /CIDToGIDMap, we'll now fallback to an explicitly defined /ToUnicode map even when that one happens to be an /Identity-H or /Identity-V map. The `Font.fallbackToSystemFont` method is unfortunately getting more and more special-cases, however that might be unavoidable given all the weird non-embedded fonts found in the wild :-(	2021-09-15 11:30:40 +02:00
Brendan Dahl	f38fb42b42	Enable/disable image smoothing based on image interpolate value. (bug 1722191) While some of the output looks worse to my eye, this behavior more closely matches what I see when I open the PDFs in Adobe acrobat. Fixes: #4706, #9713, #8245, #1344	2021-09-10 14:23:35 -07:00
Jonas Jenwald	3ccf277f58	Fallback to the /ToUnicode map for TrueType fonts with (3, 1) and (1, 0) cmap-tables (issue 13316) In the PDF document some of the glyphs have bogus `differences`-entries[1] that cannot be resolved to valid glyph names, thus causing the glyph mapping to fail. My initial idea was to use a similar approach as in the `PartialEvaluator._simpleFontToUnicode`-method, to extract the charCodes from those entries, however it turned out that that didn't actually help in this case (the mapping was still wrong). To fix this I'm thus proposing that we fallback to the /ToUnicode map when no other useable data exists (e.g. no post-table), since it hopefully shouldn't make things any worse than leaving parts of the glyph map empty (which currently happens). --- [1] As can be seem below, some of the entries are completely normal while others are non-standard: ``` Differences (array) 0 = 65 1 = /g5167 2 = /space 3 = /g11927 4 = /g17737 5 = /g11540 6 = /g2180 7 = /K 8 = /P 9 = /two 10 = /zero 11 = /one 12 = /five 13 = /four 14 = /g6932 15 = /g7246 16 = /g1691 17 = /g2343 18 = /g14792 19 = /g3325 20 = /g4280 21 = /g20383 22 = /g18166 23 = /g16988 24 = /g17943 25 = /g19223 26 = /g10830 27 = 97 28 = /g982 29 = /g1226 30 = /g5059 31 = /g2677 32 = /g1042 33 = /g11568 34 = /L 35 = /three 36 = /seven 37 = /g2364 38 = /g12063 39 = /g5356 40 = /g2173 41 = /g17877 42 = /g7273 43 = /g7647 44 = /g7224 45 = /g19327 46 = /g5054 47 = /g2342 48 = /g10136 49 = /g6856 50 = /g13381 51 = /g7257 52 = /g12093 53 = /g2359 ```	2021-09-04 07:38:22 +02:00
Brendan Dahl	a7f807b059	Only use base encoding if it's populated. (bug 1727053) The font dict in this file has an encoding entry, but only specifies a differences map. The base encoding is empty in this case and shouldn't be used.	2021-08-30 12:51:59 -07:00
Jonas Jenwald	853b1172a1	Support Optional Content in Image-/XObjects (issue 13931) Currently, in the `PartialEvaluator`, we only support Optional Content in Form-/XObjects. Hence this patch adds support for Image-/XObjects as well, which looks like a simple oversight in PR 12095 since the canvas-implementation already contains the necessary code to support this.	2021-08-26 16:54:15 +02:00
Jonas Jenwald	ac27f96987	Extend the glyph maps for standard respectively Calibri fonts (issue 13916)	2021-08-21 00:48:38 +02:00
Brendan Dahl	da1af02ac8	Improve performance of reused patterns. Bug 1721218 has a shading pattern that was used thousands of times. To improve performance of this PDF: - add a cache for patterns in the evaluator and only send the IR form once to the main thread (this also makes caching in canvas easier) - cache the created canvas radial/axial patterns - for shading fill radial/axial use the pattern directly instead of creating temporary canvas	2021-07-22 16:47:40 -07:00
Brendan Dahl	a52c0c6988	Fix transformations when painting image masks and tiling patterns. Previously, when we filled image masks we didn't copy over the current transformation, this caused patterns to be misaligned when painted. Now we create a temporary canvas with the mask and have the transform copied over and offset it relative to where the mask would be painted. We also weren't properly offsetting tiling patterns. This isn't usually noticeable since patters repeat, but in the case of #13561 the pattern is only drawn once and has to be in the correct position to line up with the mask image. These fixes broke #11473, but highlighted that we were drawing that correctly by accident and not correctly handling negative bounding boxes on tiling patterns. Fixes #6297, #13561, #13441 Partially fixes #1344 (still blurry but boxes are in correct position now)	2021-07-06 17:29:32 -07:00
Calixte Denizet	429ffdcd2f	XFA - Save filled data in the pdf when downloading the file (Bug 1716288) - when binding (after parsing) we get a map between some template nodes and some data nodes; - so set user data in input handlers in using data node uids in the annotation storage; - to save the form, just put the value we have in the storage in the correct data nodes, serialize the xml as a string and then write the string at the end of the pdf using src/core/writer.js; - fix few bugs around data bindings: - the "Off" issue in Bug 1716980.	2021-06-25 18:57:01 +02:00

1 2 3 4 5 ...

677 Commits